Challenges and Full-Link Solutions for Grayscale Release in Microservice Architecture

Full-link grayscale release provides solutions through the concept of swim lanes and traffic routing. The Zadig platform supports MSE and Istio integration to achieve efficient grayscale release.

# Challenges of Grayscale Release in Microservice Architecture

In traditional monolithic application architecture, grayscale release is relatively simple. It only needs to be diverted at the service's traffic entrance, which can be achieved by using K8s Service or various types of gateways. However, microservice architecture introduces new complexity, and the dependencies between services are complex. Sometimes, the release of a certain feature may depend on multiple services, requiring grayscale traffic to be accurately routed to the grayscale version of the service throughout the call chain. The traditional practice of setting up shunts for single service traffic entrances cannot meet this requirement. In order to solve the problem of grayscale release under the microservice architecture, the full-link grayscale release introduces the concept of Lane. The lane extends the grayscale perspective from a single service to the entire requested call chain, ensuring that traffic flows accurately between a set of services with specified rules, just like in a pre-set lane. The full-link grayscale release solution is specially designed for microservice architectures and aims to meet the challenges of grayscale release under the microservice architecture.

# Ideas for Implementing Full-Link Grayscale Release

The core of full-link grayscale release lies in the implementation of the traffic lane concept. As mentioned above, lanes define the scope of activity for traffic that meets specified rules. There are two main implementation ideas:

# The First Idea: Complete Environmental Isolation

The main challenge in implementing swim lanes is how to route traffic to the correct service version during inter-service calls. However, there is a simple approach to avoid this issue: replicate a complete environment with all microservices and replace the services that require grayscale with their grayscale versions. Then, only the traffic needs to be diverted through the gateway at the entry points of the two environments. Due to the network isolation between the two environments, the grayscale environment naturally becomes a grayscale traffic lane.

However, for microservice projects with a large number of services, this approach can waste resources, as creating non-grayscale services in the grayscale environment consumes additional resources. If multiple versions need to be gray-released simultaneously, multiple complete environments need to be created, further increasing resource waste.

# The Second Idea: Service Traffic Routing

If each service can be given the ability to route traffic, lane settings can share normal services and fully utilize resources, allowing multiple versions of full-link grayscale release to be conducted in the same environment. Specifically, two capabilities are required: full-link traffic routing and full-link data transmission.

Traffic routing refers to the ability of a service to route traffic to the correct destination based on specified rules. For example, traffic with grayscale markers should be prioritized to the grayscale version of the service. Full-link traffic routing requires each service to have this capability.

There are currently two mainstream implementations of full-link traffic routing:

  1. Based on Istio: Using Istio, an open-source Service Mesh component, the Envoy transparent proxy is deployed in each service's container, intercepting network communication between services and forwarding it according to specified rules, thus achieving full-link traffic routing without modifying existing code.
  2. Based on Service Discovery Components: By using a service registry that supports setting metadata for services, such as Nacos, you can tag the characteristics of service instances, such as grayscale versions. Each service can obtain version information of other service instances through the registry and implement traffic routing by modifying code logic or using Java Agent.

To achieve full-link grayscale release, traffic routing rules are based on traffic markers, so the markers need to be passed through the entire request chain, i.e., full-link data transmission capability is required. Simple data transmission can be achieved using native HTTP Headers, Query Parameters, and other resources. However, in complex microservice scenarios, the Tracing Baggage mechanism should be used. Tracing Baggage is a capability provided by distributed tracing tools that can carry user-defined key-value pairs. Mainstream tracing tools like Skywalking and OpenTelemetry support this feature. Using a distributed tracing framework can facilitate logging and problem investigation, which is particularly suitable for grayscale release scenarios.

# Analysis of Pain Points in Current Enterprise Release Practices

Currently, enterprises face the following difficulties when selecting and implementing release strategies:

  1. After transitioning from traditional deployment models to cloud-native models, there is a lack of talent with the necessary skills to transform the technical architecture, making it difficult for enterprises to start with release strategies.
  2. Even if a release strategy suitable for the current product status has been found, the lack of support from automation platforms or tools means that manual execution is still required, which can lead to process omissions or human errors, causing production incidents.
  3. Only service-level grayscale capabilities have been achieved, and publishing each service individually is time-consuming, resulting in a slow release process and poor verification results.

To address these issues, Zadig provides solutions for grayscale release, helping enterprises overcome these pain points.

# Solutions for Full-Link Grayscale Release Based on Zadig

To effectively address the pain points in enterprise release, Zadig provides two general grayscale release solutions: "Alibaba Cloud MSE + Zadig" and "Istio + Distributed Tracing + Zadig":

# Alibaba Cloud MSE + Zadig

Alibaba Cloud MSE provides Java applications with the ability to easily achieve full-link grayscale. The MSE microservice engine is a non-invasive enterprise production-level service governance product implemented using Java Agent. It does not require any changes to business code and can provide governance capabilities beyond full-link grayscale, supporting all Spring Boot, Spring Cloud, and Dubbo versions from the past five years.

During the process of grayscale release using MSE, Zadig can easily create grayscale environments and grayscale K8S resources. By combining the release workflow orchestration capabilities, it automatically sets the resource tags required by MSE for K8S resources, integrating the MSE API to reduce repetitive work.

# Istio + Distributed Tracing + Zadig

Istio can achieve full-link traffic routing capabilities without intrusion. It can also set up traffic routing based on proportion, weight, HTTP headers, and other conditions. However, full-link data transmission requires the service itself to implement it. Therefore, the service needs to integrate with a distributed tracing framework that supports Baggage. If it has not been integrated, it will involve some transformation costs.

Zadig can automatically create Istio VirtualService and DestinationRule resources based on specified grayscale tasks and grayscale marking rules, combining release workflow orchestration capabilities and environmental monitoring and management capabilities to achieve the corresponding lanes, making it easy for developers to perform full-link grayscale release.

Detailed Introduction and Practical Process of the Above Two Solutions:

Background Image

作为一名软件工程师,我们一直给各行各业写软件提升效率,但是软件工程本身却是非常低效,为什么市面上没有一个工具可以让研发团队不这么累,还能更好、更快地满足大客户的交付需求?我们是否能够打造一个面向开发者的交付平台呢?我们开源打造 Zadig 正是去满足这个愿望。

—— Zadig 创始人 Landy