Guest post originally published on DoiT International’s blog by Stephan Stipl, Senior Cloud Architect at DoiT International

Understand components of GCP Load Balancing and learn how to set up globally available GKE multi-cluster load balancer, step-by-step.


One of the features I like the most about GCP is the external HTTP(S) Load Balancing. This is a global load balancer which gives you a single anycast IP address (no DNS load balancing needed, yay!). Requests enter Google’s global network at one of the edge points of presence (POPs) close to the user,¹ and are proxied to the closest region with available capacity. This results in a highly available, globally distributed, scalable, and fully managed load balancing setup. It can be further augmented with DDoS and WAF protection Cloud Armor, Cloud CDN, or Identity-Aware Proxy (IAP) to secure access to your web applications.

GKE Mullt-Cluster Foo Bar

With this, multi-cluster load balancing with GKE immediately comes into mind and is often a topic of interest from our customers. And while there’s no native support in GKE/Kubernetes at the moment,² GCP provides all necessary building blocks to set this up yourself.

Let’s get familiar with the GCP Load Balancing components in the first part. We will follow the journey of a request as it enters the system and understands what each of the load balancing building blocks represents. And we will set up load balancing across two GKE clusters step by step in the second part.

GCP Load Balancing Overview

Fig. 1: GCP Load Balancing Overview

Let’s start with a high-level Load Balancing flow overview. HTTP(S) connection from the client is terminated at edge location by Google Front Ends (GFEs),³ based on HTTP(S) Target Proxy, and Forwarding Rule configuration. The Target Proxy consults associated URL Map and Backend Service definitions to determine how to route traffic. From the GFEs a new connection will be established, and traffic flows over the Google Network to the closest healthy Backend with available capacity. Traffic within the region is then distributed across individual Backend Endpoints, according to their capacity.

GCP Load Balancing Components

Fig. 2: GCP Load Balancing Components


We will set up a multi-cluster load balancing for two services — Foo and Bar — deployed across two clusters (fig. 3). We’ll use simple path-based rules, and route any request for /foo/* to service Foo, resp. /bar/* to service Bar.

Fig. 3: GKE Multi-Cluster Foo Bar


git clone gke-multi-cluster-native 

Deploy Applications and Services to GKE clusters

Fig. 4: K8s Demo App

Let’s start by deploying simple demo applications to each of the clusters. The application displays details about serving cluster and region, and source code is available at stepanstipl/k8s-demo-app.

Repeat the following steps for each of your clusters.

Get Credentials for kubectl

gcloud container clusters get-credentials [cluster] \
--region [cluster-region]

Deploy Both Foo & Bar Applications

kubectl apply -f deploy-foo.yaml
kubectl apply -f deploy-bar.yaml

You can verify that Pods for both services are up and running by kubectl get pods.

Create K8s Services for Both Applications

kubectl apply -f svc-foo.yaml kubectl apply -f svc-bar.yaml

Note the '{"exposed_ports": {"80":{}}}' annotation on the services telling GKE to create a NEG for the Service.

You can verify services are set up correctly by forwarding local port using the kubectl port-forward service/foo 8888:80 and accessing the service at http://localhost:8888/.

Now don’t forget to repeat the above for all your clusters.

Setup Load Balancing (GCLB) Components

Create a Health Check

gcloud compute health-checks create http health-check-foobar \
--use-serving-port \

Create Backend Services

Create backend service for each of the services, plus one more to serve as default backend for traffic that doesn’t match the path-based rules.

gcloud compute backend-services create backend-service-default \
--globalgcloud compute backend-services create backend-service-foo \
--global \
--health-checks health-check-foobargcloud compute backend-services create backend-service-bar \
--global \
--health-checks health-check-foobar

Create URL Map

gcloud compute url-maps create foobar-url-map \
--global \
--default-service backend-service-default

Add Path Rules to URL Map

gcloud compute url-maps add-path-matcher foobar-url-map \
--global \
--path-matcher-name=foo-bar-matcher \
--default-service=backend-service-default \

Reserve Static IP Address

gcloud compute addresses create foobar-ipv4 \
--ip-version=IPV4 \

Setup DNS

Point your DNS to the previously reserved static IP address. Note the IP address you have requested:

gcloud compute addresses list --global

Create an A record foobar.[your_domain_name] pointing to this IP. You can use Cloud DNS to manage the record or any other service of your choice. This step should be completed before moving forward¹⁰

Create Managed SSL Certificate

gcloud beta compute ssl-certificates create foobar-cert \
--domains "foobar.[your_domain_name]"

Create Target HTTPS Proxy

gcloud compute target-https-proxies create foobar-https-proxy \
--ssl-certificates=foobar-cert \

Create Forwarding Rule

gcloud compute forwarding-rules create foobar-fw-rule \
--target-https-proxy=foobar-https-proxy \
--global \
--ports=443 \

Verify TLS Certificate

The whole process of certificate provisioning can take a while. You can verify its status using the:

gcloud beta compute ssl-certificates describe foobar-cert

The managed.status should become ACTIVE within the next 60 minutes or so, usually sooner, if everything was set up correctly.

Connect K8s Services to the Load Balancer

GKE has provisioned NEGs for each of the K8s services deployed with the annotation. Now we need to add these NEGs as backends to corresponding backend services.

Retrieve Names of Provisioned NEGs

kubectl get svc \
-o custom-columns=',\.google\.com/neg-status'

Note down the NEG name and zones for each service.

Repeat for all your GKE Clusters.

Add NEGs to Backend Services

Repeat the following for every NEG and zone from both clusters. Make sure to use only NEGs belonging to the Foo service.

gcloud compute backend-services add-backend backend-service-foo \
--global \
--network-endpoint-group [neg_name] \
--network-endpoint-group-zone=[neg_zone] \
--balancing-mode=RATE \

And same for Bar service, again repeat for both clusters, every NEG and zone:

gcloud compute backend-services add-backend backend-service-bar \
--global \
--network-endpoint-group [neg_name] \
--network-endpoint-group-zone=[neg_zone] \
--balancing-mode=RATE \

Allow GCLB Traffic

gcloud compute firewall-rules create fw-allow-gclb \
--network=[vpc_name] \
--action=allow \
--direction=ingress \
--source-ranges=, \

Verify Backends Are Healthy

gcloud compute backend-services get-health \
--global backend-service-foo gcloud compute backend-services get-health \
--global backend-service-bar

You should typically see 6 backends (3 per cluster,¹¹ 1 per each zone) for each backend service, with healthState: HEALTHY. It might take a while before all the backends are healthy after adding the firewall rules.

Test Everything’s Working

Curl your DNS name https://foobar.[your-domain] (or open in the browser). You should get 502 for the root, as we didn’t add any backends for the default service.

curl -v "https://foobar.[your-domain]"

Now curl paths for individual services https://foobar.[your-domain]/foo/ or https://foobar.[your-domain]/bar/ and you should receive 200 and content from the corresponding service.

curl -v "https://foobar.[your-domain]/foo/"
curl -v "https://foobar.[your-domain]/bar/"

If you retry a few times, you should see traffic served by different Pods and Clusters.¹²

If you simulate some traffic, for example using one of my favorite CLI tools vegeta, you can nicely observe traffic distribution across backends in the GCP Console. Go to Network services -> Load balancing section -> select your load balancer -> Monitoring tab and select the corresponding backend. You should see a dashboard similar to fig. 5.

Fig. 5: GKE Console — Load Balancing (both cluster were in the same region, therefore traffic is load-balanced equally across all backends)

Now it’s a good time to experiment a bit. Let’s see what happens if you have clusters in the same region, and what if they’re in different regions. Increase the load and see the traffic overflow to another region (hint: remember the --max-rate-per-endpoint used before?). See what happens if you take one of the clusters down. And can you add a 3rd cluster in the mix?

(optional) gke-autoneg-controller

Notice the annotation on the K8s Services. It is not needed for our setup, but optionally you can deploy gke-autoneg-controller¹³ to your cluster, and use it to automatically associate NEGs created by GKE with corresponding backend services. This will save you some tedious manual work.

Good Job!

And that is it. We have explained the purpose of individual GCLB components and demonstrated how to set up multi-cluster load balancing between services deployed in 2 or more GKE clusters in different regions. For real-life use, I would recommend automating this setup with a configuration management tool, such as Terraform.

This setup both increases your service availability, as several independent GKE clusters serve the traffic and also lowers your latency. In the case of HTTPS, the time to the first byte is shorter, as the initial TLS negotiation happens at the GFE server close to the user. And with multiple clusters, the request will be served by the closest one to the user.

Please let me know if you find this useful and any other questions you might have, either here or at @stepanstipl. 🚀🚀🚀 Serve fast and prosper!

Work with Stepan at DoiT International! Apply for Engineering openings on our careers site.