End user post by David Blaisonneau, Software/Cloud Expert for Network services at Orange, and Sylvain Desbureaux, DevSecOps engineer at Orange

Context

Orange is a worldwide Telecom operator supporting end-user and enterprise market customers. In Europe, it’s nearly 70 million subscribers, ~800Pb/month data traffic on mobile, 40k Sites, x2.3 monthly traffic growth in 3 years.

Over time technology has become a game changer allowing businesses to reinvent and differentiate themselves on the market. We are not immune to this trend and Orange’s ambition is to become the leading telecom reference for network agility, resilience, and performance. We are undergoing a major transformation at the moment with the goal of becoming a software-based telco that harnesses the full potential of automation, AI & Data, and drives efficiencies in power consumption and cost with the right operating models. Software has already started to transform our activities, shaping how we build, deploy, and operate our future networks.

One examples it the implementation of 5G, where each of our countries will need deployments, with their own parameters related to country specificities and commercial offers. Two yyears ago, we decided to launch a full 5G SA cloud native stack and a new operating model with the Pikeo project. We wanted to evaluate the automated deployment of a 5G Stand Alone (SA) and to experiment with several use cases around the concept of “as a service” (e.g. 5GC as a service, Slice as a service) in partnership with Hewlett Packard Enterprise and Casa Systems for 5G workloads. We deployed the 5G workloads on Orange internal CaaS (powered by SUSE/Rancher – in line with Linux Foundation Europe Sylva project) over Dell bare metal servers and over AWS infrastructure, under dedicated partnership (read https://newsroom.orange.com/orange-unveils-results-of-europes-first-experimental-end-to-end-5g-stand-alone-sa-network-pikeo/ for more). In this project our team integrated 5G SA network services into a Kubernetes cloud native infrastructure and adopted GitOps.

And here is how we did it!

Challenges

As just described, Orange’s network services team mainly focuses on integration: we buy network applications to network function vendors and we bind them together to propose a full network service to our customers. Each country has unique needs, so each deployment must be customized.

In our previous deployment model, our team used Ansible to deploy telco applications. It lacked real idempotency. A push model, maintenance complexity for big deployments and “script-like” approach were acceptable because we had no real alternative at this time.

When GitOps, coined by Weaveworks CEO Alexis Richardson, appeared, we saw some promising opportunities: the full usage of Git leveraging development good practices,  declarative models, the pull model, continuous reconciliation, and lastly the security benefits. The ultimate benefit that convinced us though was the time savings across all deployments, whether it is in a development or production environment. And believe us, the level of complexity is really high, so deployment automation is a true priority and game changer for us.

We are using the CNCF project FluxCD, created and donated by Weaveworks, because it provides a robust and scalable solution based on all the native Kubernetes APIs. FluxCD is simple, efficient and evolutionary to manage Kubernetes resources. The nice value-adds such as variable substitution, helm value multiple overloading, and merged repos are really useful for deploying applications in a homogeneous way, even with some customization. Last but not least, it’s a fully supported CNCF project, and recent open source extensions such as Weaveworks’ Flamingo or TF-Controller are gradually bringing team skills and a converging operating model based on GitOps.

To create a 5G StandAlone service, we deploy CNF (Cloud native Network Functions). CNF are container deployments allowing the creation of network services. They’re usually divided into two types:

With the 5G StandAlone mobile network came a new feature; the possibility to create slices of the network. It allows having dedicated spaces and applications for specific customers (state services, police, large business etc). Those slices must be seen as a extension of the network in time depending on commercial offers and can be seen as a “VirtualHost” feature in a WebServer.

Therefore we need a solution to deploy a common but customized 5G service to a large set of targets, and to allow the extension and specification of it in time. In the past, we would have chosen Ansible, but we now have a better solution.

Setup and implementation

The first (one-time per platform) step is to deploy the infrastructure underneath (Kubernetes clusters, networks, …) using Terraform or Ansible. The last step of this deployment is to trigger the bootstrap of FluxCD. This bootstrap installs FluxCD, prepares some external services (for example AWS route53 and secret manager), and creates the early secrets. Then all deployments come from the first FluxCD application, which lists all other FluxCD applications to be launched.

The way we deploy the different applications and configurations uses a “Russian doll” model: a main Git repository hosts a description of applications / CNF definitions of clusters and also a description of the dedicated git configuration repository:

Here is an example of a slice:

—–

apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
  name: vendor-5g-slice-1-def922
  namespace: flux-system
spec:
  interval: 10m0s
  path: ./5gc-vendor-apps/slices_templates/0001/slices
  prune: true
  dependsOn:
    - name: vendor-5g-apps-network-attachment
  sourceRef:
    kind: GitRepository
    name: 5gc-vendor
  targetNamespace: vendor
  healthChecks:
    - apiVersion: vendorapp.vendor.io/v1alpha2
      kind: Slice
      name: slice-1-def922
      namespace: vendor
  postBuild:
    substituteFrom:
      - kind: ConfigMap
        name: slice-vars-1-def922
        optional: true

Some lessons worth sharing

Speedy reconciliation is finally here!

We moved from 3h+ deployments/upgrades/tests to 15 min maximum reconciliation (and usually less than 5 min) thanks to GitOps and Kubernetes. Developer feedback is then greatly improved, and we now deploy several releases a day compared to a few a month before in production.

Add genericity thanks to the environmental substitution

Environment substitution is a powerful tool to create generic models (a.k.a. catalog).

These models can be instantiated multiple times with specific values, retrieved from config maps.

generic Slice model

—–

apiVersion: vendorapp.vendor/v1alpha2
kind: Slice
metadata:
  name: slice-${SST}-${SD}
spec:
  nsiID: '"${NSIID}"'
  subnets:
    - apiVersion: vendorapp.vendor/v1alpha2
      kind: SliceSubnet
      name: subnet-${SST}-${SD}-amf1
  policies:
    - snssais:
	- sst: ${SST}
        sd: "${SD}"
        dnns:
          - apiVersion: vendorapp.vendor/v1alpha2
            kind: DataNetwork
            name: ${DNN}

values of a specific deployment

—–

apiVersion: v1
kind: ConfigMap
metadata:
  name: slice-vars-1-def922
data:
    info_slice_model: "0001"
    info_slice_type: "slices"
    SST: "1"
    SD: def922
    NSIID: "1"
    MCC: "922"
    DNN: orange

Using tags and “continuous” version updates for a seamless production deployment

We want to drive continuous deployment as far as possible and using tags  instead of branches allows an easier rollback if something bad happens, especially when we use a “cascade” of git repositories. Also, we want to follow the latest versions of the different components (more than 40 today) we are uusing as much as possible. In order to achieve that, we use the semantic release to tag every new commit on the main branch and renovate to find new versions of components:

 Use the Git way

Finally, as network engineers are about to become software engineers too, Git is the real cornerstone of the solution and thus we must use it with good practices which we highlighted below:

It’s certainly mainstream in a lot of other industries, but we need to make it a basic everyday tool as well as shift from imperative to declarative to extend our automation reach like we do for 5G SA deployment.

Next Steps

Now that the deployment of components works well with FluxCD, we also want to leverage the GitOps approach for the bootstrap of the platform itself. On our Pikeo project, the use of Crossplane or Flux Terraform controller is currently tested as a way to extend the scope of GitOps and capitalize on the benefits highlighted for other parts of the infrastructure.

Moreover, on the Orange Network Integration Factory tooling zone, Kubernetes installation is now performed using project Sylva, a Linux Foundation Europe project to release a cloud native infrastructure stack to host Telco and Edge use cases. Project Sylva is using FluxCD and Cluster API for Kubernetes and infrastructure deployments. Undergoing work is merging the two bootstrapping processes into one, so we keep on learning to extend our automation potential!

Outro

We learned a lot, and we pushed the boundaries of automation for the next generation of software-defined to a level that brought us a lot of excitement – days became hours and now even minutes for deploying services, all fully automated This is truly a game changer in our Telco industry that will lead us toward a new operational model where resiliency and reliability are being redefined.

GitOps approach is also a game changer regarding our relationships with our traditional vendors especially in the 5G area. Telcos across the board are keen to leverage the advantages in automation, operational and cost efficiencies cloud native brings to the market and therefore want to undergo a cloud native transformation. Prerequisites are true cloud native artifacts published into cloud native repos so they are 100% compatible with GitOps.  With that in place, we can together with our traditional vendors rethink procedures to validate and deploy new releases and redefine the Responsibility Assignment Matrix.