Building with Kubernetes: overcoming the cost of new construction

Posted on August 27, 2020 by Brad Ascar

CNCF projects highlighted in this post

Guest blog post by Brad Ascar, Sr. Solutions Architect, Carbon Relay

Market data shows just how rapidly adoption of containerization in general and Kubernetes in particular has grown among enterprises. For example, the most recent survey by the Cloud Native Computing Foundation showed that 84% of the enterprise IT respondents said they are now running containers in production, with a large majority of them (78%) citing Kubernetes as their orchestration system.

In other words, containers and Kubernetes have gone mainstream.

That means that many enterprise IT and DevOps teams are now using this fairly new technology (it’s just six years old) as a key part of their efforts to rebuild their legacy IT environments for the cloud-native world.

That’s a lot of new IT ‘buildings’ being constructed by relatively inexperienced crews using new materials and techniques.

As every construction supervisor and trades person knows, however, working with new “stuff” naturally comes with a certain amount of problems and challenges.

What are some of the common problems these Kubernetes construction crews are running into? Here are some of the biggest ones we’re seeing in the field, or hearing about around the industry.

New Technology, Scarce Experience

The newness of Kubernetes coupled with its skyrocketing adoption has resulted in a skills gap. People with solid experience with this technology are hard to find and recruit, and expensive to hire. According to the Enterprise Project, the national average salary for a Kubernetes job is nearly $145K.

As a result, some enterprises anxious to move ahead, put their otherwise accomplished people on it, assuming they’ll figure it out. That’s like using an apprentice carpenter to frame an entire house. It’s a bad way to start, and even if acceptable results are eventually achieved, there will surely be problems along the way.

An Unfamiliar Landscape

Kubernetes holds the promise of more business agility and responsiveness with less infrastructure, and hopefully, lower costs. Often starting with one application, enterprise teams face the daunting task of converting it into something entirely different. In many cases, they take an app with a monolithic design (a unified stack running within a virtual

machine on dedicated hardware in an on-premise data center) and break it up into collections of microservices that are provisioned via the cloud from an array of different sources.

While the new microservices and containerization approach is complex, most enterprise teams have the skill to stand up a Kubernetes cluster and get an app running on it. It’s getting that app to run reliably – and then optimally – that’s the real challenge.

Overprovisioning – A Simplistic, Costly Answer

Here’s something that’s happening today in lots of companies. Their teams have gotten going with Kubernetes, standing up clusters, they’ve broken up their big apps into lots of smaller pieces they’re gathering from varied sources in the cloud… and voila! Their first K8s app is up and running. Then they try to tune it a bit by changing a configuration setting, and boom! It crashes. Or, they don’t change any defaults and a larger load or some other stress on the system makes it malfunction.

While they don’t know why the app crashed while running at one gigabyte, they realize it crashes less often at 1.5 gigabytes. So, they try it at two gigabytes and it seems to run okay most of the time. But “okay” doesn’t cut it. To ‘de-risk’ the application – and avoid outages and 3 AM emergency calls, they ratchet up the provisioning to 4 gigabytes.

The results? An application, which when properly configured may have run reliably with a top end of 250 MB, is 375% overprovisioned. When the same overprovisioning happens with the second, third, and fourth – or 100th application gets containerized, problems ensue. At some point, the system falls down, applications crash, and the risks turn into real operational and reputational damage. Insult gets added to injury when the exponential increases in the consumption of cloud resources get reflected in the cloud service provider’s monthly bill.

While these teams may have decreased the amount of infrastructure they have to buy and manage, and boosted their business agility some, it’s often at a cost that is many multiples what their old on-prem hardware and VMs cost them.

Playing ‘Whack-a-Mole’ with Configuration Settings

In Kubernetes manifests, there are two major settings that IT teams can manipulate, and only for two resource types. There are settings for resource requests and resource limits, and they are applied to CPU and memory resources. Add in concepts like replicasets and autoscaling options and there are many moving parts. Does the app do better to scale-up or scale-out, and which delivers in a cost-effective way?

Initial settings may work fine for a time, even though they started in the wrong place, and are now measuring either the wrong thing or the right thing the wrong way. But as

the deployment scales horizontally and new clusters come online, and new applications with very different behaviors get thrown into the mix, things can go awry. With the original settings no longer working well, IT teams look to adjust them.

That’s when the ‘whack-a-mole’ game begins. The lack of visibility into the effects of configuration setting changes turns this process into risky guesswork. And perhaps more importantly, it uses up costly developer time.

Application Parameters Further Complicate Things

Despite their distinctly different design for Kubernetes-deployments, applications still have tunable parameters that teams can tinker and change. With a simple database, for example, teams can set how memory and page caching resource levels, time periods for how long data lives in memory before being written down to disk, and how many replicas are allowed.

Another example is Java applications, which have lots of JVM settings to configure and tune, such as heap sizes and garbage collection parameters, that have significant impacts on performance.

Applications can become ‘noisy neighbors’ and start impacting the performance of other applications. When deployed in multi-tiered environments, tuning parameters at the first layer is often relatively easy, and low-risk, but the difficulties and risks increase significantly at the second and third layers.

In short, the minimal configuration settings allowed in Kubernetes, overlayed with the varying tunable parameters at different layers of an application’s ‘stack,’ make it exceedingly difficult to get Kubernetes application performance and costs just right.

Conclusion

It’s not that IT teams can’t eventually arrive at the answers they need. It’s that it’s hard, tedious, and risky work, and what they’re solving for changes rapidly. So, just like construction crews of carpenters, plumbers, and electricians, what these enterprise teams are building takes hard, tedious and sometimes risky work. They’re doing the IT equivalent of building a new structure — moving and prepping materials, roughing out the new construction, and doing the finish work.

There are, however, new and smart ways to make sure that your team of IT construction workers avoids the pitfalls outlined above. Using these new approaches will make them smile when they look at what they’ve succeeded in building. We’ll explore those new approaches in our next post. So, stay ‘tuned’.

The How-To Webinar on “Getting Started with AI Driven K8s Optimization (Free)”: https://carbonrelay.zoom.us/webinar/register/7915979440368/WN_elPnB3QiQPWhc-hZDZLO-w

Hyderabad, India