Why Testing is No Longer Sufficient for Cloud-Native Pipelines

Posted on April 30, 2020

Originally published on OverOps blog by Alex Zhitnitsky

The move to innovate at speed and scale is stressing software quality and exposing the limitations of testing.

Don’t get me wrong – testing in all its forms is inseparable from the software delivery supply chain. Tests and static analysis are essential to software development pipelines, and this holds true for both traditional and cloud-native applications.

But the problem isâ€¦they’re not sufficient.

We are in the midst of unprecedented times, and in an era of disruption where long-established financial institutions are facing competition from 7 year old startups. Companies are under pressure to move fast to remain viable, even those that have been around for over 100 years, and they need to move twice as fast to defend their position and remain competitive.

Over a decade ago, when Test-Driven Development (TDD) was introduced, it promised to improve productivity and quality. Even for teams that don’t follow it religiously, but treat it more as a north star. Since then, release cycles shortened, CI/CD is no longer a buzzword, and new companies that develop pipeline automation products – I’m looking at you GitLab – are mature enough to IPO.

To reiterate this point, testing is perhaps more relevant than ever, but when moving fast is table stakes, relying on traditional tests alone for preventing errors is no longer enough. Even though tests are irreplaceable, critical errors still reach production, while customers are waiting on the seamless experience they were promised (right before their application breaks).

The bottom line? Testing is no longer enough. And in this post, we’ll outline how we got here and what the experts have to say on that.

The Red Queen Hypothesis (and What it Means for Testing)

In the context of digital business, standing still is the equivalent of losing. Lewis Carroll said it best in “Alice in Wonderland,” which became the basis for what’s called the “Red Queen Hypothesis:

Now, here you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that! - Lewis Carroll's Red Queen in Through the Looking-Glass

Apart from being a nice anecdote from a great book, the Red Queen Hypothesis originates from an evolutionary concept, proposing that organisms must constantly adapt in order to survive, while pitted against other organisms in a constantly changing environment.

The same idea can be applied to a business setting, demonstrating why every company is rushing to automate their CI/CD workflows. The second-order effect here is how it applies to testing. When you’re releasing code faster and faster, increasing your development velocity, how can you prevent errors from being promoted? And how can you fix them fast if they do occur?

The safety measures for a road car going 70mph are very different from those of a F1 car going over 200mph, so as you automate your CI/CD workflows, testing needs to evolve as well.

What do the experts have to say about that?

According to Google Cloud’s DevOps Research & Analysis (DORA) State of DevOps report, change failure rates (i.e. the percentage of changes in production that result in degraded service and require remediation) can go up to 60% of all new code releases. For elite performers, the failure rate can be kept under 15%, but since this addresses only known errors and is based on self-reporting of engineers (not customers), the actual rates might even be higher.

The difference between low performers and elite performers is quite high, giving the elite an “unfair advantage” in their ability to innovate and move fast. This explains the results we’ve seen from big tech companies like Facebook, Apple, Netflix, Microsoft and Google, although even they are not free from errors.

A key driver behind this is the ability to release smaller changes at a higher frequency – but that’s not the only difference. While all companies test in one form or another, how they do it and what they look for varies greatly.

During a recent live panel, we discussed this issue with the co-hosts of DevOps Paradox, Darin Pope, a DevOps Consultant that’s known for making the complex simple, and Viktor Fracic, a Principal Software Delivery Strategist, Developer Advocate and a published author. Together with Eric Mizell, OverOps VP of Solution Engineering.

An on-demand recording of that conversation, including takeaways for balancing speed and quality with Continuous Reliability, is available here.

Below are three key takeaways from that conversation that illustrate why traditional testing is not sufficient:

1. 100% Code coverage â‰ 100% errors identified

Viktor Fracic:

“Code coverage means nothing. I don’t understand why people are still obsessed with code coverage. It’s not really about how many tests you have, but the quality of those tests. I’ve seen companies that have close to 100% code coverage but have meaningless tests that don’t prove anything.

“Those types of metrics to me are very misleading. It’s easy to have 95% test code coverage, but having tests that really matter, having them drive your design, and detect things that are hard to detect, that’s what’s hard. Code coverage doesn’t serve much of a purpose if the tests themselves are not really well done.”

2. High quality testing takes time and doesn’t guarantee success

Darin Pope:

“This is day-to-day life for most people, I would rather have an error happen in production and go-to-market fast, than build something for 5 years to get it out the door when all market opportunity is lost. I’d rather see people move fast and be comfortable with failing.”

Viktor Fracic: “If you do develop for five years and then put it in production, it’s still going to fail. Until you put it in production you don’t really know how it will behave with real users. We’re just trying to get as close to being stable before we put it in production, but once it’s there then we can figure out what’s going on and react quickly to solve problems. But the whole idea that you will be testing, testing, testing and then you say, yes now it’s going to work 100%, that never happens. If you find me a person who accomplished that I would be grateful to meet them.”

Eric Mizell: “I don’t want to drop in production and fail miserably. But I do want to be able to have the people and processes in place to fix it, because you want to adopt a move fast/fix fast mindset. I want to get it out there fast, because I need to move quicker in today’s world. In a cloud world a new company can spend a day and they could compete with me with something I’ve been working on for five years. I need to figure out how to move fast and continue to move forward without falling on my face.”

3. GA is effectively a beta and users are the new QA

Viktor Fracic:

“I would say that this was true a long time ago. It’s just that now we are admitting it. And everybody knows that we never deploy stable code in production that never fails or doesn’t have any problems. Every single company in the world 20 years ago was receiving notifications from their users, this or that doesn’t work, and then you need to fix it. We knew all of this, now we’re just admitting the reality.”

Eric Mizell: “I’ve actually heard a company say that they try to do risk assessment on how risky it is to lose customers and revenue versus getting the software out there. They need new features. It’s a balance. I don’t think anyone has the perfect balance there but you see it happening a ton and it goes back to the notion of I want to be able to move fast/fix fast. How do I come back from a failure is the true challenge.”

Final Thoughts

As developers, we are great at writing code, but inherently limited in the ability to foresee where it will break down later. The task of detecting software issues, either early in testing or later in prod, and gathering information on them should be automated if we ever want to achieve the move fast/fix fast vision.

Hyderabad, India