AI sandboxing is having its Kubernetes moment

Posted on April 30, 2026 by Jed Salazar, Field CTO, Edera

CNCF projects highlighted in this post

Recently, Anthropic announced that its new model, Mythos, had autonomously found and exploited zero-day vulnerabilities in every major operating system and web browser – including a 27-year-old bug that had survived decades of human review and millions of automated tests. The model required no specialized training and no human researchers guiding its work

If an AI model can autonomously chain vulnerabilities to achieve kernel privilege escalation on Linux, what does that say about an infrastructure model where thousands of workloads share a single kernel with no structural isolation between them? Mythos didn’t introduce a new threat. It made the consequences of an old design decision much harder to defer.

Dashboards of doom

Look at the major security products on the market today. With few exceptions, they are glorified log generators and dashboards of doom. Runtime detection agents, vulnerability scanners, admission controllers, the list goes on and on, and they all operate on the same assumption: prevent the breach, or detect it fast enough, and you win.

What they don’t do is make the systems any more secure. A scanner finds a critical CVE, generates a ticket, and tosses it over the wall to a development team that has its own priorities.The architecture doesn’t self-heal. It doesn’t contain the blast. It watches itself burn and takes very thorough notes.

Imagine if Kubernetes worked this way. Your pod crashes, and instead of rescheduling it, the kubelet opens a Jira ticket: “Pod unhealthy. Recommend restarting. Assigned to: platform team.” That would be absurd. But that’s exactly how production security works in most organizations today.

Pre-fail controls also require an impossible amount of knowledge to configure correctly. Every network policy, every RBAC rule, every seccomp profile has to be tuned to the specific behavior of the workload it protects. In a multi-tenant Kubernetes cluster running thousands of containers, that means someone needs to know exactly which APIs each service calls, which ports it needs, what filesystem paths it accesses, and what constitutes “normal” behavior. For every single workload.

This isn’t a tooling problem, it’s an information problem. The knowledge required to correctly configure pre-fail controls is distributed across teams and never consolidated in any single place. Perfect configuration requires omniscience, and omniscience isn’t a feature you can ship.

So the industry plays an infinite game of incremental hardening – patch this CVE, tighten that network policy, add another detection rule.. Every improvement puts the burden on the defender,forever. The attacker needs to find one viable chain – initial access, privilege escalation, lateral movement. The defender has to hold every configuration correct simultaneously across thousands of workloads. The math doesn’t work.

The design question

There’s a question most security architectures can’t answer:

How would you architect your systems if you assumed a workload was already compromised, the way you assume a pod can crash at any time?

This is how SRE thinks about reliability. You don’t design a distributed system assuming every node stays healthy. You assume nodes fail unpredictably, and you engineer so that individual failures don’t cascade. Circuit breakers halt propagation. Failure domains contain blast radius.You don’t need to keep every node alive for your app to serve traffic, because the architecture was built to survive failure.

What if we applied the same thinking to security? What if a single compromised workload was treated the same way Kubernetes treats a crashed pod: an expected failure that the system routes around automatically? Not a catastrophe. Not a dashboard alert. Not a war room. Just another Tuesday.

The Kubernetes irony

The irony is sharpest in the Kubernetes ecosystem.

Kubernetes is the SRE moment for infrastructure – the most successful embodiment of “design for failure” ever built. Pods crash and get rescheduled. Nodes die and workloads migrate. The entire system assumes any individual component can fail,and the platform handles it automatically.

And yet the security model running on this same platform is a catastrophic single point of failure.

Most Kubernetes clusters run all their containers on a shared Linux kernel. Every workload on a node – every microservice, every sidecar, every batch job – from every team shares the same kernel address space. A kernel vulnerability doesn’t just compromise one container; it compromises every container on the node. Worse, the security controls you deployed to detect compromise – eBPF-based agents, LSM modules, seccomp-bpf filters – run on that same kernel. A single kernel exploit not only breaches every container, it simultaneously blinds every monitor watching it. Your detection layer and your blast radius are the same thing.

We operate a platform that automatically handles the failure of any pod, any node, any infrastructure component – and then we run security on it with zero isolation, zero failure domains, and zero plan for what happens when the kernel, the single piece of shared infrastructure, is the thing that fails.

The structural fix

If the shared kernel is why a single exploit cascades to every workload on a node, the architectural fix is the same one distributed systems engineering solved decades ago: eliminate the single point of failure.

Stop sharing one kernel across all workloads. Distribute the failure domain across independent kernel instances, the same way you’d distribute a monolithic database across multiple replicas. A compromise of one kernel instance is contained to one workload, not because of a policy someone remembered to configure, but because the failure domain boundary is structural.

This approach doesn’t eliminate the need for security policy. You still want network segmentation, least-privilege IAM, and supply chain security. What changes is the consequence of getting those policies wrong. With structural isolation, a policy failure is contained to the workload it affects. Pre-fail controls become best-effort hardening with a safety net underneath – not the last line of defense.

The AI agents proof

Here’s what makes this moment different: the AI industry just ran the experiment for us.

Every major AI lab shipping autonomous agents arrived at the same architectural decision independently – containment first, hard boundaries, sandboxed execution environments where policy failures can’t cascade beyond the sandbox wall. They still use policy, but they treat policy as a layer inside the sandbox, not as the boundary itself.

Why? Because you can’t write a complete security policy for something when you don’t know what it’s going to do next. An AI agent might legitimately need to install packages, write to arbitrary paths, make network calls. It might also do something catastrophic. The behavior space is too wide for policy alone to cover. So they built walls and put the rules inside them.

The AI industry rediscovered something the security industry should have built decades ago. The question is why we’re still running production workloads – the ones handling customer data, financial transactions, and critical infrastructure – on shared kernels with less isolation than a browser tab. Chrome figured out over a decade ago that a crashed or compromised tab shouldn’t take down the browser. Your Kubernetes cluster running payment processing has weaker isolation guarantees than browsing Reddit.

The shift

I started my career as a systems administrator who thought keeping a server alive was the job. I learned at Google that the real job was building systems that didn’t need me to keep them alive. That insight transformed infrastructure engineering. It gave us SRE, Kubernetes, and every self-healing distributed system we depend on today.

Security is still waiting for the same transformation. We’re still building systems that need heroes, that need someone to notice the breach, interpret the dashboard, triage the alert, and scramble the response team. We’re still treating compromise as something that shouldn’t happen rather than something to engineer around. At Edera, we believe security needs the same paradigm shift that turned operations into reliability engineering – a discipline rooted in the reality that failure is inevitable, measured by blast radius rather than breach count, and engineered so that no single compromise can cascade beyond its failure domain. We’ve spent two years building the isolation layer that makes this real for Kubernetes. Not another dashboard, not another detection tool, but an architectural default that makes compromise a non-event, the way Kubernetes makes a crashed pod a non-event.

Mumbai, India