AI agents are moving out of developer laptops and into production environments, where they need the same operational guarantees as any other cloud-native workload. This session walks through the architectural patterns that are emerging for running long-horizon, autonomous agents reliably on Kubernetes — covering the “harness” pattern, common failure modes, and durable-execution primitives. We demonstrate deploying a working agent to a Kubernetes cluster and handling real-world failures: pod restarts, mid-execution recovery, and asynchronous human-in-the-loop checkpoints that don’t hold a pod open for hours.