We had an exciting and busy start to KubeCon + CloudNativeCon Europe 2024. Here’s a look at some of the key moments of the day (and, spoiler alert: AI was the subject of a lot of conversation). 

The opening keynote

Priyanka Sharma, executive director, Cloud Native Computing Foundation, opened the event with the announcement that this is the largest KubeCon ever with over 12,000 attendees. And it’s a big year for cloud native as well – Kubernetes is turning 10 years old! As of 2024, some of the largest brands in the world rely on Kubernetes and cloud native.  Describing this as a time of “irrational exuberance” about AI and the future, Sharma reminded the cloud that 6 years ago at KubeCon, OpenAI said the future of AI would be powered by cloud native. They were right!

In this AI era, cloud native is building the future of technology.  Many people in the audience confirmed via a raise of hands that they were building AI-enabled features. But prototyping is “easy” while  production is much harder, especially without standardization. Sharma stressed the need for AI standards and how cloud native is here to help in that journey and provide the guardrails needes to help platform engineers be successful.

Sharma demoed loading a Kubernetes cluster, taking a picture of the audience, and getting an AI-generated description of the photo immediately to demonstrate prototype to production at scale. Also, a new Cloud Native whitepaper on AI was just released, more proof the cloud native community is hard at work solving infrastructure problems for AI.

Sharma brought Paige Bailey, lead product manager (Generative and Models) at Google DeepMind, Timothée Lacroix, co-founder of Mistral AI, and Jeffrey Morgan, founder of Ollama onstage to discuss their experiences and hopes for the open source future of AI,why open source matters for AI models, and how open source will make AI safer.

Keynote: Accelerating AI workloads with GPUs in Kubernetes

Kevin Klues and Sanjay Chatterjee of NVIDIA spoke about the process of accelerating AI workloads using GPUs in Kubernetes. Klues stressed the importance of AI and cloud native, echoing Sharma’s “irrational exuberance” statement, offering that AI will power our next industrial revolution with Kubernetes as the platform.  He described his experiences enabling GPU support in Kubernetes today, including techniques to share GPUs across multiple workloads and how he is using Dynamic Resource Allocation (DRA), an API for requesting and sharing resources between pods and containers inside a pod, to take support for Kubernetes to the next level.

Chatterjee joined the stage excited about how the world is falling in love with generative AI and shared how NVIDIA Picasso, an AI foundry for building and deploying generative AI models for visual design, solves some of the challenges with scaling Kubernetes. Specifically he discussed these top 3 challenges: topology-aware placement, fault tolerance, and multi-dimensional optimization. Chatterjee ended with a call to action: this is a great time to solve challenging problems with GenAI, GPUs, and Kubernetes, and this is the Linux moment for Kubernetes so let’s make it happen.

Keynote panel discussion: Optimizing performance and sustainability for AI

This keynote panel discussed how to enhance the efficiency and sustainability of AI workloads on Kubernetes for improved business value, as well as simplifying Kubernetes for optimal performance, innovative data management approaches, economic considerations and more.

Overall, Kubernetes are great for LLMs.  But a GPU-only approach may not be sustainable. There is work to do because Kubernetes is becoming the standard for AI platforms and accelerated workloads must run better on Kubernetes. Finally, resource allocation decisions must match the usage patterns.  

By speeding up data loading and pre-processing by attaching CPUs to GPU clusters and choosing the right specialized compute for the right AI model, it will be easier for research scientists to iterate faster. And finally, it’s critical that everyone work as a community to make the accelerated workloads run better.

Keynote: Platform building blocks: How to build ML infrastructure with CNCF Projects

Yuzhui Liu, Team Lead and Leon Zhou, Software Engineer, Bloomberg described a blueprint for crafting efficient scalable platforms using the cloud native ecosystem with Bloomberg as their use case. Bloomberg handles massive amounts of data in real time. They have been using AI for:

They also provided examples of how they are enhancing Bloomberg terminal functions with AI including Key News Themes, Company Analysis, and AI-Powered Earnings Summaries.

Kubernetes pattern evolution

Bilgin Ibryam of Diagrid and Roland Huß of Red Hat co-hosted a presentation on “10 Years of Kubernetes pattern evolution.” Interest in this topic was so incredible that the room was filled to capacity with a line of probably 75 more people waiting hopefully to see if they could enter. The co-presenters wrote the book “Kubernetes Patterns – Reusable Elements for Designing Cloud Native Applications” and presented some of the patterns, but the good news is, thanks to a Red Hat sponsorship, it is possible to download and enjoy the entire book for free

AI Hub is back!

The very popular AI Hub kicked off with a keynote presentation, “Engineering foundations for AI innovation,” from Rajarajan Pudupatti, vice president of platforms-as-a-service for Fidelity. Although Pudupatti cautioned that Fidelity’s work in preparing to leverage AI is still in the “not released” stage, he offered practical advice to those trying to create internal developer platforms that can get the most out of artificial intelligence. He stressed the importance of creating a self-service IDP that abstracts as much as possible so developers are free to focus on their key responsibilities. Using Kubernetes as a platform, Fidelity is working toward leveraging retrieval augmented generation (RAG) for improved UX with documentation. He acknowledged that this process can get very complicated when moved into production, but discussed techniques to get around that including query rewriting, re-ranking and chunking.

Other announcements