Ambassador post originally published on Medium by Dotan Horovits

It’s about time open observability had its own industry-wide, vendor-neutral event. This year, the Cloud Native Computing Foundation (CNCF) finally made it happen with the inaugural Open Observability Summit, bringing together contributors, practitioners, and end users for a packed day of learning and collaboration.

I was honored to deliver a keynote at this important event, as well as have my podcast OpenObservability Talks as a partner of the event. I dedicated my latest episode to covering the highlights from the event, and invited two fellow observability veterans and speakers of the event — Alok Bhide, member of the summit’s content committee and Head of Product Innovation at Chronosphere, and Henrik Rexed, Developer Advocate at Dynatrace, CNCF Ambassador, and host of the Is It Observable podcast. Together, we unpacked the summit highlights. Here’s a recap, and you can find the full recording on all the podcast apps, with links bellow.

The first foundation-led conference on open observability

Where’s the community’s meeting point to discuss open source observability? There are quite a few conferences out there on observability, but they are largely owned by vendors in this space. And with Monitorma currently on pause, it leaves us with no good place to come together.

The CNCF has done great job in bringing everyone together, dovetailing Open Source Summit North America. Alok, who’s been on the program committeee and got to review and select the talks on the Call For Papers, shared on the balance of the talks and breadth of the projects covered: “combination of foundational talks and the shining new things… and across the different projects: OTel, FluentBit, Jaeger, Prometheus, everything.”

This resulted in some great talks, which are already available online on the CNCF’s YouTube channel. Let’s dive into the highlights that caught our attention.

Fluent Bit vs. OpenTelemetry Collector: A benchmark face-off

Henrik delivered at the event an interesting benchmark analysis comparing Fluent Bit and the OpenTelemetry Collector. Performance discussions in observability pipelines are never just academic — when it comes to high-throughput data processing, small efficiencies scale fast.

“People often ask, ‘Which collector should I use?’ and now we have some data to help answer that,” said Henrik. The analysis offered detailed findings, and even one that Henrik got to refute right after delivering his talk. While Fluent Bit came out ahead in some CPU and memory usage metrics, the OTel Collector offered unmatched extensibility — leaving the answer, as always, to “it depends.”

This talk stood out not just for the results but for modeling how open benchmarking should be done: transparent methodology, reproducible results, and a clear view into the trade-offs. The full talk is available here.

Real-world scale: Observability at eBay

Enterprise-scale observability often means working with staggering volumes of telemetry, and eBay’s session gave a rare inside look into how they manage it. “They’re handling levels of scale that few companies can relate to, and they’re doing it with open source,” said Henrik.

From innovative pipeline designs to data reduction strategies, eBay’s Vijay Samuel and Xin Wei Tang shared not just architectural diagrams but hard-won lessons on scaling open observability in production. I actually hosted Vijay on a previous OpenObservability Talks episode, in which he shared more on eBay’s planet-scale observability.

I also found eBay’s case study to be a great example of a hyperscaler that uses open source observability, and wherever it doesn’t fit exactly its needs, it doesn’t give up but rather goes in, makes modifications, and engages the community to enhance the open source. The full talk is available here.

Bringing OpenTelemetry to Android with Kotlin SDK

Mobile observability remains a notoriously underserved area, especially for Android developers. That’s why the talk by Hanson Ho introducing the Kotlin SDK for OpenTelemetry caught our attention.

“It’s not just a wrapper — it’s a full SDK implementation,” Henrik noted. The session addressed the unique constraints of mobile environments — battery, network, storage — and how OTel can be adapted accordingly. It’s important to note that while at the time of the conference the Kotlin SDK was under Embrace’s github, it is in process of donation to OpenTelemetry. The full talk is available here.

I hosted Hanson on OpenObservability Talk earlier this year, where we discussed more broadly about mobile observability and how to achieve it with OpenTelemetry.

Deep-dive on tuning the OpenTelemetry Collector

If you’re running the OpenTelemetry Collector in production and haven’t tuned it yet, you’re probably leaving performance on the table.

Two of the technical talks at the summit focused on exactly this: how to fine-tune the Collector to handle high-throughput pipelines without bottlenecks. Queue sizes, batch processors, memory management — the session by Yuri Oliveira and Alex Boten and the one by Denton Krietz explored how each component affects stability and efficiency.

“The great thing is, these are not abstract tips — they’re coming from people who’ve broken and fixed it in prod,” I noted on the show. It’s talks like this that help bridge the gap between just deploying open observability tools and running them well at scale. The full talks are available here and here.

Broadcom’s end-to-end observability: From mobile to mainframe

Another standout was Broadcom’s use case, covering observability from mobile to mainframe. That’s right — not just Kubernetes and microservices, but also COBOL and legacy systems.

Broadcom’s Vashistha Kumar Singh and Martin Tali showcased how they built a cohesive observability strategy that spans decades of technology, using OpenTelemetry as a unifying layer, and running it at an impressive scale: they scaled OpenTelemetry to handle 3 million metrics/sec and 0.5M spans/sec. That scaling was accomplished with an ingestion pipeline with backpressure handling using Apache Kafka.

For organizations with hybrid environments — and let’s face it, that’s most enterprises — this was a powerful case study.

Spotify’s metric migration: From in-house TSDB to VictoriaMetrics and Prometheus

Spotify presented one of the most compelling migration stories of the day: moving from their internal time-series database to VictoriaMetrics and Prometheus.

Lauren Roshore of Spotify detailed the reasons behind the shift, and their 2-year migration journey replacing in-house time-series database (TSDB) with VictoriaMetrics and Prometheus. She shared the architecture of the new system, and how they achieved a smooth switchover with minimal disruption. Furthermore, she showed how they achieved 10x faster query speeds and significant cost savings.

It’s a great example of how open standards and cloud-native tooling can replace aging custom systems, even at one of the largest streaming platforms in the world. The full talk is available here.

Introducing Rotel: A Rust-based alternative to the OpenTelemetry Collector

One of the newer projects to emerge on the scene is Rotel, a Rust-based rewrite of the OpenTelemetry Collector, which was presented by the maintainers Mike Heffner and Ray Jenkins of Streamfold.

“This is something the community’s been asking for — lighter weight, better performance, and built with modern safety guarantees,” said Henrik.

Rotel aims to deliver the same extensibility as the Go-based Collector, but with a smaller memory footprint and more predictable performance characteristics — thanks to Rust’s strict memory model. It’s early days, but the project received a lot of interest at the summit and could become a viable alternative for production environments where resources are tight. The full talk is available here.

AI is a hot topic in observability

Artificial Intelligence (AI) was, not surprisingly, in the spotlight in this conference (as well as in the preceding Open Source Summit NA). Alok gave some good practical examples of where he sees AI aiding engineers in their day-to-day observability, such as log summarization in natural language, log correlation, as well streaming together various tools and asking cross-cutting questions: “people have things like Cursor… I want to stream Sentry, I want to stream Slack… I want to stream all of that together and ask questions… give me a sense of what’s going on.”

Besides AI helping us with our observability, there’s another challenge at the age of AI, around how to monitor the AI/ML applications themselves. This can be especially challenging with the constantly evolving landscape of GenAI. This is what project Monocle comes to solve. The project, governed by LF AI & Data foundation as a Sandbox project, helps GenAI developer to trace their applications. Monocle supports tracing all GenAI technology components, application frameworks, LLM hosting services. Prasad Mujumdar gave a good introductory talk about Monocle at the conference. The talk is available here.

Want to learn more? Check out the OpenObservability Talks episode: Highlights from CNCF’s First Open Observability Summit.