Project announcement by OpenTelemetry maintainers
In 2023, OpenTelemetry announced that it achieved stability for logs, metrics, and traces. While this was our initial goal at the formation of the project, fulfilling our vision of enabling built-in observability for cloud native applications requires us to continue evolving with the community. This year, we’re proud to announce that exactly two years after the Profiling SIG was created at KubeCon + CloudNativeCon Europe 2022 in Valencia, we’re taking a big step towards this goal by merging a profiling data model into our specification and working towards a stable implementation this year!
What is profiling?
Profiling is a method to dynamically inspect the behavior and performance of application code at run-time. Continuous profiling gives insights into resource utilization at a code-level and allows for this profiling data to be stored, queried, and analyzed over time and across different attributes. It’s an important technique for developers and performance engineers to understand exactly what’s happening in their code. OpenTelemetry’s profiling signal expands upon the work that has been done in this space and, as a first for the industry, connects profiles with other telemetry signals from applications and infrastructure. This allows developers and operators to correlate resource exhaustion or poor user experience across their services with not just the specific service or pod being impacted, but the function or line of code most responsible for it.
We’re thrilled to see the embrace of this vision by the industry, with many organizations coming together to help define the profiling signal. More specifically
- Elastic has pledged to donate their proprietary eBPF-based profiling agent
- Splunk has begun the process of donating their .NET based profiler
to the project in order to accelerate the delivery and implementation of OpenTelemetry profiling.
What does this mean for users?
Profiles will support bi-directional links between themselves and other signals, such as logs, metrics, and traces. You’ll be able to easily jump from resource telemetry to a corresponding profile. For example:
- Metrics to profiles: You will be able to go from a spike in CPU usage or memory usage to the specific pieces of the code which are consuming that resource
- Traces to profiles: You will be able to understand not just the location of latency across your services, but when that latency is caused by pieces of the code it will be reflected in a profile attached to a trace or span
- Logs to profiles: Logs often give the context that something is wrong, but profiling will allow you to go from just tracking something (i.e. Out Of Memory errors) to seeing exactly which parts of the code are using up memory resources
These are just a few and these links work the opposite direction as well, but more generally profiling helps deliver on the promise of observability by making it easier for users to query and understand an entire new dimension about their applications with minimal additional code/effort.
A community in motion
This work would not be possible without the dedicated contributors who work on OpenTelemetry each day. We’ve recently passed a new milestone, with over 1000 unique developers contributing to the project each month, representing over 180 companies. Across our most popular repositories, OpenTelemetry sees over 30 million downloads a month, and new open source projects are adopting our standards at a regular pace, including Apache Kafka, and dozens more. We’re also deepening our integrations with other open source projects in CNCF and out, such as OpenFeature and OpenSearch, in addition to our existing integrations with Kubernetes, Thanos, Knative, and many more.
2024 promises to be another big year for OpenTelemetry as we continue to implement and stabilize our existing tracing, metrics, and log signals while adding support for profiling, client-side RUM, and more. It’s a great time to get involved – check out our website to learn more!