Helping Go teams implement OpenTelemetry: A new approach

By Ran Nozik February 22, 2023

Guest post originally published on the Helios blog by Ran Nozik

Developers can instrument their Go applications quickly and easily using Helios

OpenTelemetry (OTel), the emerging industry standard for application observability and distributed tracing across cloud-native and distributed architectures, is becoming an essential tool for Go developers. However, implementing OTel with Go to send data to observability platforms is hardly a straightforward process. At Helios, we’re on a mission to help as many teams as possible adopt distributed tracing. That’s why creating a frictionless experience of code instrumentation is something we’ve put a lot of focus into, as it might be the difference between adopting OpenTelemetry, or abandoning it. We’ve invested in thinking of a new approach to OTel Go instrumentation that is easy to implement and maintain, and at the same time is non-intrusive and easy to understand from the end user’s perspective. Below, we take a look at this new approach, which anyone can use by getting started for free with Helios.

How does OpenTelemetry work in different languages?

In two of my previous blog posts I wrote about how to deploy OTel in Java and Node. Go, an open source programming language that was developed by Google and has gained significant popularity in recent years, is different: While dynamic languages such as Node or Python support replacing function implementations at runtime (i.e., monkey-patching), and Java enables bytecode manipulation using the Java Agent mechanism, these options are not available for Go – at least not in a straightforward, maintainable, and safe way. Go is compiled to native machine code, making these runtime changes risky and intrusive (and they may potentially be detected as malicious by security scanners).

How is tracing done in Go today?

Based on instructions from the official OpenTelemetry documentation, distributed tracing in Go requires several manual steps. The most basic instrumentation of HTTP clients and servers require adding manual code for each HTTP server endpoint or client initialization.

package main
import (
    "context"
    "flag"
    "fmt"
    "io"
    "log"
    "net/http"
    "time"
    "go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/baggage"
    stdout "go.opentelemetry.io/otel/exporters/stdout/stdouttrace"
    "go.opentelemetry.io/otel/propagation"
    sdktrace "go.opentelemetry.io/otel/sdk/trace"
    semconv "go.opentelemetry.io/otel/semconv/v1.12.0"
    "go.opentelemetry.io/otel/trace"
)
func initTracer() (*sdktrace.TracerProvider, error) {
    // Create stdout exporter to be able to retrieve
    // the collected spans.
    exporter, err := stdout.New(stdout.WithPrettyPrint())
    if err != nil {
        return nil, err
    }
    // For the demonstration, use sdktrace.AlwaysSample sampler to sample all traces.
    // In a production application, use sdktrace.ProbabilitySampler with a desired probability.
    tp := sdktrace.NewTracerProvider(
        sdktrace.WithSampler(sdktrace.AlwaysSample()),
        sdktrace.WithBatcher(exporter),
    )
    otel.SetTracerProvider(tp)
    otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(propagation.TraceContext{}, propagation.Baggage{}))
    return tp, err
}
func main() {
    tp, err := initTracer()
    if err != nil {
        log.Fatal(err)
    }
    defer func() {
        if err := tp.Shutdown(context.Background()); err != nil {
            log.Printf("Error shutting down tracer provider: %v", err)
        }
    }()
    url := flag.String("server", "http://localhost:7777/hello", "server url")
    flag.Parse()
    client := http.Client{Transport: otelhttp.NewTransport(http.DefaultTransport)}
    bag, _ := baggage.Parse("username=donuts")
    ctx := baggage.ContextWithBaggage(context.Background(), bag)
    var body []byte
    tr := otel.Tracer("example/client")
    err = func(ctx context.Context) error {
        ctx, span := tr.Start(ctx, "say hello", trace.WithAttributes(semconv.PeerServiceKey.String("ExampleService")))
        defer span.End()
        req, _ := http.NewRequestWithContext(ctx, "GET", *url, nil)
        fmt.Printf("Sending request...\n")
        res, err := client.Do(req)
        if err != nil {
            panic(err)
        }
        body, err = io.ReadAll(res.Body)
        _ = res.Body.Close()
        return err
    }(ctx)
    if err != nil {
        log.Fatal(err)
    }
    fmt.Printf("Response Received: %s\n\n\n", body)
    fmt.Printf("Waiting for few seconds to export spans ...\n\n")
    time.Sleep(10 * time.Second)
    fmt.Printf("Inspect traces on stdout\n")
}

Source

Instrumenting a simple HTTP call requires several different steps of configuring the client’s transport and making several different invocations of the native net/http module. There are also many pitfalls that you can encounter here as there are multiple ways of initializing an HTTP client, and we’ve seen some of our customers make these mistakes.

Possible approaches for implementing OTel in Go

When we confronted the challenge of finding an alternative Go instrumentation approach that requires minimal to no manual work, we knew we had to think outside the box. We understood that instrumenting at runtime is probably not going to be our choice, rather that it would be better to do it at compilation time. This is due to the different characteristics of Go: (1) Namely the fact that it compiles to native machine code; and (2) that dynamically replacing function implementations isn’t as easy or safe as it is in Javascript or Python. We considered several different approaches – each naturally had its own set of advantages and drawbacks.

We always kept in mind what our guiding principles are:

Safety – the approach must never put its host application in risk of memory corruption, segmentation fault, or similar issues.
Maintainability – it has to be relatively easily written and maintained without deep knowledge of the Go internals. Writing instrumentation in assembly or creating different instrumentation library versions per CPU architecture were out of the question.
Understandability – the end user, i.e. the developer, should be able to understand what’s happening, and debug if necessary. What seem to be magic tricks may not be trusted, and we want our audience to feel comfortable.

A brief overview of some of the approaches we considered:

Dynamic hooking – wrap function calls (the ones we’d like to instrument) on the compiled Go binary, during its execution. Our concerns for this approach were safety (manipulating compiled code), maintainability (instrumentation is written in assembly) and portability (implementation is CPU architecture specific)
Static hooking – hooking function calls by editing the compiled binary. Similar to the dynamic hooking, but done after the compilation and not at run time. The concerns, however, are similar to the ones from the dynamic hooking approach.
AST manipulation – parse the source code using Go’s AST and replace the relevant function calls with instrumented code. The main drawback of this approach is the complexity of adding support for new libraries – as each would require implementing a unique AST transformation.
Proxy libraries – create a library that wraps the original instrumented library implementation with the same API. Libraries can manually replace the original ones. The main drawback is the slight overhead of the proxy libraries, and the overhead of maintaining the proxy libraries.

The Helios approach

After exploring all the options above, we decided to combine the AST and proxy libraries approaches (3+4) into a mechanism which involves creating proxy libraries and leveraging the AST manipulation approach to replace the imports during or before compilation.

The process would be as follows:

1. Create a proxy library that has the exact same interface as the original library. Only instrumented functions (and sometimes structs, who have their own complexity and could be a topic for another blog post) will be manipulated.

func Handle(pattern string, handler Handler) {
    handler = otelhttp.NewHandler(handler, pattern)
    realHttp.Handle(pattern, handler)
}

Source
(Unlike the existing approach of requiring the user to manually wrap the http.Handler using the otelhttp module, the proxy library exposes a Handle of its own, taking care of that automatically.)

2. Create a compile-time instrumentor executable that replaces imports of instrumented libraries with the proxy libraries, for example import “net/http” is transformed into import http “github.com/helios/go-sdk/proxy-libs/helioshttp”. The go.mod file is also updated accordingly, adding the proxy library.

Advantages of this approach

Writing a proxy library is relatively easy. It’s essentially similar to writing an instrumentation in any other language, besides the fact that non-instrumented functions need to be copied as well (but we’re working on removing the need for that as well, keeping the proxy library as compact as possible).
The solution is platform agnostic – as it happens in compile time.
It’s completely safe – worst case scenario (and we made a point of avoiding it altogether, but still), the compilation process fails, and fixing it is straight forward.
It’s easy to understand – developers can see for themselves what the instrumentation does, and the source code of the proxy libraries is open for all and very easy to understand and debug.

A simple example to get a better understanding of the proxy lib behavior is our net/http proxy lib. In the example provided by OpenTelemetry, every HTTP endpoint should be manually wrapped with an OpenTelemetry wrapper handler. This is not very convenient, of course.

import (
    "net/http"
)
...
    helloHandler := func(w http.ResponseWriter, req *http.Request) {
        ctx := req.Context()
        span := trace.SpanFromContext(ctx)
        bag := baggage.FromContext(ctx)
        span.AddEvent("handling this...", trace.WithAttributes(uk.String(bag.Member("username").Value())))
        _, _ = io.WriteString(w, "Hello, world!\n")
    }
    otelHandler := otelhttp.NewHandler(http.HandlerFunc(helloHandler), "Hello")
    http.Handle("/hello", otelHandler)
    err = http.ListenAndServe(":7777", nil)

Source

Helios’s proxy lib takes care of that, so that whenever calling http.Handle our implementation adds the wrapper handler automatically, and the user doesn’t have to worry about it at all.

import (
    http "github.com/helios/go-sdk/proxy-libs/helioshttp"
)
...
    helloHandler := func(w http.ResponseWriter, req *http.Request) {
        ctx := req.Context()
        span := trace.SpanFromContext(ctx)
        bag := baggage.FromContext(ctx)
        span.AddEvent("handling this...", trace.WithAttributes(uk.String(bag.Member("username").Value())))
        _, _ = io.WriteString(w, "Hello, world!\n")
    )
    http.Handle("/hello", helloHandler)
    err = http.ListenAndServe(":7777", nil)

A few questions remained:

How do we generate the proxy libraries? Go modules have potentially hundreds of exported symbols. The task of creating a library could become time consuming, tiresome and prone to errors.
How do we make sure the proxy libraries interface remains aligned with the instrumented library? Regressions are always a risk.
How do we handle module versions? Like any instrumentation, this must be taken into account.

Additional tools to support our compile-time instrumentation of Go libraries

To productize the creation and maintenance of the proxy libraries, we created a set of internal tools that are used by the Helios team to streamline the development and maintenance of proxy libraries:

1. Proxy lib generator – a proxy library must match the interface of the original library it instruments. Since Go packages aren’t classes in an object-oriented language, inheritance isn’t possible. This means that every exported member of the original library should be created in the lib – potentially several hundreds of consts, vars, types and functions. The process of creating a library from scratch is time consuming and error prone. For that reason, we created a tool that reads the source code of a Go package and generates the boilerplate code for the proxy lib, leaving us with implementing the “business logic” of instrumentation itself in the functions that should in fact be instrumented.
For an original module that looks like this:

package test_package
import (
    "context"
)
type ExportedType1 = string
type ExportedType2 = int64
var ExportedVar1 = 1234
var ExportedVar2 = "abcd"
const ExportedConst1 = 1234
const ExportedConst2 = "abcd"
func ExportedFunc1() {
    // Do something
}
func ExportedFunc2(input string, input2 []int, input3 *string, input4 <-chan string, input5 context.Context, input6 *context.Context, input7 ...string) (string, error) {
    // Do something
    return "success", nil
}

The output of our proxy library generator would be:

package heliostest_package
import (
    origin_test_package "test_package"
)
type ExportedType1 = origin_test_package.ExportedType1
type ExportedType2 = origin_test_package.ExportedType2
var ExportedVar1 = origin_test_package.ExportedVar1
var ExportedVar2 = origin_test_package.ExportedVar2
const ExportedConst1 = origin_test_package.ExportedConst1
const ExportedConst2 = origin_test_package.ExportedConst2
func ExportedFunc1() {
    origin_test_package.ExportedFunc1()
}
func ExportedFunc2(input string, input2 []int, input3 *string, input4 <-chan string, input5 context.Context, input6 *context.Context, input7 ...string) (string, error) {
    return origin_test_package.ExportedFunc2(input, input2, input3, input4, input5, input6, input7...)
}

2. Library interface validator – we’d like to make sure that the interface of the proxy library matches the original library. This is crucial to protect ourselves from breaking the interface in the future, and to shorten the process of checking if new releases of the instrumented package have introduced an interface change, which should be addressed.

3. Tagging of versions – as written above, the proxy library interface must match the interface of the original package, which could of course vary between different releases. For example, the net/http interface changed between Go 1.18 and 1.19 (for example, some consts like MaxBytesError were added). To handle that, we had to tag different versions of the proxy libraries to match their corresponding instrumented libraries, and make sure that our instrumentor adds an import to the correct proxy library version at compile time.

Wrap-up: There’s an easier way to implement OTel with Go

Go is a different kind of programming language that has direct implications on the ability to implement OTel on it. At Helios, we’ve developed a new, innovative approach that requires little to no manual work to do Go instrumentation. We made the solution much easier and safer so that dev teams can adopt distributed tracing more seamlessly.

graph — An E2E trace visualization of the official OTel demo application with the Helios instrumentation, also of Go services. Check out the full visualization on the Helios OpenTelemetry Sandbox.

If you’re a Go developer, we invite you to start instrumenting your code for free with Helios.

Hong Kong