David Hope

Tailoring span names and enriching spans without changing code with OpenTelemetry -  Part 1

The OpenTelemetry Collector offers powerful capabilities to enrich and refine telemetry data before it reaches your observability tools. In this blog post, we'll explore how to leverage the Collector to create more meaningful transaction names in Elastic Observability, significantly enhancing the value of your monitoring data.

Tailoring span names and enriching spans without changing code with OpenTelemetry - Part 1

The OpenTelemetry Collector offers powerful capabilities to enrich and refine telemetry data before it reaches your observability tools. In this blog post, we'll explore how to leverage the Collector to create more meaningful transaction names in Elastic Observability, significantly enhancing the value of your monitoring data.

Consider this scenario: You have a transaction labeled simply as "HTTP GET" with an average response time of 5ms. However, this generic label masks a variety of distinct operations – payment processing, user logins, and adding items to a cart. Does that 5ms average truly represent the performance of these diverse actions? Clearly not.

The other problem that happens is that span traces become all mixed up so that login spans and image serving spans all become part of the same bucket, this makes things like latency correlation analysis hard in Elastic.

We'll focus on a specific technique using the collector's attributes, and transform processors to extract meaningful information from HTTP URLs and use it to create more descriptive span names. This approach not only improves the accuracy of your metrics but also enhances your ability to quickly identify and troubleshoot performance issues across your microservices architecture.

By using these processors in combination, we can quickly address the issue of overly generic transaction names, creating more granular and informative identifiers that provide accurate visibility into your services' performance.

However, it's crucial to approach this technique with caution. While more detailed transaction names can significantly improve observability, they can also lead to an unexpected challenge: cardinality explosion. As we dive into the implementation details, we'll also discuss how to strike the right balance between granularity and manageability, ensuring that our solution enhances rather than overwhelms our observability stack.

In the following sections, we'll walk through the configuration step-by-step, explaining how each processor contributes to our goal, and highlighting best practices to avoid potential pitfalls like cardinality issues. Whether you're new to OpenTelemetry or looking to optimize your existing setup, this guide will help you unlock more meaningful insights from your telemetry data.

Prerequisites and configuration

If you plan on following this blog, here are some of the components and details we used to set up the configuration:

  • Ensure you have an account on Elastic Cloud and a deployed stack (see instructions here).
  • I am also using the OpenTelemetry demo in my environment, this is important to follow along with as this demo has the specific issue I want to address. You should clone the repository and follow the instructions here to get this up and running. I recommend using Kubernetes and I will be doing this in my AWS EKS (Elastic Kubernetes Service) environment.

The OpenTelemetry Demo

The OpenTelemetry Demo is a comprehensive, microservices-based application designed to showcase the capabilities and best practices of OpenTelemetry instrumentation. It simulates an e-commerce platform, incorporating various services such as frontend, cart, checkout, and payment processing. This demo serves as an excellent learning tool and reference implementation for developers and organizations looking to adopt OpenTelemetry.

The demo application generates traces, metrics, and logs across its interconnected services, demonstrating how OpenTelemetry can provide deep visibility into complex, distributed systems. It's particularly useful for experimenting with different collection, processing, and visualization techniques, making it an ideal playground for exploring observability concepts and tools like the OpenTelemetry Collector.

By using real-world scenarios and common architectural patterns, the OpenTelemetry Demo helps users understand how to effectively implement observability in their own applications and how to leverage the data for performance optimization and troubleshooting.

Once you have an Elastic Cloud instance and you fire up the OpenTelemetry demo, you should see something like this on the Elastic Service Map page:

Navigating to the traces page will give you the following set up.

As you can see there are some very broad transaction names here like HTTP GET and the averages will not be very accurate for specific business functions within your services as shown.

So let's fix that with the OpenTelemetry Collector.

The OpenTelemetry Collector

The OpenTelemetry Collector is a vital component in the OpenTelemetry ecosystem, serving as a vendor-agnostic way to receive, process, and export telemetry data. It acts as a centralized observability pipeline that can collect traces, metrics, and logs from various sources, then transform and route this data to multiple backend systems.

The collector's flexible architecture allows for easy configuration and extension through a wide range of receivers, processors, and exporters which you can explore over here. I have personally found navigating the 'contrib' archive incredibly useful for finding techniques that I didn't know existed. This makes the OpenTelemetry Collector an invaluable tool for organizations looking to standardize their observability data pipeline, reduce overhead, and seamlessly integrate with different monitoring and analysis platforms.

Let's go back to our problem, how do we change the transaction names that Elastic is using to something more useful so that our HTTP GET translates to something like payment-service/login? The first thing we do is we take the full http url and consider which parts of it relate to our transaction. Looking at the span details we see a url

my-otel-demo-frontendproxy:8080/api/recommendations?productIds=&sessionId=45a9f3a4-39d8-47ed-bf16-01e6e81c80bc&currencyCode=

Now obviously we wouldn't want to create transaction names that map to every single session id, that would lead to the cardinality explosion we talked about earlier, however, something like the first two parts of the url 'api/recommendations' looks like exactly the kind of thing we need.

The attributes processor

The OpenTelemetry collector gives us a useful tool here, the attributes processor can help us extract parts of the url to use later in our observability pipeline. To do this is very simple, we simply build a regex like this one below. Now I should mention that I did not generate this regex myself but I used an LLM to do this for me, never fear regex again!

attributes:
  actions:
    - key: http.url
      action: extract
      pattern: '^(?P<short_url>https?://[^/]+(?:/[^/]+)*)(?:/(?P<url_truncated_path>[^/?]+/[^/?]+))(?:\?|/?$)'

This configuration is doing some heavy lifting for us, so let's break it down:

  • We're using the attributes processor, which is perfect for manipulating span attributes.
  • We're targeting the http.url attribute of incoming spans.
  • The extract action tells the processor to pull out specific parts of the URL using our regex pattern.

Now, about that regex - it's designed to extract two key pieces of information:

  1. short_url
    : This captures the protocol, domain, and optionally the first path segment. For example, in "https://example.com/api/users/profile", it would grab "https://example.com/api".
  2. url_truncated_path
    : This snags the next two path segments (if they exist). In our example, it would extract "users/profile".

Why is this useful? Well, it allows us to create more specific transaction names based on the URL structure, without including overly specific details that could lead to cardinality explosion. For instance, we avoid capturing unique IDs or query parameters that would create a new transaction name for every single request.

So, if we have a URL like "https://example.com/api/users/profile?id=123", our extracted

url_truncated_path
would be "users/profile". This gives us a nice balance - it's more specific than just "HTTP GET", but not so specific that we end up with thousands of unique transaction names.

Now it's worth mentioning here that if you don't have an attribute you want to use for naming your transactions it is worth looking at the options for your SDK or agent, as an example the Java automatic instrumentation Otel agent has the following options for capturing request and response headers. You can then subsequently use this data to name your transactions if the url is insufficient!

In the next steps, we'll see how to use this extracted information to create more meaningful span names, providing better granularity in our observability data without overwhelming our system. Remember, the goal is to enhance our visibility, not to drown in a sea of overly specific metrics!

The transform processor

Now that we've extracted the relevant parts of our URLs, it's time to put that information to good use. Enter the transform processor - our next powerful tool in the OpenTelemetry Collector pipeline.

The transform processor allows us to modify various aspects of our telemetry data, including span names. Here's the configuration we'll use:

transform:
  trace_statements:
    - context: span
      statements:
        - set(name, attributes["url_truncated_path"])

Let's break this down:

  • We're using the transform processor, which gives us fine-grained control over our spans.
  • We're focusing on
    trace_statements
    , as we want to modify our trace spans.
  • The
    context: span
    tells the processor to apply these changes to each individual span.
  • Our statement is where the magic happens: we're setting the span's name to the value of the
    url_truncated_path
    attribute we extracted earlier.

What does this mean in practice? Remember our previous example URL "https://example.com/api/users/profile?id=123"? Instead of a generic span name like "HTTP GET", we'll now have a much more informative name: "users/profile".

This transformation brings several benefits:

  1. Improved Readability: At a glance, you can now see what part of your application is being accessed.
  2. Better Aggregation: You can easily group and analyze similar requests, like all operations on user profiles.
  3. Balanced Cardinality: We're specific enough to be useful, but not so specific that we create a new span name for every unique URL.

By combining the attribute extraction we did earlier with this transformation, we've created a powerful system for generating meaningful span names. This approach gives us deep insight into our application's behavior without the risk of cardinality explosion.

Putting it All Together

The resulting config for the OpenTelemetry collector is below remember this goes into the opentelemetry-demo/kubernetes/elastic-helm/configmap-deployment.yaml and is applied with kubectl apply -f configmap-deployment.yaml

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: elastic-otelcol-agent
  namespace: default
  labels:
    app.kubernetes.io/name: otelcol

data:
  relay: |
    connectors:
      spanmetrics: {}
    exporters:
      debug: {}
      otlp/elastic:
        endpoint: ${env:ELASTIC_APM_ENDPOINT}
        compression: none
        headers:
          Authorization: Bearer ${ELASTIC_APM_SECRET_TOKEN}
    extensions:
    processors:
      batch: {}
      resource:
        attributes:
          - key: deployment.environment
            value: "opentelemetry-demo"
            action: upsert
      attributes:
        actions:
          - key: http.url
            action: extract
            pattern: '^(?P<short_url>https?://[^/]+(?:/[^/]+)*)(?:/(?P<url_truncated_path>[^/?]+/[^/?]+))(?:\?|/?$)'
      transform:
        trace_statements:
          - context: span
            statements:
              - set(name, attributes["url_truncated_path"])
    receivers:
      httpcheck/frontendproxy:
        targets:
        - endpoint: http://example-frontendproxy:8080
      otlp:
        protocols:
          grpc:
            endpoint: ${env:MY_POD_IP}:4317
          http:
            cors:
              allowed_origins:
              - http://*
              - https://*
            endpoint: ${env:MY_POD_IP}:4318
    service:
      extensions:
      pipelines:
        logs:
          exporters:
          - debug
          - otlp/elastic
          processors:
          - batch
          - resource
          - attributes
          - transform
          receivers:
          - otlp
        metrics:
          exporters:
          - otlp/elastic
          - debug
          processors:
          - batch
          - resource
          receivers:
          - httpcheck/frontendproxy
          - otlp
          - spanmetrics
        traces:
          exporters:
          - otlp/elastic
          - debug
          - spanmetrics
          processors:
          - batch
          - resource
          - attributes
          - transform
          receivers:
          - otlp
      telemetry:
        metrics:
          address: ${env:MY_POD_IP}:8888

You'll notice that we tie everything together by adding our enrichment and transformations to the traces section in pipelines at the bottom of the collector config. This is the definition of our observability pipeline, bringing together all the pieces we've discussed to create more meaningful and actionable telemetry data.

By implementing this configuration, you're taking a significant step towards more insightful observability. You're not just collecting data; you're refining it to provide clear, actionable insights into your application's performance, check out the final result below!

Ready to Take Your Observability to the Next Level?

Implementing OpenTelemetry with Elastic Observability opens up a world of possibilities for understanding and optimizing your applications. But this is just the beginning! To further enhance your observability journey, check out these valuable resources:

  1. Infrastructure Monitoring with OpenTelemetry in Elastic Observability
  2. Explore More OpenTelemetry Content
  3. Using the OTel Operator for Injecting Java Agents
  4. What is OpenTelemetry?

We encourage you to dive deeper, experiment with these configurations, and see how they can transform your observability data. Remember, the key is to find the right balance between detail and manageability.

Have you implemented similar strategies in your observability pipeline? We'd love to hear about your experiences and insights. Share your thoughts in the comments below or reach out to us on our community forums.

Stay tuned for Part 2 of this series, where we will look at an advanced technique for collecting more data that can help you get even more granular by collecting Span names, baggage and data for metrics using a Java plugin all without code.

Share this article