In the complex world of microservices and distributed systems, achieving transparency and understanding the intricacies and inefficiencies of service interactions and request flows has become a paramount challenge. Distributed tracing is essential in understanding distributed systems. But distributed tracing, whether manually applied or auto-instrumented, is usually rather coarse-grained. Hence, distributed tracing covers only a limited fraction of the system and can easily miss parts of the system that are the most useful to trace.
Addressing this gap, Elastic developed the concept of inferred spans as a powerful enhancement to traditional instrumentation-based tracing as an extension for the OpenTelemetry Java SDK/Agent. We are in the process of contributing this back to OpenTelemetry, until then our extension can be seamlessly used with the existing OpenTelelemetry Java SDK (as described below).
Inferred spans are designed to augment the visibility provided by instrumentation-based traces, shedding light on latency sources within the application or libraries that were previously uninstrumented. This feature significantly expands the utility of distributed tracing, allowing for a more comprehensive understanding of system behavior and facilitating a deeper dive into performance optimization.
What is inferred spans?
Inferred spans is an observability technique that combines distributed tracing with profiling techniques to illuminate the darker, unobserved corners of your application — areas where standard instrumentation techniques fall short. The inferred spans feature interweaves information derived from profiling stacktraces with instrumentation-based tracing data, allowing for the generation of new spans based on the insights drawn from profiling data.
This feature proves invaluable when dealing with custom code or third-party libraries that significantly contribute to the request latency but lack built-in or external instrumentation support. Often, identifying or crafting specific instrumentation for these segments can range from challenging to outright unfeasible. Moreover, certain scenarios exist where implementing instrumentation is impractical due to the potential for substantial performance overhead. For instance, instrumenting application locking mechanisms, despite their critical role, is not viable because of their ubiquitous nature and the significant latency overhead the instrumentation can introduce to application requests. Still, ideally, such latency issues would be visible within your distributed traces.
Inferred spans ensures a deeper visibility into your application’s performance dynamics including the above-mentioned scenarios.
Inferred spans in action
To demonstrate the inferred spans feature we will use the Java implementation of the Elastiflix demo application. Elasticflix has an endpoint called favorites that does some Redis calls and also includes an artificial delay. First, we use the plain OpenTelemetry Java Agent to instrument our application:
java -javaagent:/path/to/otel-javaagent-<version>.jar \
-Dotel.service.name=my-service-name \
-Dotel.exporter.otlp.endpoint=https://<our-elastic-apm-endpoint> \
"-Dotel.exporter.otlp.headers=Authorization=Bearer SECRETTOKENHERE" \
-jar my-service-name.jar
With the OpenTelemetry Java Agent we get out-of-the-box instrumentation for HTTP entry points and calls to Redis for our Elastiflix application. The resulting traces contain spans for the POST /favorites entrypoint, as well as a few short spans for the calls to Redis.
As you can see in the trace above, it’s not clear where most of the time is spent within the POST /favorites request.
Let’s see how inferred spans can shed light into these areas. You can use the inferred spans feature either manually with your OpenTelemetry SDK (see section below), package it as a drop-in extension for the upstream OpenTelemetry Java agent, or just use Elastic’s distribution of the OpenTelemetry Java agent that comes with the inferred spans feature.
For convenience, we just download the agent jar of the Elastic distribution and extend the configuration to enable the inferred spans feature:
java -javaagent:/path/to/elastic-otel-javaagent-<version>.jar \
-Dotel.service.name=my-service-name \
-Dotel.exporter.otlp.endpoint=https://XX.apm.europe-west3.gcp.cloud.es.io:443 \
"-Dotel.exporter.otlp.headers=Authorization=Bearer SECRETTOKENHERE" \
-Delastic.otel.inferred.spans.enabled=true \
-jar my-service-name.jar
The only non-standard option here is elastic.otel.inferred.spans.enabled: The inferred spans Feature is currently opt-in and therefore needs to be enabled explicitly. Running the same application with the inferred spans feature enabled yields more comprehensive traces:
The inferred-spans (colored blue in the above screenshot) follow the naming pattern Class#method. With that, the inferred spans feature helps us pinpoint the exact methods that contribute the most to the overall latency of the request. Note that the parent-child relationship between the HTTP entry span, the Redis spans, and the inferred spans is reconstructed correctly, resulting in a fully functional trace structure.
Examining the handleDelay method within the Elastiflix application reveals the use of a straightforward sleep statement. Although the sleep method is not CPU-bound, the full duration of this delay is captured as inferred spans. This stems from employing the async-profiler's wall clock time profiling, as opposed to solely relying on CPU profiling. The ability of the inferred spans feature to reflect actual latency, including for I/O operations and other non-CPU-bound tasks, represents a significant advancement. It allows for diagnosing and resolving performance issues that extend beyond CPU limitations, offering a more nuanced view of system behavior.
Using inferred spans with your own OpenTelemetry SDK
OpenTelemetry is a highly extensible framework: Elastic embraces this extensibility by also publishing most extensions shipped with our OpenTelemetry Java Distro as standalone-extensions to the OpenTelemetry Java SDK.
As a result, if you do not want to use our distro (e.g., because you don’t need or want bytecode instrumentation in your project), you can still use our extensions, such as the extension for the inferred spans feature. All you need to do is set up the OpenTelemetry SDK in your code and add the inferred spans extension as a dependency:
<dependency>
<groupId>co.elastic.otel</groupId>
<artifactId>inferred-spans</artifactId>
<version>{latest version}</version>
</dependency>
During your SDK setup, you’ll have to initialize and register the extension:
InferredSpansProcessor inferredSpans = InferredSpansProcessor.builder()
.samplingInterval(Duration.ofMillis(10)) //the builder offers all config options
.build();
SdkTracerProvider tracerProvider = SdkTracerProvider.builder()
.addSpanProcessor(inferredSpans)
.addSpanProcessor(BatchSpanProcessor.builder(OtlpGrpcSpanExporter.builder()
.setEndpoint("https://<your-elastic-apm-endpoint>")
.addHeader("Authorization", "Bearer <secrettoken>")
.build()).build())
.build();
inferredSpans.setTracerProvider(tracerProvider);
The inferred spans extension seamlessly integrates with the OpenTelemetry SDK Autoconfiguration mechanism. By incorporating the OpenTelemetry SDK and its extensions as dependencies within your application code — rather than through an external agent — you gain the flexibility to configure them using the same environment variables or JVM properties. Once the inferred spans extension is included in your classpath, activating it for autoconfigured SDKs becomes straightforward. Simply enable it using the elastic.otel.inferred.spans.enabled property, as previously described, to leverage the full capabilities of this feature with minimal setup.
How does inferred spans work?
The inferred spans feature leverages the capabilities of collecting wall clock time profiling data of the widely-used async-profiler, a low-overhead, popular production-time profiler in the Java ecosystem. It then transforms the profiling data into actionable spans as part of the distributed traces. But what mechanism allows for this transformation?
Essentially, the inferred spans extension engages with the lifecycle of span events, specifically when a span is either activated or deactivated across any thread via the OpenTelemetry context. Upon the activation of the initial span within a transaction, the extension commences a session of wall-clock profiling via the async-profiler, set to a predetermined duration. Concurrently, it logs the details of all span activations and deactivations, capturing their respective timestamps and the threads on which they occurred.
Following the completion of the profiling session, the extension processes the profiling data alongside the log of span events. By correlating the data, it reconstructs the inferred spans. It's important to note that, in certain complex scenarios, the correlation may assign an incorrect name to a span. To mitigate this and aid in accurate identification, the extension enriches the inferred spans with stacktrace segments under the code.stacktrace attribute, offering users clarity and insight into the precise methods implicated.
Inferred spans vs. correlation of traces with profiling data
In the wake of OpenTelemetry's recent announcement of the profiling signal, coupled with Elastic's commitment to donating the Universal Profiling Agent to OpenTelemetry, you might be wondering about how the inferred spans feature differentiates from merely correlating profiling data with distributed traces using span IDs and trace IDs. Rather than viewing these as competing functionalities, it's more accurate to consider them complementary.
The inferred spans feature and the correlation of tracing with profiling data both employ similar methodologies — melding tracing information with profiling data. However, they each shine in distinct areas. Inferred spans excels at identifying long-running methods that could escape notice with traditional CPU profiling, which is more adept at pinpointing CPU bottlenecks. A unique advantage of inferred spans is its ability to account for I/O time, capturing delays caused by operations like disk access that wouldn't typically be visible in CPU profiling flamegraphs.
However, the inferred spans feature has its limitations, notably in detecting latency issues arising from "death by a thousand cuts" — where a method, although not time-consuming per invocation, significantly impacts total latency due to being called numerous times across a request. While individual calls might not be captured as inferred spans due to their brevity, CPU-bound methods contributing to latency are unveiled through CPU profiling, as flamegraphs display the aggregate CPU time consumed by these methods.
An additional strength of the inferred spans feature lies in its data structure, offering a simplified tracing model that outlines typical parent-child relationships, execution order, and good latency estimates. This structure is achieved by integrating tracing data with span activation/deactivation events and profiling data, facilitating straightforward navigation and troubleshooting of latency issues within individual traces.
Correlating distributed tracing data with profiling data comes with a different set of advantages. Learn more about it in our related blog post, Beyond the trace: Pinpointing performance culprits with continuous profiling and distributed tracing correlation.
What about the performance overhead?
As mentioned before, the inferred spans functionality is based on the widely used async-profiler, known for its minimal impact on performance. However, the efficiency of profiling operations is not without its caveats, largely influenced by the specific configurations employed. A pivotal factor in this balancing act is the sampling interval — the longer the interval between samples, the lower the incurred overhead, albeit at the expense of potentially overlooking shorter methods that could be critical to the inferred spans feature discovery process.
Adjusting the probability-based trace sampling presents another way for optimization, directly influencing the overhead. For instance, setting trace sampling to 50% effectively halves the profiling load, making the inferred spans feature even more resource-efficient on average per request. This nuanced approach to tuning ensures that the inferred spans feature can be leveraged in real-world, production environments with a manageable performance footprint. When properly configured, this feature offers a potent, low-overhead solution for enhancing observability and diagnostic capabilities within production applications.
What’s next for inferred spans and OpenTelemetry?
This blog post outlined and introduced the inferred spans feature available as an extension for the OpenTelemetry Java SDK and built into the newly introduced Elastic OpenTelemetry Java Distro. Inferred spans allows users to troubleshoot latency issues in areas of code that are not explicitly instrumented while utilizing traditional tracing data.
The feature is currently merely a port of the existing feature from the proprietary Elastic APM Agent. With Elastic embracing OpenTelemetry, we plan on contributing this extension to the upstream OpenTelemetry project. For that, we also plan on migrating the extension to the latest async-profiler 3.x release. Try out inferred spans for yourself and see how it can help you diagnose performance problems in your applications.
The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.