Performance Tuning
editPerformance Tuning
editThere are many options available to tune agent performance. Which option to adjust depends on whether you are optimizing for speed, memory usage, bandwidth, or storage.
Sampling
editThe first knob to reach for when tuning the performance of the agent is transactionSampleRate
.
Adjusting the sampling rate controls what percent of requests are traced.
By default,
the sample rate is set at 1.0
,
meaning all requests are traced.
The sample rate will impact all four performance categories, so simply turning down the sample rate is an easy way to improve performance.
Here’s an example of setting the sample rate to 20%:
require('elastic-apm-node').start({ transactionSampleRate: 0.2 })
APM Server communication
editThe agent uses a persistent outgoing HTTP request to stream data to the APM Server. To avoid issues with intermittent proxies and load balancers, the HTTP request is ended and a new one created at regular intervals or when the size of the request becomes too big.
There’s an overhead involved in each HTTP request: Besides sending new HTTP headers, the agent needs to re-send certain metadata to the APM Server each time a new HTTP request is made. However, if allowed by the network, the TCP socket is reused between each HTTP request.
Max HTTP request duration
editBy default, an HTTP request to the APM Server is ended after a maximum of 10 seconds.
Using the apiRequestTime
config option,
this time limit can be modified.
Lowering the time limit might be necessary if dealing with very aggressive proxies, but increasing the time limit means that the combined overhead of these HTTP requests is reduced, as headers and metadata don’t need to be re-sent that often.
Max HTTP request size
editBy default, an HTTP request to the APM Server is ended after approximately 1 MiB of gzip compressed data have been written to the body.
Using the apiRequestSize
config option,
this time limit can be modified.
Lowering the size limit might be necessary if dealing with very aggressive proxies, but increasing the size limit means that the combined overhead of these HTTP requests is reduced, as headers and metadata don’t need to be re-sent that often.
APM Server Timeout
editIn the event that the APM Server or the connection to the APM Server is slow or unstable,
the serverTimeout
setting can be set to ensure the agent doesn’t wait too long for a response.
The agent only allows for a single TCP socket to be opened to the APM Server at any given time. This is to avoid the overhead of opening too many sockets. If the agent is stuck waiting for a response from the previous HTTP request, it might start dropping new data in order to keep its memory footprint low.
Keeping this timeout low, helps alleviate that problem.
Stack Traces
editStack traces can be a significant contributor to memory usage. There are several settings to adjust how they are used.
Span Stack Traces
editIn a complex application,
a request may produce many spans.
Capturing a stack trace for every span can result in significant memory usage.
To disable capturing stack traces for spans,
set captureSpanStackTraces
to false
.
This will mainly impact memory usage, but may also have a noticeable impact on speed too. The CPU time required to capture and convert a stack frame to something JavaScript can understand is not insignificant.
Source Lines
editIf you want to keep span stack traces enabled for context, the next thing to try is adjusting how many source lines are reported for each stack trace. When a stack trace is captured, the agent will also capture several lines of source code around each stack frame location in the stack trace.
The are four different settings to control this behaviour:
Source line settings are divided into app frames representing your app code and library frames representing the code of your dependencies. App and library categories are both split into error and span groups. Spans, by default, do not capture source lines. Errors, by default, will capture five lines of code around each stack frame.
Source lines are cached in-process. In memory-constrained environments, the source line cache may use more memory than desired. Turning the limits down will help prevent excessive memory use.
Stack Frame Limit
editThe stackTraceLimit
controls how many stack frames should be captured when producing an Error
instance of any kind.
This will mainly impact memory usage, but may also have a noticeable impact on speed too. The CPU time required to capture and convert a stack frame to something JavaScript can understand is not insignificant.
Error Log Stack Traces
editMost stack traces recorded by the agent will point to where the error was instantiated,
not where it was identified and reported to the agent with captureError
.
For this reason,
the agent also has the captureErrorLogStackTraces
setting to enable capturing an additional stack trace pointing to the place an error was reported to the agent.
By default,
it will only capture the stack trace to the reporting point when captureError
is called with a string message.
Setting this to always
will increase memory and bandwidth usage,
so it helps to consider how frequently the app may capture errors.
Spans
editThe transactionMaxSpans
setting limits the number of spans which may be recorded within a single transaction before remaining spans are dropped.
Spans may include many things such as a stack trace and context data. Limiting the number of spans that may be recorded will reduce memory usage.
Reducing max spans could result in loss of useful data about what occurred within a request, if it is set too low.