Processing and performance

edit

APM Server performance depends on a number of factors: memory and CPU available, network latency, transaction sizes, workload patterns, agent and server settings, versions, and protocol.

We tested several scenarios to help you understand how to size the APM Server so that it can keep up with the load that your Elastic APM agents are sending:

  • Using the CPU Optimized hardware template on AWS, GCP and Azure on Elastic Cloud with the following instances (for more details see Hardware Profiles):

    • AWS: c6gd
    • Azure: fsv2
    • GCP: n2.68x32x45
  • For each hardware template, testing with several sizes: 1 GB, 4 GB, 8 GB, and 32 GB.
  • For each size, using a fixed number of APM agents: 10 agents for 1 GB, 30 agents for 4 GB, 60 agents for 8 GB, and 240 agents for 32 GB.
  • In all scenarios, using medium sized events. Events include transactions and spans.

You will also need to scale up Elasticsearch accordingly, potentially with an increased number of shards configured. For more details on scaling Elasticsearch, refer to the Elasticsearch documentation.

The results below include numbers for a synthetic workload. You can use the results of our tests to guide your sizing decisions, however, performance will vary based on factors unique to your use case like your specific setup, the size of APM event data, and the exact number of agents.

Profile / Cloud AWS Azure GCP

1 GB
(10 agents)

15,000
events/second

14,000
events/second

17,000
events/second

4 GB
(30 agents)

29,000
events/second

26,000
events/second

35,000
events/second

8 GB
(60 agents)

50,000
events/second

34,000
events/second

48,000
events/second

16 GB
(120 agents)

96,000
events/second

57,000
events/second

90,000
events/second

32 GB
(240 agents)

133,000
events/second

89,000
events/second

143,000
events/second

Don’t forget that the APM Server is stateless. Several instances running do not need to know about each other. This means that with a properly sized Elasticsearch instance, APM Server scales out linearly.

RUM deserves special consideration. The RUM agent runs in browsers, and there can be many thousands reporting to an APM Server with very variable network latency.

Alternatively or in addition to scaling the APM Server, consider decreasing the ingestion volume. Read more in Reduce storage.