Fleet Server scalability

edit

This page summarizes the resource and Fleet Server configuration requirements needed to scale your deployment of Elastic Agents. To scale Fleet Server, you need to modify settings in your deployment and the Fleet Server agent policy.

First modify your Fleet deployment settings in Elastic Cloud:

  1. Log in to Elastic Cloud and go to your deployment.
  2. Under Deployments > deployment name, click Edit.
  3. Under Integrations Server:

    • Modify the compute resources available to the server to accommodate a higher scale of Elastic Agents
    • Modify the availability zones to satisfy fault tolerance requirements

    For recommended settings, refer to Scaling recommendations (Elastic Cloud).

    Fleet Server hosted agent

Next modify the Fleet Server configuration by editing the agent policy:

  1. In Kibana, go to Management > Fleet > Agent Policies. Click the name of the Elastic Cloud agent policy to edit the policy.
  2. Open the Actions menu next to the Fleet Server integration and click Edit integration.

    Elastic Cloud policy
  3. Under Fleet Server, modify Max Connections and other advanced settings as described in Scaling recommendations (Elastic Cloud).

    Fleet Server configuration

Advanced Fleet Server options

edit

The following advanced settings are available to fine tune your Fleet Server deployment.

cache
num_counters
Size of the hash table. Best practice is to have this set to 10 times the max connections.
max_cost
Total size of the cache.
server.timeouts
checkin_timestamp
How often Fleet Server updates the "last activity" field for each agent. Defaults to 30s. In a large-scale deployment, increasing this setting may improve performance. If this setting is higher than 2m, most agents will be shown as "offline" in the Fleet UI. For a typical setup, it’s recommended that you set this value to less than 2m.
checkin_long_poll
How long Fleet Server allows a long poll request from an agent before timing out. Defaults to 5m. In a large-scale deployment, increasing this setting may improve performance.
server.limits
policy_throttle
How often a new policy is rolled out to the agents.
checkin_limit.interval
How fast the agents can check in to the Fleet Server.
checkin_limit.burst
Burst of check-ins allowed before falling back to the rate defined by interval.
checkin_limit.max
Maximum number of agents.
artifact_limit.max
Maximum number of agents that can call the artifact API concurrently. It allows the user to avoid overloading the Fleet Server from artifact API calls.
artifact_limit.interval
How often artifacts are rolled out. Default of 100ms allows 10 artifacts to be rolled out per second.
artifact_limit.burst
Number of transactions allowed for a burst, controlling oversubscription on outbound buffer.
ack_limit.max
Maximum number of agents that can call the Ack API concurrently. It allows the user to avoid overloading the Fleet Server from Ack API calls.
ack_limit.interval
How often an acknowledgment (ACK) is sent. Default value of 10ms enables 100 ACKs per second to be sent.
ack_limit.burst
Burst of ACKs to accommodate (default of 20) before falling back to the rate defined in interval.
enroll_limit.max
Maximum number of agents that can call the Enroll API concurrently. This setting allows the user to avoid overloading the Fleet Server from Enrollment API calls.
enroll_limit.interval
Interval between processing enrollment request. Enrollment is both CPU and RAM intensive, so the number of enrollment requests needs to be limited for overall system health. Default value of 100ms allows 10 enrollments per second.
enroll_limit.burst
Burst of enrollments to accept before falling back to the rate defined by interval.

Scaling recommendations (Elastic Cloud)

edit

The following tables provide resource requirements and scaling guidelines based on the number of agents required by your deployment:

Resource requirements by number of agents
edit

Number of Agents

Memory

vCPU

Elasticsearch Cluster size

50

1 GB

Up to 8.5 vCPU

480 GB disk | 16 GB RAM | up to 5 vCPU

5,000

2 GB

Up to 8.5 vCPU

960 GB disk | 32 GB RAM | 5 vCPU

7,500

4 GB

Up to 8.5 vCPU

1.88 TB disk | 64 GB RAM | 9.8 vCPU

10,000

8 GB

Up to 8.5 vCPU

3.75 TB disk | 128 GB RAM | 19.8 vCPU

15,000

16 GB

8.5 vCPU

7.5 TB disk | 256 GB RAM | 39.4 vCPU

25,000

16 GB

8.5 vCPU

7.5 TB disk | 256 GB RAM | 39.4 vCPU

50,000

32 GB

16.9 vCPU

11.25 TB disk | 384 GB RAM |59.2 vCPU

Recommended settings by number of deployed Elastic Agents
edit

You might need to scroll to the right to see all the table columns.

50

5,000

7,500

10,000

12,500

30,000

50,000

Max Connections

100

7,000

10,000

20,000

32,000

32,000

32,000

Cache settings

num_counters

2000

20000

40000

80000

160000

160000

320000

max_cost

2097152

20971520

50971520

104857600

209715200

209715200

209715200

Server limits

policy_throttle

200 ms

50 ms

10 ms

5 ms

5 ms

2 ms

5 ms

checkin_limit:

interval

50 ms

5 ms

2 ms

1 ms

500 us

500 us

500 us

burst

25

500

1000

2000

4000

4000

4000

max

100

5001

7501

10001

12501

15001

25001

artifact_limit:

interval

100 ms

5 ms

2 ms

1 ms

500 us

500 us

500 us

burst

10

500

1000

2000

4000

4000

4000

max

10

1000

2000

4000

8000

8000

8000

ack_limit:

interval

10 ms

4 ms

2 ms

1 ms

500 us

500 us

500 us

burst

20

500

1000

2000

4000

4000

4000

max

20

1000

2000

4000

8000

8000

8000

enroll_limit:

interval

100 ms

20 ms

10 ms

10 ms

10 ms

10 ms

10 ms

burst

5

50

100

100

100

100

100

max

10

100

200

200

200

200

200

Server runtime settings

gc_percent

20

20

20

20

20

20

20