Advanced Elasticsearch node scheduling
editAdvanced Elasticsearch node scheduling
editElastic Cloud on Kubernetes (ECK) offers full control over Elasticsearch cluster nodes scheduling by combining Elasticsearch configuration with Kubernetes scheduling options:
You can combine these features to deploy a production-grade Elasticsearch cluster.
Define Elasticsearch nodes roles
editYou can configure Elasticsearch nodes with one or multiple roles.
You can use YAML anchors to declare the configuration change once and reuse it across all the node sets.
This allows you to describe an Elasticsearch cluster with 3 dedicated master nodes, for example:
apiVersion: elasticsearch.k8s.elastic.co/v1 kind: Elasticsearch metadata: name: quickstart spec: version: 8.16.1 nodeSets: # 3 dedicated master nodes - name: master count: 3 config: node.roles: ["master"] node.remote_cluster_client: false # 3 ingest-data nodes - name: ingest-data count: 3 config: node.roles: ["data", "ingest"]
Affinity options
editYou can set up various affinity and anti-affinity options through the podTemplate
section of the Elasticsearch resource specification.
A single Elasticsearch node per Kubernetes host (default)
editTo avoid scheduling several Elasticsearch nodes from the same cluster on the same host, use a podAntiAffinity
rule based on the hostname and the cluster name label:
apiVersion: elasticsearch.k8s.elastic.co/v1 kind: Elasticsearch metadata: name: quickstart spec: version: 8.16.1 nodeSets: - name: default count: 3 podTemplate: spec: affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchLabels: elasticsearch.k8s.elastic.co/cluster-name: quickstart topologyKey: kubernetes.io/hostname
This is ECK default behaviour if you don’t specify any affinity
option. To explicitly disable the default behaviour, set an empty affinity object:
apiVersion: elasticsearch.k8s.elastic.co/v1 kind: Elasticsearch metadata: name: quickstart spec: version: 8.16.1 nodeSets: - name: default count: 3 podTemplate: spec: affinity: {}
The default affinity is using preferredDuringSchedulingIgnoredDuringExecution
, which acts as best effort and won’t prevent an Elasticsearch node from being scheduled on a host if there are no other hosts available. Scheduling a 4-nodes Elasticsearch cluster on a 3-host Kubernetes cluster would then successfully schedule 2 Elasticsearch nodes on the same host. To enforce a strict single node per host, specify requiredDuringSchedulingIgnoredDuringExecution
instead:
apiVersion: elasticsearch.k8s.elastic.co/v1 kind: Elasticsearch metadata: name: quickstart spec: version: 8.16.1 nodeSets: - name: default count: 3 podTemplate: spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchLabels: elasticsearch.k8s.elastic.co/cluster-name: quickstart topologyKey: kubernetes.io/hostname
Local Persistent Volume constraints
editBy default, volumes can be bound to a Pod before the Pod gets scheduled to a particular Kubernetes node. This can be a problem if the PersistentVolume can only be accessed from a particular host or set of hosts. Local persistent volumes are a good example: they are accessible from a single host. If the Pod gets scheduled to a different host based on any affinity or anti-affinity rule, the volume may not be available.
To solve this problem, you can set the Volume Binding Mode of the StorageClass
you are using. Make sure volumeBindingMode: WaitForFirstConsumer
is set, especially if you are using local persistent volumes.
kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: local-storage provisioner: kubernetes.io/no-provisioner volumeBindingMode: WaitForFirstConsumer
Node affinity
editTo restrict the scheduling to a particular set of Kubernetes nodes based on labels, use a NodeSelector.
The following example schedules Elasticsearch Pods on Kubernetes nodes tagged with both labels diskType: ssd
and environment: production
.
apiVersion: elasticsearch.k8s.elastic.co/v1 kind: Elasticsearch metadata: name: quickstart spec: version: 8.16.1 nodeSets: - name: default count: 3 podTemplate: spec: nodeSelector: diskType: ssd environment: production
You can achieve the same (and more) with node affinity:
apiVersion: elasticsearch.k8s.elastic.co/v1 kind: Elasticsearch metadata: name: quickstart spec: version: 8.16.1 nodeSets: - name: default count: 3 podTemplate: spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: environment operator: In values: - e2e - production preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: diskType operator: In values: - ssd
This example restricts Elasticsearch nodes so they are only scheduled on Kubernetes hosts tagged with environment: e2e
or environment: production
. It favors nodes tagged with diskType: ssd
.
Topology spread constraints and availability zone awareness
editStarting with ECK 2.0 the operator can make Kubernetes Node labels available as Pod annotations. It can be used to make information, such as logical failure domains, available in a running Pod. Combined with Elasticsearch shard allocation awareness and Kubernetes topology spread constraints, you can create an availability zone-aware Elasticsearch cluster.
Exposing Kubernetes node topology labels in Pods
edit-
First, ensure that the operator’s flag
exposed-node-labels
contains the list of the Kubernetes node labels that should be exposed in the Elasticsearch Pods. If you are using the provided installation manifest, or the Helm chart, then this flag is already preset with two wildcard patterns for well-known node labels that describe Kubernetes cluster topology, liketopology.kubernetes.io/.*
andfailure-domain.beta.kubernetes.io/.*
. -
On the Elasticsearch resources set the
eck.k8s.elastic.co/downward-node-labels
annotations with the list of the Kubernetes node labels that should be copied as Pod annotations. -
Use the Kubernetes downward API in the
podTemplate
to make those annotations available as environment variables in Elasticsearch Pods.
Refer to the next section or to the Elasticsearch sample resource in the ECK source repository for a complete example.
Using node topology labels, Kubernetes topology spread constraints, and Elasticsearch shard allocation awareness
editThe following example demonstrates how to use the topology.kubernetes.io/zone
node labels to spread a NodeSet across the availability zones of a Kubernetes cluster.
Note that by default ECK creates a k8s_node_name
attribute with the name of the Kubernetes node running the Pod, and configures Elasticsearch to use this attribute. This ensures that Elasticsearch allocates primary and replica shards to Pods running on different Kubernetes nodes and never to Pods that are scheduled onto the same Kubernetes node. To preserve this behavior while making Elasticsearch aware of the availability zone, include the k8s_node_name
attribute in the comma-separated cluster.routing.allocation.awareness.attributes
list.
apiVersion: elasticsearch.k8s.elastic.co/v1 kind: Elasticsearch metadata: annotations: eck.k8s.elastic.co/downward-node-labels: "topology.kubernetes.io/zone" name: quickstart spec: version: 8.16.1 nodeSets: - name: default count: 3 config: node.attr.zone: ${ZONE} cluster.routing.allocation.awareness.attributes: k8s_node_name,zone podTemplate: spec: # The initContainers section is only required if using secureSettings in conjunction with zone awareness. # initContainers: # - name: elastic-internal-init-keystore # env: # - name: ZONE # valueFrom: # fieldRef: # fieldPath: metadata.annotations['topology.kubernetes.io/zone'] containers: - name: elasticsearch env: - name: ZONE valueFrom: fieldRef: fieldPath: metadata.annotations['topology.kubernetes.io/zone'] topologySpreadConstraints: - maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: DoNotSchedule labelSelector: matchLabels: elasticsearch.k8s.elastic.co/cluster-name: quickstart elasticsearch.k8s.elastic.co/statefulset-name: quickstart-es-default
This example relies on:
-
Kubernetes nodes in each zone being labeled accordingly.
topology.kubernetes.io/zone
is standard, but any label can be used. - Pod topology spread constraints to spread the Pods across availability zones in the Kubernetes cluster.
-
Elasticsearch configured to allocate shards based on node attributes. Here we specified
node.attr.zone
, but any attribute name can be used.node.attr.rack_id
is another common example.
Hot-warm topologies
editBy combining Elasticsearch shard allocation awareness with Kubernetes node affinity, you can set up an Elasticsearch cluster with hot-warm topology:
apiVersion: elasticsearch.k8s.elastic.co/v1 kind: Elasticsearch metadata: name: quickstart spec: version: 8.16.1 nodeSets: # hot nodes, with high CPU and fast IO - name: hot count: 3 config: node.attr.data: hot podTemplate: spec: containers: - name: elasticsearch resources: limits: memory: 16Gi cpu: 4 affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: beta.kubernetes.io/instance-type operator: In values: - highio volumeClaimTemplates: - metadata: name: elasticsearch-data spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Ti storageClassName: local-storage # warm nodes, with high storage - name: warm count: 3 config: node.attr.data: warm podTemplate: spec: containers: - name: elasticsearch resources: limits: memory: 16Gi cpu: 2 affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: beta.kubernetes.io/instance-type operator: In values: - highstorage volumeClaimTemplates: - metadata: name: elasticsearch-data spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Ti storageClassName: local-storage
In this example, we configure two groups of Elasticsearch nodes:
-
The first group has the
data
attribute set tohot
. It is intended to run on hosts with high CPU resources and fast IO (SSD). Pods can only be scheduled on Kubernetes nodes labeled withbeta.kubernetes.io/instance-type: highio
(to adapt to the labels of your Kubernetes nodes). -
The second group has the
data
attribute set towarm
. It is intended to run on hosts with larger but maybe slower storage. Pods can only be scheduled on nodes labeled withbeta.kubernetes.io/instance-type: highstorage
.
This example uses Local Persistent Volumes for both groups, but can be adapted to use high-performance volumes for hot
Elasticsearch nodes and high-storage volumes for warm
Elasticsearch nodes.
Finally, set up Index Lifecycle Management policies on your indices, optimizing for hot-warm architectures.