Run Elastic Agent standalone on Kubernetes

edit

Use Elastic Agent Docker images on Kubernetes to retrieve cluster metrics.

Running Elastic Cloud on Kubernetes? Refer to Run Elastic Agent on ECK.

Kubernetes deploy manifests
edit

Deploy Elastic Agent as a DaemonSet to ensure that there is a running instance on each node of the cluster. These instances are used to retrieve most metrics from the host, such as system metrics, Docker stats, and metrics from all the services running on top of Kubernetes.

In addition, one of the Pods in the DaemonSet will constantly hold a leader lock which makes it responsible for handling cluster-wide monitoring. Find more information about leader election configuration options at leader election provider. This instance is used to retrieve metrics that are unique for the whole cluster, such as Kubernetes events or kube-state-metrics. If kube-state-metrics is not already running, deploy it now (see the Kubernetes deployment docs)

Everything is deployed under the kube-system namespace by default. To change the namespace, modify the manifest file.

To download the manifest file, run:

curl -L -O https://raw.githubusercontent.com/elastic/elastic-agent/8.2/deploy/kubernetes/elastic-agent-standalone-kubernetes.yaml

This manifest includes the Kubernetes integration to collect Kubernetes metrics, System integration to collect system level metrics and logs from nodes, and the Pod’s log collection using dynamic inputs and Kubernetes provider.

Settings
edit

Set the Elasticsearch settings before deploying the manifest:

- name: ES_USERNAME
  value: "elastic"
- name: ES_PASSWORD
  value: "passpassMyStr0ngP@ss"
- name: ES_HOST
  value: "https://somesuperhostiduuid.europe-west1.gcp.cloud.es.io:443"
Configuration details
Run Elastic Agent on master nodes
edit

Kubernetes master nodes can use taints to limit the workloads that can run on them. The manifest for standalone Elastic Agent defines tolerations to run on master nodes. Agents running on master nodes collect metrics from the control plane components (scheduler, controller manager) of Kubernetes. To disable Elastic Agent from running on master nodes, remove the following part of the DaemonSet spec:

spec:
 tolerations:
 - key: node-role.kubernetes.io/master
   effect: NoSchedule
Deploy
edit

To deploy to Kubernetes, run:

kubectl create -f elastic-agent-standalone-kubernetes.yaml

To check the status, run:

$ kubectl -n kube-system get pods -l app=elastic-agent
NAME                            READY   STATUS    RESTARTS   AGE
elastic-agent-4665d             1/1     Running   0          81m
elastic-agent-9f466c4b5-l8cm8   1/1     Running   0          81m
elastic-agent-fj2z9             1/1     Running   0          81m
elastic-agent-hs4pb             1/1     Running   0          81m

You might need to adjust resource limits of the elastic-agent container in the elastic-agent-standalone-kubernetes.yaml manifest. Container resource usage depends on the number of data streams and the environment size.

Red Hat OpenShift configuration
edit

If you are using Red Hat OpenShift, you need to specify additional settings in the manifest file and enable the container to run as privileged.

  1. In the manifest file, modify the agent-node-datastreams ConfigMap and adjust inputs:

    • kubernetes-cluster-metrics input:

      • If https is used to access kube-state-metrics, add the following settings to all kubernetes.state_* datasets:

          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          ssl.certificate_authorities:
            - /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt
    • kubernetes-node-metrics input:

      • Change the kubernetes.controllermanager data stream condition to:

        condition: ${kubernetes.labels.app} == 'kube-controller-manager'
      • Change the kubernetes.scheduler data stream condition to:

        condition: ${kubernetes.labels.app} == 'openshift-kube-scheduler'
      • The kubernetes.proxy data stream configuration should look like:

        - data_stream:
            dataset: kubernetes.proxy
            type: metrics
          metricsets:
            - proxy
          hosts:
            - 'localhost:29101'
          period: 10s
      • Add the following settings to all data streams that connect to https://${env.NODE_NAME}:10250:

          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          ssl.certificate_authorities:
            - /path/to/ca-bundle.crt

        ca-bundle.crt can be any CA bundle that contains the issuer of the certificate used in the Kubelet API. According to each specific installation of OpenShift this can be found either in secrets or in configmaps. In some installations it can be available as part of the service account secret, in /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt. When using the OpenShift installer for GCP, mount the following configmap in the elastic-agent pod and use ca-bundle.crt in ssl.certificate_authorities:

        Name:         kubelet-serving-ca
        Namespace:    openshift-kube-apiserver
        Labels:       <none>
        Annotations:  <none>
        
        Data
        ====
        ca-bundle.crt:
  2. Grant the elastic-agent service account access to the privileged SCC:

    oc adm policy add-scc-to-user privileged system:serviceaccount:kube-system:elastic-agent

    This command enables the container to be privileged as an administrator for OpenShift.

  3. If the namespace where elastic-agent is running has the "openshift.io/node-selector" annotation set, elastic-agent might not run on all nodes. In this case consider overriding the node selector for the namespace to allow scheduling on any node:

    oc patch namespace kube-system -p \
    '{"metadata": {"annotations": {"openshift.io/node-selector": ""}}}'

    This command sets the node selector for the project to an empty string.

Autodiscover targeted Pods
edit

Autodiscover conditions can be defined to allow Elastic Agent to automatically identify Pods and start collecting from them using predefined integrations. For example, if a user wants to automatically identify a Redis Pod and start monitoring it using the Redis integration, the following configuration should be added as an extra input in the DaemonSet manifest:

- name: redis
  type: redis/metrics
  use_output: default
  meta:
    package:
      name: redis
      version: 0.3.6
  data_stream:
    namespace: default
  streams:
    - data_stream:
        dataset: redis.info
        type: metrics
      metricsets:
        - info
      hosts:
        - '${kubernetes.pod.ip}:6379'
      idle_timeout: 20s
      maxconn: 10
      network: tcp
      period: 10s
      condition: ${kubernetes.pod.labels.app} == 'redis'

Refer to dynamic inputs and Kubernetes provider for more information about shaping dynamic inputs for autodiscovery.

Deploying Elastic Agent to collect cluster-level metrics in large clusters
edit

The size and the number of nodes in a Kubernetes cluster can be fairly large at times, and in such cases the Pod that will be collecting cluster level metrics might face performance issues due to resources limitations. In this case users might consider to avoid using the leader election strategy and instead run a dedicated, standalone Elastic Agent instance using a Deployment in addition to the DaemonSet.

Deploying Elastic Agent to managed Kubernetes environment
edit

On managed Kubernetes solutions, such as AKS, GKE or EKS, Elastic Agent has no access to collect metrics from the Kubernetes control plane components, like kube-scheduler and kube-controller-manager, that are scheduled on Kubernetes master nodes.

Audit logs are available only on Kubernetes master nodes as well, and hence cannot be collected by Elastic Agent.