Troubleshooting container engines
editTroubleshooting container engines
editThis article describes how to troubleshoot container engine services in Elastic Cloud Enterprise.
We refer to Docker by default, as it’s the most common container engine, but these steps are also valid for Podman. You can simply replace docker
in the commands with podman
as needed.
Do not restart the Docker daemon unless directly
prescribed by Elastic Support upon reviewing an
Elastic Cloud Enterprise diagnostic, as historically Docker can leave
residual orphan processes. We also advise against running any variation of
Docker’s prune
to avoid accidental data loss.
Use supported configuration
editMake sure to use a combination of Linux operating systems and container engine version that is supported, following our official Support matrix. Using unsupported combinations can cause a plethora of either intermediate or potentially permanent issues with you Elastic Cloud Enterprise environment, such as failures to create system deployments, to upgrade workload deployments, proxy timeouts, data loss, and more.
Troubleshoot unhealthy containers
editWhile troubleshooting the stability of an Elastic Cloud Enterprise host, you may encounter
unhealthy
Docker containers as reported by
ps
.
System containers reporting unhealthy is infrequent and usually only occurs after an unexpected occurance or issues while performing operating system maintenance. If operating system maintenance does need performed, kindly pivot to our perform host maintenance guide.
Restart deployment instances
editIf the unhealthy
Docker container is a Deployment’s instance, name
formatting fac-{cluster_id}-instance-{node_id}
, we recommend restarting the
instance from the Elastic Cloud Enterprise UI via its pause
and resume mechanism rather than via Docker.
If the unhealthy
status returns, we recommend investigating via
our troubleshooting bootlooping guide.
This should indicate an issue with the Elasticsearch configuration rather than any
Docker-level problem. An isolated exception effecting
air-gapped environments is if the expected Docker
image
does not
yet exist on the Allocator in which case its logs would report
Unable to pull image
.
Restart service containers
editWhile troubleshooting unhealthy
Elastic Cloud Enterprise system containers (name prefix frc-
),
some may be restarted while others should not.
Elastic Cloud Enterprise’s runners will automatically create or restart missing
system containers. If you’re attempting to permanently remove a system container
by removing its role from the host, you’d instead
update runner roles. If eligible system containers return to
an unhealthy
status after restart, we recommend reviewing their start-up Docker
logs
.
It is safe to restart the following via Docker
stop
followed by Docker
rm
on:
-
frc-allocator-metricbeats-allocator-metricbeat
-
frc-allocators-allocator
-
frc-beats-runners-beats-runner
-
frc-constructors-constructor
-
frc-proxies-proxyv2
It is safe to restart the following via Docker
restart
:
-
frc-client-forwarders-client-forwarder
-
frc-directors-director
-
frc-services-forwarders-services-forwarder
It is not safe to restart the following without explicit steps from Elastic Support upon reviewing an Elastic Cloud Enterprise diagnostic:
-
any container name prefixing
fac-
-
frc-runners-runner
-
frc-zookeeper-servers-zookeeper
For unhealthy Zookeeper, instead see verify Zookeeper sync status and resolving Zookeeper quorum.
For any Elastic Cloud Enterprise system container not listed, kindly reach out to Elastic Support for advisement.