This documentation contains work-in-progress information for future Elastic Stack and Cloud releases. Use the version selector to view supported release docs. It also contains some Elastic Cloud serverless information. Check out our serverless docs for more details.

« Fix common cluster issues Circuit breaker errors »

› › ›

Fix watermark errors

edit

Fix watermark errors

edit

When a data node is critically low on disk space and has reached the flood-stage disk usage watermark, the following error is logged: Error: disk usage exceeded flood-stage watermark, index has read-only-allow-delete block.

To prevent a full disk, when a node reaches this watermark, Elasticsearch blocks writes to any index with a shard on the node. If the block affects related system indices, Kibana and other Elastic Stack features may become unavailable. For example, this could induce Kibana’s Kibana Server is not Ready yet error message.

Elasticsearch will automatically remove the write block when the affected node’s disk usage falls below the high disk watermark. To achieve this, Elasticsearch attempts to rebalance some of the affected node’s shards to other nodes in the same data tier.

Monitor rebalancing

edit

To verify that shards are moving off the affected node until it falls below high watermark., use the cat shards API and cat recovery API:

resp = client.cat.shards(
    v=True,
)
print(resp)

resp1 = client.cat.recovery(
    v=True,
    active_only=True,
)
print(resp1)

const response = await client.cat.shards({
  v: "true",
});
console.log(response);

const response1 = await client.cat.recovery({
  v: "true",
  active_only: "true",
});
console.log(response1);

GET _cat/shards?v=true

GET _cat/recovery?v=true&active_only=true

If shards remain on the node keeping it about high watermark, use the cluster allocation explanation API to get an explanation for their allocation status.

resp = client.cluster.allocation_explain(
    index="my-index",
    shard=0,
    primary=False,
)
print(resp)

const response = await client.cluster.allocationExplain({
  index: "my-index",
  shard: 0,
  primary: false,
});
console.log(response);

GET _cluster/allocation/explain
{
  "index": "my-index",
  "shard": 0,
  "primary": false
}

Temporary Relief

edit

To immediately restore write operations, you can temporarily increase the disk watermarks and remove the write block.

resp = client.cluster.put_settings(
    persistent={
        "cluster.routing.allocation.disk.watermark.low": "90%",
        "cluster.routing.allocation.disk.watermark.low.max_headroom": "100GB",
        "cluster.routing.allocation.disk.watermark.high": "95%",
        "cluster.routing.allocation.disk.watermark.high.max_headroom": "20GB",
        "cluster.routing.allocation.disk.watermark.flood_stage": "97%",
        "cluster.routing.allocation.disk.watermark.flood_stage.max_headroom": "5GB",
        "cluster.routing.allocation.disk.watermark.flood_stage.frozen": "97%",
        "cluster.routing.allocation.disk.watermark.flood_stage.frozen.max_headroom": "5GB"
    },
)
print(resp)

resp1 = client.indices.put_settings(
    index="*",
    expand_wildcards="all",
    settings={
        "index.blocks.read_only_allow_delete": None
    },
)
print(resp1)

response = client.cluster.put_settings(
  body: {
    persistent: {
      'cluster.routing.allocation.disk.watermark.low' => '90%',
      'cluster.routing.allocation.disk.watermark.low.max_headroom' => '100GB',
      'cluster.routing.allocation.disk.watermark.high' => '95%',
      'cluster.routing.allocation.disk.watermark.high.max_headroom' => '20GB',
      'cluster.routing.allocation.disk.watermark.flood_stage' => '97%',
      'cluster.routing.allocation.disk.watermark.flood_stage.max_headroom' => '5GB',
      'cluster.routing.allocation.disk.watermark.flood_stage.frozen' => '97%',
      'cluster.routing.allocation.disk.watermark.flood_stage.frozen.max_headroom' => '5GB'
    }
  }
)
puts response

response = client.indices.put_settings(
  index: '*',
  expand_wildcards: 'all',
  body: {
    'index.blocks.read_only_allow_delete' => nil
  }
)
puts response

const response = await client.cluster.putSettings({
  persistent: {
    "cluster.routing.allocation.disk.watermark.low": "90%",
    "cluster.routing.allocation.disk.watermark.low.max_headroom": "100GB",
    "cluster.routing.allocation.disk.watermark.high": "95%",
    "cluster.routing.allocation.disk.watermark.high.max_headroom": "20GB",
    "cluster.routing.allocation.disk.watermark.flood_stage": "97%",
    "cluster.routing.allocation.disk.watermark.flood_stage.max_headroom": "5GB",
    "cluster.routing.allocation.disk.watermark.flood_stage.frozen": "97%",
    "cluster.routing.allocation.disk.watermark.flood_stage.frozen.max_headroom":
      "5GB",
  },
});
console.log(response);

const response1 = await client.indices.putSettings({
  index: "*",
  expand_wildcards: "all",
  settings: {
    "index.blocks.read_only_allow_delete": null,
  },
});
console.log(response1);

PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.disk.watermark.low": "90%",
    "cluster.routing.allocation.disk.watermark.low.max_headroom": "100GB",
    "cluster.routing.allocation.disk.watermark.high": "95%",
    "cluster.routing.allocation.disk.watermark.high.max_headroom": "20GB",
    "cluster.routing.allocation.disk.watermark.flood_stage": "97%",
    "cluster.routing.allocation.disk.watermark.flood_stage.max_headroom": "5GB",
    "cluster.routing.allocation.disk.watermark.flood_stage.frozen": "97%",
    "cluster.routing.allocation.disk.watermark.flood_stage.frozen.max_headroom": "5GB"
  }
}

PUT */_settings?expand_wildcards=all
{
  "index.blocks.read_only_allow_delete": null
}

When a long-term solution is in place, to reset or reconfigure the disk watermarks:

resp = client.cluster.put_settings(
    persistent={
        "cluster.routing.allocation.disk.watermark.low": None,
        "cluster.routing.allocation.disk.watermark.low.max_headroom": None,
        "cluster.routing.allocation.disk.watermark.high": None,
        "cluster.routing.allocation.disk.watermark.high.max_headroom": None,
        "cluster.routing.allocation.disk.watermark.flood_stage": None,
        "cluster.routing.allocation.disk.watermark.flood_stage.max_headroom": None,
        "cluster.routing.allocation.disk.watermark.flood_stage.frozen": None,
        "cluster.routing.allocation.disk.watermark.flood_stage.frozen.max_headroom": None
    },
)
print(resp)

response = client.cluster.put_settings(
  body: {
    persistent: {
      'cluster.routing.allocation.disk.watermark.low' => nil,
      'cluster.routing.allocation.disk.watermark.low.max_headroom' => nil,
      'cluster.routing.allocation.disk.watermark.high' => nil,
      'cluster.routing.allocation.disk.watermark.high.max_headroom' => nil,
      'cluster.routing.allocation.disk.watermark.flood_stage' => nil,
      'cluster.routing.allocation.disk.watermark.flood_stage.max_headroom' => nil,
      'cluster.routing.allocation.disk.watermark.flood_stage.frozen' => nil,
      'cluster.routing.allocation.disk.watermark.flood_stage.frozen.max_headroom' => nil
    }
  }
)
puts response

const response = await client.cluster.putSettings({
  persistent: {
    "cluster.routing.allocation.disk.watermark.low": null,
    "cluster.routing.allocation.disk.watermark.low.max_headroom": null,
    "cluster.routing.allocation.disk.watermark.high": null,
    "cluster.routing.allocation.disk.watermark.high.max_headroom": null,
    "cluster.routing.allocation.disk.watermark.flood_stage": null,
    "cluster.routing.allocation.disk.watermark.flood_stage.max_headroom": null,
    "cluster.routing.allocation.disk.watermark.flood_stage.frozen": null,
    "cluster.routing.allocation.disk.watermark.flood_stage.frozen.max_headroom":
      null,
  },
});
console.log(response);

PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.disk.watermark.low": null,
    "cluster.routing.allocation.disk.watermark.low.max_headroom": null,
    "cluster.routing.allocation.disk.watermark.high": null,
    "cluster.routing.allocation.disk.watermark.high.max_headroom": null,
    "cluster.routing.allocation.disk.watermark.flood_stage": null,
    "cluster.routing.allocation.disk.watermark.flood_stage.max_headroom": null,
    "cluster.routing.allocation.disk.watermark.flood_stage.frozen": null,
    "cluster.routing.allocation.disk.watermark.flood_stage.frozen.max_headroom": null
  }
}

Resolve

edit

As a long-term solution, we recommend you do one of the following best suited to your use case:

add nodes to the affected data tiers

You should enable autoscaling for clusters deployed using our Elasticsearch Service, Elastic Cloud Enterprise, and Elastic Cloud on Kubernetes platforms.
upgrade existing nodes to increase disk space

On Elasticsearch Service, Elastic Support intervention may become necessary if cluster health reaches status:red.
delete unneeded indices using the delete index API
update related ILM policy to push indices through to later data tiers

« Fix common cluster issues Circuit breaker errors »