Tutorial: Create a data stream with a lifecycle

edit

Tutorial: Create a data stream with a lifecycle

edit

To create a data stream with a built-in lifecycle, follow these steps:

Create an index template

edit

A data stream requires a matching index template. You can configure the data stream lifecycle by setting the lifecycle field in the index template the same as you do for mappings and index settings. You can define an index template that sets a lifecycle as follows:

  • Include the data_stream object to enable data streams.
  • Define the lifecycle in the template section or include a composable template that defines the lifecycle.
  • Use a priority higher than 200 to avoid collisions with built-in templates. See Avoid index pattern collisions.

You can use the create index template API.

response = client.indices.put_index_template(
  name: 'my-index-template',
  body: {
    index_patterns: [
      'my-data-stream*'
    ],
    data_stream: {},
    priority: 500,
    template: {
      lifecycle: {
        data_retention: '7d'
      }
    },
    _meta: {
      description: 'Template with data stream lifecycle'
    }
  }
)
puts response
PUT _index_template/my-index-template
{
  "index_patterns": ["my-data-stream*"],
  "data_stream": { },
  "priority": 500,
  "template": {
    "lifecycle": {
      "data_retention": "7d"
    }
  },
  "_meta": {
    "description": "Template with data stream lifecycle"
  }
}

Create a data stream

edit

You can create a data stream in two ways:

  1. By manually creating the stream using the create data stream API. The stream’s name must still match one of your template’s index patterns.

    response = client.indices.create_data_stream(
      name: 'my-data-stream'
    )
    puts response
    PUT _data_stream/my-data-stream
  2. By indexing requests that target the stream’s name. This name must match one of your index template’s index patterns.

    response = client.bulk(
      index: 'my-data-stream',
      body: [
        {
          create: {}
        },
        {
          "@timestamp": '2099-05-06T16:21:15.000Z',
          message: '192.0.2.42 - - [06/May/2099:16:21:15 +0000] "GET /images/bg.jpg HTTP/1.0" 200 24736'
        },
        {
          create: {}
        },
        {
          "@timestamp": '2099-05-06T16:25:42.000Z',
          message: '192.0.2.255 - - [06/May/2099:16:25:42 +0000] "GET /favicon.ico HTTP/1.0" 200 3638'
        }
      ]
    )
    puts response
    PUT my-data-stream/_bulk
    { "create":{ } }
    { "@timestamp": "2099-05-06T16:21:15.000Z", "message": "192.0.2.42 - - [06/May/2099:16:21:15 +0000] \"GET /images/bg.jpg HTTP/1.0\" 200 24736" }
    { "create":{ } }
    { "@timestamp": "2099-05-06T16:25:42.000Z", "message": "192.0.2.255 - - [06/May/2099:16:25:42 +0000] \"GET /favicon.ico HTTP/1.0\" 200 3638" }

Retrieve lifecycle information

edit

You can use the get data stream lifecycle API to see the data stream lifecycle of your data stream and the explain data stream lifecycle API to see the exact state of each backing index.

response = client.indices.get_data_lifecycle(
  name: 'my-data-stream'
)
puts response
GET _data_stream/my-data-stream/_lifecycle

The result will look like this:

{
  "data_streams": [
    {
      "name": "my-data-stream",
      "lifecycle": {
        "enabled": true,       
        "data_retention": "7d" 
      }
    }
  ]
}

The name of your data stream.

Shows if the data stream lifecycle is enabled for this data stream.

The retention period of the data indexed in this data stream, this means that the data in this data stream will be kept at least for 7 days. After that Elasticsearch can delete it at its own discretion.

If you want to see more information about how the data stream lifecycle is applied on individual backing indices use the explain data stream lifecycle API:

response = client.indices.explain_data_lifecycle(
  index: '.ds-my-data-stream-*'
)
puts response
GET .ds-my-data-stream-*/_lifecycle/explain

The result will look like this:

{
  "indices": {
    ".ds-my-data-stream-2023.04.19-000001": {
      "index": ".ds-my-data-stream-2023.04.19-000001",      
      "managed_by_lifecycle": true,                               
      "index_creation_date_millis": 1681918009501,
      "time_since_index_creation": "1.6m",                  
      "lifecycle": {                                        
        "enabled": true,
        "data_retention": "7d"
      }
    }
  }
}

The name of the backing index.

If it is managed by the built-in data stream lifecycle.

Time since the index was created.

The lifecycle configuration that is applied on this backing index.