TSDS guidelines
editTSDS guidelines
editThis page describes how to enable TSDS functionality in your integration packages. Full details about TSDS can be found in Time series data stream in the Elasticsearch documentation.
In this document you can find:
Background
editA time series is a sequence of observations for a specific entity. TSDS enables the column-oriented functionality in elasticsearch by co-locating the data and optimizing the storage and aggregations to take advantage of such co-allocation.
Integrations are one of the biggest sources of input data to Elasticsearch. Enabling TSDS on integration packages can be achieved by minimal changes made in the fields.yml
and manifest.yml
files of a package.
Steps for enabling TSDS for a metrics dataset
editDatastreams having type logs
are excluded from TSDS migration.
Step 1: Set the dimension fields
editEach field belonging to the set of fields that uniquely identify a document is a dimension. For more details, refer to Dimensions.
To set a field as a dimension simply add dimension: true
to its mapping:
- name: ApiId type: keyword dimension: true
A field having type flattened cannot be selected as a dimension field. If the field that you are choosing as a dimension is too long or is of type flattened, consider hashing the value of this field and using the result as a dimension. Fingerprint processor can be used for this purpose.
You can find an example in Oracle Integration TSDS Enablement Example
Important considerations:
-
There is a limit on how many dimension fields a datastream can have. By default, this value is
21
. You can adjust this restriction by altering theindex.mapping.dimension_fields.limit
:elasticsearch: index_template: settings: index.mapping.dimension_fields.limit: 32 # Defaults to 21
- Dimension keys have a hard limit of 512b. Documents are rejected if this limit is reached.
- Dimension values have a hard limit of 1024b. Documents are rejected if this limit is reached.
ECS fields
editThere are fields that are part of every package, and they are potential candidates for becoming dimension fields:
-
host.name
-
service.address
-
agent.id
-
container.id
For products that are capable of running both on-premise and in a public cloud environment (by being deployed on public cloud virtual machines), it is recommended to annotate the ECS fields listed below as dimension fields:
-
host.name
-
service.address
-
container.id
-
cloud.account.id
-
cloud.provider
-
cloud.region
-
cloud.availability_zone
-
agent.id
-
cloud.instance.id
For products operating as managed services within cloud providers like AWS, Azure, and GCP, it is advised to label the fields listed below as dimension fields:
-
cloud.account.id
-
cloud.region
-
cloud.availability_zone
-
cloud.provider
-
agent.id
Note that for some packages some of these fields do not hold any value, so make sure to only use the needed ones.
Integration specific fields
editThe files.yml
file has the field mappings specific to a datastream of an integration. Some of these fields might need to be set as a dimension if the set of dimension fields in ECS is not enough to create a unique _tsid
.
Adding an inline comment prior to the dimension annotation is advised, detailing the rationale behind the choice of a particular field as a dimension field:
``` - name: wait_class type: keyword # Multiple events are generated based on the values of wait_class. Hence, it is a dimension dimension: true description: Every wait event belongs to a class of wait events. ```
Step 2: Set type for metric fields
editMetrics are fields that contain numeric measurements, as well as aggregations and/or down sampling values based off of those measurements. Annotate each metric with the correct metric type. The currently supported values are gauge
, counter
, and null
.
Example of adding a metric type to a field:
- name: compactions_failed type: double metric_type: counter description: | Counter of TSM compactions by level that have failed due to error.
Some of the aggregation functions are not supported for certain metric_type
values. In such a scenario, please revisit to see if the selection of metric_type
you made is indeed correct for that field. If valid, please create an issue in elastic/elasticsearch explaining the use case.
Step 3: Update Kibana version
editModify the kibana.version
to at least 8.8.0
in the manifest.yml
file of the package:
conditions: kibana.version: "^8.8.0"
Step 4: Enable time_series
index mode
editAdd the changes to the manifest.yml
file of the datastream as shown to enable the timeseries index mode:
elasticsearch: index_mode: "time_series"
Testing
edit- If the number of dimensions is insufficient, we will have loss of data. Consider testing this using the TSDS migration test kit.
-
Verify the dashboard is rendering the data properly. If certain visualisation do not work, consider migrating to Lens. Remember that certain aggregation functions are not supported when a field has metric type
counter
, for example,avg()
. Replace such aggregation functions with a supported aggregation type such asmax()
ormin()
.
Best practices
edit- Use Lens as the preferred visualisation type.
- Always assess the number of unique values the field that is selected to be a dimension would hold, especially if it is a numeric field. A field that holds millions of unique values may not be an ideal candidate for becoming a dimension field.
- If the dimension field value length is very long (max limit is 1024B), consider transforming the value to hash value representation. Fingerprint processor can be used for this purpose.
- In the field mapping files above each dimension field, add in-line comments stating the reason for selecting the field as a dimension field.
- As part of TSDS migration testing, you may discover other errors which may be unrelated to TSDS migration. Keep the pull request for TSDS migration free from such changes. This helps in obtaining quick PR approval.
Troubleshooting
editDropped documents
editIn the event that after enabling TSDS you notice that metrics data is being dropped from an index, the TSDS test migration kit can be used as a helpful debugging tool.
Conflicting field type
editFields having conflicting field type will not be considered as dimension. Resolve the field type ambiguity before defining a field as dimension field.
Identification of write index
editWhen mappings are modified for a datastream, index rollover happens and a new index is created under the datastream. Even if there exists a new index, the data continues to go to the old index until the timestamp matches index.time_series.start_time
of the newly created index.
An enhancement request for Kibana is created to indicate the write index. Until then, refer to the index.time_series.start_time of indices and compare with the current time to identify the write index.
If you find this error (for reference, see integrations issue #7345 and elasticsearch PR #98518):
... (status=400): {"type":"illegal_argument_exception","reason":"the document timestamp [2023-08-07T00:00:00.000Z] is outside of ranges of currently writable indices [[2023-08-07T08:55:38.000Z,2023-08-07T12:55:38.000Z]]"}, dropping event!
Consider:
-
Defining the
look_ahead
orlook_back_time
for each data stream. For example:elasticsearch: index_mode: "time_series" index_template: settings: index.look_ahead_time: "10h"
Updating the package with this does not cause an automatic rollover on the data stream. You have to do that manually.
-
Updating the
timestamp
of the document being rejected. - Finding a fix to receive the document without a delay.