When working with observability and logging data, not all documents make it into Elasticsearch in pristine condition. Some may be dropped due to processing failures in ingest pipelines or mapping errors, while others may be partially ingested with ignored fields if a fields value is incompatible with the defined mappings. These issues can impact downstream analysis and dashboards. Streams data quality makes it easier than ever to monitor the health of your ingested data, identify potential issues, and take corrective action right from the UI. With data quality, you can now see exactly how well your Stream is performing and quickly understand whether your data has a Good, Degraded, or Poor quality.
What's in data quality
At-a-glance summary
The summary card shows:
- Degraded documents - Documents that contain the _ignoredfield - see this for more info.
- Failed documents - Documents that were rejected at ingestion due to mapping conflicts or pipeline failures.
The overall quality score (Good, Degraded, Poor) is automatically calculated based on the percentage of degraded and failed documents.
Trends over time
The tab includes a time-series chart so you can track how degraded and failed documents are accumulating over time. Use the date picker to zoom into a specific range and understand when problems are spiking.
Quality issues table
A detailed table lists the types of issues affecting your stream. For each issue, you can:
- See which fields are causing problems.
- Review counts of affected documents.
- Filter by issues that have not been solved yet (Current issues only).
- Open a flyout to dive deeper into the cause of the issue and learn how to fix it.
Monitoring degraded documents
A degraded document is one that contains the
To help keep these issues under control, the Data quality tab provides visibility into the percentage of degraded documents in your stream.
Set up a rule to stay ahead of issues
You can use the Create rule button above the Degraded docs chart to define an alert that notifies you when the percentage of degraded documents crosses a certain threshold. This makes it easy to proactively monitor for mapping mismatches and ensure your data continues to meet quality expectations.
For more information on how to configure this rule, see Degraded docs rule conditions.
Handling failed documents with the failure store
Failure store is a special index that captures documents rejected during ingestion. Instead of losing this data, the failure store retains it in a dedicated
In Data Quality tab, the failed documents are only visible if your stream has a failure store enabled, for checking failure store documents you are required to have at least
Once enabled, you can edit the failure store configuration or disable it at any time using the Edit button above the failed docs chart.
The failure store can also be configured in the Streams Retention tab - see this article for more information.
Technical implementation
Under the hood, the Data quality tab builds on the existing Dataset quality plugin - the same one that powers the Dataset quality page in Stack Management. However, instead of working in the context of datasets following the Data stream naming scheme, it’s now tailored specifically for streams.
To determine the quality of a stream, the UI sends three ES|QL query server requests:
- All documents (including failures):
FROM myStream, myStream::failures | STATS doc_count = COUNT(*)
- Failed documents only:
FROM myStream::failures | STATS failed_doc_count = COUNT(*)
- Degraded documents:
FROM myStream METADATA _ignored | WHERE _ignored IS NOT NULL | STATS degraded_doc_count = COUNT(*)
The results of these queries are then used to calculate the percentages of failed and degraded documents. The overall data quality is determined using simple thresholds:
- Good: Both percentages are 0%
- Degraded: Any percentage is greater than 0% but less than 3%
- Poor: Any percentage is above 3%
For managing the failure store, Streams uses the Update data stream options API with the
Why you’ll love this
The new Data quality tab gives you:
- Visibility into ingestion problems without digging into logs
- A clear breakdown of degraded vs. failed documents
- Insights into which fields are ignored and why
- Tools to capture and troubleshoot failed docs with the failure store
By surfacing data quality issues directly in the Streams UI, we’re making it easier to keep your data flowing reliably and to ensure your analytics are built on a strong foundation.
Try it out today
The data quality feature is available in Elastic Observability on Serverless, and coming soon for self-managed and Elastic Cloud users.
Sign up for an Elastic trial at cloud.elastic.co, and trial Elastic's Serverless offering which will allow you to play with all of the Streams functionality.
For more information on Streams:
Read about Reimagining streams
Look at the Streams website
Read the Streams documentation
