Elasticsearch's new index mode, logsdb, reduces log storage needs by up to 65%
Today, we announce the general availability of Elasticsearch's new index mode, logsdb, which reduces the storage footprint of log data by up to 65% compared to recent versions of Elasticsearch without logsdb. This dramatic improvement enables observability and security teams to expand visibility without exceeding their budget while keeping all data immediately accessible for analysis.
Logsdb optimizes the ordering of data, eliminates duplication by reconstructing non-stored field values on the fly with synthetic _source
, and improves compression with advanced algorithms and codecs, leveraging columnar storage within Elasticsearch for efficient log storage and retrieval.
Enhance analytics and reduce costs by improving storage efficiency with logsdb index mode
Logs provide critical signals for detecting and remediating observability and security issues — and their utility is increasing as AI advancements ease the analysis of text-based data — so efficient storage and performant access matter more than ever.
Unfortunately, the growing log volume generated by infrastructure and applications is driving up costs, forcing compromises that hamper analysis: limit collection, reduce retention, or relegate fresh data to siloed archive tiers.
Logsdb directly addresses these challenges. With greater storage efficiency, you can collect more data and avoid the hassle of complicated data filtering. You can retain logs longer to support threat hunting, incident response, and compliance requirements. And because all data is always searchable, you can get fast insights, no matter how large your data set grows.
Technical innovation behind logsdb index mode
Logsdb index mode dramatically reduces the disk footprint of log data with smart index sorting, synthetic _source, and advanced compression. Implementing it can reduce log storage needs by up to 65%, compared to recent versions of Elasticsearch without logsdb. While logsdb currently uses more CPU during indexing, its efficient storage reduces overall costs for most customers. For customers who need long-term retention, we expect total cost of ownership (TCO) reductions of up to 50%.
Smart index sorting improves storage efficiency by up to 30% and reduces query latency on some logging data sets by locating similar data close together. By default, it sorts indices by host.name and @timestamp. If your data has more suitable fields, you can specify them instead.
Advanced compression significantly reduces storage requirements for text-heavy data like logs through Zstandard compression (Zstd), delta encoding, run-length encoding, and other smart codecs that are automatically chosen. Doc-values, which are stored in a columnar format optimized for compression and performance, enable efficient storage and retrieval of field values for sorting, aggregations, and scripting.
Synthetic _source enables organizations to trim storage needs by another 20-40% by discarding the _source field and fully or partially reconstructing it on demand. While the feature sometimes requires more compute for indexing and retrieval, testing shows that it delivers measurable net efficiency improvements. Synthetic _source is built on nearly two years of production usage with metrics, with numerous enhancements for logs, including support for nearly all field types.
Resulting storage savings are propagated through the index lifecycle phases. A storage reduction of 65% in the hot tier will result in the same reduction in the warm, cold, and frozen tiers, as well as reduce the footprint for storing snapshots in bucket storage.
No visibility compromises: Retain all logs for observability and security
Logs are the foundation of visibility into infrastructure and applications, providing the simplest and most essential signal for monitoring and troubleshooting. However, costs are rising as logging volumes grow. This challenge is forcing customers to implement complex filtering and management policies, delete data prematurely, and strand relevant logs in stores that require a day or longer to rehydrate before analysis. Without a complete, easily searchable, and accessible data set, finding and resolving issues is substantially more challenging.
Logsdb index mode builds on breakthrough Elasticsearch capabilities like searchable snapshots and Automatic Import to address these pain points for operations and security teams:
Reduce costs: Logsdb reduces the storage footprint of logs by up to 65%, enabling organizations to reduce storage expenses while retaining more data. This translates to cost savings across all storage tiers — from hot to frozen — and higher productivity for the observability and security teams who use this data.
Preserve valuable data: Logsdb keeps all your log data and improves operational efficiency without relying on extra tools or complicated filters. With features like synthetic _source, preserve the value of data without storing the entire source document.
Expand visibility: Logsdb provides efficient access to all data on one platform, without separate silos for observability, security, and historical data. For site reliability engineers (SREs), it accelerates problem resolution by enabling analysis of logs alongside metrics, traces, and business data. Likewise, for security operations center (SOC) teams, it accelerates investigation and remediation by eliminating blind spots.
Streamline access to data: Logsdb lets SRE teams efficiently retain actionable data for troubleshooting, trending, and analysis. Similarly, SOC teams can swiftly search all of their data for investigation and threat hunting without incurring exorbitant costs.
Ready for your environment
Elasticsearch logsdb index mode is generally available for Elastic Cloud Hosted and Self-Managed customers starting in version 8.17 and is enabled by default for logs in Elastic Cloud Serverless.
Basic logsdb capabilities (including smart index sorting and advanced compression) are available to organizations with Standard, Gold, and Platinum licenses. Complete logsdb capabilities that further reduce storage requirements (including synthetic _source) are available to serverless customers and organizations with an Enterprise license.
Seeing is believing
Logsdb enables you to keep all your log data and improve operational efficiency without narrowing collection or discarding or siloing data. With capabilities like smart index sorting, advanced compression, and synthetic _source, keep and analyze the data you need within a budget that works for you.
Want to experience it for yourself? Try Elastic at no cost.
The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.
Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!
Elasticsearch is packed with new features to help you build the best search solutions for your use case. Dive into our sample notebooks to learn more, start a free cloud trial, or try Elastic on your local machine now.