Tencent helps to optimize Lucene caching: A solution to lock contention

August 15, 2024

Apache Lucene, a widely used open-source search library, excels in providing fast and accurate search capabilities, especially for projects built on top like Elasticsearch. However, Lucene can face performance challenges like any system under heavy use. One significant issue is lock contention in the cache system, which can become problematic as query volumes increase.

Tencent, a valued partner and community member, recently contributed to Lucene to address the lock contention in the cache system with read-write locks (RWLock). Contributions from partners like Tencent not only improve performance but also demonstrate the strength and collaborative spirit of the Elastic community.

Collaboration on products like Elasticsearch and Lucene by our community is something we greatly appreciate, with Tencent playing a significant role in our community in China. We’re really excited to see our community continue to journey with us as search enters the next era of AI.

Shay Banon, Chief Technology Officer at Elastic

Challenges of lock contention

In Lucene, caching is a critical feature used by almost all queries to enhance performance by storing the results of frequently executed queries, thus reducing the need to process the same data repeatedly. However, this caching mechanism can become a bottleneck under high query volumes due to the traditional approach of using an exclusive lock to manage cache access. While this ensures data integrity, it also introduces significant lock contention when multiple queries attempt to access the cache simultaneously. This contention can degrade performance, leading to slower query responses and a less efficient search experience.

The details

Elasticsearch, built on top of Lucene, can automatically detect and map new fields for users with its dynamic mapping feature. When users index low cardinality fields, such as height and age, they often use numbers to represent these values. Elasticsearch will infer these fields as "long" data types and use the BKD tree as the index for these long fields. As the data volume grows, building the result set for low-cardinality fields can lead to high CPU usage and increased load.

This issue is multiplied when the CPU is heavily used in bulk data operations. During a reindexing process, converting "long" fields to "keyword" fields can significantly reduce cluster load and search latency but can also be time-consuming. Elastic recommends using "keyword" for term/terms queries and "long" for range queries. However, users often don't realize the performance impact of using "long" for low cardinality fields, relying on dynamic mapping that automatically selects the type. Optimizing the BKD tree is a solution for low/medium cardinality fields that would make a significant difference. This understanding highlights the importance of making the BKD tree indexing more efficient for these types of fields, addressing both CPU load and search latency issues.

By addressing these underlying issues, particularly in the context of Elasticsearch's dynamic mapping and the use of "long" fields for low cardinality data, we can better understand the root causes of lock contention and develop more effective solutions.

Implementing read-write locks

To address this performance bottleneck, Tencent offered a solution using read-write locks. RWLock provides a more flexible and efficient way to handle concurrent access to shared resources. They allow multiple threads to read from the cache simultaneously while ensuring that write operations are mutually exclusive.

Key advantages of read-write locks:

Concurrent read access: Multiple queries can read from the cache at the same time, reducing wait times and improving overall throughput.
Controlled write access: Write operations still require an exclusive lock, ensuring data integrity during updates.
Reduced lock contention: By allowing simultaneous reads, the frequency and severity of lock contention are significantly diminished.
This change not only resolves the lock contention issue but also improves performance by 50% to 200%, depending on the complexity of the queries. For simpler queries, the performance boost is closer to 50%, while more complex queries can see improvements up to 200%.

doc count	field cardinality	query point	baseline QPS	candidate QPS	diff percentage	diff
30000000	10	1	2481	4408	78%	using LongAdder. uniqueQueries to be Collections.synchronizedMap. cache to be IdentityHashMap.

The proposed changes were accepted and merged into the main branch of Lucene. The corresponding commit, which details the implementation, can be found here: Lucene PR #13306. This integration marks a significant milestone in enhancing Lucene's performance and scalability.

Future impact

The implementation of read-write locks in Lucene's cache system is expected to have a profound impact on query performance across various scenarios. Here are some key areas where this optimization will make a difference:

Vector retrieval: Elasticsearch, a vector database, relies on vector retrieval for tasks, such as similarity searches. The optimized cache will handle these complex operations more efficiently.
BM25 scoring: BM25, a popular ranking function in information retrieval, will benefit from reduced latency and improved throughput, enhancing the relevance and speed of search results.
Range queries: Range queries, which often involve scanning large portions of the index, will see substantial performance gains, making them more viable for real-time applications.

Given that most query conditions in online services remain unchanged with only a few conditions varying, the improved cache can significantly enhance query performance. By leveraging the optimized cache, services can handle a higher volume of complex queries without compromising speed or accuracy.

Celebrating collaboration

We would like to celebrate Tencent cloud, a valued partner and community member, for their invaluable contribution to this enhancement. Tencent's efforts in implementing the RWLock have significantly improved performance and showcased the strength and collaborative spirit of the Elastic community. Contributions from partners like Tencent are crucial in driving innovation and improving the tools we all rely on. Their dedication exemplifies the best of what our community can achieve together.

Tencent cloud ES service believes that Elasticsearch is the preferred vector database for developers, highlighting their commitment to contributing to both Lucene and Elasticsearch. In the context of generative AI scenarios, Tencent cloud promotes Elasticsearch as the ideal vector database for developers. Tencent cloud ES service has become a key participant in customizing the retrieval augmented generation (RAG) standard set by the China Academy of Information and Communications Technology (CAICT) and was the first product to pass this standard. This underscores how the Elasticsearch development ecosystem is becoming the preferred choice for Chinese developers in RAG application development.

Optimizing Lucene's cache system with read-write locks is a big improvement that addresses the critical issue of lock contention. By allowing multiple read operations to occur simultaneously and ensuring exclusive locks only during write operations, Lucene's performance enhanced significantly. This optimization, now part of the main Lucene branch, promises to deliver faster and more efficient search capabilities for a wide range of applications.

The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.