07 July 2018

This Week in Elasticsearch and Apache Lucene - 2018-07-07

Paul Sanwald

•

•

•

•

Elasticsearch

Highlights

We have added documentation for painless script contexts, which includes each place in the Elasticsearch APIs that a script may be used, as well as what variables are available in each of those contexts.

As part of our ingest node work, we have added a "bytes" processor that converts human readable byte sizes (e.g., 1kb) to raw byte sizes (e.g., 1024). The new processor has been merged and is targeted to the next 6.x minor.

We have opened a PR which will allow for more flexibility in how fields are selected for inclusion in “all” queries. This will remove the current limitation in which plugins cannot control whether or not fields are searched in an “all” query.

We have undertaken an effort to improve testability and test coverage of our cloud platforms integration. Examples include: clean up some repository-s3 tests , merge AwsS3Service and InternalAwsS3Service in a S3Service class and Merge AzureStorageService and AzureStorageServiceImpl and clean up tests.

We recently enhanced our support for AWS session tokens by adding support for 3-part credentials. With MFA-secured AWS access, you use your permanent (2-part) credentials plus the MFA code to obtain a different set of temporary (3-part) credentials which permit access to the desired resources. Today, Elasticsearch can obtain temporary credentials from the EC2 metadata service but they cannot be supplied by the user as would be needed for use outside of EC2. In 6.4.0 Elasticsearch gains support for three-part temporary credentials supplied by the user, which means that, via the repository-s3 plugin, it's possible to snapshot and restore to a MFA-secured S3 bucket from outside of EC2.

Changes in 5.6:

Propagate mapping.single_type setting on shrinked index #31811

Changes in 6.3:

SQL: Allow long literals #31777
JDBC: Fix stackoverflow on getObject and timestamp conversion #31735
SQL: Fix incorrect message for aliases #31792
Watcher: Fix check for currently executed watches #31137

Changes in 6.4:

REST high-level client: add get index API #31703
Fix handling of points_only with term strategy in geo_shape #31766
Watcher: Consolidate setting update registration #31762
Fix not waiting for Netty ThreadDeathWatcher in IT #31758
Fix not waiting for Netty ThreadDeathWatcher in IT (#31758) #31789
Add analyze API to high-level rest client #31577
REST high-level client: add cluster get settings API #31706
Implemented XContent serialisation for GetIndexResponse #31675
Fixture for Minio testing #31688
ingest: Introduction of a bytes processor #31733
Fix coerce validation_method in GeoBoundingBoxQueryBuilder #31747
Add support for AWS session tokens #30414
resolveHasher defaults to NOOP #31723
Split CircuitBreaker-related tests #31659
Add write*Blob option to replace existing blob #31729
Watcher: Fix chain input toXcontent serialization #31721
Extend allowed characters for grok field names (#21745) (#31653) #31722

Changes in 7.0:

Remove support for deprecated StoredScript contexts #31394
Account for XContent overhead in in-flight breaker #31613
has_parent builder: exception message/param fix #31182

Lucene

Reclaiming deletes through merges

Today, the default merge policy, called TieredMergePolicy, exposes an opaque 'reclaimDeletesWeight' parameter to configure how aggressively deletes should be reclaimed. Its value is used in the function that scores merges. Unfortunately, values of this parameter don't mean much, only larger values will reclaim deleted documents more aggressively at the expense of more I/O. There is a suggestion that we replace it with a new 'indexPctDeletedTarget' parameter, which defines the maximum percentage of deleted documents that the index may have, which is much easier to reason about.

Other

Lucene 6.6.5 was released. This release contains no changes and was done because Lucene and Solr must be released at the same time and Solr needed to do a bugfix release because of an XXE vulnerability.
Recent refactorings to TieredMergePolicy introduced some subtle bugs.
Discussing a suggestion that the unused PostingsEnum#attributes API gets removed.
We are exploring how the matches API could be improved to allow for better highlighting by exposing information about matching terms.
Discussing a proposal to clean up access to slices in IndexSearcher. Slices are a subset of the segments of an index, which are searched concurrently. IndexSearcher merges results in the end using TopDocs#merge.
We noticed that merges would include hard deletes when counting the number of soft deletes.
Discussing an old issue about adding a much needed expansion limit to SpanMultiTermQueryWrapper.
Added a new helper method to create an iterator over a range of doc ids, which could typically represent the set of matches of a range query over a field that is used for index sorting.

The Search AI Company

ELK Stack

Elastic Cloud

Generative AI

Search

Security

Observability

By solution

Industries

Customer spotlight

Research

Build

Learn

Connect

This Week in Elasticsearch and Apache Lucene - 2018-07-07

Follow us

About us

Join us

Partners

Trust & Security

Investor relations

EXCELLENCE AWARDS