This Week in Elasticsearch and Apache Lucene - Query Profiler and Geopoint Fields
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Top News
Elasticsearch 2.2.0 released with a query profiler and supercharged geopoint fields. https://t.co/CKjrpuLKYg Already available on Found
— elastic (@elastic) February 2, 2016
Elasticsearch Core
2.2:
- The SmbDirectoryWrapper in the Azure plugin is now an elasticsearch package to avoid hiding bugs like calling ensureOpen on the wrong directory.
- The NotSerializableExceptionWrappe
r now includes the exception class name.
2.x:
- Upgrade to lucene 5.5.0-snapshot-4de5f1d
- RPM and deb signing is now tested during build.
- Requests that shouldn't be allowed according to CORS settings are now rejected before they are executed.
master:
- The plugin CLI has been refactored to reduce complexity and ambiguity, and to improve exception handling.
- The ingest pipeline adds processor tags to the ingest metadata on failure.
- Catch processor/pipeline exceptions and throw structured exceptions.
- Added the foreach processor to ingest for dealing with arrays.
- Prevent index/delete/flush requests from bouncing between two primary shard copies during relocation.
- Shard failure requests for no longer existing shards3 now generate an exception.
- Clean handoff during primary relocation now ensures that no index/delete requests are lost. This fixes a long standing issue: Delete might returns false `isFound()` while primary is relocated
- Tasks can report their status.
- The settings filter to remove private settings is now immutable.
- Pluggable custom gateways are no longer supported.
- Shard version information is no longer used for shard routing now that we have allocation IDs.
- The bin/plugin script is now called bin/elasticsearch-plugin.
- The TermVector API no longer supports the DFS option as it was very heavy and added little value.
- The cat API now respects the Accepts header instead of the Content-type header, when choosing the response format.
- The IndicesFieldDataCache has been simplified and no longer uses Guice.
- MessageDigest instances are no longer cloned (as some platforms don't support it) but return thread local instances instead.
Ongoing:
- The reindex API can be run in the background with the wait_for_completion parameter, which defaults to `true`. It also supports a progress indicator.
- Unify plugin packaging structure across projects.
- Index folders will now include the index UUID (and sanitise the index name to avoid problems with different file systems).
- Work continues on the monumental search refactoring.
Apache Lucene
- The current plan is to cut the 5.5.0 release branch in a few days and once the 5.5.0 release is done we'll get the 6.0.0 release process underway!
- More progress on the challenging change to push retrying of file deletion down under the
Directory
abstraction, instead of making it the caller's job - The new postings-based geo point queries are graduating from the experimental sandbox module into the spatial module, and the previous spatial module classes (with optional
spatial4j
dependency) are moving to a newspatial-extras
module, as a precursor to nice geo point performance gains added in a backwards compatible way - Our copyright headers now appear at the very top of all sources, and our IDE configs are now fixed do so for new source files as well
- Randomized tests uncovered a missing try-with-resources in the new
SimpleTextPointWriter
- We don't need to
Files.deleteIfExists
when creating a new index file, since we already pass theTRUNCATE_EXISTING
option IndexWriter
now logs how long it took to flush each part of a new segment- Now it's possible to fully wrap another
MergePolicy
- More geo math tweaks to avoid exceeding the legal range for latitude and longitude
- A new
expectThrows
utility uses lambda expressions to compactly expect a test to throw a specific exception and fail otherwise, but we still need to somehow cutover numerous tests BaseMergePolicyTestCase
is now used by more tests, but it caused a reproducible test failure fixed by this issueTieredMergePolicy
had an extra=
in an exception message- The new
TestSwappedIndexFiles,
designed to ensure that copying the same file name from a different index is always detected as corruption, had a scary failure, but it was a simple test bug - Some more small fixes for the new (coming in 6.0.0) point values:
- Point fields failed to detect some misuse
- 2D point values are also exercised in a few more tests
- We now test
addIndexes
with point values when the field numbers changed, and across different codecs - The new
BasePointFormatTestCase
shares common code and makes it easy to test new point formats in the future - Tests that assert two readers are equal now also verify the point values are the same
- Codec level encryption offers fine-grained control over which parts of the index need encryption
- Another corner case geo point test failure
FastVectorHighlighter
hitsStr<wbr>ingIndexOutOfBoundsException
in some cases - We should standardize on
TimeUnit
for time conversions - The points based and postings based geo implementations use different encodings with different quantization errors
MultiCollector
might throwNullPointerException
when one if its sub-collectors throws CollectTerminatedExcept<wbr>ion
- A new utility class runs a
TokenFilter
on a string and prints the results to help debugging
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!