Completion Suggester

edit

In order to understand the format of suggestions, please read the Suggesters page first.

The completion suggester is a so-called prefix suggester. It does not do spell correction like the term or phrase suggesters but allows basic auto-complete functionality.

Why another suggester? Why not prefix queries?

edit

The first question which comes to mind when reading about a prefix suggestion is, why you should use it at all, if you have prefix queries already. The answer is simple: Prefix suggestions are fast.

The data structures are internally backed by Lucenes AnalyzingSuggester, which uses FSTs (finite state transducers) to execute suggestions. Usually these data structures are costly to create, stored in-memory and need to be rebuilt every now and then to reflect changes in your indexed documents. The completion suggester circumvents this by storing the FST (finite state transducer) as part of your index during index time. This allows for really fast loads and executions.

Mapping

edit

In order to use this feature, you have to specify a special mapping for this field, which enables the special storage of the field.

curl -X PUT localhost:9200/music
curl -X PUT localhost:9200/music/song/_mapping -d '{
  "song" : {
        "properties" : {
            "name" : { "type" : "string" },
            "suggest" : { "type" : "completion",
                          "analyzer" : "simple",
                          "search_analyzer" : "simple",
                          "payloads" : true
            }
        }
    }
}'

Mapping supports the following parameters:

analyzer
The index analyzer to use, defaults to simple. In case you are wondering why we did not opt for the standard analyzer: We try to have easy to understand behaviour here, and if you index the field content At the Drive-in, you will not get any suggestions for a, nor for d (the first non stopword).
search_analyzer
The search analyzer to use, defaults to value of analyzer.
payloads
Enables the storing of payloads, defaults to false
preserve_separators
Preserves the separators, defaults to true. If disabled, you could find a field starting with Foo Fighters, if you suggest for foof.
preserve_position_increments
Enables position increments, defaults to true. If disabled and using stopwords analyzer, you could get a field starting with The Beatles, if you suggest for b. Note: You could also achieve this by indexing two inputs, Beatles and The Beatles, no need to change a simple analyzer, if you are able to enrich your data.
max_input_length
Limits the length of a single input, defaults to 50 UTF-16 code points. This limit is only used at index time to reduce the total number of characters per input string in order to prevent massive inputs from bloating the underlying datastructure. The most usecases won’t be influenced by the default value since prefix completions hardly grow beyond prefixes longer than a handful of characters. (Old name "max_input_len" is deprecated)

Indexing

edit
curl -X PUT 'localhost:9200/music/song/1?refresh=true' -d '{
    "name" : "Nevermind",
    "suggest" : {
        "input": [ "Nevermind", "Nirvana" ],
        "output": "Nirvana - Nevermind",
        "payload" : { "artistId" : 2321 },
        "weight" : 34
    }
}'

The following parameters are supported:

input
The input to store, this can be a an array of strings or just a string. This field is mandatory.
output
The string to return, if a suggestion matches. This is very useful to normalize outputs (i.e. have them always in the format artist - songname). This is optional. Note: The result is de-duplicated if several documents have the same output, i.e. only one is returned as part of the suggest result.
payload
An arbitrary JSON object, which is simply returned in the suggest option. You could store data like the id of a document, in order to load it from elasticsearch without executing another search (which might not yield any results, if input and output differ strongly).
weight
A positive integer or a string containing a positive integer, which defines a weight and allows you to rank your suggestions. This field is optional.

Even though you will lose most of the features of the completion suggest, you can choose to use the following shorthand form. Keep in mind that you will not be able to use several inputs, an output, payloads or weights. This form does still work inside of multi fields.

{
  "suggest" : "Nirvana"
}

The suggest data structure might not reflect deletes on documents immediately. You may need to do an Optimize for that. You can call optimize with the only_expunge_deletes=true to only target deletions for merging. By default only_expunge_deletes=true will only select segments for merge where the percentage of deleted documents is greater than 10% of the number of document in that segment. To adjust this index.merge.policy.expunge_deletes_allowed can be updated to a value between [0..100]. Please remember even with this option set, optimize is considered a extremely heavy operation and should be called rarely.

Querying

edit

Suggesting works as usual, except that you have to specify the suggest type as completion.

curl -X POST 'localhost:9200/music/_suggest?pretty' -d '{
    "song-suggest" : {
        "text" : "n",
        "completion" : {
            "field" : "suggest"
        }
    }
}'

{
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "song-suggest" : [ {
    "text" : "n",
    "offset" : 0,
    "length" : 1,
    "options" : [ {
      "text" : "Nirvana - Nevermind",
      "score" : 34.0, "payload" : {"artistId":2321}
    } ]
  } ]
}

As you can see, the payload is included in the response, if configured appropriately. If you configured a weight for a suggestion, this weight is used as score. Also the text field uses the output of your indexed suggestion, if configured, otherwise the matched part of the input field.

The basic completion suggester query supports the following two parameters:

field
The name of the field on which to run the query (required).
size
The number of suggestions to return (defaults to 5).

The completion suggester considers all documents in the index. See Context Suggester for an explanation of how to query a subset of documents instead.

Fuzzy queries

edit

The completion suggester also supports fuzzy queries - this means, you can actually have a typo in your search and still get results back.

curl -X POST 'localhost:9200/music/_suggest?pretty' -d '{
    "song-suggest" : {
        "text" : "n",
        "completion" : {
            "field" : "suggest",
            "fuzzy" : {
                "fuzziness" : 2
            }
        }
    }
}'

The fuzzy query can take specific fuzzy parameters. The following parameters are supported:

fuzziness

The fuzziness factor, defaults to AUTO. See Fuzziness for allowed settings.

transpositions

if set to true, transpositions are counted as one change instead of two, defaults to true

min_length

Minimum length of the input before fuzzy suggestions are returned, defaults 3

prefix_length

Minimum length of the input, which is not checked for fuzzy alternatives, defaults to 1

unicode_aware

Sets all are measurements (like edit distance, transpositions and lengths) in unicode code points (actual letters) instead of bytes.

If you want to stick with the default values, but still use fuzzy, you can either use fuzzy: {} or fuzzy: true.