kNN search API

edit

This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.

Performs a k-nearest neighbor (kNN) search and returns the matching documents.

GET my-index/_knn_search
{
  "knn": {
    "field": "image_vector",
    "query_vector": [0.3, 0.1, 1.2],
    "k": 10,
    "num_candidates": 100
  },
  "_source": ["name", "date"]
}

Request

edit

GET <target>/_knn_search

POST <target>/_knn_search

Prerequisites

edit
  • If the Elasticsearch security features are enabled, you must have the read index privilege for the target data stream, index, or alias.

Description

edit

The kNN search API performs a k-nearest neighbor (kNN) search on a dense_vector field. Given a query vector, it finds the k closest vectors and returns those documents as search hits.

Elasticsearch uses the HNSW algorithm to support efficient kNN search. Like most kNN algorithms, HNSW is an approximate method that sacrifices result accuracy for improved search speed. This means the results returned are not always the true k closest neighbors.

Path parameters

edit
<target>
(Optional, string) Comma-separated list of data streams, indices, and aliases to search. Supports wildcards (*). To search all data streams and indices, use * or _all.

kNN search does not yet work with filtered aliases. Running a kNN search against a filtered alias may incorrectly result in fewer than k hits.

Query parameters

edit
routing
(Optional, string) Custom value used to route operations to a specific shard.

Request body

edit
knn

(Required, object) Defines the kNN query to run.

Properties of knn object
field
(Required, string) The name of the vector field to search against. Must be a dense_vector field with indexing enabled.
query_vector
(Required, array of floats) Query vector. Must have the same number of dimensions as the vector field you are searching against.
k
(Required, integer) Number of nearest neighbors to return as top hits. This value must be less than num_candidates.
num_candidates
(Required, integer) The number of nearest neighbor candidates to consider per shard. Cannot exceed 10,000. Elasticsearch collects num_candidates results from each shard, then merges them to find the top k results. Increasing num_candidates tends to improve the accuracy of the final k results.
docvalue_fields

(Optional, array of strings and objects) Array of field patterns. The request returns values for field names matching these patterns in the hits.fields property of the response.

You can specify items in the array as a string or object. See Doc value fields.

Properties of docvalue_fields objects
field
(Required, string) Wildcard pattern. The request returns doc values for field names matching this pattern.
format

(Optional, string) Format in which the doc values are returned.

For date fields, you can specify a date date format. For numeric fields fields, you can specify a DecimalFormat pattern.

For other field data types, this parameter is not supported.

fields

(Optional, array of strings and objects) Array of field patterns. The request returns values for field names matching these patterns in the hits.fields property of the response.

You can specify items in the array as a string or object. See The fields option.

Properties of fields objects
field
(Required, string) Field to return. Supports wildcards (*).
format

(Optional, string) Format for date and geospatial fields. Other field data types do not support this parameter.

date and date_nanos fields accept a date format. geo_point and geo_shape fields accept:

geojson (default)
GeoJSON
wkt
Well Known Text
mvt(<zoom>/<x>/<y>@<extent>) or mvt(<zoom>/<x>/<y>)

Binary Mapbox vector tile. The API returns the tile as a base64-encoded string.

mvt parameters
<zoom>
(Required, integer) Zoom level for the tile. Accepts 0-29.
<x>
(Required, integer) X coordinate for the tile.
<y>
(Required, integer) Y coordinate for the tile.
<extent>
(Optional, integer) Size, in pixels, of a side of the tile. Vector tiles are square with equal sides. Defaults to 4096.
_source

(Optional) Indicates which source fields are returned for matching documents. These fields are returned in the hits._source property of the search response. Defaults to true. See The _source option.

Valid values for _source
true
(Boolean) The entire document source is returned.
false
(Boolean) The document source is not returned.
<wildcard_pattern>
(string or array of strings) Wildcard (*) pattern or array of patterns containing source fields to return.
<object>

(object) Object containing a list of source fields to include or exclude.

Properties for <object>
excludes

(string or array of strings) Wildcard (*) pattern or array of patterns containing source fields to exclude from the response.

You can also use this property to exclude fields from the subset specified in includes property.

includes

(string or array of strings) Wildcard (*) pattern or array of patterns containing source fields to return.

If this property is specified, only these source fields are returned. You can exclude fields from this subset using the excludes property.

stored_fields

(Optional, string) A comma-separated list of stored fields to return as part of a hit. If no fields are specified, no stored fields are included in the response. See Stored fields.

If this option is specified, the _source parameter defaults to false. You can pass _source: true to return both source fields and stored fields in the search response.

Response body

edit

A kNN search response has the exact same structure as a search API response. However, certain sections have a meaning specific to kNN search:

  • The document _score is determined by the similarity between the query and document vector. See similarity.
  • The hits.total object contains the total number of nearest neighbor candidates considered, which is num_candidates * num_shards. The hits.total.relation will always be eq, indicating an exact value.