NOTE: You are looking at documentation for an older release. For the latest information, see the current release documentation.
Using the annotated-text field
editUsing the annotated-text field
editThe annotated-text tokenizes text content as per the more common text field (see
"limitations" below) but also injects any marked-up annotation tokens directly into
the search index:
PUT my_index
{
"mappings": {
"_doc": {
"properties": {
"my_field": {
"type": "annotated_text"
}
}
}
}
}
Such a mapping would allow marked-up text eg wikipedia articles to be indexed as both text
and structured tokens. The annotations use a markdown-like syntax using URL encoding of
one or more values separated by the & symbol.
We can use the "_analyze" api to test how an example annotation would be stored as tokens in the search index:
GET my_index/_analyze
{
"field": "my_field",
"text":"Investors in [Apple](Apple+Inc.) rejoiced."
}
Response:
{
"tokens": [
{
"token": "investors",
"start_offset": 0,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "in",
"start_offset": 10,
"end_offset": 12,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "Apple Inc.",
"start_offset": 13,
"end_offset": 18,
"type": "annotation",
"position": 2
},
{
"token": "apple",
"start_offset": 13,
"end_offset": 18,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "rejoiced",
"start_offset": 19,
"end_offset": 27,
"type": "<ALPHANUM>",
"position": 3
}
]
}
|
Note the whole annotation token |
We can now perform searches for annotations using regular term queries that don’t tokenize
the provided search values. Annotations are a more precise way of matching as can be seen
in this example where a search for Beck will not match Jeff Beck :
# Example documents
PUT my_index/_doc/1
{
"my_field": "[Beck](Beck) announced a new tour"
}
PUT my_index/_doc/2
{
"my_field": "[Jeff Beck](Jeff+Beck&Guitarist) plays a strat"
}
# Example search
GET my_index/_search
{
"query": {
"term": {
"my_field": "Beck"
}
}
}
|
As well as tokenising the plain text into single words e.g. |
|
|
Note annotations can inject multiple tokens at the same position - here we inject both
the very specific value |
|
|
A benefit of searching with these carefully defined annotation tokens is that a query for
|
Any use of = signs in annotation values eg [Prince](person=Prince) will
cause the document to be rejected with a parse failure. In future we hope to have a use for
the equals signs so wil actively reject documents that contain this today.