- .NET Clients: other versions:
- Introduction
- Installation
- Breaking changes
- API Conventions
- Elasticsearch.Net - Low level client
- NEST - High level client
- Troubleshooting
- Search
- Query DSL
- Full text queries
- Term level queries
- Exists Query Usage
- Fuzzy Date Query Usage
- Fuzzy Numeric Query Usage
- Fuzzy Query Usage
- Ids Query Usage
- Prefix Query Usage
- Date Range Query Usage
- Long Range Query Usage
- Numeric Range Query Usage
- Term Range Query Usage
- Regexp Query Usage
- Term Query Usage
- Terms Set Query Usage
- Terms List Query Usage
- Terms Lookup Query Usage
- Terms Query Usage
- Wildcard Query Usage
- Compound queries
- Joining queries
- Geo queries
- Specialized queries
- Span queries
- NEST specific queries
- Aggregations
- Metric Aggregations
- Average Aggregation Usage
- Boxplot Aggregation Usage
- Cardinality Aggregation Usage
- Extended Stats Aggregation Usage
- Geo Bounds Aggregation Usage
- Geo Centroid Aggregation Usage
- Geo Line Aggregation Usage
- Max Aggregation Usage
- Median Absolute Deviation Aggregation Usage
- Min Aggregation Usage
- Percentile Ranks Aggregation Usage
- Percentiles Aggregation Usage
- Rate Aggregation Usage
- Scripted Metric Aggregation Usage
- Stats Aggregation Usage
- String Stats Aggregation Usage
- Sum Aggregation Usage
- T Test Aggregation Usage
- Top Hits Aggregation Usage
- Top Metrics Aggregation Usage
- Value Count Aggregation Usage
- Weighted Average Aggregation Usage
- Bucket Aggregations
- Adjacency Matrix Usage
- Auto Date Histogram Aggregation Usage
- Children Aggregation Usage
- Composite Aggregation Usage
- Date Histogram Aggregation Usage
- Date Range Aggregation Usage
- Diversified Sampler Aggregation Usage
- Filter Aggregation Usage
- Filters Aggregation Usage
- Geo Distance Aggregation Usage
- Geo Hash Grid Aggregation Usage
- Geo Tile Grid Aggregation Usage
- Global Aggregation Usage
- Histogram Aggregation Usage
- Ip Range Aggregation Usage
- Missing Aggregation Usage
- Multi Terms Aggregation Usage
- Nested Aggregation Usage
- Parent Aggregation Usage
- Range Aggregation Usage
- Rare Terms Aggregation Usage
- Reverse Nested Aggregation Usage
- Sampler Aggregation Usage
- Significant Terms Aggregation Usage
- Significant Text Aggregation Usage
- Terms Aggregation Usage
- Variable Width Histogram Usage
- Pipeline Aggregations
- Average Bucket Aggregation Usage
- Bucket Script Aggregation Usage
- Bucket Selector Aggregation Usage
- Bucket Sort Aggregation Usage
- Cumulative Cardinality Aggregation Usage
- Cumulative Sum Aggregation Usage
- Derivative Aggregation Usage
- Extended Stats Bucket Aggregation Usage
- Max Bucket Aggregation Usage
- Min Bucket Aggregation Usage
- Moving Average Ewma Aggregation Usage
- Moving Average Holt Linear Aggregation Usage
- Moving Average Holt Winters Aggregation Usage
- Moving Average Linear Aggregation Usage
- Moving Average Simple Aggregation Usage
- Moving Function Aggregation Usage
- Moving Percentiles Aggregation Usage
- Normalize Aggregation Usage
- Percentiles Bucket Aggregation Usage
- Serial Differencing Aggregation Usage
- Stats Bucket Aggregation Usage
- Sum Bucket Aggregation Usage
- Matrix Aggregations
- Metric Aggregations
Testing analyzers
editTesting analyzers
editWhen building your own analyzers, it’s useful to test that the analyzer does what we expect it to. This is where the Analyze API comes in.
Testing in-built analyzers
editTo get started with the Analyze API, we can test to see how a built-in analyzer will analyze a piece of text
var analyzeResponse = client.Indices.Analyze(a => a .Analyzer("standard") .Text("F# is THE SUPERIOR language :)") );
This returns the following response from Elasticsearch
{ "tokens": [ { "token": "f", "start_offset": 0, "end_offset": 1, "type": "<ALPHANUM>", "position": 0 }, { "token": "is", "start_offset": 3, "end_offset": 5, "type": "<ALPHANUM>", "position": 1 }, { "token": "the", "start_offset": 6, "end_offset": 9, "type": "<ALPHANUM>", "position": 2 }, { "token": "superior", "start_offset": 10, "end_offset": 18, "type": "<ALPHANUM>", "position": 3 }, { "token": "language", "start_offset": 19, "end_offset": 27, "type": "<ALPHANUM>", "position": 4 } ] }
which is deserialized to an instance of AnalyzeResponse
by NEST
that we can work with
foreach (var analyzeToken in analyzeResponse.Tokens) { Console.WriteLine($"{analyzeToken.Token}"); }
In testing the standard
analyzer on our text, we’ve noticed that
-
F#
is tokenized as"f"
-
stop word tokens
"is"
and"the"
are included -
"superior"
is included but we’d also like to tokenize"great"
as a synonym for superior
We’ll look at how we can test a combination of built-in analysis components next to build an analyzer to fit our needs.
Testing built-in analysis components
editA transient analyzer can be composed from built-in analysis components to test an analysis configuration
var analyzeResponse = client.Indices.Analyze(a => a .Tokenizer("standard") .Filter("lowercase", "stop") .Text("F# is THE SUPERIOR language :)") );
{ "tokens": [ { "token": "f", "start_offset": 0, "end_offset": 1, "type": "<ALPHANUM>", "position": 0 }, { "token": "superior", "start_offset": 10, "end_offset": 18, "type": "<ALPHANUM>", "position": 3 }, { "token": "language", "start_offset": 19, "end_offset": 27, "type": "<ALPHANUM>", "position": 4 } ] }
Great! This has removed stop words, but we still have F#
tokenized as "f"
and no "great"
synonym for "superior"
.
Character and Token filters are applied in the order in which they are specified.
Let’s build a custom analyzer with additional components to solve this.
Testing a custom analyzer in an index
editA custom analyzer can be created within an index, either when creating the index or by updating the settings on an existing index.
When adding to an existing index, it needs to be closed first.
In this example, we’ll add a custom analyzer to an existing index. First, we need to close the index
client.Indices.Close("analysis-index");
Now, we can update the settings to add the analyzer
client.Indices.UpdateSettings("analysis-index", i => i .IndexSettings(s => s .Analysis(a => a .CharFilters(cf => cf .Mapping("my_char_filter", m => m .Mappings("F# => FSharp") ) ) .TokenFilters(tf => tf .Synonym("my_synonym", sf => sf .Synonyms("superior, great") ) ) .Analyzers(an => an .Custom("my_analyzer", ca => ca .Tokenizer("standard") .CharFilters("my_char_filter") .Filters("lowercase", "stop", "my_synonym") ) ) ) ) );
And open the index again. Here, we also wait up to five seconds for the status of the index to become green
client.Indices.Open("analysis-index"); client.Cluster.Health("analysis-index",h => h .WaitForStatus(WaitForStatus.Green) .Timeout(TimeSpan.FromSeconds(5)) );
With the index open and ready, let’s test the analyzer
var analyzeResponse = client.Indices.Analyze(a => a .Index("analysis-index") .Analyzer("my_analyzer") .Text("F# is THE SUPERIOR language :)") );
Since we added the custom analyzer to the "analysis-index" index, we need to target this index to test it |
The output now looks like
{ "tokens": [ { "token": "fsharp", "start_offset": 0, "end_offset": 2, "type": "<ALPHANUM>", "position": 0 }, { "token": "superior", "start_offset": 10, "end_offset": 18, "type": "<ALPHANUM>", "position": 3 }, { "token": "great", "start_offset": 10, "end_offset": 18, "type": "SYNONYM", "position": 3 }, { "token": "language", "start_offset": 19, "end_offset": 27, "type": "<ALPHANUM>", "position": 4 } ] }
Exactly what we were after!
Testing an analyzer on a field
editIt’s also possible to test the analyzer for a given field type mapping. Given an index created with the following settings and mappings
client.Indices.Create("project-index", i => i .Settings(s => s .Analysis(a => a .CharFilters(cf => cf .Mapping("my_char_filter", m => m .Mappings("F# => FSharp") ) ) .TokenFilters(tf => tf .Synonym("my_synonym", sf => sf .Synonyms("superior, great") ) ) .Analyzers(an => an .Custom("my_analyzer", ca => ca .Tokenizer("standard") .CharFilters("my_char_filter") .Filters("lowercase", "stop", "my_synonym") ) ) ) ) .Map<Project>(mm => mm .Properties(p => p .Text(t => t .Name(n => n.Name) .Analyzer("my_analyzer") ) ) ) );
The analyzer on the name
field can be tested with
var analyzeResponse = client.Indices.Analyze(a => a .Index("project-index") .Field<Project, string>(f => f.Name) .Text("F# is THE SUPERIOR language :)") );
On this page