Tutorial: hybrid search with semantic_text
editTutorial: hybrid search with semantic_text
editThis tutorial demonstrates how to perform hybrid search, combining semantic search with traditional full-text search.
In hybrid search, semantic search retrieves results based on the meaning of the text, while full-text search focuses on exact word matches. By combining both methods, hybrid search delivers more relevant results, particularly in cases where relying on a single approach may not be sufficient.
The recommended way to use hybrid search in the Elastic Stack is following the semantic_text
workflow. This tutorial uses the elser
service for demonstration, but you can use any service and its supported models offered by the Inference API.
Create the inference endpoint
editCreate an inference endpoint by using the Create inference API:
resp = client.inference.put( task_type="sparse_embedding", inference_id="my-elser-endpoint", inference_config={ "service": "elser", "service_settings": { "adaptive_allocations": { "enabled": True, "min_number_of_allocations": 3, "max_number_of_allocations": 10 }, "num_threads": 1 } }, ) print(resp)
PUT _inference/sparse_embedding/my-elser-endpoint { "service": "elser", "service_settings": { "adaptive_allocations": { "enabled": true, "min_number_of_allocations": 3, "max_number_of_allocations": 10 }, "num_threads": 1 } }
The task type is |
|
The |
|
This setting enables and configures adaptive allocations. Adaptive allocations make it possible for ELSER to automatically scale up or down resources based on the current load on the process. |
You might see a 502 bad gateway error in the response when using the Kibana Console. This error usually just reflects a timeout, while the model downloads in the background. You can check the download progress in the Machine Learning UI.
Create an index mapping for hybrid search
editThe destination index will contain both the embeddings for semantic search and the original text field for full-text search. This structure enables the combination of semantic search and full-text search.
resp = client.indices.create( index="semantic-embeddings", mappings={ "properties": { "semantic_text": { "type": "semantic_text", "inference_id": "my-elser-endpoint" }, "content": { "type": "text", "copy_to": "semantic_text" } } }, ) print(resp)
PUT semantic-embeddings { "mappings": { "properties": { "semantic_text": { "type": "semantic_text", "inference_id": "my-elser-endpoint" }, "content": { "type": "text", "copy_to": "semantic_text" } } } }
The name of the field to contain the generated embeddings for semantic search. |
|
The identifier of the inference endpoint that generates the embeddings based on the input text. |
|
The name of the field to contain the original text for lexical search. |
|
The textual data stored in the |
If you want to run a search on indices that were populated by web crawlers or connectors, you have to
update the index mappings for these indices to
include the semantic_text
field. Once the mapping is updated, you’ll need to run a full web crawl or a full connector sync. This ensures that all existing
documents are reprocessed and updated with the new semantic embeddings, enabling hybrid search on the updated data.
Load data
editIn this step, you load the data that you later use to create embeddings from.
Use the msmarco-passagetest2019-top1000
data set, which is a subset of the MS MARCO Passage Ranking data set. It consists of 200 queries, each accompanied by a list of relevant text passages. All unique passages, along with their IDs, have been extracted from that data set and compiled into a tsv file.
Download the file and upload it to your cluster using the Data Visualizer in the Machine Learning UI. After your data is analyzed, click Override settings. Under Edit field names, assign id
to the first column and content
to the second. Click Apply, then Import. Name the index test-data
, and click Import. After the upload is complete, you will see an index named test-data
with 182,469 documents.
Reindex the data for hybrid search
editReindex the data from the test-data
index into the semantic-embeddings
index.
The data in the content
field of the source index is copied into the content
field of the destination index.
The copy_to
parameter set in the index mapping creation ensures that the content is copied into the semantic_text
field. The data is processed by the inference endpoint at ingest time to generate embeddings.
This step uses the reindex API to simulate data ingestion. If you are working with data that has already been indexed,
rather than using the test-data
set, reindexing is still required to ensure that the data is processed by the inference endpoint
and the necessary embeddings are generated.
resp = client.reindex( wait_for_completion=False, source={ "index": "test-data", "size": 10 }, dest={ "index": "semantic-embeddings" }, ) print(resp)
const response = await client.reindex({ wait_for_completion: "false", source: { index: "test-data", size: 10, }, dest: { index: "semantic-embeddings", }, }); console.log(response);
POST _reindex?wait_for_completion=false { "source": { "index": "test-data", "size": 10 }, "dest": { "index": "semantic-embeddings" } }
The default batch size for reindexing is 1000. Reducing size to a smaller number makes the update of the reindexing process quicker which enables you to follow the progress closely and detect errors early. |
The call returns a task ID to monitor the progress:
resp = client.tasks.get( task_id="<task_id>", ) print(resp)
const response = await client.tasks.get({ task_id: "<task_id>", }); console.log(response);
GET _tasks/<task_id>
Reindexing large datasets can take a long time. You can test this workflow using only a subset of the dataset.
To cancel the reindexing process and generate embeddings for the subset that was reindexed:
resp = client.tasks.cancel( task_id="<task_id>", ) print(resp)
const response = await client.tasks.cancel({ task_id: "<task_id>", }); console.log(response);
POST _tasks/<task_id>/_cancel
Perform hybrid search
editAfter reindexing the data into the semantic-embeddings
index, you can perform hybrid search by using reciprocal rank fusion (RRF). RRF is a technique that merges the rankings from both semantic and lexical queries, giving more weight to results that rank high in either search. This ensures that the final results are balanced and relevant.
resp = client.search( index="semantic-embeddings", retriever={ "rrf": { "retrievers": [ { "standard": { "query": { "match": { "content": "How to avoid muscle soreness while running?" } } } }, { "standard": { "query": { "semantic": { "field": "semantic_text", "query": "How to avoid muscle soreness while running?" } } } } ] } }, ) print(resp)
const response = await client.search({ index: "semantic-embeddings", retriever: { rrf: { retrievers: [ { standard: { query: { match: { content: "How to avoid muscle soreness while running?", }, }, }, }, { standard: { query: { semantic: { field: "semantic_text", query: "How to avoid muscle soreness while running?", }, }, }, }, ], }, }, }); console.log(response);
GET semantic-embeddings/_search { "retriever": { "rrf": { "retrievers": [ { "standard": { "query": { "match": { "content": "How to avoid muscle soreness while running?" } } } }, { "standard": { "query": { "semantic": { "field": "semantic_text", "query": "How to avoid muscle soreness while running?" } } } } ] } } }
The first |
|
Lexical search is performed on the |
|
The second |
|
The |
After performing the hybrid search, the query will return the top 10 documents that match both semantic and lexical search criteria. The results include detailed information about each document:
{ "took": 107, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 473, "relation": "eq" }, "max_score": null, "hits": [ { "_index": "semantic-embeddings", "_id": "wv65epIBEMBRnhfTsOFM", "_score": 0.032786883, "_rank": 1, "_source": { "semantic_text": { "inference": { "inference_id": "my-elser-endpoint", "model_settings": { "task_type": "sparse_embedding" }, "chunks": [ { "text": "What so many out there do not realize is the importance of what you do after you work out. You may have done the majority of the work, but how you treat your body in the minutes and hours after you exercise has a direct effect on muscle soreness, muscle strength and growth, and staying hydrated. Cool Down. After your last exercise, your workout is not over. The first thing you need to do is cool down. Even if running was all that you did, you still should do light cardio for a few minutes. This brings your heart rate down at a slow and steady pace, which helps you avoid feeling sick after a workout.", "embeddings": { "exercise": 1.571044, "after": 1.3603843, "sick": 1.3281639, "cool": 1.3227621, "muscle": 1.2645415, "sore": 1.2561599, "cooling": 1.2335974, "running": 1.1750668, "hours": 1.1104802, "out": 1.0991782, "##io": 1.0794281, "last": 1.0474665, (...) } } ] } }, "id": 8408852, "content": "What so many out there do not realize is the importance of (...)" } } ] } }