This interactive notebook will introduce you to some basic operations with Elasticsearch, using the official Elasticsearch Python client. You'll perform semantic search using Sentence Transformers for text embedding. Learn how to integrate traditional text-based search with semantic search, for a hybrid search system.
Create Elastic Cloud deployment
If you don't have an Elastic Cloud deployment, sign up here for a free trial.
Once logged in to your Elastic Cloud account, go to the Create deployment page and select Create deployment. Leave all settings with their default values.
Install packages and import modules
To get started, we'll need to connect to our Elastic deployment using the Python client. Because we're using an Elastic Cloud deployment, we'll use the Cloud ID to identify our deployment.
First we need to install the elasticsearch Python client.
For this example, we're using all-MiniLM-L6-v2, part of the sentence_transformers library. You can read more about this model on Huggingface.
Initialize the Elasticsearch client
Now we can instantiate the Elasticsearch python client, providing the cloud id and password in your deployment.
If you're running Elasticsearch locally or self-managed, you can pass in the Elasticsearch host instead. Read more on how to connect to Elasticsearch locally.
Enable Telemetry
Knowing that you are using this notebook helps us decide where to invest our efforts to improve our products. We would like to ask you that you run the following code to let us gather anonymous usage statistics. See telemetry.py for details. Thank you!
Test the Client
Before you continue, confirm that the client has connected with this test.
Index some test data
Our client is set up and connected to our Elastic deployment. Now we need some data to test out the basics of Elasticsearch queries. We'll use a small index of books with the following fields:
titleauthorspublish_datenum_reviewspublisher
Create an index
First ensure that you do not have a previously created index with the name book_index.
ObjectApiResponse({'acknowledged': True})đ NOTE: at any time you can come back to this section and run the delete function above to remove your index and start from scratch.
Let's create an Elasticsearch index with the correct mappings for our test data.
ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'book_index'})Index test data
Run the following command to upload some test data, containing information about 10 popular programming books from this dataset.
model.encode will encode the text into a vector on the fly, using the model we initialized earlier.
ObjectApiResponse({'errors': False, 'took': 88, 'items': [{'index': {'_index': 'book_index', '_id': 'caRpvY4BKY8PuI1qPluy', '_version': 1, 'result': 'created', 'forced_refresh': True, '_shards': {'total': 2, 'successful': 2, 'failed': 0}, '_seq_no': 0, '_primary_term': 1, 'status': 201}}, {'index': {'_index': 'book_index', '_id': 'cqRpvY4BKY8PuI1qPluy', '_version': 1, 'result': 'created', 'forced_refresh': True, '_shards': {'total': 2, 'successful': 2, 'failed': 0}, '_seq_no': 1, '_primary_term': 1, 'status': 201}}, {'index': {'_index': 'book_index', '_id': 'c6RpvY4BKY8PuI1qPluy', '_version': 1, 'result': 'created', 'forced_refresh': True, '_shards': {'total': 2, 'successful': 2, 'failed': 0}, '_seq_no': 2, '_primary_term': 1, 'status': 201}}, {'index': {'_index': 'book_index', '_id': 'dKRpvY4BKY8PuI1qPluy', '_version': 1, 'result': 'created', 'forced_refresh': True, '_shards': {'total': 2, 'successful': 2, 'failed': 0}, '_seq_no': 3, '_primary_term': 1, 'status': 201}}, {'index': {'_index': 'book_index', '_id': 'daRpvY4BKY8PuI1qPluy', '_version': 1, 'result': 'created', 'forced_refresh': True, '_shards': {'total': 2, 'successful': 2, 'failed': 0}, '_seq_no': 4, '_primary_term': 1, 'status': 201}}, {'index': {'_index': 'book_index', '_id': 'dqRpvY4BKY8PuI1qPluy', '_version': 1, 'result': 'created', 'forced_refresh': True, '_shards': {'total': 2, 'successful': 2, 'failed': 0}, '_seq_no': 5, '_primary_term': 1, 'status': 201}}, {'index': {'_index': 'book_index', '_id': 'd6RpvY4BKY8PuI1qPluy', '_version': 1, 'result': 'created', 'forced_refresh': True, '_shards': {'total': 2, 'successful': 2, 'failed': 0}, '_seq_no': 6, '_primary_term': 1, 'status': 201}}, {'index': {'_index': 'book_index', '_id': 'eKRpvY4BKY8PuI1qPluy', '_version': 1, 'result': 'created', 'forced_refresh': True, '_shards': {'total': 2, 'successful': 2, 'failed': 0}, '_seq_no': 7, '_primary_term': 1, 'status': 201}}, {'index': {'_index': 'book_index', '_id': 'eaRpvY4BKY8PuI1qPluy', '_version': 1, 'result': 'created', 'forced_refresh': True, '_shards': {'total': 2, 'successful': 2, 'failed': 0}, '_seq_no': 8, '_primary_term': 1, 'status': 201}}, {'index': {'_index': 'book_index', '_id': 'eqRpvY4BKY8PuI1qPluy', '_version': 1, 'result': 'created', 'forced_refresh': True, '_shards': {'total': 2, 'successful': 2, 'failed': 0}, '_seq_no': 9, '_primary_term': 1, 'status': 201}}]})Aside: Pretty printing Elasticsearch responses
Your API calls will return hard-to-read nested JSON.
We'll create a little function called pretty_response to return nice, human-readable outputs from our examples.
Making queries
Now that we have indexed the books, we want to perform a semantic search for books that are similar to a given query. We embed the query and perform a search.
Filtering
Filter context is mostly used for filtering structured data. For example, use filter context to answer questions like:
- Does this timestamp fall into the range 2015 to 2016?
- Is the status field set to "published"?
Filter context is in effect whenever a query clause is passed to a filter parameter, such as the filter or must_not parameters in a bool query.
Learn more about filter context in the Elasticsearch docs.
Example: Keyword Filtering
This is an example of adding a keyword filter to the query.
The example retrieves the top books that are similar to "javascript books" based on their title vectors, and also Addison-Wesley as publisher.