Semantic search quick start

This interactive notebook will introduce you to some basic operations with Elasticsearch, using the official Elasticsearch Python client. You'll perform semantic search using Sentence Transformers for text embedding. Learn how to integrate traditional text-based search with semantic search, for a hybrid search system.

Create Elastic Cloud deployment

If you don't have an Elastic Cloud deployment, sign up here for a free trial.

Once logged in to your Elastic Cloud account, go to the Create deployment page and select Create deployment. Leave all settings with their default values.

Install packages and import modules

To get started, we'll need to connect to our Elastic deployment using the Python client. Because we're using an Elastic Cloud deployment, we'll use the Cloud ID to identify our deployment.

First we need to install the elasticsearch Python client.

For this example, we're using all-MiniLM-L6-v2, part of the sentence_transformers library. You can read more about this model on Huggingface.

Initialize the Elasticsearch client

Now we can instantiate the Elasticsearch python client, providing the cloud id and password in your deployment.

If you're running Elasticsearch locally or self-managed, you can pass in the Elasticsearch host instead. Read more on how to connect to Elasticsearch locally.

Enable Telemetry

Knowing that you are using this notebook helps us decide where to invest our efforts to improve our products. We would like to ask you that you run the following code to let us gather anonymous usage statistics. See telemetry.py for details. Thank you!

Test the Client

Before you continue, confirm that the client has connected with this test.

{'name': 'instance-0000000000', 'cluster_name': 'a72482be54904952ba46d53c3def7740', 'cluster_uuid': 'g8BE52TtT32pGBbRzP_oKA', 'version': {'number': '8.12.2', 'build_flavor': 'default', 'build_type': 'docker', 'build_hash': '48a287ab9497e852de30327444b0809e55d46466', 'build_date': '2024-02-19T10:04:32.774273190Z', 'build_snapshot': False, 'lucene_version': '9.9.2', 'minimum_wire_compatibility_version': '7.17.0', 'minimum_index_compatibility_version': '7.0.0'}, 'tagline': 'You Know, for Search'}

Index some test data

Our client is set up and connected to our Elastic deployment. Now we need some data to test out the basics of Elasticsearch queries. We'll use a small index of books with the following fields:

title
authors
publish_date
num_reviews
publisher

Create an index

First ensure that you do not have a previously created index with the name book_index.

ObjectApiResponse({'acknowledged': True})

🔐 NOTE: at any time you can come back to this section and run the delete function above to remove your index and start from scratch.

Let's create an Elasticsearch index with the correct mappings for our test data.

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'book_index'})

Index test data

Run the following command to upload some test data, containing information about 10 popular programming books from this dataset. model.encode will encode the text into a vector on the fly, using the model we initialized earlier.

ObjectApiResponse({'errors': False, 'took': 88, 'items': [{'index': {'_index': 'book_index', '_id': 'caRpvY4BKY8PuI1qPluy', '_version': 1, 'result': 'created', 'forced_refresh': True, '_shards': {'total': 2, 'successful': 2, 'failed': 0}, '_seq_no': 0, '_primary_term': 1, 'status': 201}}, {'index': {'_index': 'book_index', '_id': 'cqRpvY4BKY8PuI1qPluy', '_version': 1, 'result': 'created', 'forced_refresh': True, '_shards': {'total': 2, 'successful': 2, 'failed': 0}, '_seq_no': 1, '_primary_term': 1, 'status': 201}}, {'index': {'_index': 'book_index', '_id': 'c6RpvY4BKY8PuI1qPluy', '_version': 1, 'result': 'created', 'forced_refresh': True, '_shards': {'total': 2, 'successful': 2, 'failed': 0}, '_seq_no': 2, '_primary_term': 1, 'status': 201}}, {'index': {'_index': 'book_index', '_id': 'dKRpvY4BKY8PuI1qPluy', '_version': 1, 'result': 'created', 'forced_refresh': True, '_shards': {'total': 2, 'successful': 2, 'failed': 0}, '_seq_no': 3, '_primary_term': 1, 'status': 201}}, {'index': {'_index': 'book_index', '_id': 'daRpvY4BKY8PuI1qPluy', '_version': 1, 'result': 'created', 'forced_refresh': True, '_shards': {'total': 2, 'successful': 2, 'failed': 0}, '_seq_no': 4, '_primary_term': 1, 'status': 201}}, {'index': {'_index': 'book_index', '_id': 'dqRpvY4BKY8PuI1qPluy', '_version': 1, 'result': 'created', 'forced_refresh': True, '_shards': {'total': 2, 'successful': 2, 'failed': 0}, '_seq_no': 5, '_primary_term': 1, 'status': 201}}, {'index': {'_index': 'book_index', '_id': 'd6RpvY4BKY8PuI1qPluy', '_version': 1, 'result': 'created', 'forced_refresh': True, '_shards': {'total': 2, 'successful': 2, 'failed': 0}, '_seq_no': 6, '_primary_term': 1, 'status': 201}}, {'index': {'_index': 'book_index', '_id': 'eKRpvY4BKY8PuI1qPluy', '_version': 1, 'result': 'created', 'forced_refresh': True, '_shards': {'total': 2, 'successful': 2, 'failed': 0}, '_seq_no': 7, '_primary_term': 1, 'status': 201}}, {'index': {'_index': 'book_index', '_id': 'eaRpvY4BKY8PuI1qPluy', '_version': 1, 'result': 'created', 'forced_refresh': True, '_shards': {'total': 2, 'successful': 2, 'failed': 0}, '_seq_no': 8, '_primary_term': 1, 'status': 201}}, {'index': {'_index': 'book_index', '_id': 'eqRpvY4BKY8PuI1qPluy', '_version': 1, 'result': 'created', 'forced_refresh': True, '_shards': {'total': 2, 'successful': 2, 'failed': 0}, '_seq_no': 9, '_primary_term': 1, 'status': 201}}]})

Aside: Pretty printing Elasticsearch responses

Your API calls will return hard-to-read nested JSON. We'll create a little function called pretty_response to return nice, human-readable outputs from our examples.

Making queries

Now that we have indexed the books, we want to perform a semantic search for books that are similar to a given query. We embed the query and perform a search.

ID: eaRpvY4BKY8PuI1qPluy Publication date: 2008-05-15 Title: JavaScript: The Good Parts Summary: A deep dive into the parts of JavaScript that are essential to writing maintainable code Publisher: oreilly Reviews: 51 Authors: ['douglas crockford'] Score: 0.8042828 ID: daRpvY4BKY8PuI1qPluy Publication date: 2015-03-27 Title: You Don't Know JS: Up & Going Summary: Introduction to JavaScript and programming as a whole Publisher: oreilly Reviews: 36 Authors: ['kyle simpson'] Score: 0.6989136 ID: dqRpvY4BKY8PuI1qPluy Publication date: 2018-12-04 Title: Eloquent JavaScript Summary: A modern introduction to programming Publisher: no starch press Reviews: 38 Authors: ['marijn haverbeke'] Score: 0.6796988 ID: caRpvY4BKY8PuI1qPluy Publication date: 2019-10-29 Title: The Pragmatic Programmer: Your Journey to Mastery Summary: A guide to pragmatic programming for software engineers and developers Publisher: addison-wesley Reviews: 30 Authors: ['andrew hunt', 'david thomas'] Score: 0.6206549 ID: eqRpvY4BKY8PuI1qPluy Publication date: 2012-06-27 Title: Introduction to the Theory of Computation Summary: Introduction to the theory of computation and complexity theory Publisher: cengage learning Reviews: 33 Authors: ['michael sipser'] Score: 0.60087687 ID: eKRpvY4BKY8PuI1qPluy Publication date: 2011-05-13 Title: The Clean Coder: A Code of Conduct for Professional Programmers Summary: A guide to professional conduct in the field of software engineering Publisher: prentice hall Reviews: 20 Authors: ['robert c. martin'] Score: 0.571234 ID: d6RpvY4BKY8PuI1qPluy Publication date: 1994-10-31 Title: Design Patterns: Elements of Reusable Object-Oriented Software Summary: Guide to design patterns that can be used in any object-oriented language Publisher: addison-wesley Reviews: 45 Authors: ['erich gamma', 'richard helm', 'ralph johnson', 'john vlissides'] Score: 0.56499225 ID: c6RpvY4BKY8PuI1qPluy Publication date: 2020-04-06 Title: Artificial Intelligence: A Modern Approach Summary: Comprehensive introduction to the theory and practice of artificial intelligence Publisher: pearson Reviews: 39 Authors: ['stuart russell', 'peter norvig'] Score: 0.5605484 ID: dKRpvY4BKY8PuI1qPluy Publication date: 2008-08-11 Title: Clean Code: A Handbook of Agile Software Craftsmanship Summary: A guide to writing code that is easy to read, understand and maintain Publisher: prentice hall Reviews: 55 Authors: ['robert c. martin'] Score: 0.5422694 ID: cqRpvY4BKY8PuI1qPluy Publication date: 2019-05-03 Title: Python Crash Course Summary: A fast-paced, no-nonsense guide to programming in Python Publisher: no starch press Reviews: 42 Authors: ['eric matthes'] Score: 0.52540874

Filtering

Filter context is mostly used for filtering structured data. For example, use filter context to answer questions like:

Does this timestamp fall into the range 2015 to 2016?
Is the status field set to "published"?

Filter context is in effect whenever a query clause is passed to a filter parameter, such as the filter or must_not parameters in a bool query.

Learn more about filter context in the Elasticsearch docs.

Example: Keyword Filtering

This is an example of adding a keyword filter to the query.

The example retrieves the top books that are similar to "javascript books" based on their title vectors, and also Addison-Wesley as publisher.

ID: caRpvY4BKY8PuI1qPluy Publication date: 2019-10-29 Title: The Pragmatic Programmer: Your Journey to Mastery Summary: A guide to pragmatic programming for software engineers and developers Publisher: addison-wesley Reviews: 30 Authors: ['andrew hunt', 'david thomas'] Score: 0.6206549 ID: d6RpvY4BKY8PuI1qPluy Publication date: 1994-10-31 Title: Design Patterns: Elements of Reusable Object-Oriented Software Summary: Guide to design patterns that can be used in any object-oriented language Publisher: addison-wesley Reviews: 45 Authors: ['erich gamma', 'richard helm', 'ralph johnson', 'john vlissides'] Score: 0.56499225