Hybrid search with multiple embeddings: A fun and furry search for cats!

A walkthrough of how to implement different types of search - lexical, vector and hybrid - on multiple embeddings (text and image). It uses a simple and playful search application on cats.

Did you know that Elastic can be used as a powerful vector database? In this blog, we’ll explore how to generate, store, and query vector embeddings alongside traditional lexical search. Elastic’s strength lies in its flexibility and scalability, making it an excellent choice for modern search use cases. By integrating vector embeddings with Elastic, you can improve search relevance, and enhance search capabilities across various data types—including non-textual documents like images.

But it gets even better! Learning Elastic’s search features can be fun too. In this article, we’ll show you how to search for your favorite cats using Elastic to search both text descriptions and images of cats. Through a simple Python app that accompanies this article, you’ll learn how to implement both vector and keyword-based searches. We’ll guide you through generating your own vector embeddings, storing them in Elastic and running hybrid queries - all while searching for adorable feline friends.

Whether you're an experienced developer or new to Elasticsearch, this fun project is a great way to understand how modern search technologies work. Plus, if you love cats, you'll find it even more engaging. So let’s dive in and set up the Elasticats app while exploring Elasticsearch’s powerful capabilities.

Before we begin, let’s make sure that you have your Elastic cloud ID and API key ready. Make a copy of the .env-template file, save it as .env and plug in your Elastic cloud credentials.

Application Architecture

Here’s a high-level diagram that depicts our application architecture:

Generating and storing vector embeddings

Before we can perform any type of search, we first need to have data. Our data.json contains the list of cat documents that we will index in Elasticsearch. Each document describes a cat and has the following mappings:

mappings= {
      	"properties": {
            	"img_embedding": {
                       "type": "dense_vector",
                       "dims": 512,
                       "index": True,
                       "similarity": "cosine"
                 },
                 "photo": {
                       "type": "keyword"
                 },
                 "cat_id": {
                       "type": "keyword"
                 },
                 "name": {
                       "type" : "text"
                 },
                 "url" : {
                       "type" : "keyword"
                 },
                 "summary" : {
                       "type" : "text"
                 },
                 "summary_embedding": {
                       "type": "dense_vector",
                       "dims": 384
                 },
                 "age": {
                       "type": "keyword"
                 },
                 "gender": {
                       "type": "keyword"
                 },
                 "size": {
                       "type": "keyword"
                 },
                 "coat": {
                       "type": "keyword"
                 },
                 "breed": {
                       "type": "keyword"
                 }
       }
}

Each cat’s photo property points to the location of the cat’s image. When we call the reindex function in our application, it will generate two embeddings:

1. First is a vector embedding for each cat’s image. We used the clip-ViT-B-32 model. Image models allow you to embed images and text into the same vector space. This allows you to implement image search either as text-to-image or image-to-image search.

       self.img_model = SentenceTransformer('clip-ViT-B-32')
   def get_img_embedding(self, text='', image_path=''):
       if text:
           print(f'Encoding text: {text}')
           return self.img_model.encode(text)
       else:
           print(f'Encoding image: {image_path}')
           temp_image = Image.open(image_path)
           return self.img_model.encode(temp_image)

2. The second embedding is for the summary text about each cat that is up for adoption. We used a different model which is all-MiniLM-L6-v2.

       self.text_model = SentenceTransformer('all-MiniLM-L6-v2')
   def get_text_embedding(self, text):
       return self.text_model.encode(text)

We then store the embeddings as part of our documents.

   def insert_documents(self, documents):
       operations = []
       for document in documents:
           operations.append({'index': {'_index': self.index}})
           operations.append({
               **document,
               'img_embedding': self.get_img_embedding(image_path="static/"+document['photo']),
               'summary_embedding': self.get_text_embedding(document['summary'])
           })
       return self.es.bulk(operations=operations)

We’re now ready to call the reindex function.

@app.cli.command()
def reindex():
   """Regenerate the Elasticsearch index."""
   response = es.reindex()
   print (response)
   print(f'Index with {len(response["items"])} documents created '
         f'in {response["took"]} milliseconds.')
   def reindex(self):
       self.create_index()
       with open('data.json', 'rt') as f:
           documents = json.loads(f.read())
       return self.insert_documents(documents)

From the terminal, run the following command:

(.venv) $> flask reindex

We can now run our web application:

(.venv) $> flask run

Our initial form looks like this:

As you can see, we have exposed some of the keywords as filters (e.g. age, gender, size, etc.) that we will use as part of our queries.

Executing different types of searches

The following workflow diagram shows the different search paths available in our web application. We’ll walk through each scenario.

The simplest scenario is a “match all” query which basically returns all cats in our index. We don’t use any of the filters nor enter a description or upload an image.

       search_query = {
           'must': {
               'match_all': {}
           }
       }

If any of the filters were supplied in the form, then we perform a boolean query. In this scenario, no description is entered so we’re applying the filters in our “match all” query.

def extract_filters(form_data):
   filters = []


   for key, val in form_data.items():
       if (key == "imageQuery" or key == "inputQuery" or key == "from_"):
           continue


       if (key != "age" and key != "breed"):
           if (val[0] != ''): #only apply the filter if value is not empty
               filters.append({
                   "term": {
                       f"{key}": {
                           "value": val[0]
                       }
                   },
               })
       else:
           #remove any empty values first
           cleaned_list = [item for item in val if item]


           if (len(cleaned_list) > 0): #only apply the filter if list is not empty
               filters.append({
                   "terms": {
                       f"{key}": cleaned_list
                   },
               })


   return {'filter': filters}
   filters = extract_filters(form_data)
   if search_query:
       search_params['query'] = {
           'bool': {
               **search_query,
               **filters
           }
       }

In our web form, we are able to upload a similar image of a cat(s). By uploading an image, we can do a vector search by transforming the uploaded image into an embedding and then performing a knn search on the image embeddings that were previously stored.

First, we save the uploaded image in an uploads folder.

   if 'imageQuery' in request.files:
       file = request.files['imageQuery']


       if file:
           filename = file.filename
           filepath = os.path.join(app.config['UPLOAD_FOLDER'], filename)
           # Process the image as needed
           file.save(filepath)
           imageSearch = True;

We then create a knn query for the image embedding.

   elif imageSearch:
       search_query = None
       # add knn image if there's image
       knn_query.append({
           'field': 'img_embedding',
           'query_vector': es.get_img_embedding(image_path=filepath),
           'k': 5,
           'num_candidates': 15,
           **filters,
       })
   search_params = {
       'knn': knn_query,
       'from_': from_,
       'size': 5
   }

Notice that the vector search can be performed with or without the filters (from the boolean query). Also, note that k=5 which means that we’re only returning the top 5 similar documents (cats).

Try any of these images stored in the images/<breed> folder:

  1. Abyssinian
    1. Dahlia - 72245105_3.jpg
  2. American shorthair
    1. Uni - 64635658_2.jpg
    2. Sugarplum - 72157682_4.jpeg
  3. Persian
    1. Sugar - 72528240_2.jpeg

The most complex scenario in our application is when some text is entered into the description field. Here, we perform 3 different types of search and combine them into a hybrid search. First, we perform a lexical “match” query on the actual text input.

   # add text search
   if textQuery:
       search_query = {
           'must': {
               'match': {
                   'summary': textQuery
               }
           }
       }

We also create 2 knn queries:

  1. Using the model for the text embedding, we generate an embedding for the text input and perform a knn search on the summary embedding.
  2. Using the model for the image embedding, we generate another embedding for the text input and perform a knn search on the image embedding. I mentioned earlier that image models allow you to do not just an image-to-image search as we’ve seen in the vector search scenario above, but it also allows you to do a text-to-image search. This means that if I type “black cats” in the description, it will search for images that may contain or resemble black cats!
       # add knn text and image search if there's a description
       knn_query.append({
           'field': 'summary_embedding',
           'query_vector': es.get_text_embedding(textQuery),
           'k': 5,
           'num_candidates': 15,
           **filters,
       })


       knn_query.append({
           'field': 'img_embedding',
           'query_vector': es.get_img_embedding(textQuery),
           'k': 5,
           'num_candidates': 15,
           **filters,
       })

We then utilize the Reciprocal Rank Fusion (RRF) retriever to effectively combine and rank the results from all three queries into a single cohesive result set.

   rank = None
   if len(knn_query) > 0 and search_query:
       rank = {
           'rrf': {}
       }
   # Conditionally add the 'rank' parameter
   if rank:
       search_params['rank'] = rank

RRF is a method designed to merge multiple result sets, each with potentially different relevance indicators, into one unified set. Unlike simply joining the result arrays, RRF applies a specific formula to rank documents based on their positions in the individual result sets. This approach ensures that documents appearing in multiple queries are given higher importance, leading to improved relevance and quality of the final results. By using RRF, we avoid the complexities of manually tuning weights for each query and achieve a balanced integration of diverse search strategies.

To further illustrate, the following is a table showing the ranking of the individual result sets when we search for “sisters”. Using the RRF formula (with the default ranking constant k=60), we can then derive the final score for each document. Sorting the final scores in descending order then gives us the final ranking of the documents. “Willow & Nova” is our top hit (cat)!

Cat (document)Lexical rankingknn (on img_embedding) rankingknn (on summary_embedding) rankingFinal ScoreFinal Ranking
Sugarplum130.03226645852
Willow & Nova2110.04891591751
Zoe & Zara20.016129032264
Sage320.032002048133
Primrose40.0156255
Dahlia50.015384615387
Luke & Leia40.0156256
Sugar & Garth50.015384615388

Here are some other tests you can use for the description:

  1. “sisters” vs “siblings”
  2. “tuxedo”
  3. “black cats” with “American shorthair” breed filter
  4. “white”

Conclusion

Besides the obvious — **cats!** — Elasticats is a fantastic way to get to know Elasticsearch. It’s a fun and practical project that lets you explore search technologies while reminding us of the joy that technology can bring. As you dive deeper, you’ll also discover how Elasticsearch’s ability to handle vector embeddings can unlock new levels of search functionality. Whether it’s for cats, images, or other data types, Elastic makes search both powerful and enjoyable!

Feel free to contribute to the project or fork the repository to customize it further. Happy searching, and may you find the cat of your dreams! 😸

Elasticsearch is packed with new features to help you build the best search solutions for your use case. Dive into our sample notebooks to learn more, start a free cloud trial, or try Elastic on your local machine now.

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as your are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself