What is hybrid search?
Hybrid search definition
Hybrid search is a powerful information retrieval strategy that combines two or more search techniques into a search algorithm.
Typically, hybrid search combines keyword search and semantic search, utilizing advanced machine learning techniques. Semantic search retrieves results based on the meaning of the text, while full-text search focuses on exact word matches. Hybrid search is vital for conversational queries and those 'what was that called again?' moments where users don't or can't enter precise keywords.
Both keyword search and semantic search have unique strengths. Keyword search uses a ranking algorithm and specific terms to determine how relevant a document is to a search query. Semantic search takes the search query and considers the context.
Hybrid search improves search precision by combining the strengths of semantic search and traditional search. Balancing semantic understanding and honoring exact query terms, hybrid search delivers results that improve the user search experience.
Components of hybrid search
Hybrid search is the combination of keyword, lexical, or BM25 (a ranking algorithm that determines relevance), and semantic search. Semantic search focuses on what you achieve with the search, while vector search focuses on how you achieve those results, mainly by retrieving data using vector representations.
Semantic search
Semantic search is all about understanding meaning and context. This type of search focuses on understanding the intent behind the words in a query rather than just matching keywords as BM25 search does. Semantic search bridges the gap between human query and actual meaning, accounting for variability and ambiguity in language. It leverages natural language processing (NLP), machine learning, knowledge graphs, and vectors to deliver results that are more relevant to the user's intent and incorporate context.
To determine the context, semantic search may use known user data, location, or past search history to determine relevant results. Searching "football" in the USA would glean different results than the same search in other parts of the world. Semantic search distinguishes intent based on the user's geographical location.
Vector search
Vector search is a technical search method that uses numeric representation or vectors to represent items like text, images, or audio and retrieves data based on similarities. These vectors capture the underlying meaning or features of these items. A vector search retrieves data by measuring the similarity of vector representations.
Combining approaches
Semantic search and vector search have a lot in common, in fact, semantic search is powered by vector search.
When a query is launched, the search engine transforms the query into vector embeddings. An algorithm, like the kNN algorithm (k-nearest neighbor algorithm), matches vectors of existing documents to the query vector. The algorithm then generates results based on conceptual relevance.
When semantic and vector search work together, platforms can handle complex queries, including multi-language searches and searches that require unstructured data.
How hybrid search works
Hybrid search blends keyword and vector search to deliver comprehensive search results. Vector embeddings convert data, like sentences or photos, into numbers that capture their meaning and relationships. The data is tokenized, indexed, and represented by numerical embeddings. Vector search can capture meaning in unstructured data. Vector search overcomes limitations in keyword search — allowing users to search by what they mean, even if they can't recall a precise description or exact keyword. Hybrid search can parse both dense and sparse vectors for the most relevant results.
Dense vectors
Dense vectors handle semantic understanding and contextual queries. They are commonly used in modern machine learning, especially for tasks like generating embeddings.
Sparse vectors
Sparse vectors handle traditional keyword-based indexing and are sparsely populated with information. These vectors are commonly used for large data sets.
Query processing
Query processing in hybrid search uses sparse vectors for exact keyword matching and prioritization and dense vectors for semantic understanding, capturing contextual meaning and intent. By combining these two types of vectors, hybrid search delivers comprehensive search results that balance specificity and relevance. To achieve results, hybrid search uses reciprocal rank fusion (RRF) to combine multiple result sets (each with different relevance indicators) into a single result set.
Benefits of hybrid search
Hybrid search delivers benefits to users over traditional search by utilizing the combined strengths of different search methods. Its primary benefit is delivering more accurate search results with less effort.
Across industries, internal and external search algorithms can use hybrid search to present relevant results. For example, e-commerce platforms can distinguish between searches for "red dress with pockets" vs "red dress for first dinner date at fancy restaurant that has room for keys and money."
Another example, searching for "dogs" in an internal benefits document at an enterprise company might produce a result for “office pet policy.” The specific word might not appear in the query, but it’s probably the answer a user was looking for.
Overall, hybrid leads to an improved user search experience thanks to its flexibility with language. Hybrid search enhances search precision by balancing semantic understanding with exact query terms. Therefore, conversational and complex queries can be processed efficiently, preventing dead ends and user frustration.
Hybrid search with RAG
Retrieval augmented generation (RAG) is a search technique that uses private or proprietary data sources to provide context that supplements your LLM's original knowledge base. RAG is valuable for queries because it enables generative AI systems to use external information sources to produce more relevant responses.
Using hybrid search with RAG — and bringing in additional sources of data — can improve the relevance of a search experience by adding context. Additional information sources can be anything that organizations or customers might need to answer a query, from new information on the internet to proprietary or confidential business documents.
RAG offers several benefits over language models that work in isolation. It’s cost-effective, requires less computing and storage, and ensures your model can access the most up-to-date information.
Hybrid search with Elastic
Elastic makes it easy to implement hybrid search by supporting semantic search out-of-the-box. With Elastic, hybrid search can be performed on one platform, one API, and a speed and scale with better relevance from the onset.
Using Elastic's playground, developers can explore grounding LLMs of their choice with their own private data in a low-code interface.
Elastic helps developers simplify query construction with newly introduced query retrievers — standard, kNN, and RRF. Using these queries, Elastic understands the selected data and will automatically generate a unified query.
Hybrid search resources
- Roll up your sleeves with Elasticsearch AI Playground
- How to combine combined full-text and kNN results
- aNN vs kNN: Understanding their differences and roles in vector search
- Using hybrid search for gopher hunting with Elasticsearch and Go
- How to perform hybrid search with semantic text
- What is RAG?
- What is semantic search?