Improving the quality of search results is essential for providing an efficient user experience. One way to optimize searches is by automatically expanding the queried terms through synonyms. This allows queries to be interpreted more broadly, covering language variations and thus improving result matching.
This blog explores how large language models (LLMs) can be used to identify and generate synonyms automatically, allowing these terms to be programmatically loaded into Elasticsearch's synonym API.
When to use synonyms?
The use of synonyms can be a faster and more cost-effective solution compared to vector search. Its implementation is simpler as it does not require deep knowledge of embeddings or a complex vector ingestion process.
Additionally, resource consumption is lower since vector search demands greater storage capacity and memory for embedding indexing and retrieval.
Another important aspect is search regionalization. With synonyms, it is possible to adapt terms according to local language and customs. This is useful in situations where embeddings may fail to match regional expressions or country-specific terms. For example, some words or acronyms may have different meanings depending on the region, but are naturally treated as synonyms by local users. In Brazil, this is quite common. "Abacaxi" and "ananás" are the same fruit (pineapple), but the second term is more commonly used in some regions of the Northeast. Similarly, the well-known "pão francês" in the Southeast may be known as "pão careca" in the Northeast.
How to use LLMs to generate synonyms?
To obtain synonyms automatically, we can use LLMs, which analyze the context of a term and suggest appropriate variations. This approach allows for dynamically expanding synonyms, ensuring a broader and more accurate search without relying on a fixed dictionary.
In this demonstration, we will use an LLM to generate synonyms for e-commerce products. Many searches return few or no results due to variations in the queried terms. With synonyms, we can solve this issue. For example, a search for "smartphone" can encompass different models of mobile phones, ensuring users find the products they are looking for.
Prerequisites
Before getting started, we need to set up the environment and define the required dependencies. We will use the solution provided by Elastic to run Elasticsearch and Kibana locally in Docker. The code will be written in Python, v3.9.6, with the following dependencies:
pip install openai==1.59.8 elasticsearch==8.15.1
Creating the product index
Initially, we will create an index of products without synonym support. This will allow us to validate queries and then compare them to an index that includes synonyms.
To create the index, we bulk load a product dataset using the following command in Kibana DevTools:
POST _bulk{"index": {"_index": "products", "_id": 10001}}{"category": "Electronics", "name": "iPhone 14 Pro"}{"index": {"_index": "products", "_id": 10007}}{"category": "Electronics", "name": "MacBook Pro 16-inch"}{"index": {"_index": "products", "_id": 10013}}{"category": "Electronics", "name": "Samsung Galaxy Tab S8"}{"index": {"_index": "products", "_id": 10037}}{"category": "Electronics", "name": "Apple Watch Series 8"}{"index": {"_index": "products", "_id": 10049}}{"category": "Electronics", "name": "Kindle Paperwhite"}{"index": {"_index": "products", "_id": 10067}}{"category": "Electronics", "name": "Samsung QLED 4K TV"}{"index": {"_index": "products", "_id": 10073}}{"category": "Electronics", "name": "HP Spectre x360 Laptop"}{"index": {"_index": "products", "_id": 10079}}{"category": "Electronics", "name": "Apple AirPods Pro"}{"index": {"_index": "products", "_id": 10115}}{"category": "Electronics", "name": "Amazon Echo Show 10"}{"index": {"_index": "products", "_id": 10121}}{"category": "Electronics", "name": "Apple iPad Air"}{"index": {"_index": "products", "_id": 10127}}{"category": "Electronics", "name": "Apple AirPods Max"}{"index": {"_index": "products", "_id": 10151}}{"category": "Electronics", "name": "Sony WH-1000XM4 Headphones"}{"index": {"_index": "products", "_id": 10157}}{"category": "Electronics", "name": "Google Pixel 6 Pro"}{"index": {"_index": "products", "_id": 10163}}{"category": "Electronics", "name": "Apple MacBook Air"}{"index": {"_index": "products", "_id": 10181}}{"category": "Electronics", "name": "Google Pixelbook Go"}{"index": {"_index": "products", "_id": 10187}}{"category": "Electronics", "name": "Sonos Beam Soundbar"}{"index": {"_index": "products", "_id": 10199}}{"category": "Electronics", "name": "Apple TV 4K"}{"index": {"_index": "products", "_id": 10205}}{"category": "Electronics", "name": "Samsung Galaxy Watch 4"}{"index": {"_index": "products", "_id": 10211}}{"category": "Electronics", "name": "Apple MacBook Pro 16-inch"}{"index": {"_index": "products", "_id": 10223}}{"category": "Electronics", "name": "Amazon Echo Dot (4th Gen)"}
Generating synonyms with LLM
In this step, we will use an LLM to dynamically generate synonyms. To achieve this, we will integrate the OpenAI API, defining an appropriate model and prompt. The LLM will receive the product category and name, ensuring that the synonyms are contextually relevant.
import jsonimport logging
from openai import OpenAI
def call_gpt(prompt, model): try: logging.info("generate synonyms by llm...") response = client.chat.completions.create( model=model, messages=[{"role": "user", "content": prompt}], temperature=0.7, max_tokens=1000 ) content = response.choices[0].message.content.strip() return content except Exception as e: logging.error(f"Failed to use model: {e}") return None
def generate_synonyms(category, products): synonyms = {}
for product in products: prompt = f"You are an expert in generating synonyms for products. Based on the category and product name provided, generate synonyms or related terms. Follow these rules:\n" prompt += "1. **Format**: The first word should be the main item (part of the product name, excluding the brand), followed by up to 3 synonyms separated by commas.\n" prompt += "2. **Exclude the brand**: Do not include the brand name in the synonyms.\n" prompt += "3. **Maximum synonyms**: Generate a maximum of 3 synonyms per product.\n\n" prompt += f"The category is: **{category}**, and the product is: **{product}**. Return only the synonyms in the requested format, without additional explanations."
response = call_gpt(prompt, "gpt-4o") synonyms[product] = response
return synonyms
From the created product index, we will retrieve all items in the "Electronics" category and send their names to the LLM. The expected output will be something like:
{ "iPhone 14 Pro": ["iPhone", "smartphone", "mobile", "handset"], "MacBook Pro 16-inch": ["MacBook", "Laptop", "Notebook", "Ultrabook"], "Samsung Galaxy Tab S8": ["Tab", "Tablet", "Slate", "Pad"], "Bose QuietComfort 35 Headphones": ["Headphones", "earphones", "earbuds", "headset"]}
With the generated synonyms, we can register them in Elasticsearch using the Synonyms API.
Managing synonyms with the Synonyms API
The Synonyms API provides an efficient way to manage synonym sets directly within the system. Each synonym set consists of synonym rules, where a group of words is treated as equivalent in searches.
Example of creating a synonym set
PUT _synonyms/my-synonyms-set{ "synonyms_set": [ { "id": "rule-1", "synonyms": "hello, hi" }, { "synonyms": "bye, goodbye" } ]}
This creates a set called "my-synonyms-set," where "hello" and "hi" are treated as equivalents, as well as "bye" and "goodbye."
Implementing synonym creation for the product catalog
Below is the method responsible for building a synonym set and inserting it into Elasticsearch. The synonym rules are generated based on the mapping of synonyms suggested by the LLM. Each rule has an ID, corresponding to the product name in slug format, and the list of synonyms calculated by the LLM.
import jsonimport logging
from elasticsearch import Elasticsearchfrom slugify import slugify
es = Elasticsearch( "http://localhost:9200", api_key="your_api_key")
def mount_synonyms(results): synonyms_set = [{"id": slugify(product), "synonyms": synonyms} for product, synonyms in results.items()]
try: response = es.synonyms.put_synonym(id="products-synonyms-set", synonyms_set=synonyms_set)
logging.info(json.dumps(response.body, indent=4)) return response.body except Exception as e: logging.error(f"Error create synonyms: {str(e)}") return None
Below is the request payload to create the synonym set:
{ "synonyms_set":[ { "id": "iphone-14-pro", "synonyms": "iPhone, smartphone, mobile, handset" }, { "id": "macbook-pro-16-inch", "synonyms": "MacBook, Laptop, Notebook, Computer" }, { "id": "samsung-galaxy-tab-s8", "synonyms": "Tablet, Slate, Pad, Device" }, { "id": "garmin-forerunner-945", "synonyms": "Forerunner, smartwatch, fitness watch, GPS watch" }, { "id": "bose-quietcomfort-35-headphones", "synonyms": "Headphones, Earphones, Headset, Cans" } ]}
With the synonym set created in the cluster, we can move on to the next step, which is creating a new index with synonym support using the defined set.
The complete Python code with the synonyms generated by LLM and the synonym set creation defined by the Synonyms API is below:
import jsonimport logging
from elasticsearch import Elasticsearchfrom openai import OpenAIfrom slugify import slugify
logging.basicConfig(level=logging.INFO)
client = OpenAI( api_key="your-key",)
es = Elasticsearch( "http://localhost:9200", api_key="your_api_key")
def call_gpt(prompt, model): try: logging.info("generate synonyms by llm...") response = client.chat.completions.create( model=model, messages=[{"role": "user", "content": prompt}], temperature=0.7, max_tokens=1000 ) content = response.choices[0].message.content.strip() return content except Exception as e: logging.error(f"Failed to use model: {e}") return None
def generate_synonyms(category, products): synonyms = {}
for product in products: prompt = f"You are an expert in generating synonyms for products. Based on the category and product name provided, generate synonyms or related terms. Follow these rules:\n" prompt += "1. **Format**: The first word should be the main item (part of the product name, excluding the brand), followed by up to 3 synonyms separated by commas.\n" prompt += "2. **Exclude the brand**: Do not include the brand name in the synonyms.\n" prompt += "3. **Maximum synonyms**: Generate a maximum of 3 synonyms per product.\n\n" prompt += f"The category is: **{category}**, and the product is: **{product}**. Return only the synonyms in the requested format, without additional explanations."
response = call_gpt(prompt, "gpt-4o") synonyms[product] = response
return synonyms
def get_products(category): query = { "size": 50, "_source": ["name"], "query": { "bool": { "filter": [ { "term": { "category.keyword": category } } ] } } } response = es.search(index="products", body=query)
if response["hits"]["total"]["value"] > 0: product_names = [hit["_source"]["name"] for hit in response["hits"]["hits"]] return product_names else: return []
def mount_synonyms(results): synonyms_set = [{"id": slugify(product), "synonyms": synonyms} for product, synonyms in results.items()]
try: es_client = get_client_es() response = es_client.synonyms.put_synonym(id="products-synonyms-set", synonyms_set=synonyms_set)
logging.info(json.dumps(response.body, indent=4)) return response.body except Exception as e: logging.error(f"Erro update synonyms: {str(e)}") return None
if __name__ == '__main__': category = "Electronics" products = get_products("Electronics") llm_synonyms = generate_synonyms(category, products) mount_synonyms(llm_synonyms)
Creating an index with synonym support
A new index will be created where all data from the products
index will be reindexed. This index will use the synonyms_filter
, which applies the products-synonyms-set
created earlier.
Below is the index mapping configured to use synonyms:
PUT products_02{ "settings": { "analysis": { "filter": { "synonyms_filter": { "type": "synonym", "synonyms_set": "products-synonyms-set", "updateable": true } }, "analyzer": { "synonyms_analyzer": { "type": "custom", "tokenizer": "standard", "filter": [ "lowercase", "synonyms_filter" ] } } } }, "mappings": { "properties": { "ID": { "type": "long" }, "category": { "type": "keyword" }, "name": { "type": "text", "analyzer": "standard", "search_analyzer": "synonyms_analyzer" } } }}
Reindexing the products
index
Now, we will use the Reindex API to migrate the data from the products
index to the new products_02
index, which includes synonym support. The following code was executed in Kibana DevTools:
POST _reindex{ "source": { "index": "products" }, "dest": { "index": "products_02" }}
After the migration, the products_02
index will be populated and ready to validate searches using the configured synonym set.
Validating search with synonyms
Let's compare the search results between the two indexes. We will execute the same query on both indexes and validate whether the synonyms are being used to retrieve results.
Search in the products
index (without synonyms)
We will use Kibana to perform searches and analyze the results. In the Analytics > Discovery menu, we will create a Data View to visualize the data from the indexes we created.
Within Discovery, click on Data View and define a name and an index pattern. For the "products" index, we will use the "products” pattern. Then, we will repeat the process to create a new Data View for the "products_02" index, using the "products_02” pattern.

With the Data Views configured, we can return to Analytics > Discovery and start the validations.

Here, after selecting DataView products and performing a search for the term "tablet", we get no results, even though we know that there are products like "Kindle Paperwhite" and "Apple iPad Air".

Search in the products_02
index (support synonyms)
When performing the same query on the "products_synonyms" Data View, which supports synonyms, the products were retrieved successfully. This demonstrates that the configured synonym set is working correctly, ensuring that different variations of the searched terms return the expected results.

We can achieve the same result by running the same query directly in Kibana DevTools. Simply search the products_02 index using the Elasticsearch Search API:

Conclusion
Implementing synonyms in Elasticsearch improved the accuracy and coverage of product catalog searches. The key differentiator was the use of an LLM, which generated synonyms automatically and contextually, eliminating the need for predefined lists. The model analyzed product names and categories, ensuring relevant synonyms for e-commerce.
Additionally, the Synonyms API simplified dictionary management, allowing synonym sets to be modified dynamically. With this approach, search became more flexible and adaptable to different user query patterns.
This process can be continually improved with new data and model adjustments, ensuring an increasingly efficient research experience.
References
Run Elasticsearch locally
https://www.elastic.co/guide/en/elasticsearch/reference/current/run-elasticsearch-locally.html
Synonyms API
https://www.elastic.co/guide/en/elasticsearch/reference/current/synonyms-apis.html
Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!
Elasticsearch is packed with new features to help you build the best search solutions for your use case. Dive into our sample notebooks to learn more, start a free cloud trial, or try Elastic on your local machine now.