Using Cohere embeddings with Elastic-built search experiences

From vector search to powerful REST APIs, Elasticsearch offers developers the most extensive search toolkit. Dive into sample notebooks on GitHub to try something new. You can also start your free trial or run Elasticsearch locally today.

Elasticsearch open inference API adds support for Cohere Embeddings

We're pleased to announce that Elasticsearch now supports Cohere embeddings! Releasing this capability has been a great journey of collaboration with the Cohere team, with more to come. Cohere is an exciting innovator in the generative AI space and we're proud to enable developers to use Cohere's incredible text embeddings with Elasticsearch as the vector database, to build semantic search use cases.

This blog goes over Elastic's approach to shipping and explains how to use Cohere embeddings with Elastic-built search experiences.

Elastic's approach to shipping: frequent, production ready iterations

Before we dive in, if you're new to Elastic (welcome!), we've always believed in investing in our technology of choice (Apache Lucene) and ensuring contributions can be used as production grade capabilities, in the fastest release mode we can provide.

Let's dig into what we've built so far, and what we will be able to deliver soon:

In August 2023 we discussed our contribution to Lucene to enable maximum inner product and enable Cohere embeddings to be first class citizens of the Elastic Stack.
This was contributed first into Lucene and released in the Elasticsearch 8.11 version.
In that same release we also introduced the tech preview of our /_inference API endpoint which supported embeddings from models managed in Elasticsearch, but quickly in the following release, we established a pattern of integration with third party model providers such as Hugging Face and OpenAI.

Cohere embeddings support is already available to customers participating in the preview of our stateless offering on Elastic Cloud and soon will be available in an upcoming Elasticsearch release for all.

You'll need a Cohere account, and some working knowledge of the Cohere Embed endpoint. You have a choice of available models, but if you're just trying this out for the first time we recommend using the model embed-english-v3.0 or if you're looking for a multilingual variant try embed-multilingual-v3.0 with dimension size 1024.

In Kibana, you'll have access to a console for you to input these next steps in Elasticsearch even without an IDE set up.

When you choose to run this command in the console you should see a corresponding 200 for the creation of your named Cohere inference service. In this configuration we've specified that the embedding_type is byte which will be the equivalent to asking Cohere to return signed int8 embeddings. This is only a valid configuration if you're choosing to use a v3 model.

You'll want to set up the mappings in the index to prepare for the storage of your embeddings that you will soon retrieve from Cohere.

Elasticsearch vector database for Cohere embeddings

In the definition of the mapping you will find an excellent example of another contribution made by the Elastic team to Lucene, the ability to use Scalar Quantization.

Just for fun, we've posted the command you would see in our Getting Started experience that ingests a simple book catalog.

At this point you have your books content in an Elasticsearch index and now you need to enable Cohere to generate embeddings on the documents!

To accomplish this step, you'll be setting up an ingest pipeline which utilizes our inference processor to make the call to the inference service you defined in the first PUT request.

If you weren't ingesting something as simple as this books catalog, you might be wondering how you'd handle token limits for the selected model.

If you needed to, you could quickly amend your created ingest pipeline to chunk large documents, or use additional transformation tools to handle your chunking prior to first ingest.

If you're looking for additional tools to help figure out your chunking strategy, look no further than these notebooks in Search Labs.

Fun fact, in the near future, this step will be made completely optional for Elasticsearch developers. As was mentioned at the beginning of this blog, this integration we're showing you today is a firm foundation for many more changes to come. One of which will be a drastic simplification of this step, where you won't have to worry about chunking at all, nor the construction and design of an ingest pipeline. Elastic will handle those steps for you with great defaults!

You're set up with your destination index, and the ingest pipeline, now it's time to reindex to force the documents through the step.

Elastic kNN search for Cohere vector embeddings

Now you're ready to issue your first vector search with Cohere embeddings.

It's as easy as that.

If you have already achieved a good level of understanding of vector search, we highly recommend you read this blog on running kNN as a query- which unlocks expert mode!

This integration with Cohere is offered in Serverless and in Elasticsearch 8.13.

Happy Searching, and big thanks again to the Cohere team for their collaboration on this project!

Looking to use more of Cohere's capabilities?: Read about our support for Cohere's Rerank 3 model

Report an issue

Related Content

Lucene Vector Database

January 28, 2026

Apache Lucene 2025 wrap-up

2025 was a stellar year for Apache Lucene; here are our highlights.

BT CH

By: Benjamin Trent and Chris Hegarty

Vector Database

December 23, 2025

Comparing dense vector search performance with the Profile API in Elasticsearch

Learn how to use the Profile API in Elasticsearch to compare dense vector configurations and tune kNN performance with visual data from Kibana.

By: Alexander Dávila

Up to 12x Faster Vector Indexing in Elasticsearch with NVIDIA cuVS: GPU-acceleration Chapter 2

Vector Database

December 3, 2025

Up to 12x Faster Vector Indexing in Elasticsearch with NVIDIA cuVS: GPU-acceleration Chapter 2

Discover how Elasticsearch achieves nearly 12x higher indexing throughput with GPU-accelerated vector indexing and NVIDIA cuVS.

CH HM CN +5

By: Chris Hegarty, Hemant Malik, Corey Nolet and 5 more

Multimodal search for mountain peaks with Elasticsearch and SigLIP-2

Vector Database Hybrid Search+2

November 4, 2025

Multimodal search for mountain peaks with Elasticsearch and SigLIP-2

Learn how to implement text-to-image and image-to-image multimodal search using SigLIP-2 embeddings and Elasticsearch kNN vector search. Project focus: finding Mount Ama Dablam peak photos from an Everest trek.

By: Navneet Kumar

Improving multilingual embedding model relevancy with hybrid search reranking

Vector Database Operations

November 3, 2025

Improving multilingual embedding model relevancy with hybrid search reranking

Learn how to improve the relevancy of E5 multilingual embedding model search results using Cohere's reranker and hybrid search in Elasticsearch.

By: Quynh Nguyen