Elasticsearch open Inference API support for AlibabaCloud AI Search

Our latest addition to the Elasticsearch Open Inference API is the integration of AlibabaCloud AI Search. This work enables Elastic users to connect directly with the AlibabaCloud AI platform. Developers building RAG applications using the Elasticsearch vector database can store and use dense and sparse embeddings generated from models hosted on AlibabaCloud AI Search platform with semantic_text. In addition, Elastic users now have integrated access to reranking models for enhanced semantic reranking and the Qwen LLM family.

In this blog, we explore how to integrate AlibabaCloud's AI services with Elasticsearch. You'll learn how to set up and use Alibaba's completion, rerank, sparse embedding, and text embedding services within Elasticsearch. The broad set of supported models integrated into inference task types will enhance the relevance of many use cases including RAG.

We’re grateful to the Alibaba team for contributing support for these task types to Elasticsearch open inference API!

Let’s walk through examples of how to configure and use these services within an Elasticsearch environment. Note Alibaba uses the term service_id instead of model_id.

Using a base model in AlibabaCloud AI Search platform

This walkthrough assumes you already have an AlibabaCloud Account with access to AlibabaCloud AI Search platform. Next, you’ll need to create a workspace and API key for Inference creation.

Creating an inference API endpoint in Elasticsearch

In Elasticsearch, create your endpoint by providing the service as “alibabacloud-ai-search”, and the service settings including your workspace, the host, the service id and your api keys to access AlibabaCloud AI Search platform. In our example, we're creating a text embedding endpoint using "ops-text-embedding-001" as the service id.

You will receive a response from Elasticsearch with the endpoint that was created successfully:

Note that there are no additional settings for model creation. Elasticsearch will automatically connect to the AlibabaCloud AI Search platform to test your credentials and the service id, and fill in the number of dimensions and similarity measures for you.

Next, let’s test our endpoint to ensure everything is set up correctly. To do this, we’ll call the perform inference API:

The API call will return the generated embeddings for the provided input, which will look something like this:

You are now ready to start exploring. After you have tried these examples, have a look at some new exciting innovations in Elasticsearch for semantic search use cases:

The new semantic_text field simplifies storage and chunking of embeddings - just pick your model and Elastic does the rest!
Introduced in 8.14, retrievers allow you to setup multi-stage retrieval pipelines

But first, let’s dive into our examples!

I. Completion

To start, Alibaba Cloud provides several models for chat completion, with service IDs listed in their API documentation.

Step 1: Configure the Completion Service

First, set up the inference service for text completion:

Response

Step 2: Issue a Completion Request

Using the configured endpoint, send a POST request to generate a completion:

Returns

Uniquely, for this Elastic Inference API integration with Alibaba, chat history can be included in the inputs, in this example, we’ve included the previous response and added: “What fun things are there?”

The response clearly includes the history

In future updates, we plan to allow users to explicitly include chat history, improving the ease of usage.

II. Rerank

Moving on to our next task type, rerank. Reranking helps re-order search results for improved relevance, using Alibaba's powerful models. If you want to read more about this concept, have a look at this blog on Elastic Search Labs.

Step 1: Configure the Rerank Service

Configure the reranking inference service:

Step 2: Issue a Rerank Request

Send a POST request to rerank your search query results:

The rerank interface does not require a lot of configuration (task_settings), it returns the relevance scores ordered by the most relevant first and the index of the document in the input array.

III. Sparse Embedding

Alibaba provides a model specifically for sparse embeddings, we will use ops-text-sparse-embedding-001 for our example.

Step 1: Configure the Sparse Embedding Service

Step 2: Issue a Sparse Embedding query

Sparse has task_settings for:

input_type - either ingest or search
return_token - if true include the token text in the response, else it is a number

With return_token==false

IV. Text Embedding

Alibaba also offers text embedding models for different tasks.

Step 1: Configure the Text Embedding Service

Embeddings has one task_setting:

input_type - either ingest or search

Step 2: Issue a Text Embedding Request

Send a POST request to generate a text embedding:

AI search with Elastic and AlibabaCloud

Whether you're using Elasticsearch for implementing hybrid search, semantic reranking, or enhancing RAG use cases with summarization, the connection to AlibabaCloud's AI Services opens up a new world of possibilities for Elasticsearch developers. Thanks again, Alibaba team, for the contribution!

To dive deep, try this Jupyter notebook with an end-to-end example of using Inference API with the Alibaba Cloud AI Search.
Read Alibaba Cloud's announcement about AI-powered search innovations with Elasticsearch.

Users can start using this with Elasticsearch Serverless environments today and in an upcoming version of Elasticsearch.

Happy searching!

Elasticsearch has native integrations to industry leading Gen AI tools and providers. Check out our webinars on going Beyond RAG Basics, or building prod-ready apps Elastic Vector Database.

To build the best search solutions for your use case, start a free cloud trial or try Elastic on your local machine now.

Report an issue