Elasticsearch open Inference API adds support for AlibabaCloud AI Search

Discover how to use Elasticsearch vector database with AlibabaCloud AI Search, which offers inference, reranking, and embedding capabilities.

We are excited to announce our latest addition to the Elasticsearch Open Inference API: the integration of AlibabaCloud AI Search. This work enables Elastic users to connect directly with the AlibabaCloud AI platform. Developers building RAG applications using the Elasticsearch vector database can store and use dense and sparse embeddings generated from models hosted on AlibabaCloud AI Search platform with semantic_text. In addition, Elastic users now have integrated access to reranking models for enhanced semantic reranking and the Qwen LLM family.

In this blog, we explore how to integrate AlibabaCloud's AI services with Elasticsearch. You'll learn how to set up and use Alibaba's completion, rerank, sparse embedding, and text embedding services within Elasticsearch. The broad set of supported models integrated into inference task types will enhance the relevance of many use cases including RAG.

We’re grateful to the Alibaba team for contributing support for these task types to Elasticsearch open inference API!

Let’s walk through examples of how to configure and use these services within an Elasticsearch environment. Note Alibaba uses the term service_id instead of model_id.

Using a base model in AlibabaCloud AI Search platform

This walkthrough assumes you already have an AlibabaCloud Account with access to AlibabaCloud AI Search platform. Next, you’ll need to create a workspace and API key for Inference creation.

Creating an inference API endpoint in Elasticsearch

In Elasticsearch, create your endpoint by providing the service as “alibabacloud-ai-search”, and the service settings including your workspace, the host, the service id and your api keys to access AlibabaCloud AI Search platform. In our example, we're creating a text embedding endpoint using "ops-text-embedding-001" as the service id.

PUT _inference/text_embedding/ali_ai_embeddings
{
    "service": "alibabacloud-ai-search",
    "service_settings": {
        "api_key": "<api_key>",
        "service_id": "ops-text-embedding-001",
        "host": "xxxxx.platform-cn-shanghai.opensearch.aliyuncs.com",
        "workspace": "default"
    }
}

You will receive a response from Elasticsearch with the endpoint that was created successfully:

{
  "inference_id": "ali_ai_embeddings",
  "task_type": "text_embedding",
  "service": "alibabacloud-ai-search",
  "service_settings": {
    "similarity": "dot_product",
    "dimensions": 1536,
    "service_id": "ops-text-embedding-001",
    "host": "xxxxx.platform-cn-shanghai.opensearch.aliyuncs.com",
    "workspace": "default",
    "rate_limit": {
      "requests_per_minute": 10000
    }
  },
  "task_settings": {}
}

Note that there are no additional settings for model creation. Elasticsearch will automatically connect to the AlibabaCloud AI Search platform to test your credentials and the service id, and fill in the number of dimensions and similarity measures for you.

Next, let’s test our endpoint to ensure everything is set up correctly. To do this, we’ll call the perform inference API:

POST _inference/text_embedding/ali_ai_embeddings
{
  "input": "What is Elastic?"
}

The API call will return the generated embeddings for the provided input, which will look something like this:

{
    "text_embedding": [
        {
            "embedding": [
                0.048400473,
                0.051464397,
                … (additional values) …
                0.033325635,
                -0.008986305
            ]
        }
    ]
}

You are now ready to start exploring. After you have tried these examples, have a look at some new exciting innovations in Elasticsearch for semantic search use cases:

  • The new semantic_text field simplifies storage and chunking of embeddings - just pick your model and Elastic does the rest!
  • Introduced in 8.14, retrievers allow you to setup multi-stage retrieval pipelines

But first, let’s dive into our examples!

I. Completion

To start, Alibaba Cloud provides several models for chat completion, with service IDs listed in their API documentation.

Step 1: Configure the Completion Service

First, set up the inference service for text completion:

PUT _inference/completion/ali-chat
{
 "service": "alibabacloud-ai-search",
 "service_settings": {
     "host" : "xxxxx.platform-cn-shanghai.opensearch.aliyuncs.com",
     "api_key": "xxxxxxxxxxxxxxxxxx",
     "service_id": "ops-qwen-turbo",
     "workspace" : "default"
 }
}

Response

{
 "inference_id": "ali-chat",
 "task_type": "completion",
 "service": "alibabacloud-ai-search",
 "service_settings": {
   "service_id": "ops-qwen-turbo",
   "host": "xxxxx.platform-cn-shanghai.opensearch.aliyuncs.com",
   "workspace": "default",
   "rate_limit": {
     "requests_per_minute": 1000
   }
 },
 "task_settings": {}
}

Step 2: Issue a Completion Request

Using the configured endpoint, send a POST request to generate a completion:

POST _inference/completion/ali-chat
{
 "input":["Where is the capital of Henan?"]
}

Returns

{
 "completion": [
   {
     "result": "The capital of Henan is Zhengzhou."
   }
 ]
}

Uniquely, for this Elastic Inference API integration with Alibaba, chat history can be included in the inputs, in this example, we’ve included the previous response and added: “What fun things are there?”

POST _inference/completion/ali-chat
{
 "input":["Where is the capital of Henan?", "The capital of Henan is Zhengzhou.", "What fun things are there?" ]
}

The response clearly includes the history

{
 "completion": [
   {
     "result": "I'm sorry, I do not have enough information to provide a specific list of fun things to do in Zhengzhou, Henan. I can only tell you that Zhengzhou is the capital of Henan province. To find out about fun activities, attractions, or events in Zhengzhou, I would suggest researching local tourism websites, asking locals, or checking out travel guides for the area."
   }
 ]
}

In future updates, we plan to allow users to explicitly include chat history, improving the ease of usage.

II. Rerank

Moving on to our next task type, rerank. Reranking helps re-order search results for improved relevance, using Alibaba's powerful models. If you want to read more about this concept, have a look at this blog on Elastic Search Labs.

Step 1: Configure the Rerank Service

Configure the reranking inference service:

PUT _inference/rerank/ali-rank
{
 "service": "alibabacloud-ai-search",
 "service_settings": {
   "api_key": "xxxxxxxxxxxxxxxxxx",
   "service_id": "ops-bge-reranker-larger",
   "host" : "xxxxx.platform-cn-shanghai.opensearch.aliyuncs.com",
   "workspace" : "default"   
 }
}
{
 "inference_id": "ali-rank",
 "task_type": "rerank",
 "service": "alibabacloud-ai-search",
 "service_settings": {
   "service_id": "ops-bge-reranker-larger",
   "host": "xxxxx.platform-cn-shanghai.opensearch.aliyuncs.com",
   "workspace": "default",
   "rate_limit": {
     "requests_per_minute": 1000
   }
 },
 "task_settings": {}
}

Step 2: Issue a Rerank Request

Send a POST request to rerank your search query results:

The rerank interface does not require a lot of configuration (task_settings), it returns the relevance scores ordered by the most relevant first and the index of the document in the input array.

POST _inference/rerank/ali-rank
{
 "query": "What is the capital of the USA?",
 "input": [
   "Carson City is the capital city of the American state of Nevada. At the 2010 United States Census, Carson City had a population of 55,274.",


   "Capital punishment (the death penalty) has existed in the United States since before the United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states.",


   "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean that are a political division controlled by the United States. Its capital is Saipan.",


   "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district.",


   "Charlotte Amalie is the capital and largest city of the United States Virgin Islands. It has about 20,000 people. The city is on the island of Saint Thomas.",


   "North Dakota is a state in the United States. 672,591 people lived in North Dakota in the year 2010. The capital and seat of government is Bismarck."
 ]
}
{
 "rerank": [
   {
     "index": 3,
     "relevance_score": 0.9998832
   },
   {
     "index": 4,
     "relevance_score": 0.008847355
   },
   {
     "index": 5,
     "relevance_score": 0.0026626128
   },
   {
     "index": 0,
     "relevance_score": 0.00068250194
   },
   {
     "index": 2,
     "relevance_score": 0.00019716943
   },
   {
     "index": 1,
     "relevance_score": 0.00011591934
   }
 ]
}

III. Sparse Embedding

Alibaba provides a model specifically for sparse embeddings, we will use ops-text-sparse-embedding-001 for our example.

Step 1: Configure the Sparse Embedding Service

PUT _inference/sparse_embedding/ali-sparse-embedding
{
 "service": "alibabacloud-ai-search",
 "service_settings": {
   "api_key": "xxxxxxxxxxxxxxxxxx",
   "service_id": "ops-text-sparse-embedding-001",
   "host" : "xxxxx.platform-cn-shanghai.opensearch.aliyuncs.com",
   "workspace" : "default"
 }
}
{
 "inference_id": "ali-sparse-embedding",
 "task_type": "sparse_embedding",
 "service": "alibabacloud-ai-search",
 "service_settings": {
   "service_id": "ops-text-sparse-embedding-001",
   "host": "xxxxx.platform-cn-shanghai.opensearch.aliyuncs.com",
   "workspace": "default",
   "rate_limit": {
     "requests_per_minute": 1000
   }
 },
 "task_settings": {}
}

Step 2: Issue a Sparse Embedding query

Sparse has task_settings for:

  • input_type - either ingest or search
  • return_token - if true include the token text in the response, else it is a number
POST _inference/sparse_embedding/ali-sparse-embedding
{
 "input": "Hello world",
 "task_settings": {
   "input_type": "search",
   "return_token": true
 }
}
{
 "sparse_embedding": [
   {
     "is_truncated": false,
     "embedding": {
       "hello": 0.27783203,
       "world": 0.28222656
     }
   }
 ]
}

With return_token==false

{
 "sparse_embedding": [
   {
     "is_truncated": false,
     "embedding": {
       "8999": 0.28222656,
       "35378": 0.27783203
     }
   }
 ]
}

IV. Text Embedding

Alibaba also offers text embedding models for different tasks.

Step 1: Configure the Text Embedding Service

Embeddings has one task_setting:

  • input_type - either ingest or search
PUT _inference/text_embedding/ali-embeddings
{
 "service": "alibabacloud-ai-search",
 "service_settings": {
   "api_key": "xxxxxxxxxxxxxxxxxx",
   "service_id": "ops-text-embedding-001",
   "host" : "xxxxx.platform-cn-shanghai.opensearch.aliyuncs.com",
   "workspace" : "default"
 }
}
{
 "inference_id": "ali-embeddings",
 "task_type": "text_embedding",
 "service": "alibabacloud-ai-search",
 "service_settings": {
   "service_id": "ops-text-embedding-001",
   "host": "xxxxx.platform-cn-shanghai.opensearch.aliyuncs.com",
   "workspace": "default",
   "rate_limit": {
     "requests_per_minute": 1000
   },
   "similarity": "dot_product",
   "dimensions": 1536
 },
 "task_settings": {}
}

Step 2: Issue a Text Embedding Request

Send a POST request to generate a text embedding:

POST _inference/text_embedding/ali-embeddings
{
 "input": "Hello world"
}
{
 "text_embedding": [
   {
     "embedding": [
       -0.017036675,
       0.07038724,
       0.044685286,
       0.0064531807,
       0.013290042,
       0.011183944,
       -0.0020014185,
       -0.009508779,
…

AI search with Elastic and AlibabaCloud

Whether you're using Elasticsearch for implementing hybrid search, semantic reranking, or enhancing RAG use cases with summarization, the connection to AlibabaCloud's AI Services opens up a new world of possibilities for Elasticsearch developers. Thanks again, Alibaba team, for the contribution!

  • To dive deep, try this Jupyter notebook with an end-to-end example of using Inference API with the Alibaba Cloud AI Search.
  • Read Alibaba Cloud's announcement about AI-powered search innovations with Elasticsearch.

Users can start using this with Elasticsearch Serverless environments today and in an upcoming version of Elasticsearch.

Happy searching!

Ready to try this out on your own? Start a free trial.

Elasticsearch has integrations for tools from LangChain, Cohere and more. Join our advanced semantic search webinar to build your next GenAI app!

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as your are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself