New

The executive guide to generative AI

Read more

Create inference API

edit

This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.

Creates a model to perform an inference task.

The inference APIs enable you to use certain services, such as ELSER, OpenAI, or Hugging Face, in your cluster. This is not the same feature that you can use on an ML node with custom machine learning models. If you want to train and use your own model, use the Machine learning trained model APIs.

Request

edit

PUT /_inference/<task_type>/<model_id>

Prerequisites

edit

Description

edit

The create inference API enables you to create and configure an inference model to perform a specific inference task.

The following services are available through the inference API:

  • ELSER
  • OpenAI
  • Hugging Face

Path parameters

edit
<model_id>
(Required, string) The unique identifier of the model.
<task_type>

(Required, string) The type of the inference task that the model will perform. Available task types:

  • sparse_embedding,
  • text_embedding.

Request body

edit
service

(Required, string) The type of service supported for the specified task type. Available services:

  • elser: specify the sparse_embedding task type to use the ELSER service.
  • openai: specify the text_embedding task type to use the OpenAI service.
  • hugging_face: specify the text_embedding task type to use the Hugging Face service.
service_settings

(Required, object) Settings used to install the inference model. These settings are specific to the service you specified.

service_settings for elser
num_allocations
(Required, integer) The number of model allocations to create.
num_threads
(Required, integer) The number of threads to use by each model allocation.
service_settings for openai
api_key
(Required, string) A valid API key of your OpenAI account. You can find your OpenAI API keys in your OpenAI account under the API keys section.

You need to provide the API key only once, during the inference model creation. The Get inference API does not retrieve your API key. After creating the inference model, you cannot change the associated API key. If you want to use a different API key, delete the inference model and recreate it with the same name and the updated API key.

organization_id
(Optional, string) The unique identifier of your organization. You can find the Organization ID in your OpenAI account under Settings > Organizations.
url
(Optional, string) The URL endpoint to use for the requests. Can be changed for testing purposes. Defaults to https://api.openai.com/v1/embeddings.
service_settings for hugging_face
api_key
(Required, string) A valid access token of your Hugging Face account. You can find your Hugging Face access tokens or you can create a new one on the settings page.

You need to provide the API key only once, during the inference model creation. The Get inference API does not retrieve your API key. After creating the inference model, you cannot change the associated API key. If you want to use a different API key, delete the inference model and recreate it with the same name and the updated API key.

url
(Required, string) The URL endpoint to use for the requests.
task_settings

(Optional, object) Settings to configure the inference task. These settings are specific to the <task_type> you specified.

task_settings for text_embedding
model
(Optional, string) The name of the model to use for the inference task. Refer to the OpenAI documentation for the list of available text embedding models.

Examples

edit

This section contains example API calls for every service type.

ELSER service

edit

The following example shows how to create an inference model called my-elser-model to perform a sparse_embedding task type.

PUT _inference/sparse_embedding/my-elser-model
{
  "service": "elser",
  "service_settings": {
    "num_allocations": 1,
    "num_threads": 1
  },
  "task_settings": {}
}

Example response:

{
  "model_id": "my-elser-model",
  "task_type": "sparse_embedding",
  "service": "elser",
  "service_settings": {
    "num_allocations": 1,
    "num_threads": 1
  },
  "task_settings": {}
}

OpenAI service

edit

The following example shows how to create an inference model called openai_embeddings to perform a text_embedding task type.

PUT _inference/text_embedding/openai_embeddings
{
    "service": "openai",
    "service_settings": {
        "api_key": "<api_key>"
    },
    "task_settings": {
       "model": "text-embedding-ada-002"
    }
}

Hugging Face service

edit

The following example shows how to create an inference model called hugging-face_embeddings to perform a text_embedding task type.

PUT _inference/text_embedding/hugging-face-embeddings
{
  "service": "hugging_face",
  "service_settings": {
    "api_key": "<access_token>", 
    "url": "<url_endpoint>" 
  }
}

A valid Hugging Face access token. You can find on the settings page of your account.

The inference endpoint URL you created on Hugging Face.

Create a new inference endpoint on the Hugging Face endpoint page to get an endpoint URL. Select the model you want to use on the new endpoint creation page - for example intfloat/e5-small-v2 - then select the Sentence Embeddings task under the Advanced configuration section. Create the endpoint. Copy the URL after the endpoint initialization has been finished.