AlibabaCloud AI Search inference integration
editAlibabaCloud AI Search inference integration
editCreates an inference endpoint to perform an inference task with the alibabacloud-ai-search service.
Request
editPUT /_inference/<task_type>/<inference_id>
Path parameters
edit-
<inference_id> - (Required, string) The unique identifier of the inference endpoint.
-
<task_type> -
(Required, string) The type of the inference task that the model will perform.
Available task types:
-
completion, -
rerank -
sparse_embedding, -
text_embedding.
-
Request body
edit-
chunking_settings -
(Optional, object) Chunking configuration object. Refer to Configuring chunking to learn more about chunking.
-
max_chunk_size -
(Optional, integer)
Specifies the maximum size of a chunk in words.
Defaults to
250. This value cannot be higher than300or lower than20(forsentencestrategy) or10(forwordstrategy). -
overlap -
(Optional, integer)
Only for
wordchunking strategy. Specifies the number of overlapping words for chunks. Defaults to100. This value cannot be higher than the half ofmax_chunk_size. -
sentence_overlap -
(Optional, integer)
Only for
sentencechunking strategy. Specifies the numnber of overlapping sentences for chunks. It can be either1or0. Defaults to1. -
strategy -
(Optional, string)
Specifies the chunking strategy.
It could be either
sentenceorword.
-
-
service -
(Required, string) The type of service supported for the specified task type.
In this case,
alibabacloud-ai-search. -
service_settings -
(Required, object) Settings used to install the inference model.
These settings are specific to the
alibabacloud-ai-searchservice.-
api_key - (Required, string) A valid API key for the AlibabaCloud AI Search API.
-
service_id -
(Required, string) The name of the model service to use for the inference task.
Available service_ids for the
completiontask:-
ops-qwen-turbo -
qwen-turbo -
qwen-plus -
qwen-max÷qwen-max-longcontext
For the supported
completionservice_ids, refer to the documentation.Available service_id for the
reranktask is:-
ops-bge-reranker-larger
For the supported
rerankservice_id, refer to the documentation.Available service_id for the
sparse_embeddingtask:-
ops-text-sparse-embedding-001
For the supported
sparse_embeddingservice_id, refer to the documentation.Available service_ids for the
text_embeddingtask:-
ops-text-embedding-001 -
ops-text-embedding-zh-001 -
ops-text-embedding-en-001 -
ops-text-embedding-002
For the supported
text_embeddingservice_ids, refer to the documentation. -
-
host - (Required, string) The name of the host address used for the inference task. You can find the host address at the API keys section of the documentation.
-
workspace - (Required, string) The name of the workspace used for the inference task.
-
rate_limit -
(Optional, object) By default, the
alibabacloud-ai-searchservice sets the number of requests allowed per minute to1000. This helps to minimize the number of rate limit errors returned from AlibabaCloud AI Search. To modify this, set therequests_per_minutesetting of this object in your service settings:"rate_limit": { "requests_per_minute": <<number_of_requests>> }
-
-
task_settings -
(Optional, object) Settings to configure the inference task. These settings are specific to the
<task_type>you specified.task_settingsfor thetext_embeddingtask type-
input_type -
(Optional, string) Specifies the type of input passed to the model. Valid values are:
-
ingest: for storing document embeddings in a vector database. -
search: for storing embeddings of search queries run against a vector database to find relevant documents.
-
task_settingsfor thesparse_embeddingtask type-
input_type -
(Optional, string) Specifies the type of input passed to the model. Valid values are:
-
ingest: for storing document embeddings in a vector database. -
search: for storing embeddings of search queries run against a vector database to find relevant documents.
-
-
return_token -
(Optional, boolean)
If
true, the token name will be returned in the response. Defaults tofalsewhich means only the token ID will be returned in the response.
-
AlibabaCloud AI Search service examples
editThe following example shows how to create an inference endpoint called alibabacloud_ai_search_completion to perform a completion task type.
resp = client.inference.put(
task_type="completion",
inference_id="alibabacloud_ai_search_completion",
inference_config={
"service": "alibabacloud-ai-search",
"service_settings": {
"host": "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
"api_key": "{{API_KEY}}",
"service_id": "ops-qwen-turbo",
"workspace": "default"
}
},
)
print(resp)
const response = await client.inference.put({
task_type: "completion",
inference_id: "alibabacloud_ai_search_completion",
inference_config: {
service: "alibabacloud-ai-search",
service_settings: {
host: "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
api_key: "{{API_KEY}}",
service_id: "ops-qwen-turbo",
workspace: "default",
},
},
});
console.log(response);
PUT _inference/completion/alibabacloud_ai_search_completion
{
"service": "alibabacloud-ai-search",
"service_settings": {
"host" : "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
"api_key": "{{API_KEY}}",
"service_id": "ops-qwen-turbo",
"workspace" : "default"
}
}
The next example shows how to create an inference endpoint called alibabacloud_ai_search_rerank to perform a rerank task type.
resp = client.inference.put(
task_type="rerank",
inference_id="alibabacloud_ai_search_rerank",
inference_config={
"service": "alibabacloud-ai-search",
"service_settings": {
"api_key": "<api_key>",
"service_id": "ops-bge-reranker-larger",
"host": "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
"workspace": "default"
}
},
)
print(resp)
const response = await client.inference.put({
task_type: "rerank",
inference_id: "alibabacloud_ai_search_rerank",
inference_config: {
service: "alibabacloud-ai-search",
service_settings: {
api_key: "<api_key>",
service_id: "ops-bge-reranker-larger",
host: "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
workspace: "default",
},
},
});
console.log(response);
PUT _inference/rerank/alibabacloud_ai_search_rerank
{
"service": "alibabacloud-ai-search",
"service_settings": {
"api_key": "<api_key>",
"service_id": "ops-bge-reranker-larger",
"host": "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
"workspace": "default"
}
}
The following example shows how to create an inference endpoint called alibabacloud_ai_search_sparse to perform a sparse_embedding task type.
resp = client.inference.put(
task_type="sparse_embedding",
inference_id="alibabacloud_ai_search_sparse",
inference_config={
"service": "alibabacloud-ai-search",
"service_settings": {
"api_key": "<api_key>",
"service_id": "ops-text-sparse-embedding-001",
"host": "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
"workspace": "default"
}
},
)
print(resp)
const response = await client.inference.put({
task_type: "sparse_embedding",
inference_id: "alibabacloud_ai_search_sparse",
inference_config: {
service: "alibabacloud-ai-search",
service_settings: {
api_key: "<api_key>",
service_id: "ops-text-sparse-embedding-001",
host: "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
workspace: "default",
},
},
});
console.log(response);
PUT _inference/sparse_embedding/alibabacloud_ai_search_sparse
{
"service": "alibabacloud-ai-search",
"service_settings": {
"api_key": "<api_key>",
"service_id": "ops-text-sparse-embedding-001",
"host": "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
"workspace": "default"
}
}
The following example shows how to create an inference endpoint called alibabacloud_ai_search_embeddings to perform a text_embedding task type.
resp = client.inference.put(
task_type="text_embedding",
inference_id="alibabacloud_ai_search_embeddings",
inference_config={
"service": "alibabacloud-ai-search",
"service_settings": {
"api_key": "<api_key>",
"service_id": "ops-text-embedding-001",
"host": "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
"workspace": "default"
}
},
)
print(resp)
const response = await client.inference.put({
task_type: "text_embedding",
inference_id: "alibabacloud_ai_search_embeddings",
inference_config: {
service: "alibabacloud-ai-search",
service_settings: {
api_key: "<api_key>",
service_id: "ops-text-embedding-001",
host: "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
workspace: "default",
},
},
});
console.log(response);
PUT _inference/text_embedding/alibabacloud_ai_search_embeddings
{
"service": "alibabacloud-ai-search",
"service_settings": {
"api_key": "<api_key>",
"service_id": "ops-text-embedding-001",
"host": "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
"workspace": "default"
}
}