AlibabaCloud AI Search inference service
editAlibabaCloud AI Search inference service
editCreates an inference endpoint to perform an inference task with the alibabacloud-ai-search
service.
Request
editPUT /_inference/<task_type>/<inference_id>
Path parameters
edit-
<inference_id>
- (Required, string) The unique identifier of the inference endpoint.
-
<task_type>
-
(Required, string) The type of the inference task that the model will perform.
Available task types:
-
completion
, -
rerank
-
sparse_embedding
, -
text_embedding
.
-
Request body
edit-
chunking_settings
-
(Optional, object) Chunking configuration object. Refer to Configuring chunking to learn more about chunking.
-
max_chunking_size
-
(Optional, integer)
Specifies the maximum size of a chunk in words.
Defaults to
250
. This value cannot be higher than300
or lower than20
(forsentence
strategy) or10
(forword
strategy). -
overlap
-
(Optional, integer)
Only for
word
chunking strategy. Specifies the number of overlapping words for chunks. Defaults to100
. This value cannot be higher than the half ofmax_chunking_size
. -
sentence_overlap
-
(Optional, integer)
Only for
sentence
chunking strategy. Specifies the numnber of overlapping sentences for chunks. It can be either1
or0
. Defaults to1
. -
strategy
-
(Optional, string)
Specifies the chunking strategy.
It could be either
sentence
orword
.
-
-
service
-
(Required, string) The type of service supported for the specified task type.
In this case,
alibabacloud-ai-search
. -
service_settings
-
(Required, object) Settings used to install the inference model.
These settings are specific to the
alibabacloud-ai-search
service.-
api_key
- (Required, string) A valid API key for the AlibabaCloud AI Search API.
-
service_id
-
(Required, string) The name of the model service to use for the inference task.
Available service_ids for the
completion
task:-
ops-qwen-turbo
-
qwen-turbo
-
qwen-plus
-
qwen-max
÷qwen-max-longcontext
For the supported
completion
service_ids, refer to the documentation.Available service_id for the
rerank
task is:-
ops-bge-reranker-larger
For the supported
rerank
service_id, refer to the documentation.Available service_id for the
sparse_embedding
task:-
ops-text-sparse-embedding-001
For the supported
sparse_embedding
service_id, refer to the documentation.Available service_ids for the
text_embedding
task:-
ops-text-embedding-001
-
ops-text-embedding-zh-001
-
ops-text-embedding-en-001
-
ops-text-embedding-002
For the supported
text_embedding
service_ids, refer to the documentation. -
-
host
- (Required, string) The name of the host address used for the inference task. You can find the host address at the API keys section of the documentation.
-
workspace
- (Required, string) The name of the workspace used for the inference task.
-
rate_limit
-
(Optional, object) By default, the
alibabacloud-ai-search
service sets the number of requests allowed per minute to1000
. This helps to minimize the number of rate limit errors returned from AlibabaCloud AI Search. To modify this, set therequests_per_minute
setting of this object in your service settings:"rate_limit": { "requests_per_minute": <<number_of_requests>> }
-
-
task_settings
-
(Optional, object) Settings to configure the inference task. These settings are specific to the
<task_type>
you specified.task_settings
for thetext_embedding
task type-
input_type
-
(Optional, string) Specifies the type of input passed to the model. Valid values are:
-
ingest
: for storing document embeddings in a vector database. -
search
: for storing embeddings of search queries run against a vector database to find relevant documents.
-
task_settings
for thesparse_embedding
task type-
input_type
-
(Optional, string) Specifies the type of input passed to the model. Valid values are:
-
ingest
: for storing document embeddings in a vector database. -
search
: for storing embeddings of search queries run against a vector database to find relevant documents.
-
-
return_token
-
(Optional, boolean)
If
true
, the token name will be returned in the response. Defaults tofalse
which means only the token ID will be returned in the response.
-
AlibabaCloud AI Search service examples
editThe following example shows how to create an inference endpoint called alibabacloud_ai_search_completion
to perform a completion
task type.
resp = client.inference.put( task_type="completion", inference_id="alibabacloud_ai_search_completion", inference_config={ "service": "alibabacloud-ai-search", "service_settings": { "host": "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com", "api_key": "{{API_KEY}}", "service_id": "ops-qwen-turbo", "workspace": "default" } }, ) print(resp)
const response = await client.inference.put({ task_type: "completion", inference_id: "alibabacloud_ai_search_completion", inference_config: { service: "alibabacloud-ai-search", service_settings: { host: "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com", api_key: "{{API_KEY}}", service_id: "ops-qwen-turbo", workspace: "default", }, }, }); console.log(response);
PUT _inference/completion/alibabacloud_ai_search_completion { "service": "alibabacloud-ai-search", "service_settings": { "host" : "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com", "api_key": "{{API_KEY}}", "service_id": "ops-qwen-turbo", "workspace" : "default" } }
The next example shows how to create an inference endpoint called alibabacloud_ai_search_rerank
to perform a rerank
task type.
resp = client.inference.put( task_type="rerank", inference_id="alibabacloud_ai_search_rerank", inference_config={ "service": "alibabacloud-ai-search", "service_settings": { "api_key": "<api_key>", "service_id": "ops-bge-reranker-larger", "host": "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com", "workspace": "default" } }, ) print(resp)
const response = await client.inference.put({ task_type: "rerank", inference_id: "alibabacloud_ai_search_rerank", inference_config: { service: "alibabacloud-ai-search", service_settings: { api_key: "<api_key>", service_id: "ops-bge-reranker-larger", host: "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com", workspace: "default", }, }, }); console.log(response);
PUT _inference/rerank/alibabacloud_ai_search_rerank { "service": "alibabacloud-ai-search", "service_settings": { "api_key": "<api_key>", "service_id": "ops-bge-reranker-larger", "host": "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com", "workspace": "default" } }
The following example shows how to create an inference endpoint called alibabacloud_ai_search_sparse
to perform a sparse_embedding
task type.
resp = client.inference.put( task_type="sparse_embedding", inference_id="alibabacloud_ai_search_sparse", inference_config={ "service": "alibabacloud-ai-search", "service_settings": { "api_key": "<api_key>", "service_id": "ops-text-sparse-embedding-001", "host": "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com", "workspace": "default" } }, ) print(resp)
const response = await client.inference.put({ task_type: "sparse_embedding", inference_id: "alibabacloud_ai_search_sparse", inference_config: { service: "alibabacloud-ai-search", service_settings: { api_key: "<api_key>", service_id: "ops-text-sparse-embedding-001", host: "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com", workspace: "default", }, }, }); console.log(response);
PUT _inference/sparse_embedding/alibabacloud_ai_search_sparse { "service": "alibabacloud-ai-search", "service_settings": { "api_key": "<api_key>", "service_id": "ops-text-sparse-embedding-001", "host": "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com", "workspace": "default" } }
The following example shows how to create an inference endpoint called alibabacloud_ai_search_embeddings
to perform a text_embedding
task type.
resp = client.inference.put( task_type="text_embedding", inference_id="alibabacloud_ai_search_embeddings", inference_config={ "service": "alibabacloud-ai-search", "service_settings": { "api_key": "<api_key>", "service_id": "ops-text-embedding-001", "host": "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com", "workspace": "default" } }, ) print(resp)
const response = await client.inference.put({ task_type: "text_embedding", inference_id: "alibabacloud_ai_search_embeddings", inference_config: { service: "alibabacloud-ai-search", service_settings: { api_key: "<api_key>", service_id: "ops-text-embedding-001", host: "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com", workspace: "default", }, }, }); console.log(response);
PUT _inference/text_embedding/alibabacloud_ai_search_embeddings { "service": "alibabacloud-ai-search", "service_settings": { "api_key": "<api_key>", "service_id": "ops-text-embedding-001", "host": "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com", "workspace": "default" } }