Batch API
The Batch API is composed of two endpoints:
1. /api/v0/deploy/create-batch-inference
2. /api/v0/deploy/check-batch-inference
The first one mimics the API of the regular inference endpoint /api/v0/deploy/inference in terms of specifying the model_id and deployment_version except instead of sending a params dict, it accepts a batches argument that consists of an array of params for each of the batch requests.
The response to /api/v0/deploy/create-batch-inference will include a batch_id that can be used to poll the status with the second endpoint. Requests can take multiple hours to complete depending on OpenAI's load.
It can be as quick as a few minutes, but in some cases can take hours.
Note that batch inference isn't supported for any providers except OpenAI and only chat models.
This endpoint creates a batch inference job with the specified model and batches.
The response contains a batch_id which can be used to track the status of the batch job.
The UUID of the model to be used for the inference job.
fd2ecd75-2e7a-4758-9613-b39a274e4f10Successful response with batch job details.
Unauthorized request.
Server error during inference job creation.
POST /api/v0/deploy/create-batch-inference HTTP/1.1
Host: api.sandbox.gradientj.com
x-api-key: YOUR_API_KEY
Content-Type: application/json
Accept: */*
Content-Length: 157
{
"model": "fd2ecd75-2e7a-4758-9613-b39a274e4f10",
"batches": [
{
"params": {
"example_variable_1": "value1"
},
"chat_history": [
{
"speaker": "user",
"text": "HI HI HI"
}
]
}
]
}{
"result": {
"batch_id": "71123c09-adca-4d33-b93d-b36780e62bfb",
"batch_status": "QUEUED",
"provider_batch_id": "batch_66f60045516c8190a21d71d9c27b0fa4",
"other_provider_data": {},
"results": [
{
"output": "HI HI HI",
"metadata": {
"inference_uuid": "9572ef17-5767-4c43-8c89-ec484a97644c"
}
}
]
}
}This endpoint checks the status of a specific batch inference job using the provided batch_id.
It returns the current status and, if completed, the results of the inference job.
The UUID of the batch job to check.
71123c09-adca-4d33-b93d-b36780e62bfbSuccessful response with batch job status and results if completed.
Invalid request or missing batch_id.
Server error during batch status check.
POST /api/v0/deploy/check-batch-inference HTTP/1.1
Host: api.sandbox.gradientj.com
x-api-key: YOUR_API_KEY
Content-Type: application/json
Accept: */*
Content-Length: 51
{
"batch_id": "71123c09-adca-4d33-b93d-b36780e62bfb"
}{
"result": {
"batch_id": "71123c09-adca-4d33-b93d-b36780e62bfb",
"batch_status": "SUCCEEDED",
"provider_batch_id": "batch_66f60045516c8190a21d71d9c27b0fa4",
"other_provider_data": {},
"results": [
{
"output": "HI HI HI",
"metadata": {
"inference_uuid": "9572ef17-5767-4c43-8c89-ec484a97644c"
}
}
]
}
}Last updated
Was this helpful?