Batch API
The Batch API is composed of two endpoints:
1. /api/v0/deploy/create-batch-inference
2. /api/v0/deploy/check-batch-inference
The first one mimics the API of the regular inference endpoint /api/v0/deploy/inference in terms of specifying the model_id and deployment_version except instead of sending a params dict, it accepts a batches argument that consists of an array of params for each of the batch requests.
The response to /api/v0/deploy/create-batch-inference will include a batch_id that can be used to poll the status with the second endpoint. Requests can take multiple hours to complete depending on OpenAI's load.
It can be as quick as a few minutes, but in some cases can take hours.
Note that batch inference isn't supported for any providers except OpenAI and only chat models.
This endpoint creates a batch inference job with the specified model and batches.
The response contains a batch_id which can be used to track the status of the batch job.
The UUID of the model to be used for the inference job.
fd2ecd75-2e7a-4758-9613-b39a274e4f10Successful response with batch job details.
Unauthorized request.
Server error during inference job creation.
This endpoint checks the status of a specific batch inference job using the provided batch_id.
It returns the current status and, if completed, the results of the inference job.
The UUID of the batch job to check.
71123c09-adca-4d33-b93d-b36780e62bfbSuccessful response with batch job status and results if completed.
Invalid request or missing batch_id.
Server error during batch status check.
Last updated
Was this helpful?