Batch API

The Batch API is composed of two endpoints: 1. /api/v0/deploy/create-batch-inference 2. /api/v0/deploy/check-batch-inference

The first one mimics the API of the regular inference endpoint /api/v0/deploy/inference in terms of specifying the model_id and deployment_version except instead of sending a params dict, it accepts a batches argument that consists of an array of params for each of the batch requests.

The response to /api/v0/deploy/create-batch-inference will include a batch_id that can be used to poll the status with the second endpoint. Requests can take multiple hours to complete depending on OpenAI's load.

It can be as quick as a few minutes, but in some cases can take hours.

Note that batch inference isn't supported for any providers except OpenAI and only chat models.

Create a new batch inference job

post

This endpoint creates a batch inference job with the specified model and batches. The response contains a batch_id which can be used to track the status of the batch job.

Authorizations
x-api-keystringRequired
Body
modelstring · uuidRequired

The UUID of the model to be used for the inference job.

Example: fd2ecd75-2e7a-4758-9613-b39a274e4f10
Responses
200

Successful response with batch job details.

application/json
post
/api/v0/deploy/create-batch-inference

Check the status of an existing batch inference job

post

This endpoint checks the status of a specific batch inference job using the provided batch_id. It returns the current status and, if completed, the results of the inference job.

Authorizations
x-api-keystringRequired
Body
batch_idstring · uuidRequired

The UUID of the batch job to check.

Example: 71123c09-adca-4d33-b93d-b36780e62bfb
Responses
200

Successful response with batch job status and results if completed.

application/json
post
/api/v0/deploy/check-batch-inference

Last updated

Was this helpful?