Batch API

The Batch API is composed of two endpoints: 1. /api/v0/deploy/create-batch-inference 2. /api/v0/deploy/check-batch-inference

The first one mimics the API of the regular inference endpoint /api/v0/deploy/inference in terms of specifying the model_id and deployment_version except instead of sending a params dict, it accepts a batches argument that consists of an array of params for each of the batch requests.

The response to /api/v0/deploy/create-batch-inference will include a batch_id that can be used to poll the status with the second endpoint. Requests can take multiple hours to complete depending on OpenAI's load.

It can be as quick as a few minutes, but in some cases can take hours.

Note that batch inference isn't supported for any providers except OpenAI and only chat models.

Create a new batch inference job

post

This endpoint creates a batch inference job with the specified model and batches. The response contains a batch_id which can be used to track the status of the batch job.

Authorizations

Body

modelstring · uuidRequired

The UUID of the model to be used for the inference job.

Example: fd2ecd75-2e7a-4758-9613-b39a274e4f10

Responses

200

Successful response with batch job details.

application/json

400

Unauthorized request.

application/json

502

Server error during inference job creation.

application/json

post

/api/v0/deploy/create-batch-inference

POST /api/v0/deploy/create-batch-inference HTTP/1.1
Host: api.sandbox.gradientj.com
x-api-key: YOUR_API_KEY
Content-Type: application/json
Accept: */*
Content-Length: 157

{
  "model": "fd2ecd75-2e7a-4758-9613-b39a274e4f10",
  "batches": [
    {
      "params": {
        "example_variable_1": "value1"
      },
      "chat_history": [
        {
          "speaker": "user",
          "text": "HI HI HI"
        }
      ]
    }
  ]
}

{
  "result": {
    "batch_id": "71123c09-adca-4d33-b93d-b36780e62bfb",
    "batch_status": "QUEUED",
    "provider_batch_id": "batch_66f60045516c8190a21d71d9c27b0fa4",
    "other_provider_data": {},
    "results": [
      {
        "output": "HI HI HI",
        "metadata": {
          "inference_uuid": "9572ef17-5767-4c43-8c89-ec484a97644c"
        }
      }
    ]
  }
}

Check the status of an existing batch inference job

post

This endpoint checks the status of a specific batch inference job using the provided batch_id. It returns the current status and, if completed, the results of the inference job.

Authorizations

Body

batch_idstring · uuidRequired

The UUID of the batch job to check.

Example: 71123c09-adca-4d33-b93d-b36780e62bfb

Responses

200

Successful response with batch job status and results if completed.

application/json

400

Invalid request or missing batch_id.

application/json

502

Server error during batch status check.

application/json

post

/api/v0/deploy/check-batch-inference

POST /api/v0/deploy/check-batch-inference HTTP/1.1
Host: api.sandbox.gradientj.com
x-api-key: YOUR_API_KEY
Content-Type: application/json
Accept: */*
Content-Length: 51

{
  "batch_id": "71123c09-adca-4d33-b93d-b36780e62bfb"
}

{
  "result": {
    "batch_id": "71123c09-adca-4d33-b93d-b36780e62bfb",
    "batch_status": "SUCCEEDED",
    "provider_batch_id": "batch_66f60045516c8190a21d71d9c27b0fa4",
    "other_provider_data": {},
    "results": [
      {
        "output": "HI HI HI",
        "metadata": {
          "inference_uuid": "9572ef17-5767-4c43-8c89-ec484a97644c"
        }
      }
    ]
  }
}

PreviousInference API

Last updated 1 year ago

Was this helpful?