Batches
/v1/batches — asynchronous batch jobs for non-critical work
Run many inference requests asynchronously, in exchange for more flexible capacity and routing that favors cheaper providers. OpenAI-compatible Batches API.
Batches don't run in real time — the /v1/batches endpoints create and monitor jobs, but the actual work happens in the background and completes within the completion_window (typically 24 hours). For the end-to-end flow, see Batch workflow.
Base path: /v1/batches
Auth: Bearer token or x-api-key. Create requires the scope of the target endpoint (inference.chat, inference.embeddings, …). List/get/cancel require any inference scope.
Create a batch
POST /v1/batchesRequest
| Parameter | Type | Required | Description |
|---|---|---|---|
input_file_id | string | yes | ID of a file uploaded via POST /v1/files with purpose: "batch". |
endpoint | string | yes | Target endpoint: /v1/chat/completions, /v1/embeddings, /v1/images/generations, or /v1/audio/speech. |
completion_window | string | no (default "24h") | How long the system has to complete the job. |
metadata | object | no | Up to 16 custom key–value pairs for your own tracking. |
Example:
curl https://api.melious.ai/v1/batches \
-H "Authorization: Bearer sk-mel-<YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{
"input_file_id": "file_abc123",
"endpoint": "/v1/chat/completions",
"completion_window": "24h",
"metadata": {"run": "nightly-classifier-2026-04-22"}
}'Response
{
"id": "batch_abc123",
"object": "batch",
"endpoint": "/v1/chat/completions",
"errors": null,
"input_file_id": "file_abc123",
"completion_window": "24h",
"status": "queued",
"output_file_id": null,
"error_file_id": null,
"created_at": 1699999999,
"in_progress_at": null,
"expires_at": 1700086399,
"finalizing_at": null,
"completed_at": null,
"failed_at": null,
"expired_at": null,
"request_counts": { "processing": 0, "succeeded": 0, "errored": 0 },
"metadata": {"run": "nightly-classifier-2026-04-22"}
}Status values
queued— accepted, not yet started.in_progress— currently processing.finalizing— all requests done, writing output file.succeeded— complete, results inoutput_file_id.failed— terminal failure at batch level.expired— window elapsed before completion.cancelling/cancelled— you called cancel.
List batches
GET /v1/batches?limit=20&after=<cursor>Paginated list of your batches.
| Query param | Type | Default | Description |
|---|---|---|---|
limit | integer | 20 | Max 100. |
after | string | none | Cursor returned by a previous call. |
Response:
{
"object": "list",
"data": [ /* batch objects */ ],
"first_id": "batch_...",
"last_id": "batch_...",
"has_more": true
}Retrieve a batch
GET /v1/batches/{batch_id}Returns the batch object. Poll this for status; request_counts updates as work progresses.
Cancel a batch
POST /v1/batches/{batch_id}/cancelCancels a queued or in-progress batch. Returns the batch object with status cancelling (or cancelled if the cancellation was synchronous). Partially-completed work is kept — you'll still get an output_file_id with whatever finished.
Input file format (JSONL)
One request per line. Each line needs custom_id, method, url, and body:
{"custom_id": "q1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "glm-4.7", "messages": [{"role": "user", "content": "..."}]}}
{"custom_id": "q2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "glm-4.7", "messages": [{"role": "user", "content": "..."}]}}custom_id must be unique within the file and is echoed into the output so you can match responses back.
Model IDs in a batch file can include the :batch flavor suffix to explicitly pick the cheapest routing tier.
Output file format
One line per request in the input, in the same shape as the API's normal response, wrapped with your custom_id:
{"custom_id": "q1", "response": {"status_code": 200, "request_id": "...", "body": {"choices": [...], ...}}, "error": null}
{"custom_id": "q2", "response": null, "error": {"code": "INFERENCE_3207", "message": "Context window exceeded"}}Errors are per-row, not per-batch — a single bad request doesn't fail the whole job.
Pricing and rate limits
Batches run against the same per-token rates — there's no discount applied for being async. The savings come from the router picking cheaper providers under :batch flavor, and from sitting outside the per-minute rate limit on the realtime endpoints.
See Pricing and Rate limits.
Errors
INFERENCE_3302—input_file_iddoesn't exist or belongs to a different user.INFERENCE_3303— input file isn'tpurpose: "batch"or fails JSONL validation.INFERENCE_3304— file exceeds the 105 MB cap.INFERENCE_3305— too many requests in one file.INFERENCE_3306— a JSONL line doesn't parse.VALIDATION_4007— unsupportedendpointvalue.AUTH_1015— key lacks the scope for the target endpoint.
Related
Files for uploading input and downloading output • Batch workflow for the full lifecycle • Routing for the :batch flavor.