Batch workflow
Upload a JSONL file, kick off a batch, download the results
For non-critical work, batching is usually the right move — cheaper routing, outside the per-minute rate limit, and no need to hold open thousands of concurrent requests.
The pattern is: write a JSONL file of requests, upload it, create a batch, poll for completion, download the output.
Full example
This Python script takes a list of prompts, submits them, and returns the results indexed by your custom ID. One file, end to end.
import json
import time
import httpx
API = "https://api.melious.ai/v1"
KEY = "sk-mel-<YOUR_API_KEY>"
HEAD = {"Authorization": f"Bearer {KEY}"}
def run_batch(prompts: dict[str, str], model: str = "glm-4.7:batch") -> dict[str, str]:
# 1. Build JSONL — one request per line, with a custom_id we'll use to match results
lines = [
json.dumps({
"custom_id": cid,
"method": "POST",
"url": "/v1/chat/completions",
"body": {"model": model, "messages": [{"role": "user", "content": prompt}]},
})
for cid, prompt in prompts.items()
]
jsonl = "\n".join(lines).encode()
# 2. Upload as a file with purpose=batch
upload = httpx.post(
f"{API}/files",
headers=HEAD,
files={"file": ("requests.jsonl", jsonl, "application/jsonl")},
data={"purpose": "batch"},
).json()
input_file_id = upload["id"]
# 3. Create the batch
batch = httpx.post(
f"{API}/batches",
headers={**HEAD, "Content-Type": "application/json"},
json={
"input_file_id": input_file_id,
"endpoint": "/v1/chat/completions",
"completion_window": "24h",
},
).json()
batch_id = batch["id"]
print(f"batch {batch_id} queued")
# 4. Poll until done
while True:
time.sleep(30)
status = httpx.get(f"{API}/batches/{batch_id}", headers=HEAD).json()
print(f" status: {status['status']} "
f"({status['request_counts']['succeeded']}/{sum(status['request_counts'].values())})")
if status["status"] in ("succeeded", "failed", "expired", "cancelled"):
break
if status["status"] != "succeeded":
raise RuntimeError(f"batch ended in status {status['status']}")
# 5. Download the output JSONL and parse
output_file_id = status["output_file_id"]
body = httpx.get(f"{API}/files/{output_file_id}/content", headers=HEAD).text
results = {}
for line in body.strip().splitlines():
entry = json.loads(line)
if entry["error"]:
results[entry["custom_id"]] = f"ERROR: {entry['error']['message']}"
else:
choice = entry["response"]["body"]["choices"][0]
results[entry["custom_id"]] = choice["message"]["content"]
return results
if __name__ == "__main__":
prompts = {
f"q{i}": f"In one sentence, why did Hanseatic city #{i} matter?"
for i in range(10)
}
answers = run_batch(prompts)
for cid, text in answers.items():
print(f"{cid}: {text[:80]}")Five steps, each mapping to one endpoint:
- JSONL build — each line is a full request shaped like the real endpoint's body, wrapped with a
custom_idyou choose. POST /v1/files— upload the JSONL withpurpose=batch.POST /v1/batches— create the job pointing at the file and the target endpoint.GET /v1/batches/{id}— poll untilstatus == "succeeded".GET /v1/files/{id}/content— download the output, match bycustom_id.
When to pick batch
Good fits:
- Nightly classification, summarization, or extraction over a large set.
- Backfills and reprocessing of historical data.
- Evaluation runs.
- Embedding a whole corpus (embeddings work over batch too).
Bad fits:
- Anything user-facing in real time.
- Workloads with hard latency SLAs shorter than the
completion_window. - Small runs (say, under 50 requests) — the per-minute rate limit is fine for those and you skip the upload/download dance.
Handling partial failures
Batches don't fail atomically — a single bad request returns an error for that row, and the rest still run. The output JSONL has error set for failed rows and response set for successes, both keyed by your custom_id.
Our example above coerces errors to a string prefix; in production you'd want something structured for retry logic:
if entry["error"]:
retriable = entry["error"]["code"] in {"INFERENCE_3103", "INFERENCE_3107", "INFERENCE_3108"}
results[entry["custom_id"]] = {"ok": False, "error": entry["error"], "retry": retriable}
else:
results[entry["custom_id"]] = {"ok": True, "text": entry["response"]["body"]["choices"][0]["message"]["content"]}For retry, build a new JSONL of just the retriable rows and run another batch.
Cost
Batch isn't cheaper per token today — there's no discount applied to batched requests. The savings come from the :batch flavor suffix routing to the cheapest providers, and from being outside the per-minute realtime rate limit. See Pricing.
Gotchas
- 105 MB input cap. If you're running more than that per file, split into multiple batches.
custom_idmust be unique within a file and is echoed into the output. Use something you can match back reliably — UUIDs are fine, sequential IDs are fine.- The
endpointyou pick constrains the file. A/v1/chat/completionsbatch can't mix in embedding requests. One endpoint per batch.
Related
Batches reference • Files reference • Routing for the :batch flavor.