Structured outputs
Get JSON back from the model, validated and repaired if needed
Sometimes you want the model to return JSON, not prose. Two shapes — JSON mode (valid JSON, any shape) and JSON Schema (valid JSON matching a schema).
JSON mode
from openai import OpenAI
import json
client = OpenAI(
api_key="sk-mel-<YOUR_API_KEY>",
base_url="https://api.melious.ai/v1",
)
response = client.chat.completions.create(
model="glm-4.7",
messages=[
{"role": "system", "content": "Reply with JSON only."},
{"role": "user", "content": "Pick three Hanseatic cities. Include name and modern country."},
],
response_format={"type": "json_object"},
)
data = json.loads(response.choices[0].message.content)
print(data)The system prompt matters — in JSON mode, the model needs to be told to produce JSON. Otherwise it may return an empty object or a weird shape because you didn't tell it what you wanted. "Reply with JSON only" is the minimum viable instruction.
JSON Schema
Stricter — the response is constrained to match your schema:
schema = {
"type": "object",
"properties": {
"cities": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"country": {"type": "string"},
"founded_year": {"type": "integer"},
},
"required": ["name", "country", "founded_year"],
"additionalProperties": False,
},
},
},
"required": ["cities"],
"additionalProperties": False,
}
response = client.chat.completions.create(
model="glm-4.7",
messages=[
{"role": "user", "content": "Pick three Hanseatic cities with founding years."},
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "cities_list",
"schema": schema,
"strict": True,
},
},
)Models that support json_schema honor it natively — the provider constrains generation to match the schema. Non-supporting models get a fallback: we inject the schema into the system prompt and repair the output afterwards (see below). That fallback isn't as reliable — prefer strict: true on models that actually support the feature.
What happens when the model produces not-quite-valid JSON
Open-weight models sometimes emit trailing commas, unquoted keys, smart-quotes, or truncated JSON when they hit max_tokens mid-object. Rather than return garbage, Melious runs the output through a tolerant JSON repair pass before sending it back. You get valid JSON — or a clear error if the output is truly unrecoverable — never a string your parser throws on.
This is an explicit design choice, not a fallback we're embarrassed about. Clients shouldn't have to write their own "maybe fix the JSON" layer. If repair was needed, the output is still labelled json_object and parses cleanly — you won't see a flag. If you'd prefer to see the raw output (or want guaranteed-strict behavior), pick a model that supports json_schema natively.
Which models support what
json_objectmode — almost every chat model. Check_meta.capabilities.structured_output.json_schema(strict) — subset of models. Check_meta.capabilities.json_schema.
Asking for a feature a model doesn't support returns INFERENCE_3203 or INFERENCE_3204.
Anthropic shape
The Messages API doesn't have a direct response_format field — the canonical pattern is tool-call-shaped extraction:
client.messages.create(
model="claude-sonnet-4",
max_tokens=512,
tools=[{
"name": "record_cities",
"description": "Record a list of Hanseatic cities",
"input_schema": schema,
}],
tool_choice={"type": "tool", "name": "record_cities"},
messages=[{"role": "user", "content": "Pick three Hanseatic cities..."}],
)Force the model to call record_cities — the input field on the resulting tool_use block is your structured output.
Patterns that work
A few things we've seen:
- Use structured output for extraction, not generation. "Pull fields X/Y/Z from this text" works well; "generate creative JSON with arbitrary keys" works less well.
additionalProperties: falsematters. Without it, models sometimes invent extra keys.- Keep required fields tight. Every required field the model skips forces a regeneration; optional fields are free.
- When a schema includes enums, list the options in the description. The model reads descriptions more attentively than raw schema enum lists.
Gotchas
max_tokenstruncation is the most common failure mode. If the model produces complete-but-truncated JSON that the repair layer can patch, you'll get back a trimmed object. If truncation lands mid-number or mid-string with no clue how to recover, you'll get an error. Either way, checkfinish_reason == "length"and raisemax_tokensif needed.- Nested arrays of arrays confuse some models. Flatten to keyed objects where you can.
Related
Chat completions for the full response_format field • Tool calling for the tool-based extraction pattern on Messages.