Structured outputs

Sometimes you want the model to return JSON, not prose. Two shapes — JSON mode (valid JSON, any shape) and JSON Schema (valid JSON matching a schema).

JSON mode

from openai import OpenAI
import json

client = OpenAI(
    api_key="sk-mel-<YOUR_API_KEY>",
    base_url="https://api.melious.ai/v1",
)

response = client.chat.completions.create(
    model="glm-4.7",
    messages=[
        {"role": "system", "content": "Reply with JSON only."},
        {"role": "user", "content": "Pick three Hanseatic cities. Include name and modern country."},
    ],
    response_format={"type": "json_object"},
)

data = json.loads(response.choices[0].message.content)
print(data)

The system prompt matters — in JSON mode, the model needs to be told to produce JSON. Otherwise it may return an empty object or a weird shape because you didn't tell it what you wanted. "Reply with JSON only" is the minimum viable instruction.

JSON Schema

Stricter — the response is constrained to match your schema:

schema = {
    "type": "object",
    "properties": {
        "cities": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "country": {"type": "string"},
                    "founded_year": {"type": "integer"},
                },
                "required": ["name", "country", "founded_year"],
                "additionalProperties": False,
            },
        },
    },
    "required": ["cities"],
    "additionalProperties": False,
}

response = client.chat.completions.create(
    model="glm-4.7",
    messages=[
        {"role": "user", "content": "Pick three Hanseatic cities with founding years."},
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "cities_list",
            "schema": schema,
            "strict": True,
        },
    },
)

Models that support json_schema honor it natively — the provider constrains generation to match the schema. Non-supporting models get a fallback: we inject the schema into the system prompt and repair the output afterwards (see below). That fallback isn't as reliable — prefer strict: true on models that actually support the feature.

What happens when the model produces not-quite-valid JSON

Open-weight models sometimes emit trailing commas, unquoted keys, smart-quotes, or truncated JSON when they hit max_tokens mid-object. Rather than return garbage, Melious runs the output through a tolerant JSON repair pass before sending it back. You get valid JSON — or a clear error if the output is truly unrecoverable — never a string your parser throws on.

This is an explicit design choice, not a fallback we're embarrassed about. Clients shouldn't have to write their own "maybe fix the JSON" layer. If repair was needed, the output is still labelled json_object and parses cleanly — you won't see a flag. If you'd prefer to see the raw output (or want guaranteed-strict behavior), pick a model that supports json_schema natively.

Which models support what

json_object mode — almost every chat model. Check _meta.capabilities.structured_output.
json_schema (strict) — subset of models. Check _meta.capabilities.json_schema.

Asking for a feature a model doesn't support returns INFERENCE_3203 or INFERENCE_3204.

Anthropic shape

The Messages API doesn't have a direct response_format field — the canonical pattern is tool-call-shaped extraction:

client.messages.create(
    model="claude-sonnet-4",
    max_tokens=512,
    tools=[{
        "name": "record_cities",
        "description": "Record a list of Hanseatic cities",
        "input_schema": schema,
    }],
    tool_choice={"type": "tool", "name": "record_cities"},
    messages=[{"role": "user", "content": "Pick three Hanseatic cities..."}],
)

Force the model to call record_cities — the input field on the resulting tool_use block is your structured output.

Patterns that work

A few things we've seen:

Use structured output for extraction, not generation. "Pull fields X/Y/Z from this text" works well; "generate creative JSON with arbitrary keys" works less well.
additionalProperties: false matters. Without it, models sometimes invent extra keys.
Keep required fields tight. Every required field the model skips forces a regeneration; optional fields are free.
When a schema includes enums, list the options in the description. The model reads descriptions more attentively than raw schema enum lists.

Gotchas

max_tokens truncation is the most common failure mode. If the model produces complete-but-truncated JSON that the repair layer can patch, you'll get back a trimmed object. If truncation lands mid-number or mid-string with no clue how to recover, you'll get an error. Either way, check finish_reason == "length" and raise max_tokens if needed.
Nested arrays of arrays confuse some models. Flatten to keyed objects where you can.

Chat completions for the full response_format field • Tool calling for the tool-based extraction pattern on Messages.

Structured outputs

On this page