Chat completions

Generate a chat response. OpenAI-compatible in shape, plus Melious-specific routing and environment fields.

Endpoint:

POST /v1/chat/completions

Auth: Bearer token or x-api-key. Requires scope inference.chat.

Example

from openai import OpenAI

client = OpenAI(
    api_key="sk-mel-<YOUR_API_KEY>",
    base_url="https://api.melious.ai/v1",
)

response = client.chat.completions.create(
    model="glm-4.7",
    messages=[
        {"role": "system", "content": "You are concise."},
        {"role": "user", "content": "Name three Hanseatic cities."},
    ],
)
print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "sk-mel-<YOUR_API_KEY>",
  baseURL: "https://api.melious.ai/v1",
});

const response = await client.chat.completions.create({
  model: "glm-4.7",
  messages: [
    { role: "system", content: "You are concise." },
    { role: "user", content: "Name three Hanseatic cities." },
  ],
});
console.log(response.choices[0].message.content);

curl https://api.melious.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-mel-<YOUR_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-4.7",
    "messages": [
      {"role": "system", "content": "You are concise."},
      {"role": "user", "content": "Name three Hanseatic cities."}
    ]
  }'

Request

Core parameters

Parameter	Type	Default	Description
`model`	string	—	Model ID, optionally with a flavor suffix like `:eco`. See Routing.
`messages`	array	—	The conversation, oldest first. Each entry has a `role` and `content`.
`max_tokens`	integer	model max	Caps the completion length.
`temperature`	number	model default	Sampling temperature, `[0, 2]`. Lower is closer to deterministic. Set this or `top_p`, not both.
`top_p`	number	`1`	Nucleus sampling cutoff, `[0, 1]`.
`top_k`	integer	unset	Restrict to top-K tokens (provider-specific).
`min_p`	number	unset	Minimum-probability tail cut (provider-specific).
`frequency_penalty`	number	`0`	Penalize repeated tokens, `[-2, 2]`.
`presence_penalty`	number	`0`	Penalize tokens already present, `[-2, 2]`.
`stop`	string \| array	null	Stop sequences.
`seed`	integer	none	Deterministic sampling (best-effort — not all providers honor it).
`user`	string	none	End-user identifier for abuse monitoring.
`n`	integer	`1`	Number of completions, `[1, 10]`.

Parameter	Type	Default	Description
`stream`	boolean	`false`	Enable SSE streaming.
`stream_options`	object	null	E.g. `{"include_usage": true}` to get the final `usage` on the last chunk.

Parameter	Type	Default	Description
`tools`	array	null	Function definitions the model may call.
`tool_choice`	string \| object	`"auto"`	`"auto"`, `"none"`, `"required"`, or `{"type": "function", "function": {"name": "..."}}`.

Parameter	Type	Default	Description
`response_format`	object	null	`{"type": "json_object"}` or `{"type": "json_schema", "json_schema": {...}}`.

Log probabilities

Parameter	Type	Default	Description
`logprobs`	boolean	`false`	Return log probabilities. Off by default — turning it on roughly doubles the payload, so we only do that when you ask.
`top_logprobs`	integer	unset	Number of top alternatives per position, `[0, 20]`.

Reasoning

Parameter	Type	Default	Description
`reasoning_effort`	string	model default	`"low"`, `"medium"`, `"high"` for reasoning models. Ignored by non-reasoning models.

Melious-specific

Parameter	Type	Default	Description
`preset`	string	none	`"reasoning"` biases routing for reasoning models; `"non_reasoning"` biases toward speed. Overridden by a `:flavor` suffix on `model`.
`request_id`	string	auto	Client-provided correlation ID. If omitted, we generate one.
`blueprint_id`	UUID	none	Load a vault-stored blueprint (requires `X-Vault-Key` header).
`blueprint_config`	object	none	Inline blueprint configuration. Takes precedence over `blueprint_id`.
`variables`	object	none	Override blueprint variables.
`skill_ids`	array	none	Load vault skills by ID into the request. Requires `X-Vault-Key`.
`skill_configs`	array	none	Inline skill configurations.

Blueprints and skills are part of the vault-encrypted composition system — most API callers don't need them. They exist so Studio's runtime can round-trip through the same endpoint.

`messages`

Each entry is an object:

role — "system", "user", "assistant", or "tool".
content — a string, or an array of content parts (for vision).

Vision content parts:

{
  "role": "user",
  "content": [
    {"type": "text", "text": "What's in this image?"},
    {"type": "image_url", "image_url": {"url": "https://example.com/cat.jpg"}}
  ]
}

image_url.url accepts public URLs or base64 data URIs (data:image/jpeg;base64,...). URLs are fetched and re-encoded server-side before inference, so the provider never sees your URL. Model has to support vision — check _meta.capabilities.vision on GET /v1/models/{id}?include_meta=true.

Tool messages:

{
  "role": "tool",
  "tool_call_id": "call_abc",
  "content": "{\"temperature\": 22}"
}

Response

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1699999999,
  "model": "glm-4.7",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hamburg, Lübeck, Bremen.",
        "tool_calls": null
      },
      "finish_reason": "stop",
      "logprobs": null
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 7,
    "total_tokens": 19
  },
  "system_fingerprint": "...",
  "environment_impact": { "energy_kwh": 0.00015, "carbon_g_co2": 0.06, "water_liters": 0.0002, "renewable_percent": 85, "pue": 1.18, "provider_id": "ovhcloud", "location": "FR" },
  "billing_cost": { "energy": "0.0008", "credits": "0.0", "paid_with": "energy" }
}

`choices[].finish_reason`

stop — model stopped naturally or hit a stop sequence.
length — max_tokens hit.
tool_calls — model wants to call a tool.
content_filter — blocked by provider safety filter.
blocked — blocked by a Melious blueprint railguard.
blueprint_override — blueprint replied directly, no model call.

VALIDATION_4002 — messages or model missing.
INFERENCE_3001 — unknown model ID.
INFERENCE_3201 — asked for vision on a non-vision model.
INFERENCE_3202 — passed tools to a model that doesn't support them.
INFERENCE_3207 — input exceeds the model's context window.
INFERENCE_3208 — rejected by the provider's content filter.
INFERENCE_3103 — all providers failed (transient; retry).
BILLING_2001 / BILLING_2003 — out of energy / credits.
AUTH_1015 — key is missing inference.chat scope.

Full list and retry guidance: Errors.

Streaming • Tool calling • Vision • Structured outputs • Routing.

Chat completions

Example

Request

Core parameters

Streaming

Tools

Structured output

Log probabilities

Reasoning

Melious-specific

`messages`

Response

`choices[].finish_reason`

`environment_impact`

`billing_cost`

Errors

On this page