Chat completions
POST /v1/chat/completions — OpenAI-compatible chat endpoint
Generate a chat response. OpenAI-compatible in shape, plus Melious-specific routing and environment fields.
Endpoint:
POST /v1/chat/completionsAuth: Bearer token or x-api-key. Requires scope inference.chat.
Example
from openai import OpenAI
client = OpenAI(
api_key="sk-mel-<YOUR_API_KEY>",
base_url="https://api.melious.ai/v1",
)
response = client.chat.completions.create(
model="glm-4.7",
messages=[
{"role": "system", "content": "You are concise."},
{"role": "user", "content": "Name three Hanseatic cities."},
],
)
print(response.choices[0].message.content)import OpenAI from "openai";
const client = new OpenAI({
apiKey: "sk-mel-<YOUR_API_KEY>",
baseURL: "https://api.melious.ai/v1",
});
const response = await client.chat.completions.create({
model: "glm-4.7",
messages: [
{ role: "system", content: "You are concise." },
{ role: "user", content: "Name three Hanseatic cities." },
],
});
console.log(response.choices[0].message.content);curl https://api.melious.ai/v1/chat/completions \
-H "Authorization: Bearer sk-mel-<YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{
"model": "glm-4.7",
"messages": [
{"role": "system", "content": "You are concise."},
{"role": "user", "content": "Name three Hanseatic cities."}
]
}'Request
Core parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
model | string | — | Model ID, optionally with a flavor suffix like :eco. See Routing. |
messages | array | — | The conversation, oldest first. Each entry has a role and content. |
max_tokens | integer | model max | Caps the completion length. |
temperature | number | model default | Sampling temperature, [0, 2]. Lower is closer to deterministic. Set this or top_p, not both. |
top_p | number | 1 | Nucleus sampling cutoff, [0, 1]. |
top_k | integer | unset | Restrict to top-K tokens (provider-specific). |
min_p | number | unset | Minimum-probability tail cut (provider-specific). |
frequency_penalty | number | 0 | Penalize repeated tokens, [-2, 2]. |
presence_penalty | number | 0 | Penalize tokens already present, [-2, 2]. |
stop | string | array | null | Stop sequences. |
seed | integer | none | Deterministic sampling (best-effort — not all providers honor it). |
user | string | none | End-user identifier for abuse monitoring. |
n | integer | 1 | Number of completions, [1, 10]. |
Streaming
| Parameter | Type | Default | Description |
|---|---|---|---|
stream | boolean | false | Enable SSE streaming. |
stream_options | object | null | E.g. {"include_usage": true} to get the final usage on the last chunk. |
See Streaming for the full shape.
Tools
| Parameter | Type | Default | Description |
|---|---|---|---|
tools | array | null | Function definitions the model may call. |
tool_choice | string | object | "auto" | "auto", "none", "required", or {"type": "function", "function": {"name": "..."}}. |
Tool flow and examples: Tool calling.
Structured output
| Parameter | Type | Default | Description |
|---|---|---|---|
response_format | object | null | {"type": "json_object"} or {"type": "json_schema", "json_schema": {...}}. |
Walkthrough: Structured outputs.
Log probabilities
| Parameter | Type | Default | Description |
|---|---|---|---|
logprobs | boolean | false | Return log probabilities. Off by default — turning it on roughly doubles the payload, so we only do that when you ask. |
top_logprobs | integer | unset | Number of top alternatives per position, [0, 20]. |
Reasoning
| Parameter | Type | Default | Description |
|---|---|---|---|
reasoning_effort | string | model default | "low", "medium", "high" for reasoning models. Ignored by non-reasoning models. |
Melious-specific
| Parameter | Type | Default | Description |
|---|---|---|---|
preset | string | none | "reasoning" biases routing for reasoning models; "non_reasoning" biases toward speed. Overridden by a :flavor suffix on model. |
request_id | string | auto | Client-provided correlation ID. If omitted, we generate one. |
blueprint_id | UUID | none | Load a vault-stored blueprint (requires X-Vault-Key header). |
blueprint_config | object | none | Inline blueprint configuration. Takes precedence over blueprint_id. |
variables | object | none | Override blueprint variables. |
skill_ids | array | none | Load vault skills by ID into the request. Requires X-Vault-Key. |
skill_configs | array | none | Inline skill configurations. |
Blueprints and skills are part of the vault-encrypted composition system — most API callers don't need them. They exist so Studio's runtime can round-trip through the same endpoint.
messages
Each entry is an object:
role—"system","user","assistant", or"tool".content— a string, or an array of content parts (for vision).
Vision content parts:
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/cat.jpg"}}
]
}image_url.url accepts public URLs or base64 data URIs (data:image/jpeg;base64,...). URLs are fetched and re-encoded server-side before inference, so the provider never sees your URL. Model has to support vision — check _meta.capabilities.vision on GET /v1/models/{id}?include_meta=true.
Tool messages:
{
"role": "tool",
"tool_call_id": "call_abc",
"content": "{\"temperature\": 22}"
}Response
{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1699999999,
"model": "glm-4.7",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hamburg, Lübeck, Bremen.",
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 7,
"total_tokens": 19
},
"system_fingerprint": "...",
"environment_impact": { "energy_kwh": 0.00015, "carbon_g_co2": 0.06, "water_liters": 0.0002, "renewable_percent": 85, "pue": 1.18, "provider_id": "ovhcloud", "location": "FR" },
"billing_cost": { "energy": "0.0008", "credits": "0.0", "paid_with": "energy" }
}choices[].finish_reason
stop— model stopped naturally or hit a stop sequence.length—max_tokenshit.tool_calls— model wants to call a tool.content_filter— blocked by provider safety filter.blocked— blocked by a Melious blueprint railguard.blueprint_override— blueprint replied directly, no model call.
environment_impact
See Environmental impact for field definitions.
billing_cost
See Pricing. Decimals are strings to avoid float drift.
Errors
Every error returns the standard {"error": {"code", "message", "details"}} shape. Common codes on this endpoint:
VALIDATION_4002—messagesormodelmissing.INFERENCE_3001— unknown model ID.INFERENCE_3201— asked for vision on a non-vision model.INFERENCE_3202— passedtoolsto a model that doesn't support them.INFERENCE_3207— input exceeds the model's context window.INFERENCE_3208— rejected by the provider's content filter.INFERENCE_3103— all providers failed (transient; retry).BILLING_2001/BILLING_2003— out of energy / credits.AUTH_1015— key is missinginference.chatscope.
Full list and retry guidance: Errors.
Related
Streaming • Tool calling • Vision • Structured outputs • Routing.