Messages
POST /v1/messages — Anthropic Messages API shape, open-weight inference underneath
Anthropic-shaped chat endpoint. Claude Code, the Anthropic SDK, and anything else that speaks this protocol works unmodified against Melious.
Endpoint:
POST /v1/messagesAuth: Bearer token or x-api-key. Requires scope inference.chat.
Required header: anthropic-version — any recent version string (e.g. 2023-06-01). The Anthropic SDK sets it automatically.
Example
from anthropic import Anthropic
client = Anthropic(
api_key="sk-mel-<YOUR_API_KEY>",
base_url="https://api.melious.ai",
)
response = client.messages.create(
model="claude-sonnet-4",
max_tokens=256,
messages=[{"role": "user", "content": "Name three Hanseatic cities."}],
)
print(response.content[0].text)import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
apiKey: "sk-mel-<YOUR_API_KEY>",
baseURL: "https://api.melious.ai",
});
const response = await client.messages.create({
model: "claude-sonnet-4",
max_tokens: 256,
messages: [{ role: "user", content: "Name three Hanseatic cities." }],
});
console.log(response.content[0].text);curl https://api.melious.ai/v1/messages \
-H "x-api-key: sk-mel-<YOUR_API_KEY>" \
-H "anthropic-version: 2023-06-01" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4",
"max_tokens": 256,
"messages": [{"role": "user", "content": "Name three Hanseatic cities."}]
}'Request
| Parameter | Type | Default | Description |
|---|---|---|---|
model | string | — | Anthropic model name or an open-weight model ID. See Model mapping below. |
messages | array | — | Conversation history. Each message has role ("user" or "assistant") and content (string or content blocks). |
max_tokens | integer | — | Required. Maximum tokens to generate. |
system | string | array | none | System prompt — a string, or an array of {"type": "text", "text": "..."} blocks. |
tools | array | none | Anthropic-shape tool definitions with name, description, and input_schema. |
tool_choice | object | auto | {"type": "auto"}, {"type": "any"}, {"type": "tool", "name": "..."}, or {"type": "none"}. |
stop_sequences | array | none | Custom stop sequences. |
temperature | number | 1 | Sampling temperature, [0, 1]. |
top_p | number | 1 | Nucleus sampling, [0, 1]. |
top_k | integer | unset | Restrict to top-K tokens. |
stream | boolean | false | Enable SSE streaming. |
messages[].content blocks
{"type": "text", "text": "..."}{"type": "image", "source": {"type": "base64", "media_type": "image/jpeg", "data": "..."}}or{"type": "image", "source": {"type": "url", "url": "..."}}{"type": "tool_use", "id": "...", "name": "...", "input": {...}}(assistant only){"type": "tool_result", "tool_use_id": "...", "content": "..."}(user only)
Model mapping
Anthropic-proprietary names are mapped to open-weight models before inference runs. You can also send open-weight IDs directly.
| You send (contains, case-insensitive) | We run (default) |
|---|---|
opus | glm-4.7 |
sonnet | deepseek-v3.2 |
haiku | gpt-oss-120b |
| anything else | the string as-is |
Admins can override these per-instance via config keys anthropic.model_mapping.opus, anthropic.model_mapping.sonnet, anthropic.model_mapping.haiku.
The response echoes the original model name back. If you sent claude-sonnet-4, the response model field is claude-sonnet-4 — so clients that hard-check the model name (Claude Code, some middleware) don't break. The actual model that ran shows up in internal usage tracking, not in the response body.
We map Anthropic names because Claude Code hardcodes them. For direct control, send an open-weight ID: "model": "glm-4.7" routes to exactly that model with no mapping.
Response
{
"id": "msg_...",
"type": "message",
"role": "assistant",
"content": [
{"type": "text", "text": "Hamburg, Lübeck, Bremen."}
],
"model": "claude-sonnet-4",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 12,
"output_tokens": 7
},
"environment_impact": { "energy_kwh": 0.00015, "carbon_g_co2": 0.06, "water_liters": 0.0002, "renewable_percent": 85, "pue": 1.18, "provider_id": "deepseek-fi", "location": "FI" },
"billing_cost": { "energy": "0.0008", "credits": "0.0", "paid_with": "energy" }
}stop_reason
end_turn— model finished naturally.max_tokens— hit the cap.stop_sequence— matched a custom stop sequence.tool_use— model wants to call a tool.
content blocks
Same shapes as input: text, tool_use. tool_use blocks include id, name, and parsed input.
Streaming
stream: true switches the response to Server-Sent Events with Anthropic event types:
| Event | Meaning |
|---|---|
message_start | Beginning of the message — includes initial usage. |
content_block_start | New content block (text or tool_use) begins. |
content_block_delta | Token(s) appended to the current block. For tool use, arguments stream as input_json_delta. |
content_block_stop | Current block ended. |
message_delta | Final metadata (stop_reason, usage deltas). |
message_stop | Stream ended. |
error | Mid-stream error; the connection closes after. |
Environment impact and billing info ride on the non-streaming path; for streams, totals are on the Melious side and aggregated through usage reporting rather than on the wire.
See Streaming for consumer code.
What's supported vs not
Supported: system prompts, multi-turn conversation, tool use with tool_use/tool_result blocks, streaming, vision (if the underlying model supports it), stop sequences, temperature/top_p/top_k, max_tokens.
Not supported today:
- Anthropic prompt caching (
cache_controlblocks) — silently dropped. Open-weight providers don't surface an equivalent cache handle. - Extended thinking blocks — reasoning models produce thinking inline, but the separated
thinkingblock shape isn't preserved as its own event type. - Fine-grained tool-streaming signals beyond
content_block_delta/input_json_delta.
If your client depends on one of these, open an issue — we'll scope the work.
Errors
Standard {"error": {"code", "message", "details"}} shape. Common codes here:
400— malformed request, missinganthropic-versionheader, bad content block.VALIDATION_4002— missing required field (messages,max_tokens).INFERENCE_3001— model not found (after mapping).INFERENCE_3103— all providers failed.INFERENCE_3207— context window exceeded.AUTH_1015— insufficient scope.
Mid-stream errors arrive as an error SSE event with {"type": "error", "error": {...}}.
Related
Count tokens for preflight sizing • From Anthropic for setup • Tool calling for the loop.