Messages

POST /v1/messages — Anthropic Messages API shape, open-weight inference underneath

Anthropic-shaped chat endpoint. Claude Code, the Anthropic SDK, and anything else that speaks this protocol works unmodified against Melious.

Endpoint:

POST /v1/messages

Auth: Bearer token or x-api-key. Requires scope inference.chat. Required header: anthropic-version — any recent version string (e.g. 2023-06-01). The Anthropic SDK sets it automatically.

Example

from anthropic import Anthropic

client = Anthropic(
    api_key="sk-mel-<YOUR_API_KEY>",
    base_url="https://api.melious.ai",
)

response = client.messages.create(
    model="claude-sonnet-4",
    max_tokens=256,
    messages=[{"role": "user", "content": "Name three Hanseatic cities."}],
)
print(response.content[0].text)

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  apiKey: "sk-mel-<YOUR_API_KEY>",
  baseURL: "https://api.melious.ai",
});

const response = await client.messages.create({
  model: "claude-sonnet-4",
  max_tokens: 256,
  messages: [{ role: "user", content: "Name three Hanseatic cities." }],
});
console.log(response.content[0].text);

curl https://api.melious.ai/v1/messages \
  -H "x-api-key: sk-mel-<YOUR_API_KEY>" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4",
    "max_tokens": 256,
    "messages": [{"role": "user", "content": "Name three Hanseatic cities."}]
  }'

Request

Parameter	Type	Default	Description
`model`	string	—	Anthropic model name or an open-weight model ID. See Model mapping below.
`messages`	array	—	Conversation history. Each message has `role` (`"user"` or `"assistant"`) and `content` (string or content blocks).
`max_tokens`	integer	—	Required. Maximum tokens to generate.
`system`	string \| array	none	System prompt — a string, or an array of `{"type": "text", "text": "..."}` blocks.
`tools`	array	none	Anthropic-shape tool definitions with `name`, `description`, and `input_schema`.
`tool_choice`	object	auto	`{"type": "auto"}`, `{"type": "any"}`, `{"type": "tool", "name": "..."}`, or `{"type": "none"}`.
`stop_sequences`	array	none	Custom stop sequences.
`temperature`	number	`1`	Sampling temperature, `[0, 1]`.
`top_p`	number	`1`	Nucleus sampling, `[0, 1]`.
`top_k`	integer	unset	Restrict to top-K tokens.
`stream`	boolean	`false`	Enable SSE streaming.

`messages[].content` blocks

{"type": "text", "text": "..."}
{"type": "image", "source": {"type": "base64", "media_type": "image/jpeg", "data": "..."}} or {"type": "image", "source": {"type": "url", "url": "..."}}
{"type": "tool_use", "id": "...", "name": "...", "input": {...}} (assistant only)
{"type": "tool_result", "tool_use_id": "...", "content": "..."} (user only)

Model mapping

Anthropic-proprietary names are mapped to open-weight models before inference runs. You can also send open-weight IDs directly.

You send (contains, case-insensitive)	We run (default)
`opus`	`glm-4.7`
`sonnet`	`deepseek-v3.2`
`haiku`	`gpt-oss-120b`
anything else	the string as-is

Admins can override these per-instance via config keys anthropic.model_mapping.opus, anthropic.model_mapping.sonnet, anthropic.model_mapping.haiku.

The response echoes the original model name back. If you sent claude-sonnet-4, the response model field is claude-sonnet-4 — so clients that hard-check the model name (Claude Code, some middleware) don't break. The actual model that ran shows up in internal usage tracking, not in the response body.

We map Anthropic names because Claude Code hardcodes them. For direct control, send an open-weight ID: "model": "glm-4.7" routes to exactly that model with no mapping.

Response

{
  "id": "msg_...",
  "type": "message",
  "role": "assistant",
  "content": [
    {"type": "text", "text": "Hamburg, Lübeck, Bremen."}
  ],
  "model": "claude-sonnet-4",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 7
  },
  "environment_impact": { "energy_kwh": 0.00015, "carbon_g_co2": 0.06, "water_liters": 0.0002, "renewable_percent": 85, "pue": 1.18, "provider_id": "deepseek-fi", "location": "FI" },
  "billing_cost": { "energy": "0.0008", "credits": "0.0", "paid_with": "energy" }
}

`stop_reason`

end_turn — model finished naturally.
max_tokens — hit the cap.
stop_sequence — matched a custom stop sequence.
tool_use — model wants to call a tool.

`content` blocks

Same shapes as input: text, tool_use. tool_use blocks include id, name, and parsed input.

Streaming

stream: true switches the response to Server-Sent Events with Anthropic event types:

Event	Meaning
`message_start`	Beginning of the message — includes initial usage.
`content_block_start`	New content block (text or `tool_use`) begins.
`content_block_delta`	Token(s) appended to the current block. For tool use, arguments stream as `input_json_delta`.
`content_block_stop`	Current block ended.
`message_delta`	Final metadata (`stop_reason`, `usage` deltas).
`message_stop`	Stream ended.
`error`	Mid-stream error; the connection closes after.

Environment impact and billing info ride on the non-streaming path; for streams, totals are on the Melious side and aggregated through usage reporting rather than on the wire.

See Streaming for consumer code.

What's supported vs not

Supported: system prompts, multi-turn conversation, tool use with tool_use/tool_result blocks, streaming, vision (if the underlying model supports it), stop sequences, temperature/top_p/top_k, max_tokens.

Not supported today:

Anthropic prompt caching (cache_control blocks) — silently dropped. Open-weight providers don't surface an equivalent cache handle.
Extended thinking blocks — reasoning models produce thinking inline, but the separated thinking block shape isn't preserved as its own event type.
Fine-grained tool-streaming signals beyond content_block_delta / input_json_delta.

If your client depends on one of these, open an issue — we'll scope the work.

Errors

Standard {"error": {"code", "message", "details"}} shape. Common codes here:

400 — malformed request, missing anthropic-version header, bad content block.
VALIDATION_4002 — missing required field (messages, max_tokens).
INFERENCE_3001 — model not found (after mapping).
INFERENCE_3103 — all providers failed.
INFERENCE_3207 — context window exceeded.
AUTH_1015 — insufficient scope.

Mid-stream errors arrive as an error SSE event with {"type": "error", "error": {...}}.

Count tokens for preflight sizing • From Anthropic for setup • Tool calling for the loop.

Messages

On this page