Melious
API Reference

Messages

POST /v1/messages — Anthropic Messages API shape, open-weight inference underneath

Anthropic-shaped chat endpoint. Claude Code, the Anthropic SDK, and anything else that speaks this protocol works unmodified against Melious.

Endpoint:

POST /v1/messages

Auth: Bearer token or x-api-key. Requires scope inference.chat. Required header: anthropic-version — any recent version string (e.g. 2023-06-01). The Anthropic SDK sets it automatically.

Example

from anthropic import Anthropic

client = Anthropic(
    api_key="sk-mel-<YOUR_API_KEY>",
    base_url="https://api.melious.ai",
)

response = client.messages.create(
    model="claude-sonnet-4",
    max_tokens=256,
    messages=[{"role": "user", "content": "Name three Hanseatic cities."}],
)
print(response.content[0].text)
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  apiKey: "sk-mel-<YOUR_API_KEY>",
  baseURL: "https://api.melious.ai",
});

const response = await client.messages.create({
  model: "claude-sonnet-4",
  max_tokens: 256,
  messages: [{ role: "user", content: "Name three Hanseatic cities." }],
});
console.log(response.content[0].text);
curl https://api.melious.ai/v1/messages \
  -H "x-api-key: sk-mel-<YOUR_API_KEY>" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4",
    "max_tokens": 256,
    "messages": [{"role": "user", "content": "Name three Hanseatic cities."}]
  }'

Request

ParameterTypeDefaultDescription
modelstringAnthropic model name or an open-weight model ID. See Model mapping below.
messagesarrayConversation history. Each message has role ("user" or "assistant") and content (string or content blocks).
max_tokensintegerRequired. Maximum tokens to generate.
systemstring | arraynoneSystem prompt — a string, or an array of {"type": "text", "text": "..."} blocks.
toolsarraynoneAnthropic-shape tool definitions with name, description, and input_schema.
tool_choiceobjectauto{"type": "auto"}, {"type": "any"}, {"type": "tool", "name": "..."}, or {"type": "none"}.
stop_sequencesarraynoneCustom stop sequences.
temperaturenumber1Sampling temperature, [0, 1].
top_pnumber1Nucleus sampling, [0, 1].
top_kintegerunsetRestrict to top-K tokens.
streambooleanfalseEnable SSE streaming.

messages[].content blocks

  • {"type": "text", "text": "..."}
  • {"type": "image", "source": {"type": "base64", "media_type": "image/jpeg", "data": "..."}} or {"type": "image", "source": {"type": "url", "url": "..."}}
  • {"type": "tool_use", "id": "...", "name": "...", "input": {...}} (assistant only)
  • {"type": "tool_result", "tool_use_id": "...", "content": "..."} (user only)

Model mapping

Anthropic-proprietary names are mapped to open-weight models before inference runs. You can also send open-weight IDs directly.

You send (contains, case-insensitive)We run (default)
opusglm-4.7
sonnetdeepseek-v3.2
haikugpt-oss-120b
anything elsethe string as-is

Admins can override these per-instance via config keys anthropic.model_mapping.opus, anthropic.model_mapping.sonnet, anthropic.model_mapping.haiku.

The response echoes the original model name back. If you sent claude-sonnet-4, the response model field is claude-sonnet-4 — so clients that hard-check the model name (Claude Code, some middleware) don't break. The actual model that ran shows up in internal usage tracking, not in the response body.

We map Anthropic names because Claude Code hardcodes them. For direct control, send an open-weight ID: "model": "glm-4.7" routes to exactly that model with no mapping.

Response

{
  "id": "msg_...",
  "type": "message",
  "role": "assistant",
  "content": [
    {"type": "text", "text": "Hamburg, Lübeck, Bremen."}
  ],
  "model": "claude-sonnet-4",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 7
  },
  "environment_impact": { "energy_kwh": 0.00015, "carbon_g_co2": 0.06, "water_liters": 0.0002, "renewable_percent": 85, "pue": 1.18, "provider_id": "deepseek-fi", "location": "FI" },
  "billing_cost": { "energy": "0.0008", "credits": "0.0", "paid_with": "energy" }
}

stop_reason

  • end_turn — model finished naturally.
  • max_tokens — hit the cap.
  • stop_sequence — matched a custom stop sequence.
  • tool_use — model wants to call a tool.

content blocks

Same shapes as input: text, tool_use. tool_use blocks include id, name, and parsed input.

Streaming

stream: true switches the response to Server-Sent Events with Anthropic event types:

EventMeaning
message_startBeginning of the message — includes initial usage.
content_block_startNew content block (text or tool_use) begins.
content_block_deltaToken(s) appended to the current block. For tool use, arguments stream as input_json_delta.
content_block_stopCurrent block ended.
message_deltaFinal metadata (stop_reason, usage deltas).
message_stopStream ended.
errorMid-stream error; the connection closes after.

Environment impact and billing info ride on the non-streaming path; for streams, totals are on the Melious side and aggregated through usage reporting rather than on the wire.

See Streaming for consumer code.

What's supported vs not

Supported: system prompts, multi-turn conversation, tool use with tool_use/tool_result blocks, streaming, vision (if the underlying model supports it), stop sequences, temperature/top_p/top_k, max_tokens.

Not supported today:

  • Anthropic prompt caching (cache_control blocks) — silently dropped. Open-weight providers don't surface an equivalent cache handle.
  • Extended thinking blocks — reasoning models produce thinking inline, but the separated thinking block shape isn't preserved as its own event type.
  • Fine-grained tool-streaming signals beyond content_block_delta / input_json_delta.

If your client depends on one of these, open an issue — we'll scope the work.

Errors

Standard {"error": {"code", "message", "details"}} shape. Common codes here:

  • 400 — malformed request, missing anthropic-version header, bad content block.
  • VALIDATION_4002 — missing required field (messages, max_tokens).
  • INFERENCE_3001 — model not found (after mapping).
  • INFERENCE_3103 — all providers failed.
  • INFERENCE_3207 — context window exceeded.
  • AUTH_1015 — insufficient scope.

Mid-stream errors arrive as an error SSE event with {"type": "error", "error": {...}}.

Count tokens for preflight sizing • From Anthropic for setup • Tool calling for the loop.

On this page