Streaming

Both Chat completions and Messages stream token-by-token when you pass stream: true. Streaming works on every chat model — we don't really think of it as a feature, it's the default way to consume inference, and the non-streaming shape exists for batch jobs and scripts that need the whole response atomically.

Pick the shape that matches your client. They're different on the wire but conceptually the same.

Chat completions (OpenAI shape)

from openai import OpenAI

client = OpenAI(
    api_key="sk-mel-<YOUR_API_KEY>",
    base_url="https://api.melious.ai/v1",
)

stream = client.chat.completions.create(
    model="glm-4.7",
    messages=[{"role": "user", "content": "Write a haiku about Hamburg."}],
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
    if chunk.usage:
        print(f"\n[{chunk.usage.total_tokens} tokens]")

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "sk-mel-<YOUR_API_KEY>",
  baseURL: "https://api.melious.ai/v1",
});

const stream = await client.chat.completions.create({
  model: "glm-4.7",
  messages: [{ role: "user", content: "Write a haiku about Hamburg." }],
  stream: true,
  stream_options: { include_usage: true },
});

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content;
  if (delta) process.stdout.write(delta);
  if (chunk.usage) console.log(`\n[${chunk.usage.total_tokens} tokens]`);
}

const response = await fetch("https://api.melious.ai/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": "Bearer sk-mel-<YOUR_API_KEY>",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "glm-4.7",
    messages: [{ role: "user", content: "Write a haiku about Hamburg." }],
    stream: true,
  }),
});

const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = "";

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  buffer += decoder.decode(value, { stream: true });

  // SSE messages are separated by \n\n
  const messages = buffer.split("\n\n");
  buffer = messages.pop();  // keep incomplete last piece

  for (const msg of messages) {
    if (!msg.startsWith("data: ")) continue;
    const data = msg.slice(6);
    if (data === "[DONE]") return;
    const parsed = JSON.parse(data);
    process.stdout.write(parsed.choices[0]?.delta?.content ?? "");
  }
}

Chunk shape

Each SSE event carries a JSON chunk:

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1699999999,"model":"glm-4.7","choices":[{"index":0,"delta":{"content":"The"},"finish_reason":null}]}

The last non-DONE chunk usually has finish_reason: "stop" and — if you set stream_options.include_usage: true — a usage field with totals. The stream terminates with a literal data: [DONE]\n\n.

environment_impact and billing_cost don't appear on the wire during streaming; they're tracked internally and charged when the stream completes. Use the non-streaming shape if you need those fields inline.

Messages (Anthropic shape)

Anthropic's shape uses typed SSE events instead of a flat chunk:

event: message_start
data: {"type":"message_start","message":{...}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hamburg"}}

...

event: message_stop
data: {"type":"message_stop"}

from anthropic import Anthropic

client = Anthropic(
    api_key="sk-mel-<YOUR_API_KEY>",
    base_url="https://api.melious.ai",
)

with client.messages.stream(
    model="claude-sonnet-4",
    max_tokens=256,
    messages=[{"role": "user", "content": "Write a haiku about Hamburg."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  apiKey: "sk-mel-<YOUR_API_KEY>",
  baseURL: "https://api.melious.ai",
});

const stream = client.messages.stream({
  model: "claude-sonnet-4",
  max_tokens: 256,
  messages: [{ role: "user", content: "Write a haiku about Hamburg." }],
});

for await (const text of stream.textStream) {
  process.stdout.write(text);
}

The SDKs expose a high-level text_stream / textStream that skips past the event-type dispatch. If you're building a client by hand, dispatch on the event: field — that's what the SDK does internally.

Browser integration

A minimal React hook for streaming chat completions:

import { useState } from "react";

export function useChat() {
  const [text, setText] = useState("");
  const [loading, setLoading] = useState(false);

  async function send(prompt: string) {
    setText("");
    setLoading(true);
    try {
      const response = await fetch("/api/chat", {  // proxy through your backend
        method: "POST",
        body: JSON.stringify({ prompt }),
      });

      const reader = response.body!.getReader();
      const decoder = new TextDecoder();
      let buffer = "";

      while (true) {
        const { done, value } = await reader.read();
        if (done) break;
        buffer += decoder.decode(value, { stream: true });
        const lines = buffer.split("\n\n");
        buffer = lines.pop()!;
        for (const line of lines) {
          if (!line.startsWith("data: ")) continue;
          const data = line.slice(6);
          if (data === "[DONE]") return;
          const chunk = JSON.parse(data);
          const delta = chunk.choices[0]?.delta?.content ?? "";
          setText((t) => t + delta);
        }
      }
    } finally {
      setLoading(false);
    }
  }

  return { text, loading, send };
}

Never call Melious directly from the browser — your API key would be exposed. Proxy through your backend and forward the stream. The key stays server-side; the tokens flow to the browser.

Gotchas

Line buffering. SSE chunks aren't guaranteed to arrive one complete message at a time — you must buffer and split on \n\n, as shown above. Don't parse each value as JSON directly.
stream_options.include_usage. Without it, OpenAI-shape streams don't include the final usage field. If you want token counts, set it.
Mid-stream errors. You can receive an error chunk or event mid-stream if a provider fails after first byte. The connection closes afterward — treat it like a 5xx and retry.
Nginx buffering. If you're running your own reverse proxy, make sure it's not buffering the stream (we send X-Accel-Buffering: no to signal intent).

Tool calling covers streaming tool-use events specifically. Errors has the retry pattern for mid-stream failures.