Melious
Get started

From OpenAI

Point the OpenAI SDK at Melious — one line of config, one model-name swap

If you're already using the OpenAI SDK, switching to Melious is two changes: base_url and api_key. Model names have to change too, because we don't run OpenAI's proprietary models.

The diff

from openai import OpenAI

client = OpenAI(
-   api_key=os.environ["OPENAI_API_KEY"],
+   api_key=os.environ["MELIOUS_API_KEY"],
+   base_url="https://api.melious.ai/v1",
)

response = client.chat.completions.create(
-   model="gpt-4o",
+   model="glm-4.7",
    messages=[{"role": "user", "content": "Hello"}],
)
import OpenAI from "openai";

const client = new OpenAI({
-   apiKey: process.env.OPENAI_API_KEY,
+   apiKey: process.env.MELIOUS_API_KEY,
+   baseURL: "https://api.melious.ai/v1",
});

const response = await client.chat.completions.create({
-   model: "gpt-4o",
+   model: "glm-4.7",
  messages: [{ role: "user", content: "Hello" }],
});

That's the whole migration. Streaming, tool calling, vision, structured outputs, embeddings, audio — all work against the same shapes.

Picking a model

We don't run gpt-4o, gpt-4o-mini, o1, or any other OpenAI-proprietary model. We run open-weight models instead, and there's no automatic substitution — you pick one.

Rough mapping as a starting point:

You hadTry
gpt-4oglm-4.7, qwen3-235b-a22b-instruct, or deepseek-v3.2
gpt-4o-minigpt-oss-20b or a smaller Qwen/Llama variant
o1 / reasoningdeepseek-r1-0528, kimi-k2-thinking, or another thinking model
gpt-4o visionqwen3-vl-235b-a22b-instruct or mistral-small-3.2-24b-instruct
text-embedding-3-smallbge-m3 or qwen3-embedding-8b
whisper-1whisper-large-v3-turbo
dall-e-3flux-schnell or flux-dev

These are starting points, not claims of equivalence. Browse melious.ai/hub to compare context windows, pricing, and capabilities, or hit GET /v1/models?include_meta=true to filter programmatically. See Models for the broader stance.

Steering routing

The OpenAI SDK passes unknown fields through to us — so the Melious-specific extensions work without wrapping the client. Append a flavor suffix to bias routing toward speed, price, or a lower-carbon provider:

response = client.chat.completions.create(
    model="glm-4.7:eco",   # prefer providers running on greener power
    messages=[{"role": "user", "content": "Hello"}],
)

Full list of suffixes and what they do: Routing.

What's different at runtime

Three practical differences to know about:

Responses include extra fields. environment_impact and billing_cost ride on every response. The OpenAI SDK ignores unknown fields, so your code won't break — but they're there when you want them. See Environmental impact.

Rate limits are per plan, not per key. A tight quickstart key isn't a different limit than the one your production service is using — it's the same bucket. See Rate limits.

API-only accounts bill in credits. If your plan doesn't include API energy benefits, requests are paid in credits, not the energy pool that web sessions use. Pricing details in Pricing.

Prompt caching is not exposed. OpenAI's automatic caching isn't a user-facing feature on their side either, and we haven't added explicit cache control. If you want it, tell us.

When it breaks

A migration that "just worked" in three lines and then fails in production is usually one of these:

  • Model ID typo — our errors are explicit. INFERENCE_3001 means we don't have that ID. Check GET /v1/models.
  • Scope on the key — if you created a narrow API key (e.g. inference.chat only) and then tried embeddings, you'll see AUTH_1015. Add the scope in the dashboard. See Authentication.
  • Advanced models on a basic plan — a few models are gated. The error says which plan unlocks them.

Full error-code list and retry patterns: Errors.

On this page