From OpenAI
Point the OpenAI SDK at Melious — one line of config, one model-name swap
If you're already using the OpenAI SDK, switching to Melious is two changes: base_url and api_key. Model names have to change too, because we don't run OpenAI's proprietary models.
The diff
from openai import OpenAI
client = OpenAI(
- api_key=os.environ["OPENAI_API_KEY"],
+ api_key=os.environ["MELIOUS_API_KEY"],
+ base_url="https://api.melious.ai/v1",
)
response = client.chat.completions.create(
- model="gpt-4o",
+ model="glm-4.7",
messages=[{"role": "user", "content": "Hello"}],
)import OpenAI from "openai";
const client = new OpenAI({
- apiKey: process.env.OPENAI_API_KEY,
+ apiKey: process.env.MELIOUS_API_KEY,
+ baseURL: "https://api.melious.ai/v1",
});
const response = await client.chat.completions.create({
- model: "gpt-4o",
+ model: "glm-4.7",
messages: [{ role: "user", content: "Hello" }],
});That's the whole migration. Streaming, tool calling, vision, structured outputs, embeddings, audio — all work against the same shapes.
Picking a model
We don't run gpt-4o, gpt-4o-mini, o1, or any other OpenAI-proprietary model. We run open-weight models instead, and there's no automatic substitution — you pick one.
Rough mapping as a starting point:
| You had | Try |
|---|---|
gpt-4o | glm-4.7, qwen3-235b-a22b-instruct, or deepseek-v3.2 |
gpt-4o-mini | gpt-oss-20b or a smaller Qwen/Llama variant |
o1 / reasoning | deepseek-r1-0528, kimi-k2-thinking, or another thinking model |
gpt-4o vision | qwen3-vl-235b-a22b-instruct or mistral-small-3.2-24b-instruct |
text-embedding-3-small | bge-m3 or qwen3-embedding-8b |
whisper-1 | whisper-large-v3-turbo |
dall-e-3 | flux-schnell or flux-dev |
These are starting points, not claims of equivalence. Browse melious.ai/hub to compare context windows, pricing, and capabilities, or hit GET /v1/models?include_meta=true to filter programmatically. See Models for the broader stance.
Steering routing
The OpenAI SDK passes unknown fields through to us — so the Melious-specific extensions work without wrapping the client. Append a flavor suffix to bias routing toward speed, price, or a lower-carbon provider:
response = client.chat.completions.create(
model="glm-4.7:eco", # prefer providers running on greener power
messages=[{"role": "user", "content": "Hello"}],
)Full list of suffixes and what they do: Routing.
What's different at runtime
Three practical differences to know about:
Responses include extra fields. environment_impact and billing_cost ride on every response. The OpenAI SDK ignores unknown fields, so your code won't break — but they're there when you want them. See Environmental impact.
Rate limits are per plan, not per key. A tight quickstart key isn't a different limit than the one your production service is using — it's the same bucket. See Rate limits.
API-only accounts bill in credits. If your plan doesn't include API energy benefits, requests are paid in credits, not the energy pool that web sessions use. Pricing details in Pricing.
Prompt caching is not exposed. OpenAI's automatic caching isn't a user-facing feature on their side either, and we haven't added explicit cache control. If you want it, tell us.
When it breaks
A migration that "just worked" in three lines and then fails in production is usually one of these:
- Model ID typo — our errors are explicit.
INFERENCE_3001means we don't have that ID. CheckGET /v1/models. - Scope on the key — if you created a narrow API key (e.g.
inference.chatonly) and then tried embeddings, you'll seeAUTH_1015. Add the scope in the dashboard. See Authentication. - Advanced models on a basic plan — a few models are gated. The error says which plan unlocks them.
Full error-code list and retry patterns: Errors.