Melious
Concepts

Routing

How Melious picks which provider answers your request, and how to steer that choice

Every inference request is served by one of our European providers. Routing is the step that picks which one. You choose a flavor — balanced, speed, price, or eco — and Melious handles the rest.

The five flavors

Each flavor is a weighting across three dimensions: price, speed, and environment.

FlavorPriceSpeedEnvironmentWhat it's for
balanced40%40%20%Default for chat. Reasonable on all three.
speed20%70%10%Latency-sensitive apps — chat UIs, agents on the critical path.
price70%20%10%Bulk work where milliseconds don't matter. Default for embeddings.
eco20%20%60%Greener power first, even if it's slower.
batch80%5%15%Deep discount for async work. Use via Batches.

We default to balanced on chat because picking between fast and cheap is a call we didn't want to make for you. Embeddings default to price because the latency difference there is usually invisible and the cost difference isn't.

Two ways to choose a flavor

Suffix the model ID. Shortest, works in any client without wrapping.

client.chat.completions.create(
    model="glm-4.7:eco",
    messages=[...],
)

Valid suffixes are exactly the five above. Any other colon in a model ID is preserved as-is — we only strip the suffix when the last segment matches the list.

Use the preset field on chat. Semantically different from the suffix — preset takes "reasoning" or "non_reasoning" and maps to internal routing modes tuned for those shapes of work.

{
  "model": "glm-4.7",
  "messages": [...],
  "preset": "reasoning"
}
  • preset: "reasoning" — shorthand used by reasoning models (deepseek-r1, kimi-k2-thinking) to pick a quality-leaning route.
  • preset: "non_reasoning" — equivalent to :speed.

For embeddings, preset: "quality" picks a quality-leaning route; otherwise embeddings stay on the price default.

If a flavor suffix and a preset field are both set, the suffix wins.

What about country or carbon-intensity filters?

You'll see references to provider filters (countries, max_carbon_intensity, min_speed_tps) elsewhere in the ecosystem — these exist internally in the router but aren't exposed on the public HTTP request yet. The :eco flavor is the current path to "route to greener providers." If you need strict country constraints, tell us — it's on the list.

What happens when a provider fails

If you like feeling you've got one provider by the throat, you won't like how routing works.

A request can touch more than one provider. If the first pick errors or times out, we try the next-best provider in the ranking. Your response arrives from whichever one finishes successfully — the environment_impact.provider_id field tells you which one actually served it.

If every eligible provider fails, you get INFERENCE_3103 (all providers failed). Retry logic should treat it as transient. See Errors for patterns.

How weights are configured

Weights for each flavor are defaults compiled into the router, but an operator can override them via config keys inference.routing.weights.balanced, inference.routing.weights.speed, and so on. Each override is a JSON object with price, speed, and environment keys summing to 1.0. Out-of-range overrides fall back to the defaults — we'd rather route than crash.

See which countries host which providers in Providers. See how we compute the environment dimension in Environmental impact. Endpoint-level fields: Chat completions, Messages, Embeddings.

On this page