Melious
Concepts

Errors

Error-code structure, the common ones, and retry patterns that work

An error response always includes an HTTP status, a structured body, and enough information to decide whether to retry or fix your request.

Shape

{
  "error": {
    "code": "INFERENCE_3207",
    "message": "Input exceeds the model's context window",
    "details": { "context_length": 131072, "input_tokens": 164228 }
  }
}

code is the stable identifier — match on it, not on the message. Messages are for humans and may be edited. details is optional and varies per error.

Code ranges

Codes are prefixed by service with reserved number ranges:

PrefixRangeWhat it covers
AUTH_1xxxAuthentication, API keys, sessions, scopes, vault
BILLING_2xxxQuota, subscriptions, credits, energy, withdrawal
INFERENCE_3xxxModel capability, provider failures, routing, context
VALIDATION_4xxxRequest shape, formats, ranges, missing fields
SYSTEM_9xxxInternal — usually temporary

A few others exist (resource, kit, engine, hub) but most API callers only see the five above.

The common ones

CodeStatusMeaningAction
AUTH_1007401No API keyAdd Authorization: Bearer <key> or x-api-key.
AUTH_1008401Invalid API keyKey is malformed, expired, or revoked. Create a new one.
AUTH_1015403Insufficient scopeThe key exists but can't call this endpoint. details.required_scope tells you which.
AUTH_1028429Rate-limitedBack off per Retry-After. See Rate limits.
BILLING_2001402Out of energyTop up, or switch to a plan with api_usage.
BILLING_2003402Out of creditsSame as above, in credits.
INFERENCE_3001404Unknown model IDCheck GET /v1/models. Suffixes like :speed must match VALID_FLAVORS.
INFERENCE_3103502All providers failedTransient. Retry.
INFERENCE_3104503No providers availableYour filters excluded every provider. Loosen them.
INFERENCE_3107504Upstream timeoutTransient. Retry.
INFERENCE_3108429Provider rate-limitedTransient. Retry with backoff.
INFERENCE_3201400Model has no visionPick a vision-capable model — check _meta.capabilities.vision on GET /v1/models.
INFERENCE_3202400Model has no toolsSame — check _meta.capabilities.function_calling.
INFERENCE_3207400Context window exceededTrim the prompt or pick a model with more headroom.
INFERENCE_3208400Content rejected by safety filterCannot be retried verbatim.
VALIDATION_4002400Missing required fielddetails.field names it.
VALIDATION_4005400Invalid formatdetails.field names it.
VALIDATION_4008400Quota exceededPlan limit hit — see details.

This isn't the whole enum. The full list is in services/shared/exceptions.py — we don't paste it here because it grows faster than we can update docs.

Transient vs terminal

A rough rule that holds in practice:

  • Retry INFERENCE_3103, INFERENCE_3105 (provider error), INFERENCE_3107 (timeout), INFERENCE_3108 (provider rate limit), AUTH_1028 (our rate limit), and any 9xxx system error. Use exponential backoff with a cap.
  • Don't retry auth errors other than 1028, billing errors, validation errors, capability errors (INFERENCE_32013207), and content-policy rejections. Retrying won't change the outcome; fix the request.

We're still working on surfacing some upstream provider errors more cleanly — today a provider-side timeout can come through as INFERENCE_3105 with the provider's raw message in details. If that's rough, tell us.

A retry pattern

import time
import random
from openai import OpenAI, APIError, RateLimitError, APIStatusError

TRANSIENT = {"INFERENCE_3103", "INFERENCE_3105", "INFERENCE_3107", "INFERENCE_3108", "AUTH_1028"}

def call_with_retry(client: OpenAI, **kwargs):
    delay = 0.5
    for attempt in range(6):
        try:
            return client.chat.completions.create(**kwargs)
        except (RateLimitError, APIStatusError) as e:
            code = (e.body or {}).get("error", {}).get("code") if hasattr(e, "body") else None
            retry_after = getattr(e.response, "headers", {}).get("Retry-After")

            if code and code not in TRANSIENT:
                raise  # terminal — don't retry

            if retry_after:
                sleep_for = float(retry_after)
            else:
                sleep_for = delay * (1.5 ** attempt) + random.uniform(0, 0.25)
                sleep_for = min(sleep_for, 30)

            time.sleep(sleep_for)
    raise RuntimeError("transient retries exhausted")

Three things this gets right: honors Retry-After, respects the terminal/transient split by code, caps delay so a long outage doesn't eat your process.

Response-level backoff headers: Rate limits. Authentication-specific errors and scope setup: Authentication. Picking a model with the capability your request needs: Models.

On this page