Melious
Inference

Inference API

OpenAI-compatible API for chat completions, embeddings, images, and audio

Inference API

The Melious Inference API provides OpenAI-compatible endpoints for AI model inference. Use the same code you'd write for OpenAI, with added benefits like multi-provider routing and environmental tracking.


Quick Start

from openai import OpenAI

client = OpenAI(
    api_key="sk-mel-your-api-key-here",
    base_url="https://api.melious.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[{"role": "user", "content": "Hello!"}]
)

Available Endpoints


Endpoint Summary

EndpointMethodDescription
/v1/chat/completionsPOSTChat completions with streaming, vision, and tools
/v1/embeddingsPOSTGenerate vector embeddings
/v1/rerankPOSTRerank documents by semantic relevance
/v1/images/generationsPOSTGenerate images from text
/v1/audio/transcriptionsPOSTSpeech-to-text transcription
/v1/modelsGETList available models
/v1/models/{id}GETGet specific model details

Authentication

All inference endpoints require an API key:

Authorization: Bearer sk-mel-your-api-key-here

Create API keys at melious.ai/account/api/keys.


OpenAI Compatibility

Melious is a drop-in replacement for the OpenAI API. All standard OpenAI parameters are supported:

{
  "model": "gpt-oss-120b",
  "messages": [{"role": "user", "content": "Hello!"}],
  "temperature": 0.7,
  "max_tokens": 1000,
  "top_p": 1,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "stream": false
}

Melious Extensions

In addition to standard OpenAI parameters, Melious adds:

Routing Presets

Optimize for different priorities:

{
  "model": "gpt-oss-120b",
  "messages": [...],
  "preset": "environment"
}
PresetDescription
balancedDefault - balance all metrics
speedLowest latency
priceLowest cost
qualityHighest quality
environmentLowest carbon footprint

Provider Filters

Restrict to specific regions or constraints:

{
  "model": "gpt-oss-120b",
  "messages": [...],
  "filters": {
    "countries": ["NL", "FR", "DE"],
    "max_carbon_intensity": 200,
    "min_speed_tps": 500
  }
}

Environmental Impact

Every response includes environmental metrics:

{
  "id": "chatcmpl-abc123",
  "choices": [...],
  "usage": {...},
  "environment_impact": {
    "carbon_g_co2": 0.06,
    "water_liters": 0.0002,
    "energy_kwh": 0.00015,
    "renewable_percent": 85,
    "pue": 1.18,
    "provider_id": "nebius",
    "location": "NL"
  }
}

Available Models

Chat Models (27)

ModelBrandContextVisionToolsReasoning
gpt-oss-120bMeta131KNoYesNo
llama-3.1-8b-instructMeta131KNoNoNo
qwen3-235b-a22b-instructQwen268KNoYesNo
qwen3-235b-a22b-thinkingQwen262KNoYesYes
qwen3-coder-480b-a35b-instructQwen262KNoYesNo
qwen3-coder-30b-a3b-instructQwen262KNoYesNo
qwen3-30b-a3b-instructQwen262KNoYesNo
qwen3-30b-a3b-thinkingQwen262KNoYesYes
qwen3-32bQwen40KNoYesHybrid
qwen3-4b-instructQwen262KNoYesNo
qwen3-4b-thinkingQwen262KNoYesYes
deepseek-r1-0528DeepSeek131KNoYesYes
kimi-k2-instructMoonshot134KNoYesNo
kimi-k2-thinkingMoonshot134KNoYesYes
hermes-4-405bNousResearch131KNoYesHybrid
hermes-4-70bNousResearch131KNoYesHybrid
gpt-oss-120bOpenAI (OSS)128KNoYesYes
gpt-oss-20bOpenAI (OSS)128KNoYesYes
gemma-3-27bGoogle131KYesYesNo
mistral-small-3.2-24b-instructMistral131KYesYesNo
devstral-small-2505Mistral131KNoYesNo
glm-4.5ZAI131KNoYesHybrid
glm-4.5-airZAI131KNoYesHybrid
glm-4.6ZAI131KNoYesHybrid
deepseek-ocrDeepSeek32KYesYesNo
nemotron-nano-v2-12bNVIDIA131KNoYesNo
intellect-3PrimeIntellect131KNoYesNo

Audio-Input Model (1)

ModelBrandContextDescription
voxtral-small-24b-2507Mistral131KAccepts audio input for chat

Embedding Models (8)

ModelBrandContext
bge-m3BAAI8K
bge-multilingual-gemma2BAAI8K
bge-en-iclBAAI32K
bge-large-en-v1.5BAAI512
bge-base-en-v1.5BAAI512
qwen3-embedding-8bQwen32K
e5-mistral-7b-instructintfloat32K
paraphrase-multilingual-mpnetSentence Transformers512

Reranking Models (1)

ModelBrandContext
bge-reranker-v2-m3BAAI8K

Image Models (6)

ModelBrandDescription
flux-devBlack Forest LabsHigh-quality generation
flux-schnellBlack Forest LabsFast generation
sdxl-base-1.0Stability AISDXL base model
sdxl-base-v10Stability AISDXL base v10
sdxl-lightning-4stepByteDanceFast 4-step generation
sdxl-lightning-8stepByteDanceQuality 8-step generation

Audio Models (2)

ModelBrandType
whisper-large-v3OpenAISpeech-to-Text
whisper-large-v3-turboOpenAISpeech-to-Text (fast)

All 46 models are open-source and hosted on European infrastructure. See Models for detailed capabilities.


Error Handling

All inference errors follow OpenAI's error format:

{
  "error": {
    "message": "Insufficient energy balance",
    "type": "insufficient_quota",
    "code": "BILLING_INSUFFICIENT_ENERGY"
  }
}

Common Error Codes

CodeHTTP StatusDescription
AUTH_INVALID_API_KEY401Invalid or missing API key
AUTH_INSUFFICIENT_PERMISSIONS403API key lacks required scope
BILLING_INSUFFICIENT_ENERGY403Not enough energy balance
VALIDATION_INVALID_VALUE400Invalid parameter value
VALIDATION_REQUIRED_FIELD400Missing required field
INFERENCE_PROVIDER_ERROR502Provider request failed
INFERENCE_NO_PROVIDERS_AVAILABLE503No providers match filters

Rate Limits

TierRequests/MinuteRequests/Day
Free605,000
Pro30050,000
EnterpriseCustomCustom

Rate limit headers are included in all responses:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1699999999

Best Practices

  1. Use streaming for long responses to improve perceived latency
  2. Set max_tokens to avoid unexpectedly long responses
  3. Handle rate limits with exponential backoff
  4. Cache embeddings for frequently used texts
  5. Use routing presets to optimize for your priorities
  6. Monitor environmental impact to track sustainability

See Also

On this page