Melious
Inference

Chat Completions

OpenAI-compatible chat completions with vision, function calling, and streaming

Chat Completions

Generate AI responses using the OpenAI-compatible chat completions endpoint. Supports text, vision, function calling, structured output, and streaming.

POST /v1/chat/completions

Quick Example

from openai import OpenAI

client = OpenAI(
    api_key="sk-mel-your-api-key-here",
    base_url="https://api.melious.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(response.choices[0].message.content)
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-mel-your-api-key-here',
  baseURL: 'https://api.melious.ai/v1'
});

const response = await client.chat.completions.create({
  model: 'gpt-oss-120b',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What is the capital of France?' }
  ]
});

console.log(response.choices[0].message.content);
curl https://api.melious.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-mel-your-api-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-oss-120b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'

Request Parameters

Required Parameters

ParameterTypeDescription
modelstringModel ID (e.g., gpt-oss-120b, qwen3-235b-a22b-instruct)
messagesarrayArray of message objects

Optional Parameters

ParameterTypeDefaultDescription
temperaturenumber1.0Sampling temperature (0-2)
max_tokensintegerModel maxMaximum tokens to generate
top_pnumber1.0Nucleus sampling parameter
frequency_penaltynumber0Frequency penalty (-2 to 2)
presence_penaltynumber0Presence penalty (-2 to 2)
stopstring/arraynullStop sequences
streambooleanfalseEnable streaming
ninteger1Number of completions
seedintegernullRandom seed for reproducibility
userstringnullEnd-user identifier
toolsarraynullAvailable function tools
tool_choicestring/objectautoTool selection mode
response_formatobjectnullStructured output format
logprobsbooleanfalseReturn log probabilities
top_logprobsintegernullNumber of top logprobs

Melious Extensions

ParameterTypeDefaultDescription
presetstringbalancedRouting preset (speed/price/quality/environment)
filtersobjectnullProvider constraints

Message Format

Message Roles

RoleDescription
systemSystem instructions (first message)
userUser input
assistantAI responses (for conversation history)
toolTool/function call results

Text Message

{
  "role": "user",
  "content": "Hello, how are you?"
}

Multi-turn Conversation

{
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is 2+2?"},
    {"role": "assistant", "content": "2+2 equals 4."},
    {"role": "user", "content": "And 3+3?"}
  ]
}

Vision (Image Input)

Send images for visual understanding with vision-capable models.

Supported Models

Vision-capable models that can process images:

  • mistral-small-3.2-24b-instruct (Google)
  • mistral-small-3.2-24b-instruct (Mistral)

Image URL

{
  "role": "user",
  "content": [
    {"type": "text", "text": "What's in this image?"},
    {
      "type": "image_url",
      "image_url": {"url": "https://example.com/image.jpg"}
    }
  ]
}

Base64 Image

{
  "role": "user",
  "content": [
    {"type": "text", "text": "Describe this image."},
    {
      "type": "image_url",
      "image_url": {"url": "..."}
    }
  ]
}

Multiple Images

{
  "role": "user",
  "content": [
    {"type": "text", "text": "Compare these two images."},
    {"type": "image_url", "image_url": {"url": "https://example.com/image1.jpg"}},
    {"type": "image_url", "image_url": {"url": "https://example.com/image2.jpg"}}
  ]
}

Python Example

response = client.chat.completions.create(
    model="mistral-small-3.2-24b-instruct",  # Vision-capable model
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {"url": "https://example.com/image.jpg"}
                }
            ]
        }
    ],
    max_tokens=300
)

Function Calling

Let models call your functions to perform actions or retrieve data.

Define Tools

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "City name, e.g., San Francisco, CA"
            },
            "unit": {
              "type": "string",
              "enum": ["celsius", "fahrenheit"],
              "description": "Temperature unit"
            }
          },
          "required": ["location"]
        }
      }
    }
  ]
}

Using Tool Slugs

Instead of defining tools inline, pass tool slugs from your collection:

{
  "model": "auto",
  "messages": [{"role": "user", "content": "Search for AI news"}],
  "tools": ["web-search", "scrape-url"]
}

Get available tools with GET /v1/tools/openai. Add tools to your collection first via POST /v1/user/tools/{slug}.

Mixed formats: Combine slugs with full definitions:

{
  "tools": [
    "web-search",
    {"type": "function", "function": {"name": "custom_tool", "description": "...", "parameters": {}}}
  ]
}

Tool Choice

ValueDescription
"auto"Model decides whether to call tools
"none"Never call tools
"required"Must call at least one tool
{"type": "function", "function": {"name": "get_weather"}}Force specific function

Complete Example

import json
from openai import OpenAI

client = OpenAI(
    api_key="sk-mel-your-api-key-here",
    base_url="https://api.melious.ai/v1"
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                },
                "required": ["location"]
            }
        }
    }
]

# Step 1: Initial request
response = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=tools
)

# Step 2: Check for tool calls
message = response.choices[0].message
if message.tool_calls:
    # Execute your function
    tool_call = message.tool_calls[0]
    args = json.loads(tool_call.function.arguments)

    # Your function implementation
    weather_result = {"temperature": 22, "condition": "sunny"}

    # Step 3: Send result back
    response = client.chat.completions.create(
        model="gpt-oss-120b",
        messages=[
            {"role": "user", "content": "What's the weather in Paris?"},
            message,
            {
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": json.dumps(weather_result)
            }
        ],
        tools=tools
    )

print(response.choices[0].message.content)
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-mel-your-api-key-here',
  baseURL: 'https://api.melious.ai/v1'
});

const tools = [
  {
    type: 'function',
    function: {
      name: 'get_weather',
      description: 'Get current weather',
      parameters: {
        type: 'object',
        properties: {
          location: { type: 'string' }
        },
        required: ['location']
      }
    }
  }
];

// Step 1: Initial request
let response = await client.chat.completions.create({
  model: 'gpt-oss-120b',
  messages: [{ role: 'user', content: "What's the weather in Paris?" }],
  tools
});

// Step 2: Check for tool calls
const message = response.choices[0].message;
if (message.tool_calls) {
  const toolCall = message.tool_calls[0];
  const args = JSON.parse(toolCall.function.arguments);

  // Your function implementation
  const weatherResult = { temperature: 22, condition: 'sunny' };

  // Step 3: Send result back
  response = await client.chat.completions.create({
    model: 'gpt-oss-120b',
    messages: [
      { role: 'user', content: "What's the weather in Paris?" },
      message,
      {
        role: 'tool',
        tool_call_id: toolCall.id,
        content: JSON.stringify(weatherResult)
      }
    ],
    tools
  });
}

console.log(response.choices[0].message.content);

Structured Output

Get responses in a specific JSON format.

JSON Object Mode

{
  "model": "gpt-oss-120b",
  "messages": [...],
  "response_format": {"type": "json_object"}
}

When using JSON mode, instruct the model in your prompt to produce JSON output.

JSON Schema Mode

{
  "model": "gpt-oss-120b",
  "messages": [
    {"role": "user", "content": "Extract the person's name and age."}
  ],
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "person_info",
      "strict": true,
      "schema": {
        "type": "object",
        "properties": {
          "name": {"type": "string"},
          "age": {"type": "integer"}
        },
        "required": ["name", "age"]
      }
    }
  }
}

Python Example

response = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[
        {"role": "user", "content": "List 3 famous scientists with their fields."}
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "scientists",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {
                    "scientists": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "name": {"type": "string"},
                                "field": {"type": "string"}
                            },
                            "required": ["name", "field"]
                        }
                    }
                },
                "required": ["scientists"]
            }
        }
    }
)

import json
data = json.loads(response.choices[0].message.content)

Automatic JSON Repair

Melious Feature: When using json_object or json_schema modes, malformed JSON responses are automatically repaired.

Melious automatically fixes common JSON issues from LLM outputs:

IssueExampleAuto-Fixed
Markdown wrapping```json {...}```{...}
Missing brackets{"key": "value"{"key": "value"}
Trailing commas{"a": 1, "b": 2,}{"a": 1, "b": 2}
Single quotes{'key': 'value'}{"key": "value"}
Unquoted keys{key: "value"}{"key": "value"}
Comments{"key": "value" // comment}{"key": "value"}

This means you get valid JSON even when models occasionally produce malformed output - no manual parsing or error handling required.


Streaming

Get responses in real-time as they're generated.

stream = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[{"role": "user", "content": "Write a short story."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
const stream = await client.chat.completions.create({
  model: 'gpt-oss-120b',
  messages: [{ role: 'user', content: 'Write a short story.' }],
  stream: true
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) process.stdout.write(content);
}
curl https://api.melious.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-mel-your-api-key-here" \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "model": "gpt-oss-120b",
    "messages": [{"role": "user", "content": "Write a short story."}],
    "stream": true
  }'

See Streaming for more details.


Response Format

Non-Streaming Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1699999999,
  "model": "gpt-oss-120b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop",
      "logprobs": null
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 8,
    "total_tokens": 23
  },
  "environment_impact": {
    "carbon_g_co2": 0.06,
    "water_liters": 0.0002,
    "energy_kwh": 0.00015,
    "renewable_percent": 85,
    "pue": 1.18,
    "provider_id": "nebius",
    "location": "NL"
  }
}

Finish Reasons

ReasonDescription
stopNatural end or stop sequence
lengthHit max_tokens limit
tool_callsModel wants to call tools
content_filterContent filtered

Routing Presets

Optimize routing based on your priorities:

{
  "model": "gpt-oss-120b",
  "messages": [...],
  "preset": "environment"
}
PresetOptimizes For
balancedAll metrics (default)
speedLowest latency
priceLowest cost
qualityHighest quality
environmentLowest carbon footprint

Provider Filters

Restrict to specific regions or constraints:

{
  "model": "gpt-oss-120b",
  "messages": [...],
  "filters": {
    "countries": ["NL", "FR", "DE"],
    "max_carbon_intensity": 200,
    "max_input_cost": 1.0,
    "min_speed_tps": 500
  }
}
FilterTypeDescription
countriesstring[]Allowed country codes (e.g., ["NL", "FR", "DE"])
max_carbon_intensitynumberMax g CO2/kWh grid intensity
max_input_costnumberMax EUR per million input tokens
min_speed_tpsnumberMin tokens per second

Error Handling

Common Errors

Error CodeDescriptionSolution
AUTH_INVALID_API_KEYInvalid API keyCheck your API key
BILLING_INSUFFICIENT_ENERGYNot enough energyTop up or wait for regeneration
VALIDATION_REQUIRED_FIELDMissing fieldAdd required model or messages
INFERENCE_PROVIDER_ERRORProvider failedRetry or use different preset

Error Response

{
  "error": {
    "message": "Model field is required",
    "type": "invalid_request_error",
    "code": "VALIDATION_REQUIRED_FIELD"
  }
}

Best Practices

Set Temperature Appropriately

  • 0.0-0.3: Factual, deterministic responses
  • 0.7-1.0: Creative, varied responses
  • 1.0-2.0: Very creative, may be incoherent

Use System Messages

Guide model behavior with clear system instructions:

messages=[
    {"role": "system", "content": "You are a helpful coding assistant. Always include code examples."},
    {"role": "user", "content": "..."}
]

Set max_tokens

Prevent unexpectedly long responses and control costs:

response = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[...],
    max_tokens=500
)

Enable Streaming for Long Responses

Improve perceived latency with streaming:

stream = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[...],
    stream=True
)

See Also

On this page