Embeddings

Generate vector embeddings from text. OpenAI-compatible request and response.

Endpoint:

POST /v1/embeddings

Auth: Bearer token or x-api-key. Requires scope inference.embeddings.

Example

from openai import OpenAI

client = OpenAI(
    api_key="sk-mel-<YOUR_API_KEY>",
    base_url="https://api.melious.ai/v1",
)

response = client.embeddings.create(
    model="bge-m3",
    input=["A Hanseatic city is a city that...", "Hamburg is a port city..."],
)
print(len(response.data), "vectors,", len(response.data[0].embedding), "dims")

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "sk-mel-<YOUR_API_KEY>",
  baseURL: "https://api.melious.ai/v1",
});

const response = await client.embeddings.create({
  model: "bge-m3",
  input: ["A Hanseatic city is a city that...", "Hamburg is a port city..."],
});
console.log(response.data.length, "vectors,", response.data[0].embedding.length, "dims");

curl https://api.melious.ai/v1/embeddings \
  -H "Authorization: Bearer sk-mel-<YOUR_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bge-m3",
    "input": ["A Hanseatic city is a city that...", "Hamburg is a port city..."]
  }'

Request

Parameter	Type	Default	Description
`model`	string	—	Model ID, optionally with a flavor suffix. See Routing.
`input`	string \| array	—	Text to embed. Arrays run as a single batched call.
`encoding_format`	string	`"float"`	`"float"` for `number[]`, `"base64"` for base64-encoded `float32`.
`dimensions`	integer	model default	Request a specific dimensionality (model must support truncation).
`user`	string	none	End-user identifier.
`preset`	string	none	`"quality"` biases routing toward higher-quality providers.

Embeddings default to price routing — latency differences between providers on embeddings are usually invisible and price differences aren't. Override with :speed or preset: "quality" if that's wrong for your workload.

Which models support embeddings? Filter at melious.ai/hub by type, or call GET /v1/models?include_meta=true and check _meta.type == "embeddings".

Response

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.012, -0.034, ...]
    },
    { "object": "embedding", "index": 1, "embedding": [...] }
  ],
  "model": "bge-m3",
  "usage": { "prompt_tokens": 24, "total_tokens": 24 },
  "environment_impact": { "energy_kwh": 0.00001, "carbon_g_co2": 0.004, "water_liters": 0.00002, "renewable_percent": 92, "pue": 1.15, "provider_id": "ionos", "location": "DE" },
  "billing_cost": { "energy": "0.0000", "credits": "0.0", "paid_with": "energy" }
}

`data[].embedding`

Array of floats when encoding_format: "float". Base64 of little-endian float32 bytes when "base64" — decode with numpy.frombuffer(base64.b64decode(s), dtype="<f4") or equivalent.

Embeddings from this endpoint are generally not normalized to unit length. For cosine similarity, normalize on your side:

import numpy as np

v = np.array(response.data[0].embedding)
v /= np.linalg.norm(v)

`usage.total_tokens`

Same as prompt_tokens — there are no output tokens on embeddings. Kept for OpenAI shape compatibility.

Errors

VALIDATION_4002 — model or input missing.
VALIDATION_4005 — empty string or array.
INFERENCE_3001 — unknown model.
INFERENCE_3207 — any single input exceeds the model's max context.
AUTH_1015 — missing inference.embeddings scope.

Two-stage retrieval with Rerank • Models for capability discovery • Routing for flavors.

Embeddings

Example

Request

Response

data[].embedding

usage.total_tokens

Errors

Related

On this page

`data[].embedding`

`usage.total_tokens`