Melious
API Reference

Transcription

POST /v1/audio/transcriptions — turn speech into text, OpenAI-compatible

Turn an audio file into text. OpenAI-compatible request and response, Whisper-class models underneath.

Endpoint:

POST /v1/audio/transcriptions

Auth: Bearer token or x-api-key. Requires scope inference.audio. Content-Type: multipart/form-data. Max file size: 25 MB.

Example

from openai import OpenAI

client = OpenAI(
    api_key="sk-mel-<YOUR_API_KEY>",
    base_url="https://api.melious.ai/v1",
)

with open("meeting.mp3", "rb") as f:
    result = client.audio.transcriptions.create(
        model="<STT_MODEL_ID>",   # a transcription model from the hub
        file=f,
        language="de",
    )
print(result.text)
curl https://api.melious.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-mel-<YOUR_API_KEY>" \
  -F model="<STT_MODEL_ID>" \
  -F language="de" \
  -F file=@meeting.mp3

Pick a current transcription model ID from melious.ai/hub/models (filter by audio), or call GET /v1/models?include_meta=true and look for _meta.type == "audio".

Request (multipart fields)

FieldTypeDefaultDescription
filefileAudio file. Formats: mp3, mp4, mpeg, mpga, m4a, wav, webm.
modelstringTranscription model ID.
languagestringauto-detectISO-639-1 code ("de", "fr", "en", …).
response_formatstring"json""json", "text", "srt", "vtt", "verbose_json".
temperaturenumber0Sampling temperature, [0, 1].

Whisper-class models handle 50+ languages and auto-detect by default. If you know the language, set it — detection adds a few hundred milliseconds and occasionally picks wrong for short clips.

Response

response_format: "json" (default):

{
  "text": "Hamburg, Lübeck, Bremen.",
  "language": "de",
  "duration": 2.1
}

verbose_json adds a segments array with per-segment timestamps and confidence. text returns a plain-text body; srt and vtt return subtitle files.

What about translations?

OpenAI's /v1/audio/translations endpoint (transcribe and translate to English in one step) isn't implemented. Workaround: transcribe in the source language, then pipe the text through Chat completions with a translation prompt.

Audio as chat input

A different path from transcription: some chat models accept audio directly as message content, then reason about it rather than just transcribing. That's not this endpoint — it's Chat completions with audio content blocks. Check _meta.capabilities.audio_input on GET /v1/models/{id}?include_meta=true to find those models.

Errors

  • VALIDATION_4016 — file exceeds 25 MB.
  • VALIDATION_4005 — unsupported audio format.
  • INFERENCE_3001 — unknown model.
  • AUTH_1015 — missing the inference.audio scope.

Models for transcription-model discovery • Routing for cost savings on bulk transcription.

On this page