Melious
API Reference

Audio

POST /v1/audio/speech and /v1/audio/transcriptions — TTS and STT

Two endpoints: text to speech, and speech to text. OpenAI-compatible shapes for both.

Speech (TTS)

Generate audio from text.

Endpoint:

POST /v1/audio/speech

Auth: Bearer token or x-api-key. Requires scope inference.audio.

Example

from openai import OpenAI

client = OpenAI(
    api_key="sk-mel-<YOUR_API_KEY>",
    base_url="https://api.melious.ai/v1",
)

audio = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input="Hamburg, Lübeck, Bremen.",
)
audio.stream_to_file("out.mp3")
curl https://api.melious.ai/v1/audio/speech \
  -H "Authorization: Bearer sk-mel-<YOUR_API_KEY>" \
  -H "Content-Type: application/json" \
  -o out.mp3 \
  -d '{
    "model": "tts-1",
    "voice": "alloy",
    "input": "Hamburg, Lübeck, Bremen."
  }'

Request

ParameterTypeDefaultDescription
inputstringText to speak.
modelstring"tts-1"TTS model ID.
voicestring"alloy", "echo", "fable", "onyx", "nova", or "shimmer".
response_formatstring"mp3""mp3", "opus", "aac", "flac", "pcm", "wav".
speednumber1.0Playback speed, [0.25, 4.0].
userstringnoneEnd-user identifier.

Response

Binary audio data with Content-Type: audio/<format> — no JSON wrapper. Save directly to disk.

Transcriptions (STT)

Turn audio into text.

Endpoint:

POST /v1/audio/transcriptions

Auth: Bearer token or x-api-key. Requires scope inference.audio. Content-Type: multipart/form-data. Max file size: 25 MB.

Example

from openai import OpenAI

client = OpenAI(
    api_key="sk-mel-<YOUR_API_KEY>",
    base_url="https://api.melious.ai/v1",
)

with open("meeting.mp3", "rb") as f:
    result = client.audio.transcriptions.create(
        model="whisper-large-v3-turbo",
        file=f,
        language="de",
    )
print(result.text)
curl https://api.melious.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-mel-<YOUR_API_KEY>" \
  -F model="whisper-large-v3-turbo" \
  -F language="de" \
  -F file=@meeting.mp3

Request (multipart fields)

FieldTypeDefaultDescription
filefileAudio file. Formats: mp3, mp4, mpeg, mpga, m4a, wav, webm.
modelstring"whisper-large-v3-turbo"STT model ID.
languagestringauto-detectISO-639-1 code ("de", "fr", "en", …).
response_formatstring"json""json", "text", "srt", "vtt", "verbose_json".
temperaturenumber0Sampling temperature, [0, 1].

Whisper supports 50+ languages auto-detected. If you know the language, set it — detection adds a few hundred milliseconds and occasionally picks wrong for short clips.

Response

response_format: "json" (default):

{
  "text": "Hamburg, Lübeck, Bremen.",
  "language": "de",
  "duration": 2.1
}

response_format: "verbose_json" adds segments with per-segment timestamps and confidence.

response_format: "text" returns a plain-text body. "srt" and "vtt" return subtitle files.

What about translations?

OpenAI's /v1/audio/translations endpoint (translate audio to English) isn't implemented. Workaround: transcribe with whisper-large-v3-turbo (it auto-handles language), then pipe the text through Chat completions with a translation prompt.

Audio-in chat

A different path from STT: some chat models accept audio as message content directly (e.g. Voxtral variants). That's not this endpoint — it's Chat completions with audio content blocks. Check _meta.capabilities.audio_input on GET /v1/models/{id}?include_meta=true to find them.

Errors

  • VALIDATION_4016 — file exceeds 25 MB.
  • VALIDATION_4005 — unsupported audio format.
  • INFERENCE_3001 — unknown model.
  • AUTH_1015 — missing inference.audio scope.

Models for STT/TTS model discovery • Routing for bulk transcription cost savings.

On this page