Audio API

Transcribe audio to text using open-source Whisper models hosted on European infrastructure.

Privacy-first: All audio processing uses open-source models self-hosted on European infrastructure. Your audio files are never sent to external APIs.

Speech-to-Text (STT)

Transcribe audio files to text with high accuracy.

POST /v1/audio/transcriptions

Quick Example

from openai import OpenAI

client = OpenAI(
    api_key="sk-mel-your-api-key-here",
    base_url="https://api.melious.ai/v1"
)

with open("audio.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-large-v3",
        file=audio_file,
        language="en"
    )

print(response.text)

import OpenAI from 'openai';
import fs from 'fs';

const client = new OpenAI({
  apiKey: 'sk-mel-your-api-key-here',
  baseURL: 'https://api.melious.ai/v1'
});

const response = await client.audio.transcriptions.create({
  model: 'whisper-large-v3',
  file: fs.createReadStream('audio.mp3'),
  language: 'en'
});

console.log(response.text);

curl https://api.melious.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-mel-your-api-key-here" \
  -F model="whisper-large-v3" \
  -F file="@audio.mp3" \
  -F language="en"

Available Models

Model	Brand	Description	Best For
`whisper-large-v3`	OpenAI	High-accuracy transcription	Maximum quality
`whisper-large-v3-turbo`	OpenAI	Fast transcription	Speed priority

Use whisper-large-v3-turbo for faster processing when speed is more important than slight accuracy improvements.

STT Parameters

Parameter	Type	Required	Default	Description
`model`	string	Yes	-	`whisper-large-v3` or `whisper-large-v3-turbo`
`file`	file	Yes	-	Audio file (mp3, mp4, mpeg, mpga, m4a, wav, webm)
`language`	string	No	auto	ISO-639-1 language code
`response_format`	string	No	`json`	`json`, `text`, `srt`, `vtt`
`temperature`	number	No	0	Sampling temperature (0-1)

Supported Languages

Whisper supports 50+ languages including:

Code	Language	Code	Language
`en`	English	`ja`	Japanese
`es`	Spanish	`ko`	Korean
`fr`	French	`pt`	Portuguese
`de`	German	`ru`	Russian
`it`	Italian	`zh`	Chinese

Response Formats

Format	Description
`json`	JSON with text field
`text`	Plain text only
`srt`	SubRip subtitle format
`vtt`	WebVTT subtitle format

JSON Response

{
  "text": "Hello, this is a transcription of the audio file.",
  "language": "en",
  "duration": 5.2
}

Supported Audio Formats

Format	Extension	Max Size
MP3	.mp3	25 MB
MP4	.mp4	25 MB
MPEG	.mpeg	25 MB
MPGA	.mpga	25 MB
M4A	.m4a	25 MB
WAV	.wav	25 MB
WebM	.webm	25 MB

Audio Input Chat

For conversational AI with audio input, use the voxtral-small-24b-2507 model with the Chat Completions API:

from openai import OpenAI
import base64

client = OpenAI(
    api_key="sk-mel-your-api-key-here",
    base_url="https://api.melious.ai/v1"
)

# Read audio file and encode to base64
with open("question.mp3", "rb") as f:
    audio_data = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="voxtral-small-24b-2507",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Please respond to this audio:"},
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": audio_data,
                        "format": "mp3"
                    }
                }
            ]
        }
    ]
)

print(response.choices[0].message.content)

voxtral-small-24b-2507 is an audio-input chat model from Mistral that can understand and respond to audio queries conversationally.

Error Handling

Error Code	Description	Solution
`BILLING_INSUFFICIENT_ENERGY`	Not enough balance	Top up credits
`VALIDATION_INVALID_VALUE`	Invalid format or parameter	Check allowed values
`VALIDATION_FILE_TOO_LARGE`	File exceeds 25 MB	Compress or split audio

Best Practices

Specify language when known for better accuracy
Use turbo model for real-time applications where speed matters
Keep files under 25 MB - split longer audio if needed
Use MP3 format for best compatibility and smaller file sizes

Audio API

Audio API

Speech-to-Text (STT)

Quick Example

Available Models

STT Parameters

Supported Languages

Response Formats

JSON Response

Supported Audio Formats

Audio Input Chat

Error Handling

Best Practices

See Also

On this page