Melious
Inference

Audio API

Speech-to-text transcription with Whisper models

Audio API

Transcribe audio to text using open-source Whisper models hosted on European infrastructure.

Privacy-first: All audio processing uses open-source models self-hosted on European infrastructure. Your audio files are never sent to external APIs.


Speech-to-Text (STT)

Transcribe audio files to text with high accuracy.

POST /v1/audio/transcriptions

Quick Example

from openai import OpenAI

client = OpenAI(
    api_key="sk-mel-your-api-key-here",
    base_url="https://api.melious.ai/v1"
)

with open("audio.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-large-v3",
        file=audio_file,
        language="en"
    )

print(response.text)
import OpenAI from 'openai';
import fs from 'fs';

const client = new OpenAI({
  apiKey: 'sk-mel-your-api-key-here',
  baseURL: 'https://api.melious.ai/v1'
});

const response = await client.audio.transcriptions.create({
  model: 'whisper-large-v3',
  file: fs.createReadStream('audio.mp3'),
  language: 'en'
});

console.log(response.text);
curl https://api.melious.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-mel-your-api-key-here" \
  -F model="whisper-large-v3" \
  -F file="@audio.mp3" \
  -F language="en"

Available Models

ModelBrandDescriptionBest For
whisper-large-v3OpenAIHigh-accuracy transcriptionMaximum quality
whisper-large-v3-turboOpenAIFast transcriptionSpeed priority

Use whisper-large-v3-turbo for faster processing when speed is more important than slight accuracy improvements.


STT Parameters

ParameterTypeRequiredDefaultDescription
modelstringYes-whisper-large-v3 or whisper-large-v3-turbo
filefileYes-Audio file (mp3, mp4, mpeg, mpga, m4a, wav, webm)
languagestringNoautoISO-639-1 language code
response_formatstringNojsonjson, text, srt, vtt
temperaturenumberNo0Sampling temperature (0-1)

Supported Languages

Whisper supports 50+ languages including:

CodeLanguageCodeLanguage
enEnglishjaJapanese
esSpanishkoKorean
frFrenchptPortuguese
deGermanruRussian
itItalianzhChinese

Response Formats

FormatDescription
jsonJSON with text field
textPlain text only
srtSubRip subtitle format
vttWebVTT subtitle format

JSON Response

{
  "text": "Hello, this is a transcription of the audio file.",
  "language": "en",
  "duration": 5.2
}

Supported Audio Formats

FormatExtensionMax Size
MP3.mp325 MB
MP4.mp425 MB
MPEG.mpeg25 MB
MPGA.mpga25 MB
M4A.m4a25 MB
WAV.wav25 MB
WebM.webm25 MB

Audio Input Chat

For conversational AI with audio input, use the voxtral-small-24b-2507 model with the Chat Completions API:

from openai import OpenAI
import base64

client = OpenAI(
    api_key="sk-mel-your-api-key-here",
    base_url="https://api.melious.ai/v1"
)

# Read audio file and encode to base64
with open("question.mp3", "rb") as f:
    audio_data = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="voxtral-small-24b-2507",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Please respond to this audio:"},
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": audio_data,
                        "format": "mp3"
                    }
                }
            ]
        }
    ]
)

print(response.choices[0].message.content)

voxtral-small-24b-2507 is an audio-input chat model from Mistral that can understand and respond to audio queries conversationally.


Error Handling

Error CodeDescriptionSolution
BILLING_INSUFFICIENT_ENERGYNot enough balanceTop up credits
VALIDATION_INVALID_VALUEInvalid format or parameterCheck allowed values
VALIDATION_FILE_TOO_LARGEFile exceeds 25 MBCompress or split audio

Best Practices

  1. Specify language when known for better accuracy
  2. Use turbo model for real-time applications where speed matters
  3. Keep files under 25 MB - split longer audio if needed
  4. Use MP3 format for best compatibility and smaller file sizes

See Also

On this page