Inference
Audio API
Speech-to-text transcription with Whisper models
Audio API
Transcribe audio to text using open-source Whisper models hosted on European infrastructure.
Privacy-first: All audio processing uses open-source models self-hosted on European infrastructure. Your audio files are never sent to external APIs.
Speech-to-Text (STT)
Transcribe audio files to text with high accuracy.
POST /v1/audio/transcriptionsQuick Example
from openai import OpenAI
client = OpenAI(
api_key="sk-mel-your-api-key-here",
base_url="https://api.melious.ai/v1"
)
with open("audio.mp3", "rb") as audio_file:
response = client.audio.transcriptions.create(
model="whisper-large-v3",
file=audio_file,
language="en"
)
print(response.text)import OpenAI from 'openai';
import fs from 'fs';
const client = new OpenAI({
apiKey: 'sk-mel-your-api-key-here',
baseURL: 'https://api.melious.ai/v1'
});
const response = await client.audio.transcriptions.create({
model: 'whisper-large-v3',
file: fs.createReadStream('audio.mp3'),
language: 'en'
});
console.log(response.text);curl https://api.melious.ai/v1/audio/transcriptions \
-H "Authorization: Bearer sk-mel-your-api-key-here" \
-F model="whisper-large-v3" \
-F file="@audio.mp3" \
-F language="en"Available Models
| Model | Brand | Description | Best For |
|---|---|---|---|
whisper-large-v3 | OpenAI | High-accuracy transcription | Maximum quality |
whisper-large-v3-turbo | OpenAI | Fast transcription | Speed priority |
Use whisper-large-v3-turbo for faster processing when speed is more important than slight accuracy improvements.
STT Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | Yes | - | whisper-large-v3 or whisper-large-v3-turbo |
file | file | Yes | - | Audio file (mp3, mp4, mpeg, mpga, m4a, wav, webm) |
language | string | No | auto | ISO-639-1 language code |
response_format | string | No | json | json, text, srt, vtt |
temperature | number | No | 0 | Sampling temperature (0-1) |
Supported Languages
Whisper supports 50+ languages including:
| Code | Language | Code | Language |
|---|---|---|---|
en | English | ja | Japanese |
es | Spanish | ko | Korean |
fr | French | pt | Portuguese |
de | German | ru | Russian |
it | Italian | zh | Chinese |
Response Formats
| Format | Description |
|---|---|
json | JSON with text field |
text | Plain text only |
srt | SubRip subtitle format |
vtt | WebVTT subtitle format |
JSON Response
{
"text": "Hello, this is a transcription of the audio file.",
"language": "en",
"duration": 5.2
}Supported Audio Formats
| Format | Extension | Max Size |
|---|---|---|
| MP3 | .mp3 | 25 MB |
| MP4 | .mp4 | 25 MB |
| MPEG | .mpeg | 25 MB |
| MPGA | .mpga | 25 MB |
| M4A | .m4a | 25 MB |
| WAV | .wav | 25 MB |
| WebM | .webm | 25 MB |
Audio Input Chat
For conversational AI with audio input, use the voxtral-small-24b-2507 model with the Chat Completions API:
from openai import OpenAI
import base64
client = OpenAI(
api_key="sk-mel-your-api-key-here",
base_url="https://api.melious.ai/v1"
)
# Read audio file and encode to base64
with open("question.mp3", "rb") as f:
audio_data = base64.b64encode(f.read()).decode()
response = client.chat.completions.create(
model="voxtral-small-24b-2507",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Please respond to this audio:"},
{
"type": "input_audio",
"input_audio": {
"data": audio_data,
"format": "mp3"
}
}
]
}
]
)
print(response.choices[0].message.content)voxtral-small-24b-2507 is an audio-input chat model from Mistral that can understand and respond to audio queries conversationally.
Error Handling
| Error Code | Description | Solution |
|---|---|---|
BILLING_INSUFFICIENT_ENERGY | Not enough balance | Top up credits |
VALIDATION_INVALID_VALUE | Invalid format or parameter | Check allowed values |
VALIDATION_FILE_TOO_LARGE | File exceeds 25 MB | Compress or split audio |
Best Practices
- Specify language when known for better accuracy
- Use turbo model for real-time applications where speed matters
- Keep files under 25 MB - split longer audio if needed
- Use MP3 format for best compatibility and smaller file sizes
See Also
- Chat Completions - Text generation
- Models - Available models