Reranking
Rerank documents by semantic relevance using Cohere-compatible API
Overview
Rerank documents by semantic relevance to a query using state-of-the-art reranking models. Perfect for improving search results, RAG (Retrieval-Augmented Generation) context selection, and document similarity ranking.
Key Features:
- Cohere-compatible API for easy migration
- Multi-provider routing for best price/performance
- Environment impact tracking (CO2, energy, water)
- Supports string or object document formats
- Automatic failover and retry logic
Reranking models score documents by semantic similarity to a query, returning them sorted by relevance. Use reranking after initial retrieval (e.g., from embeddings search) to improve precision.
Authentication
Required: API Key
All requests must include your Melious API key in the Authorization header:
Authorization: Bearer {your_api_key}Permissions: inference.rerank scope required.
Endpoints
Create Rerank
POST /v1/rerankRerank documents by semantic relevance to a query.
Request Body:
{
"model": "bge-reranker-v2-m3",
"query": "What is machine learning?",
"documents": [
"Machine learning is a subset of AI",
"The weather is nice today",
"Neural networks process data"
],
"top_n": 2,
"return_documents": true
}Request Fields:
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID (e.g., bge-reranker-v2-m3) |
query | string | Yes | The query to rank documents against |
documents | string[] | object[] | Yes | Documents to rerank (strings or {text: "..."} objects) |
top_n | integer | No | Return only top N results (default: all documents) |
return_documents | boolean | No | Include document text in response (default: true) |
max_chunks_per_doc | integer | No | Max chunks for long documents |
user | string | No | End-user identifier for abuse monitoring |
| Melious Extensions | |||
mode | string | No | Routing mode: "balanced", "speed", "price", "quality", "environment" |
custom_weights | object | No | Custom routing weights (mutually exclusive with mode) |
filters | object | No | Hard constraints for provider selection |
Document Formats:
// String format (simple)
{
"documents": ["doc 1", "doc 2", "doc 3"]
}
// Object format (for additional metadata)
{
"documents": [
{"text": "doc 1"},
{"text": "doc 2"},
{"text": "doc 3"}
]
}Response (200 OK):
{
"id": "rerank-abc123",
"results": [
{
"index": 0,
"relevance_score": 0.95,
"document": {"text": "Machine learning is a subset of AI"}
},
{
"index": 2,
"relevance_score": 0.82,
"document": {"text": "Neural networks process data"}
}
],
"meta": {
"api_version": {"version": "1"}
},
"usage": {
"prompt_tokens": 45,
"total_tokens": 45
},
"environment_impact": {
"energy_kwh": 0.0001,
"carbon_g_co2": 0.04,
"water_liters": 0.0001,
"renewable_percent": 95,
"pue": 1.15,
"provider_id": "berget",
"location": "SE"
}
}Response Fields:
| Field | Type | Description |
|---|---|---|
id | string | Unique request identifier |
results | array | Reranked documents sorted by relevance |
results[].index | integer | Original index in input documents array |
results[].relevance_score | float | Relevance score (0-1, higher is more relevant) |
results[].document | object | Document text (if return_documents: true) |
meta | object | API metadata |
usage | object | Token usage statistics |
usage.prompt_tokens | integer | Input tokens processed |
usage.total_tokens | integer | Total tokens (same as prompt_tokens for reranking) |
environment_impact | object | Environmental metrics (Melious extension) |
Status Codes:
| Code | Description |
|---|---|
200 | Success |
400 | Bad request - invalid parameters |
401 | Unauthorized - missing/invalid API key |
403 | Forbidden - insufficient permissions or energy |
429 | Rate limit exceeded |
500 | Internal server error |
Code Examples
import httpx
import asyncio
async def rerank_documents():
"""Rerank documents by relevance to a query."""
async with httpx.AsyncClient() as client:
response = await client.post(
"https://api.melious.ai/v1/rerank",
headers={"Authorization": "Bearer your_api_key"},
json={
"model": "bge-reranker-v2-m3",
"query": "What is machine learning?",
"documents": [
"Machine learning is a subset of AI",
"The weather is nice today",
"Neural networks process data"
],
"top_n": 2
}
)
data = response.json()
print("Reranked results:")
for result in data["results"]:
print(f" [{result['index']}] Score: {result['relevance_score']:.3f}")
print(f" {result['document']['text'][:50]}...")
print(f"\nCO2 emissions: {data['environment_impact']['carbon_g_co2']:.2f}g")
return data["results"]
# Example usage
asyncio.run(rerank_documents())// Rerank documents by relevance to a query
const rerankDocuments = async () => {
const response = await fetch(
'https://api.melious.ai/v1/rerank',
{
method: 'POST',
headers: {
'Authorization': 'Bearer your_api_key',
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'bge-reranker-v2-m3',
query: 'What is machine learning?',
documents: [
'Machine learning is a subset of AI',
'The weather is nice today',
'Neural networks process data'
],
top_n: 2
})
}
);
const data = await response.json();
console.log('Reranked results:');
for (const result of data.results) {
console.log(` [${result.index}] Score: ${result.relevance_score.toFixed(3)}`);
console.log(` ${result.document.text.slice(0, 50)}...`);
}
console.log(`\nCO2 emissions: ${data.environment_impact.carbon_g_co2.toFixed(2)}g`);
return data.results;
};
// Example usage
rerankDocuments();# Rerank documents
curl -X POST "https://api.melious.ai/v1/rerank" \
-H "Authorization: Bearer your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "bge-reranker-v2-m3",
"query": "What is machine learning?",
"documents": [
"Machine learning is a subset of AI",
"The weather is nice today",
"Neural networks process data"
],
"top_n": 2,
"return_documents": true
}'
# With routing optimization
curl -X POST "https://api.melious.ai/v1/rerank" \
-H "Authorization: Bearer your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "bge-reranker-v2-m3",
"query": "What is deep learning?",
"documents": ["doc1", "doc2", "doc3"],
"mode": "environment",
"filters": {
"countries": ["SE", "NL", "DE"]
}
}'Error Handling
Handle errors gracefully by checking status codes and error messages. Implement exponential backoff for transient errors (5xx, 429).
Common Errors:
| Error Code | Description | Solution |
|---|---|---|
AUTH_INVALID_API_KEY | Invalid API key | Verify API key is correct and active |
VALIDATION_REQUIRED_FIELD | Missing required field | Ensure model, query, and documents are provided |
VALIDATION_INVALID_VALUE | Invalid parameter | Check request body matches documentation |
INFERENCE_PROVIDER_ERROR | Provider request failed | Retry with exponential backoff or change routing mode |
BILLING_INSUFFICIENT_ENERGY | Not enough energy | Top up balance or upgrade plan |
Error Response Format:
{
"status": "error",
"code": "VALIDATION_REQUIRED_FIELD",
"message": "Query field is required",
"details": {
"field": "query"
}
}Best Practices
Use Two-Stage Retrieval
Combine embeddings search with reranking for optimal results:
# Stage 1: Fast retrieval with embeddings (get top 100)
candidates = await vector_search(query, top_k=100)
# Stage 2: Precise reranking (select top 10)
reranked = await client.post("/v1/rerank", json={
"model": "bge-reranker-v2-m3",
"query": query,
"documents": [c["text"] for c in candidates],
"top_n": 10
})Limit Documents with top_n
Only return the documents you need to reduce response size and cost:
# Only get top 5 most relevant
response = await client.post("/v1/rerank", json={
"model": "bge-reranker-v2-m3",
"query": "your query",
"documents": documents,
"top_n": 5
})Skip Document Text When Not Needed
If you only need indices and scores, disable document return:
response = await client.post("/v1/rerank", json={
"model": "bge-reranker-v2-m3",
"query": "your query",
"documents": documents,
"return_documents": False # Smaller response
})
# Results contain index and score only
for result in response["results"]:
original_doc = documents[result["index"]]Choose Appropriate Models
| Model | Brand | Best For |
|---|---|---|
bge-reranker-v2-m3 | BAAI | Multilingual, general purpose |
Use Cases
RAG Context Selection
async def rag_with_reranking(query: str, knowledge_base: list[str]):
"""Improve RAG quality with reranking."""
# Step 1: Get candidate documents (embeddings or keyword search)
candidates = await search_knowledge_base(query, top_k=50)
# Step 2: Rerank for precise relevance
reranked = await client.post("/v1/rerank", json={
"model": "bge-reranker-v2-m3",
"query": query,
"documents": candidates,
"top_n": 5
})
# Step 3: Use top results as context
context = "\n\n".join([
r["document"]["text"]
for r in reranked.json()["results"]
])
# Step 4: Generate response with context
response = await client.post("/v1/chat/completions", json={
"model": "gpt-oss-120b",
"messages": [
{"role": "system", "content": f"Context:\n{context}"},
{"role": "user", "content": query}
]
})
return response.json()["choices"][0]["message"]["content"]Search Result Reordering
async def reorder_search_results(query: str, search_results: list[dict]):
"""Reorder search results by semantic relevance."""
texts = [r["title"] + " " + r["snippet"] for r in search_results]
reranked = await client.post("/v1/rerank", json={
"model": "bge-reranker-v2-m3",
"query": query,
"documents": texts,
"return_documents": False
})
# Reorder original results by relevance
results = reranked.json()["results"]
reordered = [search_results[r["index"]] for r in results]
return reorderedDocument Similarity Ranking
async def find_similar_documents(reference_doc: str, candidates: list[str]):
"""Find documents most similar to a reference document."""
reranked = await client.post("/v1/rerank", json={
"model": "bge-reranker-v2-m3",
"query": reference_doc, # Use doc as query
"documents": candidates,
"top_n": 10
})
return [
{
"text": r["document"]["text"],
"similarity": r["relevance_score"]
}
for r in reranked.json()["results"]
]Performance
Typical Latencies:
| Request Type | Latency (p50) | Latency (p95) |
|---|---|---|
| 10 documents | 100-200ms | 300-500ms |
| 50 documents | 200-400ms | 500-800ms |
| 100 documents | 300-600ms | 800-1200ms |
Optimization Tips:
- Limit document count - Pre-filter with embeddings first
- Use top_n - Only retrieve results you need
- Skip document text - Set
return_documents: falseif you have originals - Use routing -
mode: "speed"for lowest latency - Batch strategically - Rerank in chunks of 50-100 documents
See Also
- Embeddings - Generate vector embeddings for initial retrieval
- Chat Completions - Use reranked context with LLMs
- Models - List available models