Reranking

Overview

Rerank documents by semantic relevance to a query using state-of-the-art reranking models. Perfect for improving search results, RAG (Retrieval-Augmented Generation) context selection, and document similarity ranking.

Key Features:

Cohere-compatible API for easy migration
Multi-provider routing for best price/performance
Environment impact tracking (CO2, energy, water)
Supports string or object document formats
Automatic failover and retry logic

Reranking models score documents by semantic similarity to a query, returning them sorted by relevance. Use reranking after initial retrieval (e.g., from embeddings search) to improve precision.

Authentication

Required: API Key

All requests must include your Melious API key in the Authorization header:

Authorization: Bearer {your_api_key}

Permissions: inference.rerank scope required.

Endpoints

Create Rerank

POST /v1/rerank

Rerank documents by semantic relevance to a query.

Request Body:

{
  "model": "bge-reranker-v2-m3",
  "query": "What is machine learning?",
  "documents": [
    "Machine learning is a subset of AI",
    "The weather is nice today",
    "Neural networks process data"
  ],
  "top_n": 2,
  "return_documents": true
}

Request Fields:

Field	Type	Required	Description
`model`	`string`	Yes	Model ID (e.g., `bge-reranker-v2-m3`)
`query`	`string`	Yes	The query to rank documents against
`documents`	`string[]` \| `object[]`	Yes	Documents to rerank (strings or `{text: "..."}` objects)
`top_n`	`integer`	No	Return only top N results (default: all documents)
`return_documents`	`boolean`	No	Include document text in response (default: `true`)
`max_chunks_per_doc`	`integer`	No	Max chunks for long documents
`user`	`string`	No	End-user identifier for abuse monitoring
Melious Extensions
`mode`	`string`	No	Routing mode: `"balanced"`, `"speed"`, `"price"`, `"quality"`, `"environment"`
`custom_weights`	`object`	No	Custom routing weights (mutually exclusive with `mode`)
`filters`	`object`	No	Hard constraints for provider selection

Document Formats:

// String format (simple)
{
  "documents": ["doc 1", "doc 2", "doc 3"]
}

// Object format (for additional metadata)
{
  "documents": [
    {"text": "doc 1"},
    {"text": "doc 2"},
    {"text": "doc 3"}
  ]
}

Response (200 OK):

{
  "id": "rerank-abc123",
  "results": [
    {
      "index": 0,
      "relevance_score": 0.95,
      "document": {"text": "Machine learning is a subset of AI"}
    },
    {
      "index": 2,
      "relevance_score": 0.82,
      "document": {"text": "Neural networks process data"}
    }
  ],
  "meta": {
    "api_version": {"version": "1"}
  },
  "usage": {
    "prompt_tokens": 45,
    "total_tokens": 45
  },
  "environment_impact": {
    "energy_kwh": 0.0001,
    "carbon_g_co2": 0.04,
    "water_liters": 0.0001,
    "renewable_percent": 95,
    "pue": 1.15,
    "provider_id": "berget",
    "location": "SE"
  }
}

Response Fields:

Field	Type	Description
`id`	`string`	Unique request identifier
`results`	`array`	Reranked documents sorted by relevance
`results[].index`	`integer`	Original index in input documents array
`results[].relevance_score`	`float`	Relevance score (0-1, higher is more relevant)
`results[].document`	`object`	Document text (if `return_documents: true`)
`meta`	`object`	API metadata
`usage`	`object`	Token usage statistics
`usage.prompt_tokens`	`integer`	Input tokens processed
`usage.total_tokens`	`integer`	Total tokens (same as prompt_tokens for reranking)
`environment_impact`	`object`	Environmental metrics (Melious extension)

Status Codes:

Code	Description
`200`	Success
`400`	Bad request - invalid parameters
`401`	Unauthorized - missing/invalid API key
`403`	Forbidden - insufficient permissions or energy
`429`	Rate limit exceeded
`500`	Internal server error

Code Examples

import httpx
import asyncio

async def rerank_documents():
    """Rerank documents by relevance to a query."""
    async with httpx.AsyncClient() as client:
        response = await client.post(
            "https://api.melious.ai/v1/rerank",
            headers={"Authorization": "Bearer your_api_key"},
            json={
                "model": "bge-reranker-v2-m3",
                "query": "What is machine learning?",
                "documents": [
                    "Machine learning is a subset of AI",
                    "The weather is nice today",
                    "Neural networks process data"
                ],
                "top_n": 2
            }
        )
        data = response.json()

        print("Reranked results:")
        for result in data["results"]:
            print(f"  [{result['index']}] Score: {result['relevance_score']:.3f}")
            print(f"      {result['document']['text'][:50]}...")

        print(f"\nCO2 emissions: {data['environment_impact']['carbon_g_co2']:.2f}g")
        return data["results"]

# Example usage
asyncio.run(rerank_documents())

// Rerank documents by relevance to a query
const rerankDocuments = async () => {
  const response = await fetch(
    'https://api.melious.ai/v1/rerank',
    {
      method: 'POST',
      headers: {
        'Authorization': 'Bearer your_api_key',
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        model: 'bge-reranker-v2-m3',
        query: 'What is machine learning?',
        documents: [
          'Machine learning is a subset of AI',
          'The weather is nice today',
          'Neural networks process data'
        ],
        top_n: 2
      })
    }
  );

  const data = await response.json();

  console.log('Reranked results:');
  for (const result of data.results) {
    console.log(`  [${result.index}] Score: ${result.relevance_score.toFixed(3)}`);
    console.log(`      ${result.document.text.slice(0, 50)}...`);
  }

  console.log(`\nCO2 emissions: ${data.environment_impact.carbon_g_co2.toFixed(2)}g`);
  return data.results;
};

// Example usage
rerankDocuments();

# Rerank documents
curl -X POST "https://api.melious.ai/v1/rerank" \
  -H "Authorization: Bearer your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bge-reranker-v2-m3",
    "query": "What is machine learning?",
    "documents": [
      "Machine learning is a subset of AI",
      "The weather is nice today",
      "Neural networks process data"
    ],
    "top_n": 2,
    "return_documents": true
  }'

# With routing optimization
curl -X POST "https://api.melious.ai/v1/rerank" \
  -H "Authorization: Bearer your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bge-reranker-v2-m3",
    "query": "What is deep learning?",
    "documents": ["doc1", "doc2", "doc3"],
    "mode": "environment",
    "filters": {
      "countries": ["SE", "NL", "DE"]
    }
  }'

Error Handling

Handle errors gracefully by checking status codes and error messages. Implement exponential backoff for transient errors (5xx, 429).

Common Errors:

Error Code	Description	Solution
`AUTH_INVALID_API_KEY`	Invalid API key	Verify API key is correct and active
`VALIDATION_REQUIRED_FIELD`	Missing required field	Ensure `model`, `query`, and `documents` are provided
`VALIDATION_INVALID_VALUE`	Invalid parameter	Check request body matches documentation
`INFERENCE_PROVIDER_ERROR`	Provider request failed	Retry with exponential backoff or change routing mode
`BILLING_INSUFFICIENT_ENERGY`	Not enough energy	Top up balance or upgrade plan

Error Response Format:

{
  "status": "error",
  "code": "VALIDATION_REQUIRED_FIELD",
  "message": "Query field is required",
  "details": {
    "field": "query"
  }
}

Best Practices

Use Two-Stage Retrieval

Combine embeddings search with reranking for optimal results:

# Stage 1: Fast retrieval with embeddings (get top 100)
candidates = await vector_search(query, top_k=100)

# Stage 2: Precise reranking (select top 10)
reranked = await client.post("/v1/rerank", json={
    "model": "bge-reranker-v2-m3",
    "query": query,
    "documents": [c["text"] for c in candidates],
    "top_n": 10
})

Limit Documents with top_n

Only return the documents you need to reduce response size and cost:

# Only get top 5 most relevant
response = await client.post("/v1/rerank", json={
    "model": "bge-reranker-v2-m3",
    "query": "your query",
    "documents": documents,
    "top_n": 5
})

Skip Document Text When Not Needed

If you only need indices and scores, disable document return:

response = await client.post("/v1/rerank", json={
    "model": "bge-reranker-v2-m3",
    "query": "your query",
    "documents": documents,
    "return_documents": False  # Smaller response
})

# Results contain index and score only
for result in response["results"]:
    original_doc = documents[result["index"]]

Choose Appropriate Models

Model	Brand	Best For
`bge-reranker-v2-m3`	BAAI	Multilingual, general purpose

Use Cases

RAG Context Selection

async def rag_with_reranking(query: str, knowledge_base: list[str]):
    """Improve RAG quality with reranking."""
    # Step 1: Get candidate documents (embeddings or keyword search)
    candidates = await search_knowledge_base(query, top_k=50)

    # Step 2: Rerank for precise relevance
    reranked = await client.post("/v1/rerank", json={
        "model": "bge-reranker-v2-m3",
        "query": query,
        "documents": candidates,
        "top_n": 5
    })

    # Step 3: Use top results as context
    context = "\n\n".join([
        r["document"]["text"]
        for r in reranked.json()["results"]
    ])

    # Step 4: Generate response with context
    response = await client.post("/v1/chat/completions", json={
        "model": "gpt-oss-120b",
        "messages": [
            {"role": "system", "content": f"Context:\n{context}"},
            {"role": "user", "content": query}
        ]
    })

    return response.json()["choices"][0]["message"]["content"]

Search Result Reordering

async def reorder_search_results(query: str, search_results: list[dict]):
    """Reorder search results by semantic relevance."""
    texts = [r["title"] + " " + r["snippet"] for r in search_results]

    reranked = await client.post("/v1/rerank", json={
        "model": "bge-reranker-v2-m3",
        "query": query,
        "documents": texts,
        "return_documents": False
    })

    # Reorder original results by relevance
    results = reranked.json()["results"]
    reordered = [search_results[r["index"]] for r in results]

    return reordered

Document Similarity Ranking

async def find_similar_documents(reference_doc: str, candidates: list[str]):
    """Find documents most similar to a reference document."""
    reranked = await client.post("/v1/rerank", json={
        "model": "bge-reranker-v2-m3",
        "query": reference_doc,  # Use doc as query
        "documents": candidates,
        "top_n": 10
    })

    return [
        {
            "text": r["document"]["text"],
            "similarity": r["relevance_score"]
        }
        for r in reranked.json()["results"]
    ]

Performance

Typical Latencies:

Request Type	Latency (p50)	Latency (p95)
10 documents	100-200ms	300-500ms
50 documents	200-400ms	500-800ms
100 documents	300-600ms	800-1200ms

Optimization Tips:

Limit document count - Pre-filter with embeddings first
Use top_n - Only retrieve results you need
Skip document text - Set return_documents: false if you have originals
Use routing - mode: "speed" for lowest latency
Batch strategically - Rerank in chunks of 50-100 documents