Melious
Inference

Reranking

Rerank documents by semantic relevance using Cohere-compatible API

Overview

Rerank documents by semantic relevance to a query using state-of-the-art reranking models. Perfect for improving search results, RAG (Retrieval-Augmented Generation) context selection, and document similarity ranking.

Key Features:

  • Cohere-compatible API for easy migration
  • Multi-provider routing for best price/performance
  • Environment impact tracking (CO2, energy, water)
  • Supports string or object document formats
  • Automatic failover and retry logic

Reranking models score documents by semantic similarity to a query, returning them sorted by relevance. Use reranking after initial retrieval (e.g., from embeddings search) to improve precision.


Authentication

Required: API Key

All requests must include your Melious API key in the Authorization header:

Authorization: Bearer {your_api_key}

Permissions: inference.rerank scope required.


Endpoints

Create Rerank

POST /v1/rerank

Rerank documents by semantic relevance to a query.

Request Body:

{
  "model": "bge-reranker-v2-m3",
  "query": "What is machine learning?",
  "documents": [
    "Machine learning is a subset of AI",
    "The weather is nice today",
    "Neural networks process data"
  ],
  "top_n": 2,
  "return_documents": true
}

Request Fields:

FieldTypeRequiredDescription
modelstringYesModel ID (e.g., bge-reranker-v2-m3)
querystringYesThe query to rank documents against
documentsstring[] | object[]YesDocuments to rerank (strings or {text: "..."} objects)
top_nintegerNoReturn only top N results (default: all documents)
return_documentsbooleanNoInclude document text in response (default: true)
max_chunks_per_docintegerNoMax chunks for long documents
userstringNoEnd-user identifier for abuse monitoring
Melious Extensions
modestringNoRouting mode: "balanced", "speed", "price", "quality", "environment"
custom_weightsobjectNoCustom routing weights (mutually exclusive with mode)
filtersobjectNoHard constraints for provider selection

Document Formats:

// String format (simple)
{
  "documents": ["doc 1", "doc 2", "doc 3"]
}

// Object format (for additional metadata)
{
  "documents": [
    {"text": "doc 1"},
    {"text": "doc 2"},
    {"text": "doc 3"}
  ]
}

Response (200 OK):

{
  "id": "rerank-abc123",
  "results": [
    {
      "index": 0,
      "relevance_score": 0.95,
      "document": {"text": "Machine learning is a subset of AI"}
    },
    {
      "index": 2,
      "relevance_score": 0.82,
      "document": {"text": "Neural networks process data"}
    }
  ],
  "meta": {
    "api_version": {"version": "1"}
  },
  "usage": {
    "prompt_tokens": 45,
    "total_tokens": 45
  },
  "environment_impact": {
    "energy_kwh": 0.0001,
    "carbon_g_co2": 0.04,
    "water_liters": 0.0001,
    "renewable_percent": 95,
    "pue": 1.15,
    "provider_id": "berget",
    "location": "SE"
  }
}

Response Fields:

FieldTypeDescription
idstringUnique request identifier
resultsarrayReranked documents sorted by relevance
results[].indexintegerOriginal index in input documents array
results[].relevance_scorefloatRelevance score (0-1, higher is more relevant)
results[].documentobjectDocument text (if return_documents: true)
metaobjectAPI metadata
usageobjectToken usage statistics
usage.prompt_tokensintegerInput tokens processed
usage.total_tokensintegerTotal tokens (same as prompt_tokens for reranking)
environment_impactobjectEnvironmental metrics (Melious extension)

Status Codes:

CodeDescription
200Success
400Bad request - invalid parameters
401Unauthorized - missing/invalid API key
403Forbidden - insufficient permissions or energy
429Rate limit exceeded
500Internal server error

Code Examples

import httpx
import asyncio

async def rerank_documents():
    """Rerank documents by relevance to a query."""
    async with httpx.AsyncClient() as client:
        response = await client.post(
            "https://api.melious.ai/v1/rerank",
            headers={"Authorization": "Bearer your_api_key"},
            json={
                "model": "bge-reranker-v2-m3",
                "query": "What is machine learning?",
                "documents": [
                    "Machine learning is a subset of AI",
                    "The weather is nice today",
                    "Neural networks process data"
                ],
                "top_n": 2
            }
        )
        data = response.json()

        print("Reranked results:")
        for result in data["results"]:
            print(f"  [{result['index']}] Score: {result['relevance_score']:.3f}")
            print(f"      {result['document']['text'][:50]}...")

        print(f"\nCO2 emissions: {data['environment_impact']['carbon_g_co2']:.2f}g")
        return data["results"]

# Example usage
asyncio.run(rerank_documents())
// Rerank documents by relevance to a query
const rerankDocuments = async () => {
  const response = await fetch(
    'https://api.melious.ai/v1/rerank',
    {
      method: 'POST',
      headers: {
        'Authorization': 'Bearer your_api_key',
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        model: 'bge-reranker-v2-m3',
        query: 'What is machine learning?',
        documents: [
          'Machine learning is a subset of AI',
          'The weather is nice today',
          'Neural networks process data'
        ],
        top_n: 2
      })
    }
  );

  const data = await response.json();

  console.log('Reranked results:');
  for (const result of data.results) {
    console.log(`  [${result.index}] Score: ${result.relevance_score.toFixed(3)}`);
    console.log(`      ${result.document.text.slice(0, 50)}...`);
  }

  console.log(`\nCO2 emissions: ${data.environment_impact.carbon_g_co2.toFixed(2)}g`);
  return data.results;
};

// Example usage
rerankDocuments();
# Rerank documents
curl -X POST "https://api.melious.ai/v1/rerank" \
  -H "Authorization: Bearer your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bge-reranker-v2-m3",
    "query": "What is machine learning?",
    "documents": [
      "Machine learning is a subset of AI",
      "The weather is nice today",
      "Neural networks process data"
    ],
    "top_n": 2,
    "return_documents": true
  }'

# With routing optimization
curl -X POST "https://api.melious.ai/v1/rerank" \
  -H "Authorization: Bearer your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bge-reranker-v2-m3",
    "query": "What is deep learning?",
    "documents": ["doc1", "doc2", "doc3"],
    "mode": "environment",
    "filters": {
      "countries": ["SE", "NL", "DE"]
    }
  }'

Error Handling

Handle errors gracefully by checking status codes and error messages. Implement exponential backoff for transient errors (5xx, 429).

Common Errors:

Error CodeDescriptionSolution
AUTH_INVALID_API_KEYInvalid API keyVerify API key is correct and active
VALIDATION_REQUIRED_FIELDMissing required fieldEnsure model, query, and documents are provided
VALIDATION_INVALID_VALUEInvalid parameterCheck request body matches documentation
INFERENCE_PROVIDER_ERRORProvider request failedRetry with exponential backoff or change routing mode
BILLING_INSUFFICIENT_ENERGYNot enough energyTop up balance or upgrade plan

Error Response Format:

{
  "status": "error",
  "code": "VALIDATION_REQUIRED_FIELD",
  "message": "Query field is required",
  "details": {
    "field": "query"
  }
}

Best Practices

Use Two-Stage Retrieval

Combine embeddings search with reranking for optimal results:

# Stage 1: Fast retrieval with embeddings (get top 100)
candidates = await vector_search(query, top_k=100)

# Stage 2: Precise reranking (select top 10)
reranked = await client.post("/v1/rerank", json={
    "model": "bge-reranker-v2-m3",
    "query": query,
    "documents": [c["text"] for c in candidates],
    "top_n": 10
})

Limit Documents with top_n

Only return the documents you need to reduce response size and cost:

# Only get top 5 most relevant
response = await client.post("/v1/rerank", json={
    "model": "bge-reranker-v2-m3",
    "query": "your query",
    "documents": documents,
    "top_n": 5
})

Skip Document Text When Not Needed

If you only need indices and scores, disable document return:

response = await client.post("/v1/rerank", json={
    "model": "bge-reranker-v2-m3",
    "query": "your query",
    "documents": documents,
    "return_documents": False  # Smaller response
})

# Results contain index and score only
for result in response["results"]:
    original_doc = documents[result["index"]]

Choose Appropriate Models

ModelBrandBest For
bge-reranker-v2-m3BAAIMultilingual, general purpose

Use Cases

RAG Context Selection

async def rag_with_reranking(query: str, knowledge_base: list[str]):
    """Improve RAG quality with reranking."""
    # Step 1: Get candidate documents (embeddings or keyword search)
    candidates = await search_knowledge_base(query, top_k=50)

    # Step 2: Rerank for precise relevance
    reranked = await client.post("/v1/rerank", json={
        "model": "bge-reranker-v2-m3",
        "query": query,
        "documents": candidates,
        "top_n": 5
    })

    # Step 3: Use top results as context
    context = "\n\n".join([
        r["document"]["text"]
        for r in reranked.json()["results"]
    ])

    # Step 4: Generate response with context
    response = await client.post("/v1/chat/completions", json={
        "model": "gpt-oss-120b",
        "messages": [
            {"role": "system", "content": f"Context:\n{context}"},
            {"role": "user", "content": query}
        ]
    })

    return response.json()["choices"][0]["message"]["content"]

Search Result Reordering

async def reorder_search_results(query: str, search_results: list[dict]):
    """Reorder search results by semantic relevance."""
    texts = [r["title"] + " " + r["snippet"] for r in search_results]

    reranked = await client.post("/v1/rerank", json={
        "model": "bge-reranker-v2-m3",
        "query": query,
        "documents": texts,
        "return_documents": False
    })

    # Reorder original results by relevance
    results = reranked.json()["results"]
    reordered = [search_results[r["index"]] for r in results]

    return reordered

Document Similarity Ranking

async def find_similar_documents(reference_doc: str, candidates: list[str]):
    """Find documents most similar to a reference document."""
    reranked = await client.post("/v1/rerank", json={
        "model": "bge-reranker-v2-m3",
        "query": reference_doc,  # Use doc as query
        "documents": candidates,
        "top_n": 10
    })

    return [
        {
            "text": r["document"]["text"],
            "similarity": r["relevance_score"]
        }
        for r in reranked.json()["results"]
    ]

Performance

Typical Latencies:

Request TypeLatency (p50)Latency (p95)
10 documents100-200ms300-500ms
50 documents200-400ms500-800ms
100 documents300-600ms800-1200ms

Optimization Tips:

  1. Limit document count - Pre-filter with embeddings first
  2. Use top_n - Only retrieve results you need
  3. Skip document text - Set return_documents: false if you have originals
  4. Use routing - mode: "speed" for lowest latency
  5. Batch strategically - Rerank in chunks of 50-100 documents

See Also

On this page