Skip to content

Continuum Router

A high-performance, production-ready LLM API router that provides a single OpenAI-compatible interface for multiple LLM backends with intelligent routing, health monitoring, and enterprise-grade reliability features.

Key Features

  • OpenAI-Compatible API


    Full support for chat completions, responses, embeddings, reranking, sparse embeddings, image generation, files, and models endpoints

  • Hot Reload Configuration


    Runtime configuration updates without restart - supports logging, backends, health checks, rate limiting, circuit breakers, and timeouts

  • Files API with File Resolution


    Upload files and reference them in chat completions with automatic content injection

  • Multi-Backend Routing


    Intelligent routing across OpenAI, Anthropic, Gemini, Ollama, vLLM, LocalAI, LM Studio, llama.cpp, MLxcel, Continuum Router

  • Anthropic Native API


    Direct Anthropic Messages API support with prompt caching, Claude Code compatibility, and tiered token counting

  • Advanced Load Balancing


    Multiple strategies including Round-Robin, Weighted, Least-Latency, Consistent-Hash

  • Model Fallback


    Automatic failover to fallback models with cross-provider support

  • High Performance


    < 5ms routing overhead, 1000+ concurrent requests

  • API Key Authentication


    Configurable authentication modes (permissive/blocking) for API endpoints

  • Enterprise Ready


    Health checks, circuit breakers, advanced rate limiting, metrics, distributed tracing

  • CORS Support


    Configurable cross-origin resource sharing for embedding in Tauri, Electron, or web frontends

  • Unix Socket Support


    Bind to Unix domain sockets alongside TCP for secure local communication and container deployments

  • Reasoning Effort Control


    Unified reasoning_effort parameter across providers with automatic format normalization (low/medium/high/xhigh)

  • Embedded WebUI


    Browser-based administration interface compiled into the binary - manage backends, API keys, and configuration without CLI tools

  • Embeddable Library Crate


    Use as a Rust library with builder API - embed LLM routing in your Axum application or configure programmatically without YAML files

  • Agent Communication Protocol (ACP)


    JSON-RPC 2.0 stdio transport for IDE and tool integrations - session management, LLM inference pipeline, tool call reporting, and MCP-over-ACP bridge

  • Router-Managed Web Search


    Transparent web_search tool injection for self-hosted backends (vLLM, Ollama, llama.cpp), so agentic workflows run without per-client search wiring

Architecture Overview

                    Client Applications
                            |
            +---------------v---------------+
            |       Continuum Router        |
            |   +------------------------+  |
            |   |   Load Balancer        |  |
            |   |   Health Monitor       |  |
            |   |   Circuit Breaker      |  |
            |   |   Metrics & Tracing    |  |
            |   +------------------------+  |
            +---------------+---------------+
                            |
        +--------+----------+----------+---------+
        |        |          |          |         |
        v        v          v          v         v
    +------+ +-------+ +--------+ +------+ +-------+
    |OpenAI| |Anthro-| |Gemini  | |Ollama| | vLLM  |
    |      | |pic    | |        | |      | |       |
    +------+ +-------+ +--------+ +------+ +-------+
    +--------+ +---------+ +--------+ +----------+
    |llama.  | |LM       | |MLxcel  | |Continuum |
    |cpp     | |Studio   | |        | |Router    |
    +--------+ +---------+ +--------+ +----------+

Quick Start

1. Install

curl -L https://github.com/lablup/continuum-router/releases/latest/download/continuum-router-linux-x86_64.tar.gz | tar -xz
sudo mv continuum-router /usr/local/bin/
curl -L https://github.com/lablup/continuum-router/releases/latest/download/continuum-router-linux-aarch64.tar.gz | tar -xz
sudo mv continuum-router /usr/local/bin/
curl -LO https://github.com/lablup/continuum-router/releases/latest/download/continuum-router-macos-aarch64.zip
unzip continuum-router-macos-aarch64.zip
sudo mv continuum-router /usr/local/bin/
git clone https://github.com/lablup/continuum-router.git
cd continuum-router
cargo build --release
sudo mv target/release/continuum-router /usr/local/bin/

2. Configure

# Generate configuration
continuum-router --generate-config > config.yaml

# Edit for your backends
nano config.yaml

3. Run

# Start the router
continuum-router --config config.yaml

# Test it
curl http://localhost:8080/health

Use Cases

  • Unified LLM Gateway - Single endpoint for multiple LLM providers
  • Load Distribution - Distribute requests across multiple backend instances
  • High Availability - Automatic failover and health monitoring
  • Cost Optimization - Route to most cost-effective backends
  • Development - Switch between local and cloud models with a single endpoint

Performance

Metric Value
Latency < 5ms routing overhead
Throughput 1500+ requests/second per instance
Memory ~50MB base usage
Scalability 50+ backends, 1000+ models

See Performance Guide for benchmarks and tuning.

Getting Help

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.


Built by Lablup, Inc.