Skip to content

Continuum Router

A high-performance, production-ready LLM API router that provides a single OpenAI-compatible interface for multiple LLM backends with intelligent routing, health monitoring, and enterprise-grade reliability features.

Key Features

  • OpenAI-Compatible API


    Full support for chat completions, completions, models, and files endpoints

  • Hot Reload Configuration


    Runtime configuration updates without restart - supports logging, backends, health checks, rate limiting, circuit breakers, and timeouts

  • Files API with File Resolution


    Upload files and reference them in chat completions with automatic content injection

  • Multi-Backend Routing


    Intelligent routing across OpenAI, Anthropic, Gemini, Ollama, vLLM, LocalAI, LM Studio

  • Advanced Load Balancing


    Multiple strategies including Round-Robin, Weighted, Least-Latency, Consistent-Hash

  • Model Fallback


    Automatic failover to fallback models with cross-provider support

  • High Performance


    < 5ms routing overhead, 1000+ concurrent requests

  • API Key Authentication


    Configurable authentication modes (permissive/blocking) for API endpoints

  • Enterprise Ready


    Health checks, circuit breakers, advanced rate limiting, metrics, distributed tracing

Architecture Overview

                    Client Applications
                            |
            +---------------v---------------+
            |       Continuum Router        |
            |   +------------------------+  |
            |   |   Load Balancer        |  |
            |   |   Health Monitor       |  |
            |   |   Circuit Breaker      |  |
            |   |   Metrics & Tracing    |  |
            |   +------------------------+  |
            +---------------+---------------+
                            |
        +--------+----------+----------+---------+
        |        |          |          |         |
        v        v          v          v         v
    +------+ +-------+ +--------+ +------+ +-------+
    |OpenAI| |Anthro-| |Gemini  | |Ollama| | vLLM  |
    |      | |pic    | |        | |      | |       |
    +------+ +-------+ +--------+ +------+ +-------+

Quick Start

1. Install

curl -L https://github.com/lablup/continuum-router/releases/latest/download/continuum-router-linux-x86_64.tar.gz | tar -xz
sudo mv continuum-router /usr/local/bin/
curl -L https://github.com/lablup/continuum-router/releases/latest/download/continuum-router-linux-aarch64.tar.gz | tar -xz
sudo mv continuum-router /usr/local/bin/
curl -LO https://github.com/lablup/continuum-router/releases/latest/download/continuum-router-macos-aarch64.zip
unzip continuum-router-macos-aarch64.zip
sudo mv continuum-router /usr/local/bin/
git clone https://github.com/lablup/continuum-router.git
cd continuum-router
cargo build --release
sudo mv target/release/continuum-router /usr/local/bin/

2. Configure

# Generate configuration
continuum-router --generate-config > config.yaml

# Edit for your backends
nano config.yaml

3. Run

# Start the router
continuum-router --config config.yaml

# Test it
curl http://localhost:8080/health

Use Cases

  • Unified LLM Gateway - Single endpoint for multiple LLM providers
  • Load Distribution - Distribute requests across multiple backend instances
  • High Availability - Automatic failover and health monitoring
  • Cost Optimization - Route to most cost-effective backends
  • Development - Switch between local and cloud models seamlessly

Performance

Metric Value
Latency < 5ms routing overhead
Throughput 1500+ requests/second per instance
Memory ~50MB base usage
Scalability 50+ backends, 1000+ models

See Performance Guide for benchmarks and tuning.

Getting Help

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.


Made with care by the Lablup Backend.AI Team