Continuum Router¶

A high-performance, production-ready LLM API router that provides a single OpenAI-compatible interface for multiple LLM backends with intelligent routing, health monitoring, and enterprise-grade reliability features.

Key Features¶

OpenAI-Compatible API

Full support for chat completions, responses, embeddings, reranking, sparse embeddings, image generation, files, and models endpoints
Hot Reload Configuration

Runtime configuration updates without restart - supports logging, backends, health checks, rate limiting, circuit breakers, and timeouts
Files API with File Resolution

Upload files and reference them in chat completions with automatic content injection
Multi-Backend Routing

Intelligent routing across OpenAI, Anthropic, Gemini, Ollama, vLLM, LocalAI, LM Studio, llama.cpp
Anthropic Native API

Direct Anthropic Messages API support with prompt caching, Claude Code compatibility, and tiered token counting
Advanced Load Balancing

Multiple strategies including Round-Robin, Weighted, Least-Latency, Consistent-Hash
Model Fallback

Automatic failover to fallback models with cross-provider support
High Performance

< 5ms routing overhead, 1000+ concurrent requests
API Key Authentication

Configurable authentication modes (permissive/blocking) for API endpoints
Enterprise Ready

Health checks, circuit breakers, advanced rate limiting, metrics, distributed tracing
CORS Support

Configurable cross-origin resource sharing for embedding in Tauri, Electron, or web frontends
Unix Socket Support

Bind to Unix domain sockets alongside TCP for secure local communication and container deployments
Reasoning Effort Control

Unified reasoning_effort parameter across providers with automatic format normalization (low/medium/high/xhigh)

Architecture Overview¶

router overview diagram

Quick Start¶

1. Install¶

Linux (x86_64)Linux (aarch64)macOS (Apple Silicon)From Source

curl -L https://github.com/lablup/continuum-router/releases/latest/download/continuum-router-linux-x86_64.tar.gz | tar -xz
sudo mv continuum-router /usr/local/bin/

curl -L https://github.com/lablup/continuum-router/releases/latest/download/continuum-router-linux-aarch64.tar.gz | tar -xz
sudo mv continuum-router /usr/local/bin/

curl -LO https://github.com/lablup/continuum-router/releases/latest/download/continuum-router-macos-aarch64.zip
unzip continuum-router-macos-aarch64.zip
sudo mv continuum-router /usr/local/bin/

git clone https://github.com/lablup/continuum-router.git
cd continuum-router
cargo build --release
sudo mv target/release/continuum-router /usr/local/bin/

2. Configure¶

# Generate configuration
continuum-router --generate-config > config.yaml

# Edit for your backends
nano config.yaml

3. Run¶

# Start the router
continuum-router --config config.yaml

# Test it
curl http://localhost:8080/health

Use Cases¶

Unified LLM Gateway - Single endpoint for multiple LLM providers
Load Distribution - Distribute requests across multiple backend instances
High Availability - Automatic failover and health monitoring
Cost Optimization - Route to most cost-effective backends
Development - Switch between local and cloud models seamlessly

Performance¶

Metric	Value
Latency	< 5ms routing overhead
Throughput	1500+ requests/second per instance
Memory	~50MB base usage
Scalability	50+ backends, 1000+ models

See Performance Guide for benchmarks and tuning.

Getting Help¶

Documentation: Browse this site for comprehensive guides
Issues: GitHub Issues
Discussions: GitHub Discussions

License¶

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Made with care by the Lablup Backend.AI Team