Continuum Router¶
A high-performance, production-ready LLM API router that provides a single OpenAI-compatible interface for multiple LLM backends with intelligent routing, health monitoring, and enterprise-grade reliability features.
Key Features¶
-
OpenAI-Compatible API
Full support for chat completions, completions, models, and files endpoints
-
Hot Reload Configuration
Runtime configuration updates without restart - supports logging, backends, health checks, rate limiting, circuit breakers, and timeouts
-
Files API with File Resolution
Upload files and reference them in chat completions with automatic content injection
-
Multi-Backend Routing
Intelligent routing across OpenAI, Anthropic, Gemini, Ollama, vLLM, LocalAI, LM Studio
-
Advanced Load Balancing
Multiple strategies including Round-Robin, Weighted, Least-Latency, Consistent-Hash
-
Model Fallback
Automatic failover to fallback models with cross-provider support
-
High Performance
< 5ms routing overhead, 1000+ concurrent requests
-
API Key Authentication
Configurable authentication modes (permissive/blocking) for API endpoints
-
Enterprise Ready
Health checks, circuit breakers, advanced rate limiting, metrics, distributed tracing
Architecture Overview¶
Client Applications
|
+---------------v---------------+
| Continuum Router |
| +------------------------+ |
| | Load Balancer | |
| | Health Monitor | |
| | Circuit Breaker | |
| | Metrics & Tracing | |
| +------------------------+ |
+---------------+---------------+
|
+--------+----------+----------+---------+
| | | | |
v v v v v
+------+ +-------+ +--------+ +------+ +-------+
|OpenAI| |Anthro-| |Gemini | |Ollama| | vLLM |
| | |pic | | | | | | |
+------+ +-------+ +--------+ +------+ +-------+
Quick Start¶
1. Install¶
2. Configure¶
# Generate configuration
continuum-router --generate-config > config.yaml
# Edit for your backends
nano config.yaml
3. Run¶
# Start the router
continuum-router --config config.yaml
# Test it
curl http://localhost:8080/health
Use Cases¶
- Unified LLM Gateway - Single endpoint for multiple LLM providers
- Load Distribution - Distribute requests across multiple backend instances
- High Availability - Automatic failover and health monitoring
- Cost Optimization - Route to most cost-effective backends
- Development - Switch between local and cloud models seamlessly
Performance¶
| Metric | Value |
|---|---|
| Latency | < 5ms routing overhead |
| Throughput | 1500+ requests/second per instance |
| Memory | ~50MB base usage |
| Scalability | 50+ backends, 1000+ models |
See Performance Guide for benchmarks and tuning.
Getting Help¶
- Documentation: Browse this site for comprehensive guides
- Issues: GitHub Issues
- Discussions: GitHub Discussions
License¶
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
Made with care by the Lablup Backend.AI Team