Continuum Router¶
A high-performance, production-ready LLM API router that provides a single OpenAI-compatible interface for multiple LLM backends with intelligent routing, health monitoring, and enterprise-grade reliability features.
Key Features¶
-
OpenAI-Compatible API
Full support for chat completions, responses, embeddings, reranking, sparse embeddings, image generation, files, and models endpoints
-
Hot Reload Configuration
Runtime configuration updates without restart - supports logging, backends, health checks, rate limiting, circuit breakers, and timeouts
-
Files API with File Resolution
Upload files and reference them in chat completions with automatic content injection
-
Multi-Backend Routing
Intelligent routing across OpenAI, Anthropic, Gemini, Ollama, vLLM, LocalAI, LM Studio, llama.cpp
-
Anthropic Native API
Direct Anthropic Messages API support with prompt caching, Claude Code compatibility, and tiered token counting
-
Advanced Load Balancing
Multiple strategies including Round-Robin, Weighted, Least-Latency, Consistent-Hash
-
Model Fallback
Automatic failover to fallback models with cross-provider support
-
High Performance
< 5ms routing overhead, 1000+ concurrent requests
-
API Key Authentication
Configurable authentication modes (permissive/blocking) for API endpoints
-
Enterprise Ready
Health checks, circuit breakers, advanced rate limiting, metrics, distributed tracing
-
CORS Support
Configurable cross-origin resource sharing for embedding in Tauri, Electron, or web frontends
-
Unix Socket Support
Bind to Unix domain sockets alongside TCP for secure local communication and container deployments
-
Reasoning Effort Control
Unified reasoning_effort parameter across providers with automatic format normalization (low/medium/high/xhigh)
Architecture Overview¶
Quick Start¶
1. Install¶
2. Configure¶
# Generate configuration
continuum-router --generate-config > config.yaml
# Edit for your backends
nano config.yaml
3. Run¶
# Start the router
continuum-router --config config.yaml
# Test it
curl http://localhost:8080/health
Use Cases¶
- Unified LLM Gateway - Single endpoint for multiple LLM providers
- Load Distribution - Distribute requests across multiple backend instances
- High Availability - Automatic failover and health monitoring
- Cost Optimization - Route to most cost-effective backends
- Development - Switch between local and cloud models seamlessly
Performance¶
| Metric | Value |
|---|---|
| Latency | < 5ms routing overhead |
| Throughput | 1500+ requests/second per instance |
| Memory | ~50MB base usage |
| Scalability | 50+ backends, 1000+ models |
See Performance Guide for benchmarks and tuning.
Getting Help¶
- Documentation: Browse this site for comprehensive guides
- Issues: GitHub Issues
- Discussions: GitHub Discussions
License¶
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
Made with care by the Lablup Backend.AI Team