Quick Start¶

This guide will help you get Continuum Router up and running in minutes.

Prerequisites¶

A running LLM backend (OpenAI API, Ollama, vLLM, LocalAI, LM Studio, etc.)
Network access to your backend endpoints

Installation¶

Download Binary¶

Linux (x86_64)Linux (ARM64)macOS (ARM64)macOS (x86_64)

curl -L https://github.com/lablup/continuum-router/releases/latest/download/continuum-router-linux-x86_64.tar.gz | tar -xz
sudo mv continuum-router /usr/local/bin/

curl -L https://github.com/lablup/continuum-router/releases/latest/download/continuum-router-linux-arm64.tar.gz | tar -xz
sudo mv continuum-router /usr/local/bin/

curl -L https://github.com/lablup/continuum-router/releases/latest/download/continuum-router-macos-arm64.tar.gz | tar -xz
sudo mv continuum-router /usr/local/bin/

curl -L https://github.com/lablup/continuum-router/releases/latest/download/continuum-router-macos-x86_64.tar.gz | tar -xz
sudo mv continuum-router /usr/local/bin/

Build from Source¶

# Clone repository
git clone https://github.com/lablup/continuum-router.git
cd continuum-router

# Build and install
cargo build --release
sudo mv target/release/continuum-router /usr/local/bin/

Configuration¶

Generate Default Configuration¶

continuum-router --generate-config > config.yaml

Basic Configuration Example¶

backends:
    - url: http://localhost:11434
    name: ollama
    models: ["llama3.2", "qwen3"]

    - url: http://localhost:1234
    name: lm-studio
    models: ["gpt-4", "claude-3"]

selection_strategy: LeastLatency

health_checks:
  enabled: true
  interval: 30s

Configuration with Rate Limiting¶

backends:
    - url: http://localhost:11434
    name: ollama
    models: ["llama3.2", "qwen3"]

selection_strategy: LeastLatency

health_checks:
  enabled: true
  interval: 30s

rate_limiting:
  enabled: true
  storage: memory

  limits:
    per_client:
      requests_per_second: 10
      burst_capacity: 20
    per_backend:
      requests_per_second: 100
      burst_capacity: 200
    global:
      requests_per_second: 1000
      burst_capacity: 2000

  whitelist:
        - "192.168.1.0/24"
        - "10.0.0.1"

  bypass_keys:
        - "admin-key-123"

Running the Router¶

Start the Server¶

continuum-router --config config.yaml

Verify It's Running¶

# Check health endpoint
curl http://localhost:8080/health

# List available models
curl http://localhost:8080/v1/models

Using the API¶

Chat Completion¶

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Streaming Chat Completion¶

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2",
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": true
  }'

List Models¶

curl http://localhost:8080/v1/models

Next Steps¶

Installation Guide - Detailed installation instructions for all platforms
Configuration Guide - Complete configuration reference
API Reference - Full API documentation
Load Balancing - Configure load balancing strategies
Deployment Guide - Production deployment with Docker, Kubernetes, systemd

Troubleshooting¶

Common Issues¶

Router fails to start¶

Check that:

The configuration file exists and is valid YAML
Backend URLs are accessible
No other service is using port 8080

No backends available¶

Verify that:

Your backends are running and healthy
The backend URLs in your configuration are correct
Health checks can reach your backends

Connection refused errors¶

Ensure:

Your firewall allows connections to the backends
The backends are listening on the configured ports
Network routes are properly configured

For more help, see the Error Handling guide or open an issue on GitHub.