Skip to content

Quick Start

This guide will help you get Continuum Router up and running in minutes.

Prerequisites

  • A running LLM backend (OpenAI API, Ollama, vLLM, LocalAI, LM Studio, etc.)
  • Network access to your backend endpoints

Installation

Download Binary

curl -L https://github.com/lablup/continuum-router/releases/latest/download/continuum-router-linux-x86_64.tar.gz | tar -xz
sudo mv continuum-router /usr/local/bin/
curl -L https://github.com/lablup/continuum-router/releases/latest/download/continuum-router-linux-arm64.tar.gz | tar -xz
sudo mv continuum-router /usr/local/bin/
curl -L https://github.com/lablup/continuum-router/releases/latest/download/continuum-router-macos-arm64.tar.gz | tar -xz
sudo mv continuum-router /usr/local/bin/
curl -L https://github.com/lablup/continuum-router/releases/latest/download/continuum-router-macos-x86_64.tar.gz | tar -xz
sudo mv continuum-router /usr/local/bin/

Build from Source

# Clone repository
git clone https://github.com/lablup/continuum-router.git
cd continuum-router

# Build and install
cargo build --release
sudo mv target/release/continuum-router /usr/local/bin/

Configuration

Generate Default Configuration

continuum-router --generate-config > config.yaml

Basic Configuration Example

backends:
    - url: http://localhost:11434
    name: ollama
    models: ["llama3.2", "qwen3"]

    - url: http://localhost:1234
    name: lm-studio
    models: ["gpt-4", "claude-3"]

selection_strategy: LeastLatency

health_checks:
  enabled: true
  interval: 30s

Configuration with Rate Limiting

backends:
    - url: http://localhost:11434
    name: ollama
    models: ["llama3.2", "qwen3"]

selection_strategy: LeastLatency

health_checks:
  enabled: true
  interval: 30s

rate_limiting:
  enabled: true
  storage: memory

  limits:
    per_client:
      requests_per_second: 10
      burst_capacity: 20
    per_backend:
      requests_per_second: 100
      burst_capacity: 200
    global:
      requests_per_second: 1000
      burst_capacity: 2000

  whitelist:
        - "192.168.1.0/24"
        - "10.0.0.1"

  bypass_keys:
        - "admin-key-123"

Running the Router

Start the Server

continuum-router --config config.yaml

Verify It's Running

# Check health endpoint
curl http://localhost:8080/health

# List available models
curl http://localhost:8080/v1/models

Using the API

Chat Completion

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Streaming Chat Completion

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2",
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": true
  }'

List Models

curl http://localhost:8080/v1/models

Next Steps

Troubleshooting

Common Issues

Router fails to start

Check that:

  1. The configuration file exists and is valid YAML
  2. Backend URLs are accessible
  3. No other service is using port 8080

No backends available

Verify that:

  1. Your backends are running and healthy
  2. The backend URLs in your configuration are correct
  3. Health checks can reach your backends

Connection refused errors

Ensure:

  1. Your firewall allows connections to the backends
  2. The backends are listening on the configured ports
  3. Network routes are properly configured

For more help, see the Error Handling guide or open an issue on GitHub.