Skip to content

Configuration Guide

This guide documents how to configure Continuum Router. The router supports multiple configuration methods with a clear priority system for different deployment scenarios.

Configuration sections:

  • Server & Backends — Server settings, backend providers, and connection options
  • Health & Caching — Health checks, request settings, retry, caching, and logging
  • Security & Admin — API keys, authentication, WebUI, admin endpoints, and ACP
  • Advanced — Global prompts, model metadata, hot reload, tracing, load balancing, rate limiting
  • Examples & Migration — Configuration examples, migration guide, and Rust Builder API

Configuration Methods

Continuum Router supports four configuration methods:

  1. Configuration File (YAML) - Recommended for production
  2. Environment Variables - Ideal for containerized deployments
  3. Command Line Arguments - Useful for testing and overrides
  4. Rust Builder API - Type-safe programmatic configuration for library usage

Configuration Discovery

The router automatically searches for configuration files in these locations (in order). At each location, the .yaml extension is tried first, then .toml:

  1. Path specified by --config flag
  2. ./config.yaml or ./config.toml (current directory)
  3. ~/.config/continuum-router/config.yaml or config.toml (user config directory)
  4. /etc/continuum-router/config.yaml or config.toml (system config directory)

Configuration Priority

Configuration is applied in the following priority order (highest to lowest):

  1. Command-line arguments (highest priority)
  2. Environment variables
  3. Configuration file
  4. Default values (lowest priority)

This allows you to: - Set base configuration in a file - Override specific settings via environment variables in containers - Make temporary adjustments using command-line arguments

Configuration File Format

Complete Configuration Example

# Continuum Router Configuration
# Generate the full annotated sample with: continuum-router --generate-config

# Server configuration
server:
  # bind_address accepts a single string or an array of addresses
  # TCP format: "host:port", Unix socket format: "unix:/path/to/socket"
  bind_address: "0.0.0.0:8080"          # Single address (backward compatible)
  # bind_address:                        # Or multiple addresses:
  #   - "0.0.0.0:8080"                   #   TCP on all interfaces
  #   - "unix:/var/run/router.sock"      #   Unix socket (Linux/macOS, Windows 10 1809+)
  # socket_mode: 0o660                   # Optional: Unix socket file permissions
  workers: 4                             # Number of worker threads (0 = auto-detect)
  connection_pool_size: 100              # Max idle connections per backend

# Model metadata configuration (optional)
model_metadata_file: "model-metadata.yaml"  # Path to external model metadata file

# Load balancing strategy: RoundRobin (default), WeightedRoundRobin,
# LeastLatency, Random, ConsistentHash, PrefixAwareHash
selection_strategy: RoundRobin

# Backend configuration
backends:
  # Native OpenAI API with built-in configuration
  - name: "openai"
    type: openai                         # Use native OpenAI backend
    api_key: "${CONTINUUM_OPENAI_API_KEY}"  # Loaded from environment
    org_id: "${CONTINUUM_OPENAI_ORG_ID}"    # Optional organization ID
    weight: 3
    models:                              # Specify which models to use
      - gpt-4o
      - gpt-4o-mini
      - o3-mini
      - text-embedding-3-large
    retry_override:                      # Backend-specific retry settings (optional)
      max_attempts: 5
      initial_delay: "200ms"
      max_delay: "30s"
      backoff_multiplier: 2.0
      jitter: true

  # Generic OpenAI-compatible backend with custom metadata
  - name: "openai-compatible"
    url: "https://custom-llm.example.com"
    weight: 1
    models:
      - "gpt-4"
      - "gpt-3.5-turbo"
    model_configs:                       # Enhanced model configuration with metadata
      - id: "gpt-4"
        aliases:                         # Alternative IDs that share this metadata (optional)
          - "gpt-4-0125-preview"
          - "gpt-4-turbo-preview"
        metadata:
          display_name: "GPT-4"
          summary: "Most capable GPT-4 model for complex tasks"
          capabilities: ["text", "image", "function_calling"]
          knowledge_cutoff: "2024-04"
          pricing:
            input_tokens: 0.03
            output_tokens: 0.06
          limits:
            context_window: 128000
            max_output: 4096

  # Ollama local server with automatic URL detection
  - name: "local-ollama"
    type: ollama                         # Defaults to http://localhost:11434
    weight: 2
    models:
      - "llama2"
      - "mistral"
      - "codellama"

  # vLLM server
  - name: "vllm-server"
    type: vllm
    url: "http://localhost:8000"
    weight: 1
    # Models will be discovered automatically if not specified
    # Models with namespace prefixes (e.g., "custom/gpt-4") will automatically
    # match metadata for base names (e.g., "gpt-4")

  # Google Gemini API (native backend)
  - name: "gemini"
    type: gemini                           # Use native Gemini backend
    api_key: "${CONTINUUM_GEMINI_API_KEY}" # Loaded from environment
    weight: 2
    models:
      - gemini-3.1-pro-preview
      - gemini-3-flash-preview
      - gemini-2.5-pro
      - gemini-2.5-flash

# Health monitoring configuration
health_checks:
  interval: "30s"                        # How often to check backend health
  timeout: "5s"                          # Timeout for health check requests
  unhealthy_threshold: 3                 # Failures before marking unhealthy
  healthy_threshold: 2                   # Successes before marking healthy
  endpoint: "/health"                    # Endpoint used for health checks
  warmup_check_interval: "1s"            # Accelerated interval while a backend warms up (HTTP 503)
  max_warmup_duration: "300s"            # Max time in accelerated warmup mode

# Request handling and timeout configuration
timeouts:
  connection: "10s"                      # TCP connection establishment timeout
  request:
    standard:                            # Non-streaming requests
      first_byte: "30s"                  # Time to receive first byte
      total: "180s"                      # Total request timeout (3 minutes)
    streaming:                           # Streaming (SSE) requests
      first_byte: "60s"                  # Time to first SSE chunk
      chunk_interval: "30s"              # Max time between chunks
      total: "600s"                      # Total streaming timeout (10 minutes)
    image_generation:                    # Image generation requests (DALL-E, etc.)
      first_byte: "60s"                  # Time to receive first byte
      total: "180s"                      # Total timeout (3 minutes default)
    model_overrides:                     # Model-specific timeout overrides
      gpt-5-latest:
        streaming:
          total: "1200s"                 # 20 minutes for GPT-5
      gpt-4o:
        streaming:
          total: "900s"                  # 15 minutes for GPT-4o
  health_check:
    timeout: "5s"                        # Health check timeout
    interval: "30s"                      # Health check interval

# Global retry and resilience configuration
retry:
  max_attempts: 3                        # Maximum retry attempts
  initial_delay: "100ms"                 # Initial delay between retries
  max_delay: "10s"                       # Maximum delay between retries
  backoff_multiplier: 2.0                # Exponential backoff multiplier
  jitter: true                           # Add random jitter to delays
  retryable_status_codes: [429, 502, 503, 504]

# Logging configuration
logging:
  level: "info"                         # Log level: trace, debug, info, warn, error
  format: "json"                        # Log format: json, pretty

# Files API configuration
files:
  enabled: true                         # Enable/disable Files API endpoints
  max_file_size: 536870912              # Maximum file size in bytes (default: 512MB)
  storage_path: "./data/files"          # Storage path for uploaded files (supports ~)
  retention_days: 0                     # File retention in days (0 = keep forever)
  metadata_storage: persistent          # Metadata backend: "memory" or "persistent" (default)
  cleanup_orphans_on_startup: false     # Auto-cleanup orphaned files on startup

  # Authentication and authorization
  auth:
    method: api_key                     # "none" or "api_key" (default)
    required_scope: files               # API key scope required for access
    enforce_ownership: true             # Users can only access their own files
    admin_can_access_all: true          # Admin scope grants access to all files

# Distributed tracing configuration
tracing:
  enabled: true                         # Enable/disable distributed tracing
  w3c_trace_context: true               # Support W3C Trace Context (traceparent header)
  headers:
    trace_id: "X-Trace-ID"              # Header name for trace ID
    request_id: "X-Request-ID"          # Header name for request ID
    correlation_id: "X-Correlation-ID"  # Header name for correlation ID

# Circuit breaker configuration
circuit_breaker:
  enabled: true                         # Enable circuit breaker
  failure_threshold: 5                  # Consecutive failures to open the circuit
  failure_rate_threshold: 0.5           # Failure rate (0.0-1.0) to open the circuit
  minimum_requests: 10                  # Minimum requests before rate evaluation
  timeout: "60s"                        # Time before attempting recovery (half-open)
  half_open_max_requests: 3             # Trial requests allowed in half-open state
  half_open_success_threshold: 2        # Successes required to close the circuit

# Rate limiting configuration
rate_limiting:
  enabled: true                         # Enable rate limiting
  storage: memory                       # "memory" or "redis"
  limits:
    per_client:
      requests_per_second: 10
      burst_capacity: 20
    global:
      requests_per_second: 1000
      burst_capacity: 2000

# Admin API configuration
admin:
  auth:
    method: bearer                         # Auth method: none, bearer, basic, ip_whitelist, api_key
    bearer_token: "${ADMIN_TOKEN}"         # Admin authentication token
  stats:
    enabled: true                          # Enable/disable stats collection
    retention_window: 24h                  # Ring-buffer retention for windowed queries
    token_tracking: true                   # Parse response bodies for token usage
    persistence:
      enabled: true                        # Enable stats persistence across restarts
      path: ./data/stats.json              # File path for the snapshot
      snapshot_interval: 5m                # How often to write periodic snapshots
      max_age: 7d                          # Discard snapshots older than this on startup

# Metrics and monitoring configuration
metrics:
  enabled: true                         # Enable Prometheus metrics collection
  path: "/metrics"                      # Metrics endpoint path

Minimal Configuration

# Minimal configuration - other settings will use defaults
server:
  bind_address: "0.0.0.0:8080"

backends:
  - name: "ollama"
    url: "http://localhost:11434"
  - name: "lm-studio"
    url: "http://localhost:1234"

Environment Variables

Two mechanisms bring environment variables into the configuration:

  1. Direct overrides: a small set of CONTINUUM_* variables that override their config-file counterparts at startup.
  2. ${VAR} substitution: any string value in the YAML/TOML file may reference an environment variable (e.g. api_key: "${MY_PROVIDER_KEY}"); the reference is resolved at load time.

Direct Overrides

Variable Type Description
CONTINUUM_BIND_ADDRESS string Overrides server.bind_address
CONTINUUM_BACKEND_URLS string Comma-separated backend URLs; overrides the backends list
CONTINUUM_LOG_LEVEL string Overrides logging.level (trace, debug, info, warn, error)
RUST_LOG string Rust-specific logging filter configuration

Native Backend API Keys

Native backends fall back to these variables when no api_key is set in the config:

Variable Used by
CONTINUUM_OPENAI_API_KEY type: openai backends
CONTINUUM_OPENAI_ORG_ID type: openai backends (optional organization ID)
CONTINUUM_ANTHROPIC_API_KEY type: anthropic backends
CONTINUUM_GEMINI_API_KEY type: gemini backends

API Key Management

Variable Type Default Description
CONTINUUM_API_KEY string - Single API key for simple deployments
CONTINUUM_API_KEY_SCOPES string "read,write" Comma-separated scopes for the API key
CONTINUUM_API_KEY_USER_ID string "admin" User ID associated with the API key
CONTINUUM_API_KEY_ORG_ID string "default" Organization ID associated with the API key
CONTINUUM_DEV_MODE boolean false Enable development API keys (DO NOT use in production)

Example Environment Configuration

# Direct overrides
export CONTINUUM_BIND_ADDRESS="0.0.0.0:9000"
export CONTINUUM_BACKEND_URLS="http://localhost:11434,http://localhost:1234"
export CONTINUUM_LOG_LEVEL="debug"

# Referenced from config.yaml via ${...} substitution
export CONTINUUM_OPENAI_API_KEY="sk-..."
export ADMIN_TOKEN="my-admin-token"

# Start the router
continuum-router

All other settings (health checks, timeouts, retry, caching, Files API, and so on) are configured through the configuration file.

Command Line Arguments

Command-line arguments provide the highest priority configuration method and are useful for testing and temporary overrides.

Core Options

continuum-router --help
Argument Type Description
--mode <MODE> enum Server mode: http (default) or stdio (ACP JSON-RPC 2.0 transport)
-c, --config <FILE> path Configuration file path
--generate-config flag Generate sample YAML config and exit
--generate-example-config flag Generate example configuration documentation (env vars, routing, validation rules, precedence) and exit
--generate-toml-config flag Generate TOML-format sample config and exit
--model-metadata <FILE> path Path to model metadata YAML file (overrides config)

Backend Configuration

Argument Type Description
--backends <URLs> string Comma-separated backend URLs
--backend-url <URL> string Single backend URL (deprecated)

Server Configuration

Argument Type Description
--bind <ADDRESS> string Server bind address
--connection-pool-size <SIZE> integer HTTP connection pool size

Load Balancing

Argument Type Description
--selection-strategy <STRATEGY> string Load balancing strategy: RoundRobin (default), WeightedRoundRobin, LeastLatency, Random, ConsistentHash

Health Check Configuration

Argument Type Description
--disable-health-checks flag Disable health monitoring
--health-check-interval <SECONDS> integer Health check interval
--health-check-timeout <SECONDS> integer Health check timeout
--unhealthy-threshold <COUNT> integer Failures before unhealthy
--healthy-threshold <COUNT> integer Successes before healthy

Configuration Utilities

Argument Type Description
--migrate-config-file <FILE> path Migrate and fix configuration file issues without starting the router; creates a backup and reports all changes (YAML and TOML)
--dry-run flag With --migrate-config-file, preview migration changes without applying them

Container Health Check

Argument Type Description
--health-check flag Check whether a running server is healthy and exit (0 = healthy, 1 = unhealthy); intended for Docker HEALTHCHECK
--health-check-url <URL> string Health endpoint to probe (default: http://localhost:8080/health)

Subcommands

Command Description
auth login --backend <NAME> Run the OAuth device authorization flow for the named backend (its auth.type must be oauth) and persist tokens to auth.oauth.token_store. See Server & Backends for the OAuth backend configuration.

Example CLI Usage

# Use config file with overrides
continuum-router --config config.yaml --bind "0.0.0.0:9000"

# Override backends temporarily
continuum-router --config config.yaml --backends "http://localhost:11434"

# Use custom model metadata file
continuum-router --config config.yaml --model-metadata /path/to/custom-metadata.yaml

# Use model metadata with tilde expansion
continuum-router --model-metadata ~/configs/model-metadata.yaml

# Adjust health check settings for testing
continuum-router --config config.yaml --health-check-interval 10

# Generate sample configuration
continuum-router --generate-config > my-config.yaml

# Migrate an outdated config file (preview first, then apply)
continuum-router --migrate-config-file config.yaml --dry-run
continuum-router --migrate-config-file config.yaml

# Docker HEALTHCHECK probe
continuum-router --health-check --health-check-url http://localhost:8080/health

# OAuth device-flow login for a ChatGPT/Codex backend
continuum-router --config config.yaml auth login --backend chatgpt