Skip to content

Configuration Guide

This guide provides comprehensive documentation for configuring Continuum Router. The router supports multiple configuration methods with a clear priority system to provide maximum flexibility for different deployment scenarios.

Configuration sections:

  • Server & Backends — Server settings, backend providers, and connection options
  • Health & Caching — Health checks, request settings, retry, caching, and logging
  • Security & Admin — API keys, authentication, WebUI, admin endpoints, and ACP
  • Advanced — Global prompts, model metadata, hot reload, tracing, load balancing, rate limiting
  • Examples & Migration — Configuration examples, migration guide, and Rust Builder API

Configuration Methods

Continuum Router supports four configuration methods:

  1. Configuration File (YAML) - Recommended for production
  2. Environment Variables - Ideal for containerized deployments
  3. Command Line Arguments - Useful for testing and overrides
  4. Rust Builder API - Type-safe programmatic configuration for library usage

Configuration Discovery

The router automatically searches for configuration files in these locations (in order):

  1. Path specified by --config flag
  2. ./config.yaml (current directory)
  3. ./config.yml
  4. /etc/continuum-router/config.yaml
  5. /etc/continuum-router/config.yml
  6. ~/.config/continuum-router/config.yaml
  7. ~/.config/continuum-router/config.yml

Configuration Priority

Configuration is applied in the following priority order (highest to lowest):

  1. Command-line arguments (highest priority)
  2. Environment variables
  3. Configuration file
  4. Default values (lowest priority)

This allows you to: - Set base configuration in a file - Override specific settings via environment variables in containers - Make temporary adjustments using command-line arguments

Configuration File Format

Complete Configuration Example

# Continuum Router Configuration
# This example shows all available configuration options with their default values

# Server configuration
server:
  # bind_address accepts a single string or an array of addresses
  # TCP format: "host:port", Unix socket format: "unix:/path/to/socket"
  bind_address: "0.0.0.0:8080"          # Single address (backward compatible)
  # bind_address:                        # Or multiple addresses:
  #   - "0.0.0.0:8080"                   #   TCP on all interfaces
  #   - "unix:/var/run/router.sock"     #   Unix socket (Unix/Linux/macOS only)
  # socket_mode: 0o660                   # Optional: Unix socket file permissions
  workers: 4                             # Number of worker threads (0 = auto-detect)
  connection_pool_size: 100              # Max idle connections per backend

# Model metadata configuration (optional)
model_metadata_file: "model-metadata.yaml"  # Path to external model metadata file

# Backend configuration
backends:
  # Native OpenAI API with built-in configuration
    - name: "openai"
    type: openai                         # Use native OpenAI backend
    api_key: "${CONTINUUM_OPENAI_API_KEY}"  # Loaded from environment
    org_id: "${CONTINUUM_OPENAI_ORG_ID}"    # Optional organization ID
    weight: 3
    models:                              # Specify which models to use
      - gpt-4o
      - gpt-4o-mini
      - o3-mini
      - text-embedding-3-large
    retry_override:                      # Backend-specific retry settings (optional)
      max_attempts: 5
      base_delay: "200ms"
      max_delay: "30s"
      exponential_backoff: true
      jitter: true

  # Generic OpenAI-compatible backend with custom metadata
    - name: "openai-compatible"
    url: "https://custom-llm.example.com"
    weight: 1
    models:
      - "gpt-4"
      - "gpt-3.5-turbo"
    model_configs:                       # Enhanced model configuration with metadata
      - id: "gpt-4"
        aliases:                         # Alternative IDs that share this metadata (optional)
          - "gpt-4-0125-preview"
          - "gpt-4-turbo-preview"
        metadata:
          display_name: "GPT-4"
          summary: "Most capable GPT-4 model for complex tasks"
          capabilities: ["text", "image", "function_calling"]
          knowledge_cutoff: "2024-04"
          pricing:
            input_tokens: 0.03
            output_tokens: 0.06
          limits:
            context_window: 128000
            max_output: 4096

  # Ollama local server with automatic URL detection
    - name: "local-ollama"
    type: ollama                         # Defaults to http://localhost:11434
    weight: 2
    models:
      - "llama2"
      - "mistral"
      - "codellama"

  # vLLM server
    - name: "vllm-server"
    type: vllm
    url: "http://localhost:8000"
    weight: 1
    # Models will be discovered automatically if not specified
    # Models with namespace prefixes (e.g., "custom/gpt-4") will automatically
    # match metadata for base names (e.g., "gpt-4")

  # Google Gemini API (native backend)
    - name: "gemini"
    type: gemini                           # Use native Gemini backend
    api_key: "${CONTINUUM_GEMINI_API_KEY}" # Loaded from environment
    weight: 2
    models:
      - gemini-3.1-pro-preview
      - gemini-3-flash-preview
      - gemini-2.5-pro
      - gemini-2.5-flash

# Health monitoring configuration
health_checks:
  enabled: true                          # Enable/disable health checks
  interval: "30s"                        # How often to check backend health
  timeout: "10s"                         # Timeout for health check requests
  unhealthy_threshold: 3                 # Failures before marking unhealthy
  healthy_threshold: 2                   # Successes before marking healthy
  endpoint: "/v1/models"                 # Endpoint used for health checks

# Request handling and timeout configuration
timeouts:
  connection: "10s"                      # TCP connection establishment timeout
  request:
    standard:                            # Non-streaming requests
      first_byte: "30s"                  # Time to receive first byte
      total: "180s"                      # Total request timeout (3 minutes)
    streaming:                           # Streaming (SSE) requests
      first_byte: "60s"                  # Time to first SSE chunk
      chunk_interval: "30s"              # Max time between chunks
      total: "600s"                      # Total streaming timeout (10 minutes)
    image_generation:                    # Image generation requests (DALL-E, etc.)
      first_byte: "60s"                  # Time to receive first byte
      total: "180s"                      # Total timeout (3 minutes default)
    model_overrides:                     # Model-specific timeout overrides
      gpt-5-latest:
        streaming:
          total: "1200s"                 # 20 minutes for GPT-5
      gpt-4o:
        streaming:
          total: "900s"                  # 15 minutes for GPT-4o
  health_check:
    timeout: "5s"                        # Health check timeout
    interval: "30s"                      # Health check interval

request:
  max_retries: 3                         # Maximum retry attempts for requests
  retry_delay: "1s"                      # Initial delay between retries

# Global retry and resilience configuration
retry:
  max_attempts: 3                        # Maximum retry attempts
  base_delay: "100ms"                    # Base delay between retries
  max_delay: "30s"                       # Maximum delay between retries
  exponential_backoff: true              # Use exponential backoff
  jitter: true                          # Add random jitter to delays

# Caching and optimization configuration
cache:
  model_cache_ttl: "300s"               # Cache model lists for 5 minutes
  deduplication_ttl: "60s"              # Deduplicate requests for 1 minute
  enable_deduplication: true            # Enable request deduplication

# Logging configuration
logging:
  level: "info"                         # Log level: trace, debug, info, warn, error
  format: "json"                        # Log format: json, pretty
  enable_colors: false                  # Enable colored output (for pretty format)

# Files API configuration
files:
  enabled: true                         # Enable/disable Files API endpoints
  max_file_size: 536870912              # Maximum file size in bytes (default: 512MB)
  storage_path: "./data/files"          # Storage path for uploaded files (supports ~)
  retention_days: 0                     # File retention in days (0 = keep forever)
  metadata_storage: persistent          # Metadata backend: "memory" or "persistent" (default)
  cleanup_orphans_on_startup: false     # Auto-cleanup orphaned files on startup

  # Authentication and authorization
  auth:
    method: api_key                     # "none" or "api_key" (default)
    required_scope: files               # API key scope required for access
    enforce_ownership: true             # Users can only access their own files
    admin_can_access_all: true          # Admin scope grants access to all files

# Load balancing configuration
load_balancer:
  strategy: "round_robin"               # Strategy: round_robin, weighted, random
  health_aware: true                    # Only route to healthy backends

# Distributed tracing configuration
tracing:
  enabled: true                         # Enable/disable distributed tracing
  w3c_trace_context: true               # Support W3C Trace Context (traceparent header)
  headers:
    trace_id: "X-Trace-ID"              # Header name for trace ID
    request_id: "X-Request-ID"          # Header name for request ID
    correlation_id: "X-Correlation-ID"  # Header name for correlation ID

# Circuit breaker configuration (future feature)
circuit_breaker:
  enabled: false                        # Enable circuit breaker
  failure_threshold: 5                  # Failures to open circuit
  recovery_timeout: "60s"               # Time before attempting recovery
  half_open_retries: 3                  # Retries in half-open state

# Rate limiting configuration (future feature)
rate_limiting:
  enabled: false                        # Enable rate limiting
  requests_per_second: 100              # Global requests per second
  burst_size: 200                       # Burst capacity

# Admin API configuration
admin:
  auth:
    method: bearer_token                   # Auth method: none, bearer_token, basic, api_key
    token: "${ADMIN_TOKEN}"                # Admin authentication token
  stats:
    enabled: true                          # Enable/disable stats collection
    retention_window: 24h                  # Ring-buffer retention for windowed queries
    token_tracking: true                   # Parse response bodies for token usage
    persistence:
      enabled: true                        # Enable stats persistence across restarts
      path: ./data/stats.json              # File path for the snapshot
      snapshot_interval: 5m                # How often to write periodic snapshots
      max_age: 7d                          # Discard snapshots older than this on startup

# Metrics and monitoring configuration (future feature)
metrics:
  enabled: false                        # Enable metrics collection
  endpoint: "/metrics"                  # Metrics endpoint path
  include_labels: true                  # Include detailed labels

Minimal Configuration

# Minimal configuration - other settings will use defaults
server:
  bind_address: "0.0.0.0:8080"

backends:
    - name: "ollama"
    url: "http://localhost:11434"
    - name: "lm-studio"
    url: "http://localhost:1234"

Environment Variables

All configuration options can be overridden using environment variables with the CONTINUUM_ prefix:

Server Configuration

Variable Type Default Description
CONTINUUM_BIND_ADDRESS string "0.0.0.0:8080" Server bind address
CONTINUUM_WORKERS integer 4 Number of worker threads
CONTINUUM_CONNECTION_POOL_SIZE integer 100 HTTP connection pool size

Backend Configuration

Variable Type Default Description
CONTINUUM_BACKEND_URLS string Comma-separated backend URLs
CONTINUUM_BACKEND_WEIGHTS string Comma-separated weights (must match URLs)

Health Check Configuration

Variable Type Default Description
CONTINUUM_HEALTH_CHECKS_ENABLED boolean true Enable health checks
CONTINUUM_HEALTH_CHECK_INTERVAL string "30s" Health check interval
CONTINUUM_HEALTH_CHECK_TIMEOUT string "10s" Health check timeout
CONTINUUM_UNHEALTHY_THRESHOLD integer 3 Failures before unhealthy
CONTINUUM_HEALTHY_THRESHOLD integer 2 Successes before healthy

Request Configuration

Variable Type Default Description
CONTINUUM_REQUEST_TIMEOUT string "300s" Maximum request timeout
CONTINUUM_MAX_RETRIES integer 3 Maximum retry attempts
CONTINUUM_RETRY_DELAY string "1s" Initial retry delay

Logging Configuration

Variable Type Default Description
CONTINUUM_LOG_LEVEL string "info" Log level
CONTINUUM_LOG_FORMAT string "json" Log format
CONTINUUM_LOG_COLORS boolean false Enable colored output
RUST_LOG string Rust-specific logging configuration

Cache Configuration

Variable Type Default Description
CONTINUUM_MODEL_CACHE_TTL string "300s" Model cache TTL
CONTINUUM_DEDUPLICATION_TTL string "60s" Deduplication TTL
CONTINUUM_ENABLE_DEDUPLICATION boolean true Enable deduplication

Files API Configuration

Variable Type Default Description
CONTINUUM_FILES_ENABLED boolean true Enable/disable Files API
CONTINUUM_FILES_MAX_SIZE integer 536870912 Maximum file size in bytes (512MB)
CONTINUUM_FILES_STORAGE_PATH string "./data/files" Storage path for uploaded files
CONTINUUM_FILES_RETENTION_DAYS integer 0 File retention in days (0 = forever)
CONTINUUM_FILES_METADATA_STORAGE string "persistent" Metadata backend: "memory" or "persistent"
CONTINUUM_FILES_CLEANUP_ORPHANS boolean false Auto-cleanup orphaned files on startup
CONTINUUM_FILES_AUTH_METHOD string "api_key" Authentication method: "none" or "api_key"
CONTINUUM_FILES_AUTH_SCOPE string "files" Required API key scope for Files API access
CONTINUUM_FILES_ENFORCE_OWNERSHIP boolean true Users can only access their own files
CONTINUUM_FILES_ADMIN_ACCESS_ALL boolean true Admin scope grants access to all files
CONTINUUM_DEV_MODE boolean false Enable development API keys (DO NOT use in production)

API Key Management Configuration

Variable Type Default Description
CONTINUUM_API_KEY string - Single API key for simple deployments
CONTINUUM_API_KEY_SCOPES string "read,write" Comma-separated scopes for the API key
CONTINUUM_API_KEY_USER_ID string "admin" User ID associated with the API key
CONTINUUM_API_KEY_ORG_ID string "default" Organization ID associated with the API key
CONTINUUM_DEV_MODE boolean false Enable development API keys (DO NOT use in production)

Example Environment Configuration

# Basic configuration
export CONTINUUM_BIND_ADDRESS="0.0.0.0:9000"
export CONTINUUM_BACKEND_URLS="http://localhost:11434,http://localhost:1234"
export CONTINUUM_LOG_LEVEL="debug"

# Advanced configuration
export CONTINUUM_CONNECTION_POOL_SIZE="200"
export CONTINUUM_HEALTH_CHECK_INTERVAL="60s"
export CONTINUUM_MODEL_CACHE_TTL="600s"
export CONTINUUM_ENABLE_DEDUPLICATION="true"

# Start the router
continuum-router

Command Line Arguments

Command-line arguments provide the highest priority configuration method and are useful for testing and temporary overrides.

Core Options

continuum-router --help
Argument Type Description
--mode <MODE> enum Server mode: http (default) or stdio (ACP JSON-RPC 2.0 transport)
-c, --config <FILE> path Configuration file path
--generate-config flag Generate sample config and exit
--model-metadata <FILE> path Path to model metadata YAML file (overrides config)

Backend Configuration

Argument Type Description
--backends <URLs> string Comma-separated backend URLs
--backend-url <URL> string Single backend URL (deprecated)

Server Configuration

Argument Type Description
--bind <ADDRESS> string Server bind address
--connection-pool-size <SIZE> integer HTTP connection pool size

Health Check Configuration

Argument Type Description
--disable-health-checks flag Disable health monitoring
--health-check-interval <SECONDS> integer Health check interval
--health-check-timeout <SECONDS> integer Health check timeout
--unhealthy-threshold <COUNT> integer Failures before unhealthy
--healthy-threshold <COUNT> integer Successes before healthy

Example CLI Usage

# Use config file with overrides
continuum-router --config config.yaml --bind "0.0.0.0:9000"

# Override backends temporarily
continuum-router --config config.yaml --backends "http://localhost:11434"

# Use custom model metadata file
continuum-router --config config.yaml --model-metadata /path/to/custom-metadata.yaml

# Use model metadata with tilde expansion
continuum-router --model-metadata ~/configs/model-metadata.yaml

# Adjust health check settings for testing
continuum-router --config config.yaml --health-check-interval 10

# Generate sample configuration
continuum-router --generate-config > my-config.yaml