Configuration Guide¶

This guide documents how to configure Continuum Router. The router supports multiple configuration methods with a clear priority system for different deployment scenarios.

Configuration sections:

Server & Backends — Server settings, backend providers, and connection options
Health & Caching — Health checks, request settings, retry, caching, and logging
Security & Admin — API keys, authentication, WebUI, admin endpoints, and ACP
Advanced — Global prompts, model metadata, hot reload, tracing, load balancing, rate limiting
Examples & Migration — Configuration examples, migration guide, and Rust Builder API

Configuration Methods¶

Continuum Router supports four configuration methods:

Configuration File (YAML) - Recommended for production
Environment Variables - Ideal for containerized deployments
Command Line Arguments - Useful for testing and overrides
Rust Builder API - Type-safe programmatic configuration for library usage

Configuration Discovery¶

The router automatically searches for configuration files in these locations (in order). At each location, the .yaml extension is tried first, then .toml:

Path specified by --config flag
./config.yaml or ./config.toml (current directory)
~/.config/continuum-router/config.yaml or config.toml (user config directory)
/etc/continuum-router/config.yaml or config.toml (system config directory)

Configuration Priority¶

Configuration is applied in the following priority order (highest to lowest):

Command-line arguments (highest priority)
Environment variables
Configuration file
Default values (lowest priority)

This allows you to: - Set base configuration in a file - Override specific settings via environment variables in containers - Make temporary adjustments using command-line arguments

Configuration File Format¶

Complete Configuration Example¶

# Continuum Router Configuration
# Generate the full annotated sample with: continuum-router --generate-config

# Server configuration
server:
  # bind_address accepts a single string or an array of addresses
  # TCP format: "host:port", Unix socket format: "unix:/path/to/socket"
  bind_address: "0.0.0.0:8080"          # Single address (backward compatible)
  # bind_address:                        # Or multiple addresses:
  #   - "0.0.0.0:8080"                   #   TCP on all interfaces
  #   - "unix:/var/run/router.sock"      #   Unix socket (Linux/macOS, Windows 10 1809+)
  # socket_mode: 0o660                   # Optional: Unix socket file permissions
  workers: 4                             # Number of worker threads (0 = auto-detect)
  connection_pool_size: 100              # Max idle connections per backend

# Model metadata configuration (optional)
model_metadata_file: "model-metadata.yaml"  # Path to external model metadata file

# Load balancing strategy: RoundRobin (default), WeightedRoundRobin,
# LeastLatency, Random, ConsistentHash, PrefixAwareHash
selection_strategy: RoundRobin

# Backend configuration
backends:
  # Native OpenAI API with built-in configuration
  - name: "openai"
    type: openai                         # Use native OpenAI backend
    api_key: "${CONTINUUM_OPENAI_API_KEY}"  # Loaded from environment
    org_id: "${CONTINUUM_OPENAI_ORG_ID}"    # Optional organization ID
    weight: 3
    models:                              # Specify which models to use
      - gpt-4o
      - gpt-4o-mini
      - o3-mini
      - text-embedding-3-large
    retry_override:                      # Backend-specific retry settings (optional)
      max_attempts: 5
      initial_delay: "200ms"
      max_delay: "30s"
      backoff_multiplier: 2.0
      jitter: true

  # Generic OpenAI-compatible backend with custom metadata
  - name: "openai-compatible"
    url: "https://custom-llm.example.com"
    weight: 1
    models:
      - "gpt-4"
      - "gpt-3.5-turbo"
    model_configs:                       # Enhanced model configuration with metadata
      - id: "gpt-4"
        aliases:                         # Alternative IDs that share this metadata (optional)
          - "gpt-4-0125-preview"
          - "gpt-4-turbo-preview"
        metadata:
          display_name: "GPT-4"
          summary: "Most capable GPT-4 model for complex tasks"
          capabilities: ["text", "image", "function_calling"]
          knowledge_cutoff: "2024-04"
          pricing:
            input_tokens: 0.03
            output_tokens: 0.06
          limits:
            context_window: 128000
            max_output: 4096

  # Ollama local server with automatic URL detection
  - name: "local-ollama"
    type: ollama                         # Defaults to http://localhost:11434
    weight: 2
    models:
      - "llama2"
      - "mistral"
      - "codellama"

  # vLLM server
  - name: "vllm-server"
    type: vllm
    url: "http://localhost:8000"
    weight: 1
    # Models will be discovered automatically if not specified
    # Models with namespace prefixes (e.g., "custom/gpt-4") will automatically
    # match metadata for base names (e.g., "gpt-4")

  # Google Gemini API (native backend)
  - name: "gemini"
    type: gemini                           # Use native Gemini backend
    api_key: "${CONTINUUM_GEMINI_API_KEY}" # Loaded from environment
    weight: 2
    models:
      - gemini-3.1-pro-preview
      - gemini-3-flash-preview
      - gemini-2.5-pro
      - gemini-2.5-flash

# Health monitoring configuration
health_checks:
  interval: "30s"                        # How often to check backend health
  timeout: "5s"                          # Timeout for health check requests
  unhealthy_threshold: 3                 # Failures before marking unhealthy
  healthy_threshold: 2                   # Successes before marking healthy
  endpoint: "/health"                    # Endpoint used for health checks
  warmup_check_interval: "1s"            # Accelerated interval while a backend warms up (HTTP 503)
  max_warmup_duration: "300s"            # Max time in accelerated warmup mode

# Request handling and timeout configuration
timeouts:
  connection: "10s"                      # TCP connection establishment timeout
  request:
    standard:                            # Non-streaming requests
      first_byte: "30s"                  # Time to receive first byte
      total: "180s"                      # Total request timeout (3 minutes)
    streaming:                           # Streaming (SSE) requests
      first_byte: "60s"                  # Time to first SSE chunk
      chunk_interval: "30s"              # Max time between chunks
      total: "600s"                      # Total streaming timeout (10 minutes)
    image_generation:                    # Image generation requests (DALL-E, etc.)
      first_byte: "60s"                  # Time to receive first byte
      total: "180s"                      # Total timeout (3 minutes default)
    model_overrides:                     # Model-specific timeout overrides
      gpt-5-latest:
        streaming:
          total: "1200s"                 # 20 minutes for GPT-5
      gpt-4o:
        streaming:
          total: "900s"                  # 15 minutes for GPT-4o
  health_check:
    timeout: "5s"                        # Health check timeout
    interval: "30s"                      # Health check interval

# Global retry and resilience configuration
retry:
  max_attempts: 3                        # Maximum retry attempts
  initial_delay: "100ms"                 # Initial delay between retries
  max_delay: "10s"                       # Maximum delay between retries
  backoff_multiplier: 2.0                # Exponential backoff multiplier
  jitter: true                           # Add random jitter to delays
  retryable_status_codes: [429, 502, 503, 504]

# Logging configuration
logging:
  level: "info"                         # Log level: trace, debug, info, warn, error
  format: "json"                        # Log format: json, pretty

# Files API configuration
files:
  enabled: true                         # Enable/disable Files API endpoints
  max_file_size: 536870912              # Maximum file size in bytes (default: 512MB)
  storage_path: "./data/files"          # Storage path for uploaded files (supports ~)
  retention_days: 0                     # File retention in days (0 = keep forever)
  metadata_storage: persistent          # Metadata backend: "memory" or "persistent" (default)
  cleanup_orphans_on_startup: false     # Auto-cleanup orphaned files on startup

  # Authentication and authorization
  auth:
    method: api_key                     # "none" or "api_key" (default)
    required_scope: files               # API key scope required for access
    enforce_ownership: true             # Users can only access their own files
    admin_can_access_all: true          # Admin scope grants access to all files

# Distributed tracing configuration
tracing:
  enabled: true                         # Enable/disable distributed tracing
  w3c_trace_context: true               # Support W3C Trace Context (traceparent header)
  headers:
    trace_id: "X-Trace-ID"              # Header name for trace ID
    request_id: "X-Request-ID"          # Header name for request ID
    correlation_id: "X-Correlation-ID"  # Header name for correlation ID

# Circuit breaker configuration
circuit_breaker:
  enabled: true                         # Enable circuit breaker
  failure_threshold: 5                  # Consecutive failures to open the circuit
  failure_rate_threshold: 0.5           # Failure rate (0.0-1.0) to open the circuit
  minimum_requests: 10                  # Minimum requests before rate evaluation
  timeout: "60s"                        # Time before attempting recovery (half-open)
  half_open_max_requests: 3             # Trial requests allowed in half-open state
  half_open_success_threshold: 2        # Successes required to close the circuit

# Rate limiting configuration
rate_limiting:
  enabled: true                         # Enable rate limiting
  storage: memory                       # "memory" or "redis"
  limits:
    per_client:
      requests_per_second: 10
      burst_capacity: 20
    global:
      requests_per_second: 1000
      burst_capacity: 2000

# Admin API configuration
admin:
  auth:
    method: bearer                         # Auth method: none, bearer, basic, ip_whitelist, api_key
    bearer_token: "${ADMIN_TOKEN}"         # Admin authentication token
  stats:
    enabled: true                          # Enable/disable stats collection
    retention_window: 24h                  # Ring-buffer retention for windowed queries
    token_tracking: true                   # Parse response bodies for token usage
    persistence:
      enabled: true                        # Enable stats persistence across restarts
      path: ./data/stats.json              # File path for the snapshot
      snapshot_interval: 5m                # How often to write periodic snapshots
      max_age: 7d                          # Discard snapshots older than this on startup

# Metrics and monitoring configuration
metrics:
  enabled: true                         # Enable Prometheus metrics collection
  path: "/metrics"                      # Metrics endpoint path

Minimal Configuration¶

# Minimal configuration - other settings will use defaults
server:
  bind_address: "0.0.0.0:8080"

backends:
  - name: "ollama"
    url: "http://localhost:11434"
  - name: "lm-studio"
    url: "http://localhost:1234"

Environment Variables¶

Two mechanisms bring environment variables into the configuration:

Direct overrides: a small set of CONTINUUM_* variables that override their config-file counterparts at startup.
${VAR} substitution: any string value in the YAML/TOML file may reference an environment variable (e.g. api_key: "${MY_PROVIDER_KEY}"); the reference is resolved at load time.

Direct Overrides¶

Variable	Type	Description
`CONTINUUM_BIND_ADDRESS`	string	Overrides `server.bind_address`
`CONTINUUM_BACKEND_URLS`	string	Comma-separated backend URLs; overrides the `backends` list
`CONTINUUM_LOG_LEVEL`	string	Overrides `logging.level` (trace, debug, info, warn, error)
`RUST_LOG`	string	Rust-specific logging filter configuration

Native Backend API Keys¶

Native backends fall back to these variables when no api_key is set in the config:

Variable	Used by
`CONTINUUM_OPENAI_API_KEY`	`type: openai` backends
`CONTINUUM_OPENAI_ORG_ID`	`type: openai` backends (optional organization ID)
`CONTINUUM_ANTHROPIC_API_KEY`	`type: anthropic` backends
`CONTINUUM_GEMINI_API_KEY`	`type: gemini` backends

API Key Management¶

Variable	Type	Default	Description
`CONTINUUM_API_KEY`	string	-	Single API key for simple deployments
`CONTINUUM_API_KEY_SCOPES`	string	`"read,write"`	Comma-separated scopes for the API key
`CONTINUUM_API_KEY_USER_ID`	string	`"admin"`	User ID associated with the API key
`CONTINUUM_API_KEY_ORG_ID`	string	`"default"`	Organization ID associated with the API key
`CONTINUUM_DEV_MODE`	boolean	`false`	Enable development API keys (DO NOT use in production)

Example Environment Configuration¶

# Direct overrides
export CONTINUUM_BIND_ADDRESS="0.0.0.0:9000"
export CONTINUUM_BACKEND_URLS="http://localhost:11434,http://localhost:1234"
export CONTINUUM_LOG_LEVEL="debug"

# Referenced from config.yaml via ${...} substitution
export CONTINUUM_OPENAI_API_KEY="sk-..."
export ADMIN_TOKEN="my-admin-token"

# Start the router
continuum-router

All other settings (health checks, timeouts, retry, caching, Files API, and so on) are configured through the configuration file.

Command Line Arguments¶

Command-line arguments provide the highest priority configuration method and are useful for testing and temporary overrides.

Core Options¶

continuum-router --help

Argument	Type	Description
`--mode <MODE>`	enum	Server mode: `http` (default) or `stdio` (ACP JSON-RPC 2.0 transport)
`-c, --config <FILE>`	path	Configuration file path
`--generate-config`	flag	Generate sample YAML config and exit
`--generate-example-config`	flag	Generate example configuration documentation (env vars, routing, validation rules, precedence) and exit
`--generate-toml-config`	flag	Generate TOML-format sample config and exit
`--model-metadata <FILE>`	path	Path to model metadata YAML file (overrides config)

Backend Configuration¶

Argument	Type	Description
`--backends <URLs>`	string	Comma-separated backend URLs
`--backend-url <URL>`	string	Single backend URL (deprecated)

Server Configuration¶

Argument	Type	Description
`--bind <ADDRESS>`	string	Server bind address
`--connection-pool-size <SIZE>`	integer	HTTP connection pool size

Load Balancing¶

Argument	Type	Description
`--selection-strategy <STRATEGY>`	string	Load balancing strategy: `RoundRobin` (default), `WeightedRoundRobin`, `LeastLatency`, `Random`, `ConsistentHash`

Health Check Configuration¶

Argument	Type	Description
`--disable-health-checks`	flag	Disable health monitoring
`--health-check-interval <SECONDS>`	integer	Health check interval
`--health-check-timeout <SECONDS>`	integer	Health check timeout
`--unhealthy-threshold <COUNT>`	integer	Failures before unhealthy
`--healthy-threshold <COUNT>`	integer	Successes before healthy

Configuration Utilities¶

Argument	Type	Description
`--migrate-config-file <FILE>`	path	Migrate and fix configuration file issues without starting the router; creates a backup and reports all changes (YAML and TOML)
`--dry-run`	flag	With `--migrate-config-file`, preview migration changes without applying them

Container Health Check¶

Argument	Type	Description
`--health-check`	flag	Check whether a running server is healthy and exit (0 = healthy, 1 = unhealthy); intended for Docker `HEALTHCHECK`
`--health-check-url <URL>`	string	Health endpoint to probe (default: `http://localhost:8080/health`)

Subcommands¶

Command	Description
`auth login --backend <NAME>`	Run the OAuth device authorization flow for the named backend (its `auth.type` must be `oauth`) and persist tokens to `auth.oauth.token_store`. See Server & Backends for the OAuth backend configuration.

Example CLI Usage¶

# Use config file with overrides
continuum-router --config config.yaml --bind "0.0.0.0:9000"

# Override backends temporarily
continuum-router --config config.yaml --backends "http://localhost:11434"

# Use custom model metadata file
continuum-router --config config.yaml --model-metadata /path/to/custom-metadata.yaml

# Use model metadata with tilde expansion
continuum-router --model-metadata ~/configs/model-metadata.yaml

# Adjust health check settings for testing
continuum-router --config config.yaml --health-check-interval 10

# Generate sample configuration
continuum-router --generate-config > my-config.yaml

# Migrate an outdated config file (preview first, then apply)
continuum-router --migrate-config-file config.yaml --dry-run
continuum-router --migrate-config-file config.yaml

# Docker HEALTHCHECK probe
continuum-router --health-check --health-check-url http://localhost:8080/health

# OAuth device-flow login for a ChatGPT/Codex backend
continuum-router --config config.yaml auth login --backend chatgpt