Skip to content

Model Fallback System

The router implements a configurable model fallback system that automatically routes requests to alternative models when the primary model is unavailable.

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│              Layered Fallback Architecture                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│   Circuit Breaker (Infrastructure Layer)                        │
│   ┌─────────────────────────────────────────────────────────┐   │
│   │ • Monitors backend health                                │   │
│   │ • Tracks failure count/rate                              │   │
│   │ • Opens circuit on threshold breach                      │   │
│   │ • Provides FallbackStrategy enum                         │   │
│   └─────────────────────────────────────────────────────────┘   │
│                        │                                         │
│                triggers when                                     │
│              circuit opens or                                    │
│             request fails                                        │
│                        ▼                                         │
│   Fallback Service (Policy Layer)                               │
│   ┌─────────────────────────────────────────────────────────┐   │
│   │ • Defines model fallback chains                          │   │
│   │ • gpt-4o → gpt-4-turbo → gpt-3.5-turbo                  │   │
│   │ • Cross-provider fallback support                        │   │
│   │ • Parameter translation between providers                │   │
│   └─────────────────────────────────────────────────────────┘   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Core Components

Location: src/core/fallback/

// Fallback chain management
pub struct FallbackChain {
    primary_model: String,
    fallback_models: Vec<String>,
    is_cross_provider: bool,
}

// Fallback policy configuration
pub struct FallbackPolicy {
    trigger_conditions: TriggerConditions,
    max_attempts: u32,
    timeout_multiplier: f64,
    preserve_parameters: bool,
}

// Trigger reasons
pub enum TriggerReason {
    ErrorCode(u16),
    Timeout,
    ConnectionError,
    CircuitBreakerOpen,
    ModelNotFound,
}

// Fallback service orchestrator
pub struct FallbackService {
    chains: Arc<RwLock<HashMap<String, FallbackChain>>>,
    policy: FallbackPolicy,
    translator: ParameterTranslator,
    metrics: FallbackMetrics,
}

Parameter Translation

The ParameterTranslator handles cross-provider parameter mapping:

pub struct ParameterTranslator {
    mappings: HashMap<(Provider, Provider), ProviderMapping>,
}

pub enum Provider {
    OpenAI,
    Anthropic,
    Google,
    Mistral,
    Meta,
    Unknown,
}

impl ParameterTranslator {
    pub fn translate(
        &self,
        request: &ChatRequest,
        from: Provider,
        to: Provider,
    ) -> Result<ChatRequest, FallbackError> {
        // Apply parameter mappings
        // Remove provider-specific parameters
        // Add required parameters for target provider
    }
}

Fallback Flow

sequenceDiagram
    participant Client
    participant Router
    participant FallbackService
    participant CircuitBreaker
    participant Backend1
    participant Backend2

    Client->>Router: Request (model: gpt-4o)
    Router->>CircuitBreaker: Check circuit state
    CircuitBreaker-->>Router: Open (backend1 unavailable)
    Router->>FallbackService: Get fallback chain
    FallbackService-->>Router: [gpt-4-turbo, gpt-3.5-turbo]
    Router->>FallbackService: Translate parameters
    FallbackService-->>Router: Translated request
    Router->>Backend2: Forward to gpt-4-turbo
    Backend2-->>Router: Response
    Router->>Router: Add X-Fallback-* headers
    Router-->>Client: Response with fallback info

Configuration

fallback:
  enabled: true
  max_attempts: 3
  preserve_parameters: true

  chains:
    # OpenAI model fallback chain
        - primary: gpt-4o
      fallbacks:
        - gpt-4-turbo
        - gpt-3.5-turbo

    # Cross-provider fallback
        - primary: claude-3-opus
      fallbacks:
        - gpt-4o
        - gemini-pro
      cross_provider: true

  # Trigger conditions
  triggers:
        - error_codes: [500, 502, 503, 504]
        - timeout: true
        - circuit_breaker_open: true

Metrics

The fallback system exposes Prometheus metrics:

# Fallback attempt counter
fallback_attempts_total{original_model="gpt-4o", fallback_model="gpt-4-turbo", reason="circuit_breaker_open"} 15

# Successful fallback counter
fallback_success_total{original_model="gpt-4o", fallback_model="gpt-4-turbo"} 14

# Exhausted fallback chains
fallback_exhausted_total{original_model="gpt-4o"} 1

# Cross-provider fallbacks
fallback_cross_provider_total{from_provider="openai", to_provider="anthropic"} 5

# Fallback duration histogram
fallback_duration_seconds_bucket{le="0.1"} 10
fallback_duration_seconds_bucket{le="0.5"} 45
fallback_duration_seconds_bucket{le="1.0"} 50

Response Headers

When a fallback occurs, the response includes informational headers:

X-Fallback-Used: true
X-Original-Model: gpt-4o
X-Fallback-Model: gpt-4-turbo
X-Fallback-Reason: circuit_breaker_open