Model Fallback System¶
The router implements a configurable model fallback system that automatically routes requests to alternative models when the primary model is unavailable.
Architecture Overview¶
┌─────────────────────────────────────────────────────────────────┐
│ Layered Fallback Architecture │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Circuit Breaker (Infrastructure Layer) │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ • Monitors backend health │ │
│ │ • Tracks failure count/rate │ │
│ │ • Opens circuit on threshold breach │ │
│ │ • Provides FallbackStrategy enum │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ triggers when │
│ circuit opens or │
│ request fails │
│ ▼ │
│ Fallback Service (Policy Layer) │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ • Defines model fallback chains │ │
│ │ • gpt-4o → gpt-4-turbo → gpt-3.5-turbo │ │
│ │ • Cross-provider fallback support │ │
│ │ • Parameter translation between providers │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Core Components¶
Location: src/core/fallback/
// Fallback chain management
pub struct FallbackChain {
primary_model: String,
fallback_models: Vec<String>,
is_cross_provider: bool,
}
// Fallback policy configuration
pub struct FallbackPolicy {
trigger_conditions: TriggerConditions,
max_attempts: u32,
timeout_multiplier: f64,
preserve_parameters: bool,
}
// Trigger reasons
pub enum TriggerReason {
ErrorCode(u16),
Timeout,
ConnectionError,
CircuitBreakerOpen,
ModelNotFound,
}
// Fallback service orchestrator
pub struct FallbackService {
chains: Arc<RwLock<HashMap<String, FallbackChain>>>,
policy: FallbackPolicy,
translator: ParameterTranslator,
metrics: FallbackMetrics,
}
Parameter Translation¶
The ParameterTranslator handles cross-provider parameter mapping:
pub struct ParameterTranslator {
mappings: HashMap<(Provider, Provider), ProviderMapping>,
}
pub enum Provider {
OpenAI,
Anthropic,
Google,
Mistral,
Meta,
Unknown,
}
impl ParameterTranslator {
pub fn translate(
&self,
request: &ChatRequest,
from: Provider,
to: Provider,
) -> Result<ChatRequest, FallbackError> {
// Apply parameter mappings
// Remove provider-specific parameters
// Add required parameters for target provider
}
}
Fallback Flow¶
sequenceDiagram
participant Client
participant Router
participant FallbackService
participant CircuitBreaker
participant Backend1
participant Backend2
Client->>Router: Request (model: gpt-4o)
Router->>CircuitBreaker: Check circuit state
CircuitBreaker-->>Router: Open (backend1 unavailable)
Router->>FallbackService: Get fallback chain
FallbackService-->>Router: [gpt-4-turbo, gpt-3.5-turbo]
Router->>FallbackService: Translate parameters
FallbackService-->>Router: Translated request
Router->>Backend2: Forward to gpt-4-turbo
Backend2-->>Router: Response
Router->>Router: Add X-Fallback-* headers
Router-->>Client: Response with fallback info Configuration¶
fallback:
enabled: true
max_attempts: 3
preserve_parameters: true
chains:
# OpenAI model fallback chain
- primary: gpt-4o
fallbacks:
- gpt-4-turbo
- gpt-3.5-turbo
# Cross-provider fallback
- primary: claude-3-opus
fallbacks:
- gpt-4o
- gemini-pro
cross_provider: true
# Trigger conditions
triggers:
- error_codes: [500, 502, 503, 504]
- timeout: true
- circuit_breaker_open: true
Metrics¶
The fallback system exposes Prometheus metrics:
# Fallback attempt counter
fallback_attempts_total{original_model="gpt-4o", fallback_model="gpt-4-turbo", reason="circuit_breaker_open"} 15
# Successful fallback counter
fallback_success_total{original_model="gpt-4o", fallback_model="gpt-4-turbo"} 14
# Exhausted fallback chains
fallback_exhausted_total{original_model="gpt-4o"} 1
# Cross-provider fallbacks
fallback_cross_provider_total{from_provider="openai", to_provider="anthropic"} 5
# Fallback duration histogram
fallback_duration_seconds_bucket{le="0.1"} 10
fallback_duration_seconds_bucket{le="0.5"} 45
fallback_duration_seconds_bucket{le="1.0"} 50
Response Headers¶
When a fallback occurs, the response includes informational headers:
X-Fallback-Used: true
X-Original-Model: gpt-4o
X-Fallback-Model: gpt-4-turbo
X-Fallback-Reason: circuit_breaker_open
Related Documentation¶
- Circuit Breaker - Circuit breaker integration
- Rate Limiting - Request rate limiting
- Architecture Overview - Main architecture guide