Error Handling Guide¶
This guide covers error handling mechanisms, status codes, retry strategies, and troubleshooting for the Continuum Router.
Table of Contents¶
- Error Categories
- HTTP Status Codes
- Error Response Format
- Retry Strategies
- Circuit Breaker
- Error Recovery
- Custom Error Handling
- Debugging Errors
- Common Error Scenarios
Error Categories¶
Client Errors (4xx)¶
Errors caused by invalid client requests that should not be retried without modification.
| Category | Description | Examples |
|---|---|---|
| Validation Errors | Invalid request format or parameters | Missing required fields, invalid JSON |
| Authentication Errors | Failed authentication or authorization | Invalid API key, expired token |
| Resource Errors | Requested resource not found | Model not available, endpoint not found |
| Rate Limit Errors | Too many requests | Quota exceeded, rate limit hit |
Server Errors (5xx)¶
Errors on the server side that may be transient and retryable.
| Category | Description | Examples |
|---|---|---|
| Backend Errors | Backend service issues | Connection refused, backend timeout |
| Router Errors | Internal router issues | Configuration error, panic |
| Infrastructure Errors | Infrastructure failures | Database down, network partition |
| Capacity Errors | Resource exhaustion | Memory limit, connection pool full |
HTTP Status Codes¶
Success Codes (2xx)¶
| Status | Name | Description | When Used |
|---|---|---|---|
| 200 | OK | Request completed successfully | Normal response |
| 201 | Created | Resource created successfully | Resource creation |
| 202 | Accepted | Request accepted for processing | Async operations |
| 204 | No Content | Success with no response body | Deletions |
Client Error Codes (4xx)¶
| Status | Name | Description | When Used |
|---|---|---|---|
| 400 | Bad Request | Invalid request format | Malformed JSON, missing fields |
| 401 | Unauthorized | Authentication required | Missing or invalid auth |
| 403 | Forbidden | Access denied | Insufficient permissions |
| 404 | Not Found | Resource not found | Model/endpoint not available |
| 405 | Method Not Allowed | HTTP method not supported | Wrong HTTP verb |
| 408 | Request Timeout | Client request timeout | Slow client |
| 413 | Payload Too Large | Request body too large | Exceeds size limit |
| 422 | Unprocessable Entity | Validation failed | Business logic errors |
| 429 | Too Many Requests | Rate limit exceeded | Rate limiting |
Server Error Codes (5xx)¶
| Status | Name | Description | When Used |
|---|---|---|---|
| 500 | Internal Server Error | Unexpected server error | Unhandled exceptions |
| 501 | Not Implemented | Feature not implemented | Unsupported operations |
| 502 | Bad Gateway | Invalid backend response | Backend errors |
| 503 | Service Unavailable | Service temporarily down | All backends unhealthy |
| 504 | Gateway Timeout | Backend timeout | Backend too slow |
| 507 | Insufficient Storage | Storage full | Disk/memory full |
Error Response Format¶
Standard Error Response¶
{
"error": {
"code": 404,
"type": "model_not_found",
"message": "Model 'gpt-5' not found on any healthy backend",
"details": {
"requested_model": "gpt-5",
"available_models": ["gpt-4", "gpt-3.5-turbo", "llama2"],
"backends_checked": 3,
"healthy_backends": 2
},
"request_id": "req_12345",
"timestamp": "2024-01-15T10:30:45Z"
}
}
Validation Error Response¶
{
"error": {
"code": 400,
"type": "validation_error",
"message": "Invalid request parameters",
"details": {
"validation_errors": [
{
"field": "messages",
"error": "Required field missing"
},
{
"field": "temperature",
"error": "Must be between 0 and 2",
"value": 3.5
}
]
},
"request_id": "req_12346"
}
}
Rate Limit Error Response¶
{
"error": {
"code": 429,
"type": "rate_limit_exceeded",
"message": "API rate limit exceeded",
"details": {
"limit": 100,
"window": "1m",
"retry_after": 45,
"reset_at": "2024-01-15T10:31:30Z"
},
"headers": {
"X-RateLimit-Limit": "100",
"X-RateLimit-Remaining": "0",
"X-RateLimit-Reset": "1705316490",
"Retry-After": "45"
}
}
}
Backend Error Passthrough¶
When a backend returns a 4xx error, Continuum Router parses and forwards the original error details from the backend. This provides more actionable error information for debugging.
Supported Backend Formats: - OpenAI API ({"error": {"message": "...", "type": "...", "param": "...", "code": "..."}}) - Anthropic Claude API ({"error": {"message": "...", "type": "..."}}) - Google Gemini API ({"error": {"message": "...", "status": "...", "code": ...}})
Example Response (with backend error passthrough):
{
"error": {
"code": 400,
"type": "invalid_request_error",
"message": "Invalid size '512x512'. Valid sizes for gpt-image-1 are: 1024x1024, 1536x1024, 1024x1536, auto",
"param": "size"
}
}
Behavior: - When a backend returns a parseable error response, the original message, type, param, and code fields are preserved - The param field is included when the backend provides it (useful for identifying which parameter caused the error) - If the backend response cannot be parsed, a generic error message is returned - All error responses remain OpenAI-compatible
Fallback Response (when backend error cannot be parsed):
Retry Strategies¶
Configuration¶
retry:
# Basic settings
max_attempts: 3
initial_delay: "100ms"
max_delay: "10s"
backoff_multiplier: 2.0
jitter: true
# Retryable conditions
retryable_status_codes:
- 429 # Too Many Requests
- 502 # Bad Gateway
- 503 # Service Unavailable
- 504 # Gateway Timeout
retryable_errors:
- ConnectionError
- TimeoutError
- TemporaryError
# Per-endpoint configuration
endpoints:
"/v1/chat/completions":
max_attempts: 5
timeout: "60s"
"/v1/completions":
max_attempts: 3
timeout: "30s"
Exponential Backoff¶
backoff:
type: exponential
base: 100ms
multiplier: 2
max: 10s
jitter: 0.1 # ±10% randomization
# Delay calculation:
# delay = min(base * multiplier^attempt + jitter, max)
#
# Attempt 1: 100ms
# Attempt 2: 200ms
# Attempt 3: 400ms
# Attempt 4: 800ms
# ...
Smart Retry Logic¶
smart_retry:
# Retry with different backend
try_different_backend: true
# Reduce request on retry
reduce_on_retry:
max_tokens: 0.8 # Reduce by 20%
temperature: 0.9 # Lower temperature
# Skip retry for specific errors
non_retryable:
- AuthenticationError
- ValidationError
- PaymentRequired
Circuit Breaker¶
Architecture Details
For detailed implementation architecture including state machine diagrams, admin endpoints, and Prometheus metrics, see Circuit Breaker Architecture.
Configuration¶
circuit_breaker:
enabled: true
# Failure detection
failure_threshold: 5 # Failures to open circuit
success_threshold: 2 # Successes to close circuit
# Timing
timeout: "30s" # Request timeout
half_open_timeout: "15s" # Half-open state duration
reset_timeout: "60s" # Time before retry
# Monitoring window
window_size: "60s"
min_requests: 10 # Min requests for statistics
Circuit States¶
stateDiagram-v2
[*] --> Closed
Closed --> Open: Failure threshold reached
Open --> HalfOpen: Reset timeout expired
HalfOpen --> Closed: Success threshold reached
HalfOpen --> Open: Failure detected Per-Backend Circuit Breaker¶
backends:
- name: primary
url: http://primary:8000
circuit_breaker:
failure_threshold: 3
reset_timeout: "30s"
- name: secondary
url: http://secondary:8000
circuit_breaker:
failure_threshold: 5
reset_timeout: "60s"
Error Recovery¶
Automatic Recovery¶
recovery:
# Health check recovery
health_checks:
enabled: true
interval: "30s"
recovery_threshold: 2 # Consecutive successes
# Connection pool recovery
connection_pool:
validation_interval: "60s"
evict_invalid: true
replace_invalid: true
# Cache recovery
cache:
clear_on_error: false
partial_invalidation: true
Model Fallback¶
The router supports automatic model fallback when primary models are unavailable. This integrates with the circuit breaker for comprehensive error recovery.
# Model fallback configuration
fallback:
enabled: true
fallback_chains:
"gpt-4o":
- "gpt-4-turbo"
- "gpt-3.5-turbo"
"claude-opus-4-5-20251101":
- "claude-sonnet-4-5"
- "claude-haiku-4-5"
# Cross-provider fallback
"gemini-2.5-pro":
- "gemini-2.5-flash"
- "gpt-4o"
fallback_policy:
trigger_conditions:
error_codes: [429, 500, 502, 503, 504]
timeout: true
connection_error: true
circuit_breaker_open: true
max_fallback_attempts: 3
fallback_timeout_multiplier: 1.5
Fallback Trigger Conditions¶
| Condition | HTTP Status | Description |
|---|---|---|
| Rate Limit | 429 | Backend rate limit exceeded |
| Server Error | 500 | Internal backend error |
| Bad Gateway | 502 | Invalid response from backend |
| Service Unavailable | 503 | Backend temporarily unavailable |
| Gateway Timeout | 504 | Backend request timeout |
| Circuit Open | N/A | Circuit breaker is open |
Fallback Response Headers¶
When fallback occurs, these headers are added:
X-Fallback-Used: true
X-Original-Model: gpt-4o
X-Fallback-Model: gpt-4-turbo
X-Fallback-Reason: error_code_429
X-Fallback-Attempts: 2
Fallback Error Response¶
When all fallbacks are exhausted:
{
"error": {
"code": 503,
"type": "all_fallbacks_exhausted",
"message": "All fallback models failed for 'gpt-4o'",
"details": {
"original_model": "gpt-4o",
"attempted_fallbacks": ["gpt-4-turbo", "gpt-3.5-turbo"],
"failure_reasons": [
{"model": "gpt-4-turbo", "reason": "error_code_503"},
{"model": "gpt-3.5-turbo", "reason": "timeout"}
]
},
"request_id": "req_12345"
}
}
Graceful Degradation¶
degradation:
# Fallback models (legacy - use fallback.fallback_chains instead)
model_fallbacks:
"gpt-4": ["gpt-3.5-turbo", "gpt-3"]
"claude-opus": ["claude-sonnet", "claude-haiku"]
# Feature degradation
features:
streaming:
fallback_to_non_streaming: true
functions:
disable_on_error: true
# Response degradation
response:
reduce_max_tokens: true
lower_temperature: true
simplify_prompts: true
Failover Strategy¶
failover:
strategy: priority # or round-robin, least-failures
backends:
- name: primary
priority: 1
weight: 100
- name: secondary
priority: 2
weight: 50
- name: tertiary
priority: 3
weight: 10
conditions:
- error_rate > 0.1
- latency_p99 > 5s
- health_score < 0.5
Custom Error Handling¶
Error Middleware¶
// Custom error handler implementation
pub async fn error_handler(error: Error) -> Response {
let (status, error_type, message) = match error {
Error::Validation(e) => (
StatusCode::BAD_REQUEST,
"validation_error",
e.to_string()
),
Error::NotFound(e) => (
StatusCode::NOT_FOUND,
"not_found",
e.to_string()
),
Error::Backend(e) => (
StatusCode::BAD_GATEWAY,
"backend_error",
"Backend service error"
),
Error::Internal(e) => {
error!("Internal error: {:?}", e);
(
StatusCode::INTERNAL_SERVER_ERROR,
"internal_error",
"An internal error occurred"
)
}
};
Json(ErrorResponse {
error: ErrorDetails {
code: status.as_u16(),
error_type: error_type.to_string(),
message: message.to_string(),
request_id: request_id(),
timestamp: Utc::now(),
}
}).into_response()
}
Error Transformation¶
error_transformation:
# Map backend errors to client-friendly messages
mappings:
- backend_error: "CUDA out of memory"
client_error: "Model temporarily unavailable, please retry"
status: 503
- backend_error: "Model not loaded"
client_error: "Model initialization in progress"
status: 503
retry_after: 30
# Hide sensitive information
sanitization:
remove_stack_traces: true
remove_internal_ips: true
remove_credentials: true
Error Hooks¶
error_hooks:
# Pre-error hooks
pre_error:
- log_error
- capture_metrics
- notify_monitoring
# Post-error hooks
post_error:
- cleanup_resources
- update_circuit_breaker
- trigger_failover
# Custom handlers
handlers:
- type: webhook
url: https://alerts.example.com/errors
events: [critical_error, repeated_error]
- type: email
to: oncall@example.com
events: [service_down]
Debugging Errors¶
Error Logging¶
logging:
errors:
level: debug # Log all errors in detail
include_request_body: true
include_response_body: true
include_headers: true
# Structured error logging
format:
type: json
fields:
- timestamp
- level
- error_type
- error_message
- request_id
- backend_id
- model
- latency
- stack_trace
Debug Endpoints¶
# Get recent errors
curl http://localhost:8080/admin/errors/recent
# Get error statistics
curl http://localhost:8080/admin/errors/stats
# Get specific error details
curl http://localhost:8080/admin/errors/req_12345
# Trigger error for testing
curl -X POST http://localhost:8080/admin/debug/error \
-d '{"type": "backend_timeout", "backend": "primary"}'
Error Tracing¶
tracing:
errors:
capture_stack_trace: true
capture_variables: true
capture_context: true
# Distributed tracing
propagation:
- tracecontext
- baggage
# Error sampling
sampling:
all_errors: true
error_rate_threshold: 0.01
Common Error Scenarios¶
Scenario 1: All Backends Down¶
# Detection
condition: all_backends.health_status == unhealthy
# Response
response:
status: 503
message: "Service temporarily unavailable"
retry_after: 30
# Recovery
recovery:
- increase_health_check_frequency
- notify_oncall
- attempt_backend_restart
- switch_to_backup_region
Scenario 2: Model Not Found¶
# Detection
condition: requested_model not in available_models
# Response
response:
status: 404
message: "Model '{model}' not found"
suggestions: similar_models
# Mitigation
mitigation:
- check_model_aliases
- refresh_model_cache
- try_alternative_backends
Scenario 3: Rate Limit Exceeded¶
# Detection
condition: request_count > rate_limit
# Response
response:
status: 429
retry_after: calculate_backoff()
headers:
X-RateLimit-Remaining: 0
# Handling
handling:
- queue_request_if_premium
- suggest_upgrade_plan
- apply_exponential_backoff
Scenario 4: Timeout¶
# Detection
condition: request_duration > timeout
# Response
response:
status: 504
message: "Request timeout"
# Mitigation
mitigation:
- try_with_reduced_max_tokens
- switch_to_faster_backend
- enable_streaming_if_possible
Scenario 5: Backend Mismatch¶
# Detection
condition: backend_response_format != expected_format
# Response
response:
status: 502
message: "Invalid backend response"
# Recovery
recovery:
- log_response_for_debugging
- mark_backend_as_degraded
- retry_with_different_backend
- update_backend_adapter
Scenario 6: File Resolution Failure¶
# Detection
condition: file_reference_not_found OR invalid_file_id
# Response
response:
status: 404 # For file not found
# or
status: 400 # For invalid file ID format
message: "Failed to resolve file reference"
type: "invalid_request_error"
code: "file_resolution_failed"
# Details
details:
- file_id: "file-abc123"
- reason: "file not found" OR "invalid file ID format"
# Mitigation
mitigation:
- verify_file_was_uploaded
- check_file_id_format_starts_with_file_prefix
- ensure_file_not_deleted
Scenario 7: File Too Large¶
# Detection
condition: file_size > max_file_size
# Response
response:
status: 413
message: "File size exceeds maximum allowed"
type: "invalid_request_error"
# Details
details:
file_size: 600000000 # bytes
max_size: 536870912 # bytes (512MB default)
# Mitigation
mitigation:
- compress_file_before_upload
- use_smaller_file
- increase_max_file_size_in_config
Scenario 8: Too Many File References¶
# Detection
condition: file_references_count > 20
# Response
response:
status: 400
message: "Too many file references in request"
type: "invalid_request_error"
# Details
details:
count: 25
max_allowed: 20
# Mitigation
mitigation:
- split_request_into_multiple_calls
- reduce_number_of_files
Error Monitoring¶
Metrics¶
# Error rate
rate(http_requests_total{status=~"5.."}[5m])
# Error rate by type
rate(errors_total[5m]) by (error_type)
# Backend error rate
rate(backend_errors_total[5m]) by (backend_id, error_type)
# Circuit breaker state
circuit_breaker_state{backend_id="primary"}
Alerts¶
alerts:
- name: HighErrorRate
condition: error_rate > 0.05
duration: 5m
severity: warning
- name: AllBackendsDown
condition: healthy_backends == 0
duration: 1m
severity: critical
- name: CircuitBreakerOpen
condition: circuit_breaker_state == "open"
duration: 5m
severity: warning
Best Practices¶
Error Design¶
- Use appropriate HTTP status codes
- Provide clear, actionable error messages
- Include request IDs for tracing
- Don't expose internal details to clients
- Log errors with sufficient context
Error Handling¶
- Implement retry logic with backoff
- Use circuit breakers to prevent cascading failures
- Set appropriate timeouts
- Handle errors at the appropriate layer
- Fail fast for non-recoverable errors
Error Recovery¶
- Implement health checks for automatic recovery
- Use fallback strategies for critical paths
- Monitor error rates and patterns
- Set up alerting for anomalies
- Document error scenarios and recovery procedures
See Also¶
- Circuit Breaker Architecture - Detailed circuit breaker implementation
- Model Fallback Architecture - Model fallback system design
- Configuration Guide
- Monitoring Guide
- API Reference
- Quick Start Guide