Error Handling Guide¶
This guide covers error handling mechanisms, status codes, retry strategies, and troubleshooting for the Continuum Router.
Table of Contents¶
- Error Categories
- HTTP Status Codes
- Error Response Format
- Retry Strategies
- Circuit Breaker
- Error Recovery
- Custom Error Handling
- Debugging Errors
- Common Error Scenarios
Error Categories¶
Client Errors (4xx)¶
Errors caused by invalid client requests that should not be retried without modification.
| Category | Description | Examples |
|---|---|---|
| Validation Errors | Invalid request format or parameters | Missing required fields, invalid JSON |
| Authentication Errors | Failed authentication or authorization | Invalid API key, expired token |
| Resource Errors | Requested resource not found | Model not available, endpoint not found |
| Rate Limit Errors | Too many requests | Quota exceeded, rate limit hit |
Server Errors (5xx)¶
Errors on the server side that may be transient and retryable.
| Category | Description | Examples |
|---|---|---|
| Backend Errors | Backend service issues | Connection refused, backend timeout |
| Router Errors | Internal router issues | Configuration error, panic |
| Infrastructure Errors | Infrastructure failures | Database down, network partition |
| Capacity Errors | Resource exhaustion | Memory limit, connection pool full |
HTTP Status Codes¶
Success Codes (2xx)¶
| Status | Name | Description | When Used |
|---|---|---|---|
| 200 | OK | Request completed successfully | Normal response |
| 201 | Created | Resource created successfully | Resource creation |
| 202 | Accepted | Request accepted for processing | Async operations |
| 204 | No Content | Success with no response body | Deletions |
Client Error Codes (4xx)¶
| Status | Name | Description | When Used |
|---|---|---|---|
| 400 | Bad Request | Invalid request format | Malformed JSON, missing fields |
| 401 | Unauthorized | Authentication required | Missing or invalid auth |
| 403 | Forbidden | Access denied | Insufficient permissions |
| 404 | Not Found | Resource not found | Model/endpoint not available |
| 405 | Method Not Allowed | HTTP method not supported | Wrong HTTP verb |
| 408 | Request Timeout | Client request timeout | Slow client |
| 413 | Payload Too Large | Request body too large | Exceeds size limit |
| 422 | Unprocessable Entity | Validation failed | Business logic errors |
| 429 | Too Many Requests | Rate limit exceeded | Rate limiting |
Server Error Codes (5xx)¶
| Status | Name | Description | When Used |
|---|---|---|---|
| 500 | Internal Server Error | Unexpected server error | Unhandled exceptions |
| 501 | Not Implemented | Feature not implemented | Unsupported operations |
| 502 | Bad Gateway | Invalid backend response | Backend errors |
| 503 | Service Unavailable | Service temporarily down | All backends unhealthy |
| 504 | Gateway Timeout | Backend timeout | Backend too slow |
| 507 | Insufficient Storage | Storage full | Disk/memory full |
Error Response Format¶
Standard Error Response¶
{
"error": {
"code": 404,
"type": "model_not_found",
"message": "Model 'gpt-5' not found on any healthy backend",
"details": {
"requested_model": "gpt-5",
"available_models": ["gpt-4", "gpt-3.5-turbo", "llama2"],
"backends_checked": 3,
"healthy_backends": 2
},
"request_id": "req_12345",
"timestamp": "2024-01-15T10:30:45Z"
}
}
Validation Error Response¶
{
"error": {
"code": 400,
"type": "validation_error",
"message": "Invalid request parameters",
"details": {
"validation_errors": [
{
"field": "messages",
"error": "Required field missing"
},
{
"field": "temperature",
"error": "Must be between 0 and 2",
"value": 3.5
}
]
},
"request_id": "req_12346"
}
}
Rate Limit Error Response¶
{
"error": {
"code": 429,
"type": "rate_limit_exceeded",
"message": "API rate limit exceeded",
"details": {
"limit": 100,
"window": "1m",
"retry_after": 45,
"reset_at": "2024-01-15T10:31:30Z"
},
"headers": {
"X-RateLimit-Limit": "100",
"X-RateLimit-Remaining": "0",
"X-RateLimit-Reset": "1705316490",
"Retry-After": "45"
}
}
}
Backend Error Passthrough¶
When a backend returns a 4xx error, Continuum Router parses and forwards the original error details from the backend. This provides more actionable error information for debugging.
Supported Backend Formats:
- OpenAI API ({"error": {"message": "...", "type": "...", "param": "...", "code": "..."}})
- Anthropic Claude API ({"error": {"message": "...", "type": "..."}})
- Google Gemini API ({"error": {"message": "...", "status": "...", "code": ...}})
Example Response (with backend error passthrough):
{
"error": {
"code": 400,
"type": "invalid_request_error",
"message": "Invalid size '512x512'. Valid sizes for gpt-image-1 are: 1024x1024, 1536x1024, 1024x1536, auto",
"param": "size"
}
}
Behavior:
- When a backend returns a parseable error response, the original message, type, param, and code fields are preserved
- The param field is included when the backend provides it (useful for identifying which parameter caused the error)
- If the backend response cannot be parsed, a generic error message is returned
- All error responses remain OpenAI-compatible
Fallback Response (when backend error cannot be parsed):
Retry Strategies¶
Configuration¶
retry:
# Basic settings
max_attempts: 3
initial_delay: "100ms"
max_delay: "10s"
backoff_multiplier: 2.0
jitter: true
# Retryable conditions
retryable_status_codes:
- 429 # Too Many Requests
- 502 # Bad Gateway
- 503 # Service Unavailable
- 504 # Gateway Timeout
retryable_errors:
- ConnectionError
- TimeoutError
- TemporaryError
# Per-endpoint configuration
endpoints:
"/v1/chat/completions":
max_attempts: 5
timeout: "60s"
"/v1/completions":
max_attempts: 3
timeout: "30s"
Exponential Backoff¶
backoff:
type: exponential
base: 100ms
multiplier: 2
max: 10s
jitter: 0.1 # ±10% randomization
# Delay calculation:
# delay = min(base * multiplier^attempt + jitter, max)
#
# Attempt 1: 100ms
# Attempt 2: 200ms
# Attempt 3: 400ms
# Attempt 4: 800ms
# ...
Smart Retry Logic¶
smart_retry:
# Retry with different backend
try_different_backend: true
# Reduce request on retry
reduce_on_retry:
max_tokens: 0.8 # Reduce by 20%
temperature: 0.9 # Lower temperature
# Skip retry for specific errors
non_retryable:
- AuthenticationError
- ValidationError
- PaymentRequired
Circuit Breaker¶
Architecture Details
For detailed implementation architecture including state machine diagrams, admin endpoints, and Prometheus metrics, see Circuit Breaker Architecture.
Configuration¶
circuit_breaker:
enabled: true
# Failure detection
failure_threshold: 5 # Failures to open circuit
success_threshold: 2 # Successes to close circuit
# Timing
timeout: "30s" # Request timeout
half_open_timeout: "15s" # Half-open state duration
reset_timeout: "60s" # Time before retry
# Monitoring window
window_size: "60s"
min_requests: 10 # Min requests for statistics
Circuit States¶
stateDiagram-v2
[*] --> Closed
Closed --> Open: Failure threshold reached
Open --> HalfOpen: Reset timeout expired
HalfOpen --> Closed: Success threshold reached
HalfOpen --> Open: Failure detected
Per-Backend Circuit Breaker¶
backends:
- name: primary
url: http://primary:8000
circuit_breaker:
failure_threshold: 3
reset_timeout: "30s"
- name: secondary
url: http://secondary:8000
circuit_breaker:
failure_threshold: 5
reset_timeout: "60s"
Error Recovery¶
Automatic Recovery¶
recovery:
# Health check recovery
health_checks:
enabled: true
interval: "30s"
recovery_threshold: 2 # Consecutive successes
# Connection pool recovery
connection_pool:
validation_interval: "60s"
evict_invalid: true
replace_invalid: true
# Cache recovery
cache:
clear_on_error: false
partial_invalidation: true
Model Fallback¶
The router supports automatic model fallback when primary models are unavailable. This integrates with the circuit breaker so failed requests cascade through alternate models before surfacing an error.
# Model fallback configuration
fallback:
enabled: true
fallback_chains:
"gpt-4o":
- "gpt-4-turbo"
- "gpt-3.5-turbo"
"claude-opus-4-5-20251101":
- "claude-sonnet-4-5"
- "claude-haiku-4-5"
# Cross-provider fallback
"gemini-2.5-pro":
- "gemini-2.5-flash"
- "gpt-4o"
fallback_policy:
trigger_conditions:
error_codes: [429, 500, 502, 503, 504]
timeout: true
connection_error: true
circuit_breaker_open: true
max_fallback_attempts: 3
fallback_timeout_multiplier: 1.5
Fallback Trigger Conditions¶
| Condition | HTTP Status | Description |
|---|---|---|
| Rate Limit | 429 | Backend rate limit exceeded |
| Server Error | 500 | Internal backend error |
| Bad Gateway | 502 | Invalid response from backend |
| Service Unavailable | 503 | Backend temporarily unavailable |
| Gateway Timeout | 504 | Backend request timeout |
| Circuit Open | N/A | Circuit breaker is open |
Fallback Response Headers¶
When fallback occurs, these headers are added:
X-Fallback-Used: true
X-Original-Model: gpt-4o
X-Fallback-Model: gpt-4-turbo
X-Fallback-Reason: error_code_429
X-Fallback-Attempts: 2
Fallback Error Response¶
When all fallbacks are exhausted:
{
"error": {
"code": 503,
"type": "all_fallbacks_exhausted",
"message": "All fallback models failed for 'gpt-4o'",
"details": {
"original_model": "gpt-4o",
"attempted_fallbacks": ["gpt-4-turbo", "gpt-3.5-turbo"],
"failure_reasons": [
{"model": "gpt-4-turbo", "reason": "error_code_503"},
{"model": "gpt-3.5-turbo", "reason": "timeout"}
]
},
"request_id": "req_12345"
}
}
Graceful Degradation¶
degradation:
# Fallback models (legacy - use fallback.fallback_chains instead)
model_fallbacks:
"gpt-4": ["gpt-3.5-turbo", "gpt-3"]
"claude-opus": ["claude-sonnet", "claude-haiku"]
# Feature degradation
features:
streaming:
fallback_to_non_streaming: true
functions:
disable_on_error: true
# Response degradation
response:
reduce_max_tokens: true
lower_temperature: true
simplify_prompts: true
Failover Strategy¶
failover:
strategy: priority # or round-robin, least-failures
backends:
- name: primary
priority: 1
weight: 100
- name: secondary
priority: 2
weight: 50
- name: tertiary
priority: 3
weight: 10
conditions:
- error_rate > 0.1
- latency_p99 > 5s
- health_score < 0.5
Custom Error Handling¶
Error Middleware¶
// Custom error handler implementation
pub async fn error_handler(error: Error) -> Response {
let (status, error_type, message) = match error {
Error::Validation(e) => (
StatusCode::BAD_REQUEST,
"validation_error",
e.to_string()
),
Error::NotFound(e) => (
StatusCode::NOT_FOUND,
"not_found",
e.to_string()
),
Error::Backend(e) => (
StatusCode::BAD_GATEWAY,
"backend_error",
"Backend service error"
),
Error::Internal(e) => {
error!("Internal error: {:?}", e);
(
StatusCode::INTERNAL_SERVER_ERROR,
"internal_error",
"An internal error occurred"
)
}
};
Json(ErrorResponse {
error: ErrorDetails {
code: status.as_u16(),
error_type: error_type.to_string(),
message: message.to_string(),
request_id: request_id(),
timestamp: Utc::now(),
}
}).into_response()
}
Error Transformation¶
error_transformation:
# Map backend errors to client-friendly messages
mappings:
- backend_error: "CUDA out of memory"
client_error: "Model temporarily unavailable, please retry"
status: 503
- backend_error: "Model not loaded"
client_error: "Model initialization in progress"
status: 503
retry_after: 30
# Hide sensitive information
sanitization:
remove_stack_traces: true
remove_internal_ips: true
remove_credentials: true
Error Hooks¶
error_hooks:
# Pre-error hooks
pre_error:
- log_error
- capture_metrics
- notify_monitoring
# Post-error hooks
post_error:
- cleanup_resources
- update_circuit_breaker
- trigger_failover
# Custom handlers
handlers:
- type: webhook
url: https://alerts.example.com/errors
events: [critical_error, repeated_error]
- type: email
to: oncall@example.com
events: [service_down]
Debugging Errors¶
Error Logging¶
logging:
errors:
level: debug # Log all errors in detail
include_request_body: true
include_response_body: true
include_headers: true
# Structured error logging
format:
type: json
fields:
- timestamp
- level
- error_type
- error_message
- request_id
- backend_id
- model
- latency
- stack_trace
Debug Endpoints¶
# Get recent errors
curl http://localhost:8080/admin/errors/recent
# Get error statistics
curl http://localhost:8080/admin/errors/stats
# Get specific error details
curl http://localhost:8080/admin/errors/req_12345
# Trigger error for testing
curl -X POST http://localhost:8080/admin/debug/error \
-d '{"type": "backend_timeout", "backend": "primary"}'
Error Tracing¶
tracing:
errors:
capture_stack_trace: true
capture_variables: true
capture_context: true
# Distributed tracing
propagation:
- tracecontext
- baggage
# Error sampling
sampling:
all_errors: true
error_rate_threshold: 0.01
Common Error Scenarios¶
Scenario 1: All Backends Down¶
# Detection
condition: all_backends.health_status == unhealthy
# Response
response:
status: 503
message: "Service temporarily unavailable"
retry_after: 30
# Recovery
recovery:
- increase_health_check_frequency
- notify_oncall
- attempt_backend_restart
- switch_to_backup_region
Scenario 2: Model Not Found¶
# Detection
condition: requested_model not in available_models
# Response
response:
status: 404
message: "Model '{model}' not found"
suggestions: similar_models
# Mitigation
mitigation:
- check_model_aliases
- refresh_model_cache
- try_alternative_backends
Scenario 3: Rate Limit Exceeded¶
# Detection
condition: request_count > rate_limit
# Response
response:
status: 429
retry_after: calculate_backoff()
headers:
X-RateLimit-Remaining: 0
# Handling
handling:
- queue_request_if_premium
- suggest_upgrade_plan
- apply_exponential_backoff
Scenario 4: Timeout¶
# Detection
condition: request_duration > timeout
# Response
response:
status: 504
message: "Request timeout"
# Mitigation
mitigation:
- try_with_reduced_max_tokens
- switch_to_faster_backend
- enable_streaming_if_possible
Scenario 5: Backend Mismatch¶
# Detection
condition: backend_response_format != expected_format
# Response
response:
status: 502
message: "Invalid backend response"
# Recovery
recovery:
- log_response_for_debugging
- mark_backend_as_degraded
- retry_with_different_backend
- update_backend_adapter
Scenario 6: File Resolution Failure¶
# Detection
condition: file_reference_not_found OR invalid_file_id
# Response
response:
status: 404 # For file not found
# or
status: 400 # For invalid file ID format
message: "Failed to resolve file reference"
type: "invalid_request_error"
code: "file_resolution_failed"
# Details
details:
- file_id: "file-abc123"
- reason: "file not found" OR "invalid file ID format"
# Mitigation
mitigation:
- verify_file_was_uploaded
- check_file_id_format_starts_with_file_prefix
- ensure_file_not_deleted
Scenario 7: File Too Large¶
# Detection
condition: file_size > max_file_size
# Response
response:
status: 413
message: "File size exceeds maximum allowed"
type: "invalid_request_error"
# Details
details:
file_size: 600000000 # bytes
max_size: 536870912 # bytes (512MB default)
# Mitigation
mitigation:
- compress_file_before_upload
- use_smaller_file
- increase_max_file_size_in_config
Scenario 8: Too Many File References¶
# Detection
condition: file_references_count > 20
# Response
response:
status: 400
message: "Too many file references in request"
type: "invalid_request_error"
# Details
details:
count: 25
max_allowed: 20
# Mitigation
mitigation:
- split_request_into_multiple_calls
- reduce_number_of_files
Error Monitoring¶
Metrics¶
# Error rate
rate(http_requests_total{status=~"5.."}[5m])
# Error rate by type
rate(errors_total[5m]) by (error_type)
# Backend error rate
rate(backend_errors_total[5m]) by (backend_id, error_type)
# Circuit breaker state
circuit_breaker_state{backend_id="primary"}
Alerts¶
alerts:
- name: HighErrorRate
condition: error_rate > 0.05
duration: 5m
severity: warning
- name: AllBackendsDown
condition: healthy_backends == 0
duration: 1m
severity: critical
- name: CircuitBreakerOpen
condition: circuit_breaker_state == "open"
duration: 5m
severity: warning
Best Practices¶
Error Design¶
- Use appropriate HTTP status codes
- Provide clear, actionable error messages
- Include request IDs for tracing
- Don't expose internal details to clients
- Log errors with sufficient context
Error Handling¶
- Implement retry logic with backoff
- Use circuit breakers to prevent cascading failures
- Set appropriate timeouts
- Handle errors at the appropriate layer
- Fail fast for non-recoverable errors
Error Recovery¶
- Implement health checks for automatic recovery
- Use fallback strategies for critical paths
- Monitor error rates and patterns
- Set up alerting for anomalies
- Document error scenarios and recovery procedures
See Also¶
- Circuit Breaker Architecture - Detailed circuit breaker implementation
- Model Fallback Architecture - Model fallback system design
- Configuration Guide
- Monitoring Guide
- API Reference
- Quick Start Guide