Error Handling Guide¶

This guide covers error handling mechanisms, status codes, retry strategies, and troubleshooting for the Continuum Router.

Table of Contents¶

Error Categories
HTTP Status Codes
Error Response Format
Retry Strategies
Circuit Breaker
Error Recovery
Custom Error Handling
Debugging Errors
Common Error Scenarios

Error Categories¶

Client Errors (4xx)¶

Errors caused by invalid client requests that should not be retried without modification.

Category	Description	Examples
Validation Errors	Invalid request format or parameters	Missing required fields, invalid JSON
Authentication Errors	Failed authentication or authorization	Invalid API key, expired token
Resource Errors	Requested resource not found	Model not available, endpoint not found
Rate Limit Errors	Too many requests	Quota exceeded, rate limit hit

Server Errors (5xx)¶

Errors on the server side that may be transient and retryable.

Category	Description	Examples
Backend Errors	Backend service issues	Connection refused, backend timeout
Router Errors	Internal router issues	Configuration error, panic
Infrastructure Errors	Infrastructure failures	Database down, network partition
Capacity Errors	Resource exhaustion	Memory limit, connection pool full

HTTP Status Codes¶

Success Codes (2xx)¶

Status	Name	Description	When Used
200	OK	Request completed successfully	Normal response
201	Created	Resource created successfully	Resource creation
202	Accepted	Request accepted for processing	Async operations
204	No Content	Success with no response body	Deletions

Client Error Codes (4xx)¶

Status	Name	Description	When Used
400	Bad Request	Invalid request format	Malformed JSON, missing fields
401	Unauthorized	Authentication required	Missing or invalid auth
403	Forbidden	Access denied	Insufficient permissions
404	Not Found	Resource not found	Model/endpoint not available
405	Method Not Allowed	HTTP method not supported	Wrong HTTP verb
408	Request Timeout	Client request timeout	Slow client
413	Payload Too Large	Request body too large	Exceeds size limit
422	Unprocessable Entity	Validation failed	Business logic errors
429	Too Many Requests	Rate limit exceeded	Rate limiting

Server Error Codes (5xx)¶

Status	Name	Description	When Used
500	Internal Server Error	Unexpected server error	Unhandled exceptions
501	Not Implemented	Feature not implemented	Unsupported operations
502	Bad Gateway	Invalid backend response	Backend errors
503	Service Unavailable	Service temporarily down	All backends unhealthy
504	Gateway Timeout	Backend timeout	Backend too slow
507	Insufficient Storage	Storage full	Disk/memory full

Error Response Format¶

Standard Error Response¶

{
  "error": {
    "code": 404,
    "type": "model_not_found",
    "message": "Model 'gpt-5' not found on any healthy backend",
    "details": {
      "requested_model": "gpt-5",
      "available_models": ["gpt-4", "gpt-3.5-turbo", "llama2"],
      "backends_checked": 3,
      "healthy_backends": 2
    },
    "request_id": "req_12345",
    "timestamp": "2024-01-15T10:30:45Z"
  }
}

Validation Error Response¶

{
  "error": {
    "code": 400,
    "type": "validation_error",
    "message": "Invalid request parameters",
    "details": {
      "validation_errors": [
        {
          "field": "messages",
          "error": "Required field missing"
        },
        {
          "field": "temperature",
          "error": "Must be between 0 and 2",
          "value": 3.5
        }
      ]
    },
    "request_id": "req_12346"
  }
}

Rate Limit Error Response¶

{
  "error": {
    "code": 429,
    "type": "rate_limit_exceeded",
    "message": "API rate limit exceeded",
    "details": {
      "limit": 100,
      "window": "1m",
      "retry_after": 45,
      "reset_at": "2024-01-15T10:31:30Z"
    },
    "headers": {
      "X-RateLimit-Limit": "100",
      "X-RateLimit-Remaining": "0",
      "X-RateLimit-Reset": "1705316490",
      "Retry-After": "45"
    }
  }
}

Backend Error Passthrough¶

When a backend returns a 4xx error, Continuum Router parses and forwards the original error details from the backend. This provides more actionable error information for debugging.

Supported Backend Formats: - OpenAI API ({"error": {"message": "...", "type": "...", "param": "...", "code": "..."}}) - Anthropic Claude API ({"error": {"message": "...", "type": "..."}}) - Google Gemini API ({"error": {"message": "...", "status": "...", "code": ...}})

Example Response (with backend error passthrough):

{
  "error": {
    "code": 400,
    "type": "invalid_request_error",
    "message": "Invalid size '512x512'. Valid sizes for gpt-image-1 are: 1024x1024, 1536x1024, 1024x1536, auto",
    "param": "size"
  }
}

Behavior: - When a backend returns a parseable error response, the original message, type, param, and code fields are preserved - The param field is included when the backend provides it (useful for identifying which parameter caused the error) - If the backend response cannot be parsed, a generic error message is returned - All error responses remain OpenAI-compatible

Fallback Response (when backend error cannot be parsed):

{
  "error": {
    "code": 400,
    "type": "invalid_request_error",
    "message": "Client error: HTTP 400"
  }
}

Retry Strategies¶

Configuration¶

retry:
  # Basic settings
  max_attempts: 3
  initial_delay: "100ms"
  max_delay: "10s"
  backoff_multiplier: 2.0
  jitter: true

  # Retryable conditions
  retryable_status_codes:
        - 429  # Too Many Requests
        - 502  # Bad Gateway
        - 503  # Service Unavailable
        - 504  # Gateway Timeout

  retryable_errors:
        - ConnectionError
        - TimeoutError
        - TemporaryError

  # Per-endpoint configuration
  endpoints:
    "/v1/chat/completions":
      max_attempts: 5
      timeout: "60s"
    "/v1/completions":
      max_attempts: 3
      timeout: "30s"

Exponential Backoff¶

backoff:
  type: exponential
  base: 100ms
  multiplier: 2
  max: 10s
  jitter: 0.1  # ±10% randomization

# Delay calculation:
# delay = min(base * multiplier^attempt + jitter, max)
# 
# Attempt 1: 100ms
# Attempt 2: 200ms
# Attempt 3: 400ms
# Attempt 4: 800ms
# ...

Smart Retry Logic¶

smart_retry:
  # Retry with different backend
  try_different_backend: true

  # Reduce request on retry
  reduce_on_retry:
    max_tokens: 0.8  # Reduce by 20%
    temperature: 0.9  # Lower temperature

  # Skip retry for specific errors
  non_retryable:
        - AuthenticationError
        - ValidationError
        - PaymentRequired

Circuit Breaker¶

Architecture Details

For detailed implementation architecture including state machine diagrams, admin endpoints, and Prometheus metrics, see Circuit Breaker Architecture.

Configuration¶

circuit_breaker:
  enabled: true

  # Failure detection
  failure_threshold: 5        # Failures to open circuit
  success_threshold: 2        # Successes to close circuit

  # Timing
  timeout: "30s"              # Request timeout
  half_open_timeout: "15s"    # Half-open state duration
  reset_timeout: "60s"        # Time before retry

  # Monitoring window
  window_size: "60s"
  min_requests: 10            # Min requests for statistics

Circuit States¶

stateDiagram-v2
    [*] --> Closed
    Closed --> Open: Failure threshold reached
    Open --> HalfOpen: Reset timeout expired
    HalfOpen --> Closed: Success threshold reached
    HalfOpen --> Open: Failure detected

Per-Backend Circuit Breaker¶

backends:
    - name: primary
    url: http://primary:8000
    circuit_breaker:
      failure_threshold: 3
      reset_timeout: "30s"

    - name: secondary
    url: http://secondary:8000
    circuit_breaker:
      failure_threshold: 5
      reset_timeout: "60s"

Error Recovery¶

Automatic Recovery¶

recovery:
  # Health check recovery
  health_checks:
    enabled: true
    interval: "30s"
    recovery_threshold: 2  # Consecutive successes

  # Connection pool recovery
  connection_pool:
    validation_interval: "60s"
    evict_invalid: true
    replace_invalid: true

  # Cache recovery
  cache:
    clear_on_error: false
    partial_invalidation: true

Model Fallback¶

The router supports automatic model fallback when primary models are unavailable. This integrates with the circuit breaker so failed requests cascade through alternate models before surfacing an error.

# Model fallback configuration
fallback:
  enabled: true

  fallback_chains:
    "gpt-4o":
      - "gpt-4-turbo"
      - "gpt-3.5-turbo"
    "claude-opus-4-5-20251101":
      - "claude-sonnet-4-5"
      - "claude-haiku-4-5"
    # Cross-provider fallback
    "gemini-2.5-pro":
      - "gemini-2.5-flash"
      - "gpt-4o"

  fallback_policy:
    trigger_conditions:
      error_codes: [429, 500, 502, 503, 504]
      timeout: true
      connection_error: true
      circuit_breaker_open: true
    max_fallback_attempts: 3
    fallback_timeout_multiplier: 1.5

Fallback Trigger Conditions¶

Condition	HTTP Status	Description
Rate Limit	429	Backend rate limit exceeded
Server Error	500	Internal backend error
Bad Gateway	502	Invalid response from backend
Service Unavailable	503	Backend temporarily unavailable
Gateway Timeout	504	Backend request timeout
Circuit Open	N/A	Circuit breaker is open

Fallback Response Headers¶

When fallback occurs, these headers are added:

X-Fallback-Used: true
X-Original-Model: gpt-4o
X-Fallback-Model: gpt-4-turbo
X-Fallback-Reason: error_code_429
X-Fallback-Attempts: 2

Fallback Error Response¶

When all fallbacks are exhausted:

{
  "error": {
    "code": 503,
    "type": "all_fallbacks_exhausted",
    "message": "All fallback models failed for 'gpt-4o'",
    "details": {
      "original_model": "gpt-4o",
      "attempted_fallbacks": ["gpt-4-turbo", "gpt-3.5-turbo"],
      "failure_reasons": [
        {"model": "gpt-4-turbo", "reason": "error_code_503"},
        {"model": "gpt-3.5-turbo", "reason": "timeout"}
      ]
    },
    "request_id": "req_12345"
  }
}

Graceful Degradation¶

degradation:
  # Fallback models (legacy - use fallback.fallback_chains instead)
  model_fallbacks:
    "gpt-4": ["gpt-3.5-turbo", "gpt-3"]
    "claude-opus": ["claude-sonnet", "claude-haiku"]

  # Feature degradation
  features:
    streaming:
      fallback_to_non_streaming: true
    functions:
      disable_on_error: true

  # Response degradation
  response:
    reduce_max_tokens: true
    lower_temperature: true
    simplify_prompts: true

Failover Strategy¶

failover:
  strategy: priority  # or round-robin, least-failures

  backends:
        - name: primary
      priority: 1
      weight: 100

        - name: secondary
      priority: 2
      weight: 50

        - name: tertiary
      priority: 3
      weight: 10

  conditions:
        - error_rate > 0.1
        - latency_p99 > 5s
        - health_score < 0.5

Custom Error Handling¶

Error Middleware¶

// Custom error handler implementation
pub async fn error_handler(error: Error) -> Response {
    let (status, error_type, message) = match error {
        Error::Validation(e) => (
            StatusCode::BAD_REQUEST,
            "validation_error",
            e.to_string()
        ),
        Error::NotFound(e) => (
            StatusCode::NOT_FOUND,
            "not_found",
            e.to_string()
        ),
        Error::Backend(e) => (
            StatusCode::BAD_GATEWAY,
            "backend_error",
            "Backend service error"
        ),
        Error::Internal(e) => {
            error!("Internal error: {:?}", e);
            (
                StatusCode::INTERNAL_SERVER_ERROR,
                "internal_error",
                "An internal error occurred"
            )
        }
    };

    Json(ErrorResponse {
        error: ErrorDetails {
            code: status.as_u16(),
            error_type: error_type.to_string(),
            message: message.to_string(),
            request_id: request_id(),
            timestamp: Utc::now(),
        }
    }).into_response()
}

Error Transformation¶

error_transformation:
  # Map backend errors to client-friendly messages
  mappings:
        - backend_error: "CUDA out of memory"
      client_error: "Model temporarily unavailable, please retry"
      status: 503

        - backend_error: "Model not loaded"
      client_error: "Model initialization in progress"
      status: 503
      retry_after: 30

  # Hide sensitive information
  sanitization:
    remove_stack_traces: true
    remove_internal_ips: true
    remove_credentials: true

Error Hooks¶

error_hooks:
  # Pre-error hooks
  pre_error:
        - log_error
        - capture_metrics
        - notify_monitoring

  # Post-error hooks
  post_error:
        - cleanup_resources
        - update_circuit_breaker
        - trigger_failover

  # Custom handlers
  handlers:
        - type: webhook
      url: https://alerts.example.com/errors
      events: [critical_error, repeated_error]

        - type: email
      to: oncall@example.com
      events: [service_down]

Debugging Errors¶

Error Logging¶

logging:
  errors:
    level: debug  # Log all errors in detail
    include_request_body: true
    include_response_body: true
    include_headers: true

  # Structured error logging
  format:
    type: json
    fields:
      - timestamp
      - level
      - error_type
      - error_message
      - request_id
      - backend_id
      - model
      - latency
      - stack_trace

Debug Endpoints¶

# Get recent errors
curl http://localhost:8080/admin/errors/recent

# Get error statistics
curl http://localhost:8080/admin/errors/stats

# Get specific error details
curl http://localhost:8080/admin/errors/req_12345

# Trigger error for testing
curl -X POST http://localhost:8080/admin/debug/error \
  -d '{"type": "backend_timeout", "backend": "primary"}'

Error Tracing¶

tracing:
  errors:
    capture_stack_trace: true
    capture_variables: true
    capture_context: true

  # Distributed tracing
  propagation:
        - tracecontext
        - baggage

  # Error sampling
  sampling:
    all_errors: true
    error_rate_threshold: 0.01

Common Error Scenarios¶

Scenario 1: All Backends Down¶

# Detection
condition: all_backends.health_status == unhealthy

# Response
response:
  status: 503
  message: "Service temporarily unavailable"
  retry_after: 30

# Recovery
recovery:
    - increase_health_check_frequency
    - notify_oncall
    - attempt_backend_restart
    - switch_to_backup_region

Scenario 2: Model Not Found¶

# Detection
condition: requested_model not in available_models

# Response
response:
  status: 404
  message: "Model '{model}' not found"
  suggestions: similar_models

# Mitigation
mitigation:
    - check_model_aliases
    - refresh_model_cache
    - try_alternative_backends

Scenario 3: Rate Limit Exceeded¶

# Detection
condition: request_count > rate_limit

# Response
response:
  status: 429
  retry_after: calculate_backoff()
  headers:
    X-RateLimit-Remaining: 0

# Handling
handling:
    - queue_request_if_premium
    - suggest_upgrade_plan
    - apply_exponential_backoff

Scenario 4: Timeout¶

# Detection
condition: request_duration > timeout

# Response
response:
  status: 504
  message: "Request timeout"

# Mitigation
mitigation:
    - try_with_reduced_max_tokens
    - switch_to_faster_backend
    - enable_streaming_if_possible

Scenario 5: Backend Mismatch¶

# Detection
condition: backend_response_format != expected_format

# Response
response:
  status: 502
  message: "Invalid backend response"

# Recovery
recovery:
    - log_response_for_debugging
    - mark_backend_as_degraded
    - retry_with_different_backend
    - update_backend_adapter

Scenario 6: File Resolution Failure¶

# Detection
condition: file_reference_not_found OR invalid_file_id

# Response
response:
  status: 404  # For file not found
  # or
  status: 400  # For invalid file ID format
  message: "Failed to resolve file reference"
  type: "invalid_request_error"
  code: "file_resolution_failed"

# Details
details:
    - file_id: "file-abc123"
    - reason: "file not found" OR "invalid file ID format"

# Mitigation
mitigation:
    - verify_file_was_uploaded
    - check_file_id_format_starts_with_file_prefix
    - ensure_file_not_deleted

Scenario 7: File Too Large¶

# Detection
condition: file_size > max_file_size

# Response
response:
  status: 413
  message: "File size exceeds maximum allowed"
  type: "invalid_request_error"

# Details
details:
  file_size: 600000000  # bytes
  max_size: 536870912   # bytes (512MB default)

# Mitigation
mitigation:
    - compress_file_before_upload
    - use_smaller_file
    - increase_max_file_size_in_config

Scenario 8: Too Many File References¶

# Detection
condition: file_references_count > 20

# Response
response:
  status: 400
  message: "Too many file references in request"
  type: "invalid_request_error"

# Details
details:
  count: 25
  max_allowed: 20

# Mitigation
mitigation:
    - split_request_into_multiple_calls
    - reduce_number_of_files

Error Monitoring¶

Metrics¶

# Error rate
rate(http_requests_total{status=~"5.."}[5m])

# Error rate by type
rate(errors_total[5m]) by (error_type)

# Backend error rate
rate(backend_errors_total[5m]) by (backend_id, error_type)

# Circuit breaker state
circuit_breaker_state{backend_id="primary"}

Alerts¶

alerts:
    - name: HighErrorRate
    condition: error_rate > 0.05
    duration: 5m
    severity: warning

    - name: AllBackendsDown
    condition: healthy_backends == 0
    duration: 1m
    severity: critical

    - name: CircuitBreakerOpen
    condition: circuit_breaker_state == "open"
    duration: 5m
    severity: warning

Best Practices¶

Error Design¶

Use appropriate HTTP status codes
Provide clear, actionable error messages
Include request IDs for tracing
Don't expose internal details to clients
Log errors with sufficient context

Error Handling¶

Implement retry logic with backoff
Use circuit breakers to prevent cascading failures
Set appropriate timeouts
Handle errors at the appropriate layer
Fail fast for non-recoverable errors

Error Recovery¶

Implement health checks for automatic recovery
Use fallback strategies for critical paths
Monitor error rates and patterns
Set up alerting for anomalies
Document error scenarios and recovery procedures

Error Handling Guide¶

Table of Contents¶

Error Categories¶

Client Errors (4xx)¶

Server Errors (5xx)¶

HTTP Status Codes¶

Success Codes (2xx)¶

Client Error Codes (4xx)¶

Server Error Codes (5xx)¶

Error Response Format¶

Standard Error Response¶

Validation Error Response¶

Rate Limit Error Response¶

Backend Error Passthrough¶

Retry Strategies¶

Configuration¶

Exponential Backoff¶

Smart Retry Logic¶

Circuit Breaker¶

Configuration¶

Circuit States¶

Per-Backend Circuit Breaker¶

Error Recovery¶

Automatic Recovery¶

Model Fallback¶

Fallback Trigger Conditions¶

Fallback Response Headers¶

Fallback Error Response¶

Graceful Degradation¶

Failover Strategy¶

Custom Error Handling¶

Error Middleware¶

Error Transformation¶

Error Hooks¶

Debugging Errors¶

Error Logging¶

Debug Endpoints¶

Error Tracing¶

Common Error Scenarios¶

Scenario 1: All Backends Down¶

Scenario 2: Model Not Found¶

Scenario 3: Rate Limit Exceeded¶

Scenario 4: Timeout¶

Scenario 5: Backend Mismatch¶

Scenario 6: File Resolution Failure¶

Scenario 7: File Too Large¶

Scenario 8: Too Many File References¶

Error Monitoring¶

Metrics¶

Alerts¶

Best Practices¶

Error Design¶

Error Handling¶

Error Recovery¶

See Also¶