Skip to content

Error Handling Guide

This guide covers error handling mechanisms, status codes, retry strategies, and troubleshooting for the Continuum Router.

Table of Contents

Error Categories

Client Errors (4xx)

Errors caused by invalid client requests that should not be retried without modification.

Category Description Examples
Validation Errors Invalid request format or parameters Missing required fields, invalid JSON
Authentication Errors Failed authentication or authorization Invalid API key, expired token
Resource Errors Requested resource not found Model not available, endpoint not found
Rate Limit Errors Too many requests Quota exceeded, rate limit hit

Server Errors (5xx)

Errors on the server side that may be transient and retryable.

Category Description Examples
Backend Errors Backend service issues Connection refused, backend timeout
Router Errors Internal router issues Configuration error, panic
Infrastructure Errors Infrastructure failures Database down, network partition
Capacity Errors Resource exhaustion Memory limit, connection pool full

HTTP Status Codes

Success Codes (2xx)

Status Name Description When Used
200 OK Request completed successfully Normal response
201 Created Resource created successfully Resource creation
202 Accepted Request accepted for processing Async operations
204 No Content Success with no response body Deletions

Client Error Codes (4xx)

Status Name Description When Used
400 Bad Request Invalid request format Malformed JSON, missing fields
401 Unauthorized Authentication required Missing or invalid auth
403 Forbidden Access denied Insufficient permissions
404 Not Found Resource not found Model/endpoint not available
405 Method Not Allowed HTTP method not supported Wrong HTTP verb
408 Request Timeout Client request timeout Slow client
413 Payload Too Large Request body too large Exceeds size limit
422 Unprocessable Entity Validation failed Business logic errors
429 Too Many Requests Rate limit exceeded Rate limiting

Server Error Codes (5xx)

Status Name Description When Used
500 Internal Server Error Unexpected server error Unhandled exceptions
501 Not Implemented Feature not implemented Unsupported operations
502 Bad Gateway Invalid backend response Backend errors
503 Service Unavailable Service temporarily down All backends unhealthy
504 Gateway Timeout Backend timeout Backend too slow
507 Insufficient Storage Storage full Disk/memory full

Error Response Format

Standard Error Response

{
  "error": {
    "code": 404,
    "type": "model_not_found",
    "message": "Model 'gpt-5' not found on any healthy backend",
    "details": {
      "requested_model": "gpt-5",
      "available_models": ["gpt-4", "gpt-3.5-turbo", "llama2"],
      "backends_checked": 3,
      "healthy_backends": 2
    },
    "request_id": "req_12345",
    "timestamp": "2024-01-15T10:30:45Z"
  }
}

Validation Error Response

{
  "error": {
    "code": 400,
    "type": "validation_error",
    "message": "Invalid request parameters",
    "details": {
      "validation_errors": [
        {
          "field": "messages",
          "error": "Required field missing"
        },
        {
          "field": "temperature",
          "error": "Must be between 0 and 2",
          "value": 3.5
        }
      ]
    },
    "request_id": "req_12346"
  }
}

Rate Limit Error Response

{
  "error": {
    "code": 429,
    "type": "rate_limit_exceeded",
    "message": "API rate limit exceeded",
    "details": {
      "limit": 100,
      "window": "1m",
      "retry_after": 45,
      "reset_at": "2024-01-15T10:31:30Z"
    },
    "headers": {
      "X-RateLimit-Limit": "100",
      "X-RateLimit-Remaining": "0",
      "X-RateLimit-Reset": "1705316490",
      "Retry-After": "45"
    }
  }
}

Backend Error Passthrough

When a backend returns a 4xx error, Continuum Router parses and forwards the original error details from the backend. This provides more actionable error information for debugging.

Supported Backend Formats: - OpenAI API ({"error": {"message": "...", "type": "...", "param": "...", "code": "..."}}) - Anthropic Claude API ({"error": {"message": "...", "type": "..."}}) - Google Gemini API ({"error": {"message": "...", "status": "...", "code": ...}})

Example Response (with backend error passthrough):

{
  "error": {
    "code": 400,
    "type": "invalid_request_error",
    "message": "Invalid size '512x512'. Valid sizes for gpt-image-1 are: 1024x1024, 1536x1024, 1024x1536, auto",
    "param": "size"
  }
}

Behavior: - When a backend returns a parseable error response, the original message, type, param, and code fields are preserved - The param field is included when the backend provides it (useful for identifying which parameter caused the error) - If the backend response cannot be parsed, a generic error message is returned - All error responses remain OpenAI-compatible

Fallback Response (when backend error cannot be parsed):

{
  "error": {
    "code": 400,
    "type": "invalid_request_error",
    "message": "Client error: HTTP 400"
  }
}

Retry Strategies

Configuration

retry:
  # Basic settings
  max_attempts: 3
  initial_delay: "100ms"
  max_delay: "10s"
  backoff_multiplier: 2.0
  jitter: true

  # Retryable conditions
  retryable_status_codes:
        - 429  # Too Many Requests
        - 502  # Bad Gateway
        - 503  # Service Unavailable
        - 504  # Gateway Timeout

  retryable_errors:
        - ConnectionError
        - TimeoutError
        - TemporaryError

  # Per-endpoint configuration
  endpoints:
    "/v1/chat/completions":
      max_attempts: 5
      timeout: "60s"
    "/v1/completions":
      max_attempts: 3
      timeout: "30s"

Exponential Backoff

backoff:
  type: exponential
  base: 100ms
  multiplier: 2
  max: 10s
  jitter: 0.1  # ±10% randomization

# Delay calculation:
# delay = min(base * multiplier^attempt + jitter, max)
# 
# Attempt 1: 100ms
# Attempt 2: 200ms
# Attempt 3: 400ms
# Attempt 4: 800ms
# ...

Smart Retry Logic

smart_retry:
  # Retry with different backend
  try_different_backend: true

  # Reduce request on retry
  reduce_on_retry:
    max_tokens: 0.8  # Reduce by 20%
    temperature: 0.9  # Lower temperature

  # Skip retry for specific errors
  non_retryable:
        - AuthenticationError
        - ValidationError
        - PaymentRequired

Circuit Breaker

Architecture Details

For detailed implementation architecture including state machine diagrams, admin endpoints, and Prometheus metrics, see Circuit Breaker Architecture.

Configuration

circuit_breaker:
  enabled: true

  # Failure detection
  failure_threshold: 5        # Failures to open circuit
  success_threshold: 2        # Successes to close circuit

  # Timing
  timeout: "30s"              # Request timeout
  half_open_timeout: "15s"    # Half-open state duration
  reset_timeout: "60s"        # Time before retry

  # Monitoring window
  window_size: "60s"
  min_requests: 10            # Min requests for statistics

Circuit States

stateDiagram-v2
    [*] --> Closed
    Closed --> Open: Failure threshold reached
    Open --> HalfOpen: Reset timeout expired
    HalfOpen --> Closed: Success threshold reached
    HalfOpen --> Open: Failure detected

Per-Backend Circuit Breaker

backends:
    - name: primary
    url: http://primary:8000
    circuit_breaker:
      failure_threshold: 3
      reset_timeout: "30s"

    - name: secondary
    url: http://secondary:8000
    circuit_breaker:
      failure_threshold: 5
      reset_timeout: "60s"

Error Recovery

Automatic Recovery

recovery:
  # Health check recovery
  health_checks:
    enabled: true
    interval: "30s"
    recovery_threshold: 2  # Consecutive successes

  # Connection pool recovery
  connection_pool:
    validation_interval: "60s"
    evict_invalid: true
    replace_invalid: true

  # Cache recovery
  cache:
    clear_on_error: false
    partial_invalidation: true

Model Fallback

The router supports automatic model fallback when primary models are unavailable. This integrates with the circuit breaker for comprehensive error recovery.

# Model fallback configuration
fallback:
  enabled: true

  fallback_chains:
    "gpt-4o":
      - "gpt-4-turbo"
      - "gpt-3.5-turbo"
    "claude-opus-4-5-20251101":
      - "claude-sonnet-4-5"
      - "claude-haiku-4-5"
    # Cross-provider fallback
    "gemini-2.5-pro":
      - "gemini-2.5-flash"
      - "gpt-4o"

  fallback_policy:
    trigger_conditions:
      error_codes: [429, 500, 502, 503, 504]
      timeout: true
      connection_error: true
      circuit_breaker_open: true
    max_fallback_attempts: 3
    fallback_timeout_multiplier: 1.5

Fallback Trigger Conditions

Condition HTTP Status Description
Rate Limit 429 Backend rate limit exceeded
Server Error 500 Internal backend error
Bad Gateway 502 Invalid response from backend
Service Unavailable 503 Backend temporarily unavailable
Gateway Timeout 504 Backend request timeout
Circuit Open N/A Circuit breaker is open

Fallback Response Headers

When fallback occurs, these headers are added:

X-Fallback-Used: true
X-Original-Model: gpt-4o
X-Fallback-Model: gpt-4-turbo
X-Fallback-Reason: error_code_429
X-Fallback-Attempts: 2

Fallback Error Response

When all fallbacks are exhausted:

{
  "error": {
    "code": 503,
    "type": "all_fallbacks_exhausted",
    "message": "All fallback models failed for 'gpt-4o'",
    "details": {
      "original_model": "gpt-4o",
      "attempted_fallbacks": ["gpt-4-turbo", "gpt-3.5-turbo"],
      "failure_reasons": [
        {"model": "gpt-4-turbo", "reason": "error_code_503"},
        {"model": "gpt-3.5-turbo", "reason": "timeout"}
      ]
    },
    "request_id": "req_12345"
  }
}

Graceful Degradation

degradation:
  # Fallback models (legacy - use fallback.fallback_chains instead)
  model_fallbacks:
    "gpt-4": ["gpt-3.5-turbo", "gpt-3"]
    "claude-opus": ["claude-sonnet", "claude-haiku"]

  # Feature degradation
  features:
    streaming:
      fallback_to_non_streaming: true
    functions:
      disable_on_error: true

  # Response degradation
  response:
    reduce_max_tokens: true
    lower_temperature: true
    simplify_prompts: true

Failover Strategy

failover:
  strategy: priority  # or round-robin, least-failures

  backends:
        - name: primary
      priority: 1
      weight: 100

        - name: secondary
      priority: 2
      weight: 50

        - name: tertiary
      priority: 3
      weight: 10

  conditions:
        - error_rate > 0.1
        - latency_p99 > 5s
        - health_score < 0.5

Custom Error Handling

Error Middleware

// Custom error handler implementation
pub async fn error_handler(error: Error) -> Response {
    let (status, error_type, message) = match error {
        Error::Validation(e) => (
            StatusCode::BAD_REQUEST,
            "validation_error",
            e.to_string()
        ),
        Error::NotFound(e) => (
            StatusCode::NOT_FOUND,
            "not_found",
            e.to_string()
        ),
        Error::Backend(e) => (
            StatusCode::BAD_GATEWAY,
            "backend_error",
            "Backend service error"
        ),
        Error::Internal(e) => {
            error!("Internal error: {:?}", e);
            (
                StatusCode::INTERNAL_SERVER_ERROR,
                "internal_error",
                "An internal error occurred"
            )
        }
    };

    Json(ErrorResponse {
        error: ErrorDetails {
            code: status.as_u16(),
            error_type: error_type.to_string(),
            message: message.to_string(),
            request_id: request_id(),
            timestamp: Utc::now(),
        }
    }).into_response()
}

Error Transformation

error_transformation:
  # Map backend errors to client-friendly messages
  mappings:
        - backend_error: "CUDA out of memory"
      client_error: "Model temporarily unavailable, please retry"
      status: 503

        - backend_error: "Model not loaded"
      client_error: "Model initialization in progress"
      status: 503
      retry_after: 30

  # Hide sensitive information
  sanitization:
    remove_stack_traces: true
    remove_internal_ips: true
    remove_credentials: true

Error Hooks

error_hooks:
  # Pre-error hooks
  pre_error:
        - log_error
        - capture_metrics
        - notify_monitoring

  # Post-error hooks
  post_error:
        - cleanup_resources
        - update_circuit_breaker
        - trigger_failover

  # Custom handlers
  handlers:
        - type: webhook
      url: https://alerts.example.com/errors
      events: [critical_error, repeated_error]

        - type: email
      to: oncall@example.com
      events: [service_down]

Debugging Errors

Error Logging

logging:
  errors:
    level: debug  # Log all errors in detail
    include_request_body: true
    include_response_body: true
    include_headers: true

  # Structured error logging
  format:
    type: json
    fields:
      - timestamp
      - level
      - error_type
      - error_message
      - request_id
      - backend_id
      - model
      - latency
      - stack_trace

Debug Endpoints

# Get recent errors
curl http://localhost:8080/admin/errors/recent

# Get error statistics
curl http://localhost:8080/admin/errors/stats

# Get specific error details
curl http://localhost:8080/admin/errors/req_12345

# Trigger error for testing
curl -X POST http://localhost:8080/admin/debug/error \
  -d '{"type": "backend_timeout", "backend": "primary"}'

Error Tracing

tracing:
  errors:
    capture_stack_trace: true
    capture_variables: true
    capture_context: true

  # Distributed tracing
  propagation:
        - tracecontext
        - baggage

  # Error sampling
  sampling:
    all_errors: true
    error_rate_threshold: 0.01

Common Error Scenarios

Scenario 1: All Backends Down

# Detection
condition: all_backends.health_status == unhealthy

# Response
response:
  status: 503
  message: "Service temporarily unavailable"
  retry_after: 30

# Recovery
recovery:
    - increase_health_check_frequency
    - notify_oncall
    - attempt_backend_restart
    - switch_to_backup_region

Scenario 2: Model Not Found

# Detection
condition: requested_model not in available_models

# Response
response:
  status: 404
  message: "Model '{model}' not found"
  suggestions: similar_models

# Mitigation
mitigation:
    - check_model_aliases
    - refresh_model_cache
    - try_alternative_backends

Scenario 3: Rate Limit Exceeded

# Detection
condition: request_count > rate_limit

# Response
response:
  status: 429
  retry_after: calculate_backoff()
  headers:
    X-RateLimit-Remaining: 0

# Handling
handling:
    - queue_request_if_premium
    - suggest_upgrade_plan
    - apply_exponential_backoff

Scenario 4: Timeout

# Detection
condition: request_duration > timeout

# Response
response:
  status: 504
  message: "Request timeout"

# Mitigation
mitigation:
    - try_with_reduced_max_tokens
    - switch_to_faster_backend
    - enable_streaming_if_possible

Scenario 5: Backend Mismatch

# Detection
condition: backend_response_format != expected_format

# Response
response:
  status: 502
  message: "Invalid backend response"

# Recovery
recovery:
    - log_response_for_debugging
    - mark_backend_as_degraded
    - retry_with_different_backend
    - update_backend_adapter

Scenario 6: File Resolution Failure

# Detection
condition: file_reference_not_found OR invalid_file_id

# Response
response:
  status: 404  # For file not found
  # or
  status: 400  # For invalid file ID format
  message: "Failed to resolve file reference"
  type: "invalid_request_error"
  code: "file_resolution_failed"

# Details
details:
    - file_id: "file-abc123"
    - reason: "file not found" OR "invalid file ID format"

# Mitigation
mitigation:
    - verify_file_was_uploaded
    - check_file_id_format_starts_with_file_prefix
    - ensure_file_not_deleted

Scenario 7: File Too Large

# Detection
condition: file_size > max_file_size

# Response
response:
  status: 413
  message: "File size exceeds maximum allowed"
  type: "invalid_request_error"

# Details
details:
  file_size: 600000000  # bytes
  max_size: 536870912   # bytes (512MB default)

# Mitigation
mitigation:
    - compress_file_before_upload
    - use_smaller_file
    - increase_max_file_size_in_config

Scenario 8: Too Many File References

# Detection
condition: file_references_count > 20

# Response
response:
  status: 400
  message: "Too many file references in request"
  type: "invalid_request_error"

# Details
details:
  count: 25
  max_allowed: 20

# Mitigation
mitigation:
    - split_request_into_multiple_calls
    - reduce_number_of_files

Error Monitoring

Metrics

# Error rate
rate(http_requests_total{status=~"5.."}[5m])

# Error rate by type
rate(errors_total[5m]) by (error_type)

# Backend error rate
rate(backend_errors_total[5m]) by (backend_id, error_type)

# Circuit breaker state
circuit_breaker_state{backend_id="primary"}

Alerts

alerts:
    - name: HighErrorRate
    condition: error_rate > 0.05
    duration: 5m
    severity: warning

    - name: AllBackendsDown
    condition: healthy_backends == 0
    duration: 1m
    severity: critical

    - name: CircuitBreakerOpen
    condition: circuit_breaker_state == "open"
    duration: 5m
    severity: warning

Best Practices

Error Design

  1. Use appropriate HTTP status codes
  2. Provide clear, actionable error messages
  3. Include request IDs for tracing
  4. Don't expose internal details to clients
  5. Log errors with sufficient context

Error Handling

  1. Implement retry logic with backoff
  2. Use circuit breakers to prevent cascading failures
  3. Set appropriate timeouts
  4. Handle errors at the appropriate layer
  5. Fail fast for non-recoverable errors

Error Recovery

  1. Implement health checks for automatic recovery
  2. Use fallback strategies for critical paths
  3. Monitor error rates and patterns
  4. Set up alerting for anomalies
  5. Document error scenarios and recovery procedures

See Also