Architecture Guide¶
This document provides a comprehensive overview of Continuum Router's architecture, design decisions, and extension points.
Table of Contents¶
- Overview
- 4-Layer Architecture
- Core Components
- Data Flow
- Dependency Injection
- Error Handling Strategy
- Extension Points
- Design Decisions
- Performance Considerations
- Rate Limiting → Configuration
- Model Fallback System → Error Handling
- Circuit Breaker → Error Handling
- File Storage → Architecture Details
Overview¶
Continuum Router is designed as a high-performance, production-ready LLM API router using a clean 4-layer architecture that provides clear separation of concerns, testability, and maintainability. The architecture follows Domain-Driven Design principles and dependency inversion to create a robust, extensible system.
Architecture Goals¶
- Separation of Concerns: Each layer has a single, well-defined responsibility
- Dependency Inversion: Higher layers depend on abstractions, not concrete implementations
- Testability: Each component can be unit tested in isolation
- Extensibility: New features can be added without modifying existing code
- Performance: Minimal overhead while maintaining clean architecture
- Reliability: Fail-fast design with comprehensive error handling
4-Layer Architecture¶
┌─────────────────────────────────────────────────────────────────┐
│ HTTP Layer │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Routes │ │ Middleware │ │ Handlers │ │
│ │ │ │ │ │ │ │
│ │ • /v1/models │ │ • Logging │ │ • Streaming │ │
│ │ • /v1/chat/* │ │ • Metrics │ │ • Responses API │ │
│ │ • /v1/responses │ │ • Rate Limit │ │ • DTOs │ │
│ │ • /admin/* │ │ • Auth │ │ • Error │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Services Layer │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Backend Service │ │ Model Service │ │ Proxy Service │ │
│ │ │ │ │ │ │ │
│ │ • Pool Mgmt │ │ • Aggregation │ │ • Routing │ │
│ │ • Load Balance │ │ • Caching │ │ • Streaming │ │
│ │ • Health Check │ │ • Discovery │ │ • Retry Logic │ │
│ │ │ │ • Metadata │ │ │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Health Service │ │ Service Registry│ │ Deduplication │ │
│ │ │ │ │ │ │ │
│ │ • Monitoring │ │ • Lifecycle │ │ • Cache │ │
│ │ • Status Track │ │ • Dependencies │ │ • Request Hash │ │
│ │ • Recovery │ │ • Container │ │ • TTL Mgmt │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
│ │
│ ┌─────────────────┐ │
│ │ File Service │ See: architecture/file-storage.md │
│ │ │ │
│ │ • Upload/Delete │ │
│ │ • Metadata Mgmt │ │
│ │ • Persistence │ │
│ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Infrastructure Layer │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Backends │ │ Cache │ │ Common │ │
│ │ │ │ │ │ │ │
│ │ • OpenAI │ │ • LRU Cache │ │ • HTTP Client │ │
│ │ • Anthropic │ │ • TTL Cache │ │ • Executor │ │
│ │ • Gemini │ │ • Retry Cache │ │ • Statistics │ │
│ │ • vLLM │ │ │ │ • URL Validator │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Configuration │ │ Backend Pool │ │ Backend Factory │ │
│ │ │ │ │ │ │ │
│ │ • File Watcher │ │ • Pool Mgmt │ │ • Create Backend │ │
│ │ • Env Override │ │ • Connection │ │ • Type Detection │ │
│ │ • Validation │ │ • Pre-warming │ │ • Config Parsing │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Core Layer │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Models │ │ Traits │ │ Errors │ │
│ │ │ │ │ │ │ │
│ │ • Backend │ │ • BackendTrait │ │ • CoreError │ │
│ │ • Model │ │ • ServiceTrait │ │ • RouterError │ │
│ │ • Request │ │ • CacheTrait │ │ • ErrorSeverity │ │
│ │ • Response │ │ • HealthTrait │ │ • ErrorDetail │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Retry Logic │ │ Configuration │ │ Container │ │
│ │ │ │ │ │ │ │
│ │ • Policies │ │ • Models │ │ • DI Container │ │
│ │ • Strategies │ │ • Validation │ │ • Service Mgmt │ │
│ │ • Backoff │ │ • Defaults │ │ • Lifecycle │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
│ │
│ ┌─────────────────┐ │
│ │ Circuit Breaker │ │
│ │ │ │
│ │ • State Machine │ │
│ │ • Failure Track │ │
│ │ • Auto Recovery │ │
│ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Layer Descriptions¶
1. HTTP Layer (src/http/)¶
Responsibility: Handle HTTP requests, responses, and web-specific concerns
Components¶
- Routes (
routes.rs): Define HTTP endpoints and route handling - Middleware (
middleware/): Cross-cutting concerns (auth, logging, metrics, rate limiting) - DTOs (
dto/): Data Transfer Objects for HTTP serialization/deserialization - Streaming (
streaming/): Server-Sent Events (SSE) handling
Key Files¶
src/http/
├── mod.rs # HTTP layer exports
├── routes.rs # Route definitions and handlers
├── dto.rs # Request/Response DTOs
├── handlers/ # Request handlers
│ ├── mod.rs
│ └── responses.rs # Responses API handlers
├── middleware/ # HTTP middleware components
│ ├── mod.rs
│ ├── auth.rs # API key authentication middleware
│ ├── admin_auth.rs # Admin API authentication middleware
│ ├── files_auth.rs # Files API authentication middleware
│ ├── admin_audit.rs # Admin operations audit logging
│ ├── logging.rs # Request/response logging
│ ├── metrics.rs # Metrics collection
│ ├── metrics_auth.rs # Metrics endpoint authentication
│ ├── model_extractor.rs # Model extraction from requests
│ ├── prometheus.rs # Prometheus metrics integration
│ ├── rate_limit.rs # Rate limiting middleware (legacy)
│ └── rate_limit_v2/ # Enhanced rate limiting (modular)
│ ├── mod.rs # Module exports
│ ├── middleware.rs # Rate limiting middleware
│ ├── store.rs # Rate limit storage and tracking
│ └── token_bucket.rs # Token bucket algorithm
└── streaming/ # SSE streaming handlers
├── mod.rs
└── handler.rs # Streaming response handling
Middleware Components¶
The HTTP layer includes several middleware components that provide cross-cutting concerns:
-
auth.rs: API key authentication for main endpoints (
/v1/chat/completions,/v1/models, etc.)- Validates API keys from
Authorization: Bearer <key>header - Supports multiple API keys configured in
config.yaml - Returns 401 Unauthorized for invalid/missing keys
- Validates API keys from
-
admin_auth.rs: Separate authentication for admin endpoints (
/admin/*)- Uses dedicated admin API keys distinct from user API keys
- Protects sensitive operations (config reload, circuit breaker control, health management)
- Configurable via
admin.api_keysin configuration
-
files_auth.rs: Authentication middleware for Files API (
/v1/files/*)- Validates API keys specifically for file upload/download/deletion operations
- Prevents unauthorized file access and manipulation
- Integrates with file storage service for permission checks
-
admin_audit.rs: Audit logging middleware for admin operations
- Records all admin API calls with timestamps and caller identification
- Logs parameters and outcomes of sensitive operations
- Provides audit trail for compliance and security monitoring
- Configurable log levels and retention policies
-
ratelimitv2/: Enhanced rate limiting system (see Rate Limiting section)
- Token bucket algorithm with per-client tracking
- Separate limits for sustained rate and burst protection
- Automatic cleanup of expired client entries
- Detailed metrics for monitoring
2. Services Layer (src/services/)¶
Responsibility: Orchestrate business logic and coordinate between infrastructure components
Components¶
- Backend Service (
backend_service.rs): Manage backend pool, load balancing, health checks - Model Service (
model_service.rs): Aggregate models from backends, handle caching, enrich with metadata - Proxy Service (
proxy_service.rs): Route requests, handle retries, manage streaming - Health Service (
health_service.rs): Monitor service health, track status - Service Registry (
mod.rs): Manage service lifecycle and dependencies
Key Files¶
src/services/
├── mod.rs # Service registry and management
├── backend_service.rs # Backend management service
├── model_service.rs # Model aggregation service
├── proxy_service.rs # Request proxying and routing
├── health_service.rs # Health monitoring service
├── deduplication.rs # Request deduplication service
├── responses/ # Responses API support
│ ├── mod.rs
│ ├── converter.rs # Response format conversion
│ ├── session.rs # Session management
│ └── streaming.rs # Streaming response handling
└── streaming/ # Streaming utilities
├── mod.rs
├── parser.rs # Stream parsing logic
└── transformer.rs # Stream transformation (OpenAI/Anthropic)
3. Infrastructure Layer (src/infrastructure/)¶
Responsibility: Provide concrete implementations of external systems and technical capabilities
Components¶
- Backends (
backends/): Specific backend implementations (OpenAI, vLLM, Ollama) - Cache (
cache/): Caching implementations (LRU, TTL-based) - Configuration (
config/): Configuration loading, watching, validation - HTTP Client (
http_client.rs): HTTP client management and optimization
Key Files¶
src/infrastructure/
├── mod.rs # Infrastructure exports and utilities
├── backends/ # Backend implementations
│ ├── mod.rs
│ ├── anthropic/ # Native Anthropic Claude backend
│ │ ├── mod.rs # Backend implementation & request transformation
│ │ └── stream.rs # SSE stream transformer (Anthropic → OpenAI)
│ ├── gemini/ # Native Google Gemini backend
│ │ ├── mod.rs # Backend implementation with TTFB optimization
│ │ └── stream.rs # SSE stream transformer (Gemini → OpenAI)
│ ├── openai/ # OpenAI-compatible backend
│ │ ├── mod.rs
│ │ ├── backend.rs # OpenAI backend implementation
│ │ └── models/ # OpenAI-specific model definitions
│ ├── factory/ # Backend factory pattern
│ │ ├── mod.rs
│ │ └── backend_factory.rs # Creates backends from config
│ ├── pool/ # Backend pooling and management
│ │ ├── mod.rs
│ │ ├── backend_pool.rs # Connection pool management
│ │ └── backend_manager.rs # Backend lifecycle management
│ ├── generic/ # Generic backend implementations
│ │ └── mod.rs
│ └── vllm.rs # vLLM backend implementation
├── common/ # Shared infrastructure utilities
│ ├── mod.rs
│ ├── executor.rs # Request execution with retry/metrics
│ ├── headers.rs # HTTP header utilities
│ ├── http_client.rs # HTTP client factory with pooling
│ ├── statistics.rs # Backend statistics collection
│ └── url_validator.rs # URL validation and security
├── cache/ # Caching implementations
│ ├── mod.rs
│ ├── lru_cache.rs # LRU cache implementation
│ └── retry_cache.rs # Retry-aware cache
├── config/ # Configuration management
│ ├── mod.rs
│ ├── loader.rs # Configuration loading
│ ├── validator.rs # Configuration validation
│ ├── timeout_validator.rs # Timeout configuration validation
│ ├── watcher.rs # File watching for hot-reload
│ ├── migrator.rs # Configuration migration orchestrator
│ ├── migration.rs # Migration types and traits
│ ├── migrations.rs # Specific migration implementations
│ ├── fixer.rs # Auto-correction logic
│ ├── backup.rs # Backup management
│ └── secrets.rs # Secret/API key management
└── lock_optimization.rs # Lock and concurrency optimization
4. Core Layer (src/core/)¶
Responsibility: Define domain models, business rules, and fundamental abstractions
Components¶
- Models (
models/): Core domain entities (Backend, Model, Request, Response) - Traits (
traits.rs): Core interfaces and contracts - Errors (
errors.rs): Domain-specific error types and handling - Retry (
retry/): Retry policies and strategies - Container (
container.rs): Dependency injection container
Key Files¶
src/core/
├── mod.rs # Core exports and utilities
├── models/ # Domain models
│ ├── mod.rs
│ ├── backend.rs # Backend domain model
│ ├── model.rs # LLM model representation
│ ├── request.rs # Request models
│ └── responses.rs # Response models (Responses API)
├── traits.rs # Core traits and interfaces
├── errors.rs # Error types and handling
├── container.rs # Dependency injection container
├── async_utils.rs # Async utility functions
├── duration_utils.rs # Duration parsing utilities
├── streaming/ # Streaming models
│ ├── mod.rs
│ └── models.rs # Streaming-specific models
├── retry/ # Retry mechanisms
│ ├── mod.rs
│ ├── policy.rs # Retry policies
│ └── strategy.rs # Retry strategies
├── circuit_breaker/ # Circuit breaker pattern
│ ├── mod.rs # Module exports
│ ├── config.rs # Configuration models
│ ├── state.rs # State machine and breaker logic
│ ├── error.rs # Circuit breaker errors
│ ├── metrics.rs # Prometheus metrics
│ └── tests.rs # Unit tests
├── files/ # File processing utilities
│ ├── mod.rs # Module exports
│ ├── resolver.rs # File reference resolution in chat requests
│ ├── transformer.rs # Message transformation with file content
│ └── transformer_utils.rs # Transformation utility functions
└── config/ # Configuration models
├── mod.rs
├── models/ # Configuration data models (modular structure)
│ ├── mod.rs # Re-exports for backward compatibility
│ ├── config.rs # Main Config struct, ServerConfig, BackendConfig
│ ├── backend_type.rs # BackendType enum definitions
│ ├── model_metadata.rs # ModelMetadata, PricingInfo, CapabilityInfo
│ ├── global_prompts.rs # GlobalPrompts configuration
│ ├── samples.rs # Sample generation configurations
│ ├── validation.rs # Configuration validation logic
│ └── error.rs # Configuration-specific errors
├── timeout_models.rs # Timeout configuration models
├── cached_timeout.rs # Cached timeout resolution
├── optimized_retry.rs # Optimized retry configuration
├── metrics.rs # Metrics configuration
└── rate_limit.rs # Rate limit configuration
Core Components¶
Backend Pool¶
Location: src/backend.rs (legacy) → src/services/backend_service.rs
Purpose: Manages multiple LLM backends with intelligent load balancing
pub struct BackendPool {
backends: Arc<RwLock<Vec<Backend>>>,
load_balancer: LoadBalancingStrategy,
health_checker: Option<Arc<HealthChecker>>,
}
impl BackendPool {
// Round-robin load balancing with health awareness
pub async fn select_backend(&self) -> Option<Backend> { /* ... */ }
// Filter backends by model availability
pub async fn backends_for_model(&self, model: &str) -> Vec<Backend> { /* ... */ }
}
Health Checker¶
Location: src/health.rs → src/services/health_service.rs
Purpose: Monitor backend health with configurable thresholds and automatic recovery
pub struct HealthChecker {
backends: Arc<RwLock<Vec<Backend>>>,
config: HealthConfig,
status_map: Arc<RwLock<HashMap<String, HealthStatus>>>,
}
pub struct HealthConfig {
pub interval: Duration,
pub timeout: Duration,
pub unhealthy_threshold: u32, // Failures before marking unhealthy
pub healthy_threshold: u32, // Successes before marking healthy
}
Model Aggregation Service¶
Location: src/models/ (modular structure)
Purpose: Aggregate and cache model information from all backends, enrich with metadata
Module Structure (refactored from single models.rs file):
src/models/
├── mod.rs # Re-exports for backward compatibility
├── types.rs # Model, AggregatedModel, ModelList types
├── metrics.rs # ModelMetrics tracking
├── cache.rs # ModelCache implementation
├── config.rs # ModelAggregationConfig
├── fetcher.rs # Model fetching from backends
├── handlers.rs # HTTP handlers for /v1/models endpoint
├── utils.rs # Utility functions (normalize_model_id, etc.)
└── aggregation/ # Core aggregation logic
├── mod.rs # ModelAggregationService implementation
└── tests.rs # Unit tests
pub struct ModelAggregationService {
cache: Arc<RwLock<ModelCache>>,
config: ModelAggregationConfig,
backends: Arc<BackendPool>,
}
impl ModelAggregationService {
// Aggregate models from all healthy backends
pub async fn get_aggregated_models(&self) -> Result<ModelList, Error> { /* ... */ }
// Enrich models with metadata from config
pub fn merge_config_metadata(&self, models: &mut Vec<Model>) { /* ... */ }
// Cache with TTL and deduplication
pub async fn refresh_models(&self) -> Result<(), Error> { /* ... */ }
}
Proxy Module¶
Location: src/proxy/ (modular structure)
Purpose: Handle request proxying, backend selection, file resolution, and image generation/editing
Module Structure (refactored from single proxy.rs file):
src/proxy/
├── mod.rs # Re-exports for backward compatibility
├── backend.rs # Backend selection and routing logic
├── request.rs # Request execution with retry logic
├── files.rs # File reference resolution in requests
├── image_gen.rs # Image generation handling (DALL-E, Gemini, GPT Image)
├── image_edit.rs # Image editing support (/v1/images/edits)
├── image_utils.rs # Image processing utilities (multipart, validation)
├── handlers.rs # HTTP handlers for proxy endpoints
├── utils.rs # Utility functions (error responses, etc.)
└── tests.rs # Unit tests
Key Responsibilities¶
- Backend Selection: Intelligent routing to available backends
- File Resolution: Resolve file references in chat requests
- Image Generation: Support for OpenAI (DALL-E, GPT Image) and Gemini (Nano Banana) image models
- Image Editing: Image editing and variations endpoints
- Request Retry: Automatic retry with exponential backoff
- Error Handling: Standardized error responses in OpenAI format
Retry Handler¶
Location: src/services/deduplication.rs
Purpose: Implement exponential backoff with jitter and request deduplication
pub struct EnhancedRetryHandler {
config: RetryConfig,
dedup_cache: Arc<Mutex<HashMap<String, CachedResponse>>>,
dedup_ttl: Duration,
}
pub struct RetryConfig {
pub max_attempts: u32,
pub base_delay: Duration,
pub max_delay: Duration,
pub exponential_backoff: bool,
pub jitter: bool,
}
Circuit Breaker¶
Location: src/core/circuit_breaker/
Purpose: Prevent cascading failures by automatically stopping requests to failing backends
pub struct CircuitBreaker {
states: Arc<DashMap<String, BackendCircuitState>>,
config: CircuitBreakerConfig,
metrics: Option<CircuitBreakerMetrics>,
}
pub struct CircuitBreakerConfig {
pub enabled: bool,
pub failure_threshold: u32, // Failures before opening (default: 5)
pub failure_rate_threshold: f64, // Failure rate threshold (default: 0.5)
pub minimum_requests: u32, // Min requests before rate calculation
pub timeout_seconds: u64, // How long circuit stays open (default: 60s)
pub half_open_max_requests: u32, // Max requests in half-open state
pub half_open_success_threshold: u32, // Successes needed to close
}
pub enum CircuitState {
Closed, // Normal operation - requests pass through
Open, // Failing fast - requests rejected immediately
HalfOpen, // Testing recovery - limited requests allowed
}
Key Features¶
- Per-backend circuit breakers with independent state
- Atomic operations for lock-free state checking in hot path
- Automatic state transitions based on success/failure patterns
- Sliding window for failure rate calculation
- Prometheus metrics for observability
- Admin endpoints for manual control
Container (Dependency Injection)¶
Location: src/core/container.rs
Purpose: Manage service lifecycles and dependencies
pub struct Container {
services: Arc<RwLock<HashMap<TypeId, Box<dyn Any + Send + Sync>>>>,
singletons: Arc<RwLock<HashMap<TypeId, Arc<dyn Any + Send + Sync>>>>,
}
impl Container {
// Register singleton service
pub async fn register_singleton<T>(&self, instance: Arc<T>) -> CoreResult<()>
where T: 'static + Send + Sync { /* ... */ }
// Resolve service dependency
pub async fn resolve<T>(&self) -> CoreResult<Arc<T>>
where T: 'static + Send + Sync { /* ... */ }
}
Data Flow¶
Request Processing Flow¶
sequenceDiagram
participant Client
participant HTTPLayer as HTTP Layer
participant ProxyService as Proxy Service
participant BackendService as Backend Service
participant ModelService as Model Service
participant Backend as LLM Backend
Client->>HTTPLayer: POST /v1/chat/completions
HTTPLayer->>HTTPLayer: Apply Middleware (auth, logging, metrics)
HTTPLayer->>ProxyService: Forward Request
ProxyService->>ModelService: Get Model Info
ModelService->>ModelService: Check Cache
alt Cache Miss
ModelService->>BackendService: Get Backends for Model
BackendService->>Backend: Query Models
Backend-->>BackendService: Model List
BackendService-->>ModelService: Filtered Backends
ModelService->>ModelService: Update Cache
end
ModelService-->>ProxyService: Model Available on Backends
ProxyService->>BackendService: Select Healthy Backend
BackendService->>BackendService: Apply Load Balancing
BackendService-->>ProxyService: Selected Backend
ProxyService->>Backend: Forward Request
Backend-->>ProxyService: Response (streaming or non-streaming)
ProxyService->>ProxyService: Apply Response Processing
ProxyService-->>HTTPLayer: Processed Response
HTTPLayer-->>Client: HTTP Response Health Check Flow¶
sequenceDiagram
participant HealthService as Health Service
participant BackendPool as Backend Pool
participant Backend as LLM Backend
participant Cache as Health Cache
loop Every Interval
HealthService->>BackendPool: Get All Backends
BackendPool-->>HealthService: Backend List
par For Each Backend
HealthService->>Backend: GET /v1/models (or /health)
alt Success
Backend-->>HealthService: 200 OK + Model List
HealthService->>Cache: Update: consecutive_successes++
HealthService->>HealthService: Mark Healthy if threshold met
else Failure
Backend-->>HealthService: Error/Timeout
HealthService->>Cache: Update: consecutive_failures++
HealthService->>HealthService: Mark Unhealthy if threshold met
end
end
HealthService->>BackendPool: Update Backend Health Status
end Hot Reload Service¶
Location: src/infrastructure/config/hot_reload.rs, src/services/hot_reload_service.rs
Purpose: Provide runtime configuration updates without server restart
The hot reload system enables zero-downtime configuration changes through automatic file watching and intelligent component updates.
Key Architecture Components¶
- ConfigManager: File system watching using
notifycrate, publishes updates viatokio::sync::watchchannel - HotReloadService: Computes configuration differences, classifies changes (immediate/gradual/restart)
- Component Updates: Interior mutability patterns (RwLock) for atomic updates to HealthChecker, CircuitBreaker, RateLimitStore
Change Classification¶
- Immediate Update: logging.level, rate_limiting., circuit_breaker., retry., global_prompts.
- Gradual Update: backends., health_checks., timeouts.*
- Requires Restart: server.bind_address, server.workers
Admin API: /admin/config/hot-reload-status for inspecting hot reload capabilities
For detailed hot reload configuration, process flow, and usage examples, see configuration.md section on hot reload.
Configuration Migration System¶
Location: src/infrastructure/config/{migrator,migration,migrations,fixer,backup}.rs
Purpose: Automatically detect and fix configuration issues, migrate schemas, and ensure configuration validity
The configuration migration system provides a comprehensive solution for handling configuration evolution and maintenance. It automatically: - Detects and migrates outdated schema versions - Fixes common syntax errors in YAML/TOML files - Validates and corrects configuration values - Creates backups before making changes - Provides dry-run capability for previewing changes
Architecture Components¶
1. Migration Orchestrator (migrator.rs) - Main entry point for migration operations - Coordinates the entire migration workflow - Manages backup creation and restoration - Implements security validations (path traversal, file size limits)
2. Migration Framework (migration.rs) - Defines core types and traits for migrations - Migration trait for implementing version upgrades - ConfigIssue enum for categorizing problems - MigrationResult for tracking changes
3. Schema Migrations (migrations.rs) - Concrete migration implementations (e.g., V1ToV2Migration) - Transforms configuration structure between versions - Example: Converting backend_url to backends array
4. Auto-Correction Engine (fixer.rs) - Detects and fixes common configuration errors - Duration format correction (e.g., "10 seconds" → "10s") - URL validation and protocol addition - Field deprecation handling
5. Backup Manager (backup.rs) - Creates timestamped backups before modifications - Implements resource limits (10MB per file, 100MB total, max 50 backups) - Automatic cleanup of old backups - Preserves file permissions
Migration Workflow¶
graph TD
A[Read Config File] --> B[Validate Path & Size]
B --> C[Create Backup]
C --> D[Parse Configuration]
D --> E{Parse Success?}
E -->|No| F[Fix Syntax Errors]
F --> D
E -->|Yes| G[Detect Schema Version]
G --> H{Needs Migration?}
H -->|Yes| I[Apply Migrations]
H -->|No| J[Validate Values]
I --> J
J --> K{Issues Found?}
K -->|Yes| L[Apply Auto-Fixes]
K -->|No| M[Return Config]
L --> N[Write Updated Config]
N --> M Security Features¶
- Path Traversal Protection: Validates paths to prevent directory traversal attacks
- File Size Limits: Maximum 10MB configuration files to prevent DoS
- Format Validation: Only processes .yaml, .yml, and .toml files
- System Directory Protection: Blocks access to sensitive system paths
- Test Mode Relaxation: Uses conditional compilation for test-friendly validation
Example Migration: v1.0 to v2.0¶
// V1ToV2Migration implementation
fn migrate(&self, config: &mut Value) -> Result<(), MigrationError> {
// Convert single backend_url to backends array
if let Some(backend_url) = config.get("backend_url") {
let mut backends = Vec::new();
let mut backend = Map::new();
backend.insert("url".to_string(), backend_url.clone());
// Move models to backend
if let Some(model) = config.get("model") {
backend.insert("models".to_string(),
Value::Sequence(vec![model.clone()]));
}
backends.push(Value::Mapping(backend));
config["backends"] = Value::Sequence(backends);
// Remove old fields
config.remove("backend_url");
config.remove("model");
}
Ok(())
}
Configuration Loading Flow¶
graph TD
A[Application Start] --> B[Config Manager Init]
B --> C{Config File Specified?}
C -->|Yes| D[Load Specified File]
C -->|No| E[Search Standard Locations]
E --> F{Config File Found?}
F -->|Yes| G[Load Config File]
F -->|No| H[Use CLI Args + Env Vars + Defaults]
D --> I[Parse YAML]
G --> I
H --> J[Create Config from Args]
I --> K[Apply Environment Variable Overrides]
J --> K
K --> L[Apply CLI Argument Overrides]
L --> M[Validate Configuration]
M --> N{Valid?}
N -->|Yes| O[Return Config]
N -->|No| P[Exit with Error]
O --> Q[Start File Watcher for Hot Reload]
Q --> R[Application Running]
Q --> S[Config File Changed]
S --> T[Reload and Validate]
T --> U{Valid?}
U -->|Yes| V[Apply New Config]
U -->|No| W[Log Error, Keep Old Config]
V --> R
W --> R Dependency Injection¶
Service Registration¶
Services are registered in the container during application startup:
// In main.rs
async fn setup_services(config: Config) -> Result<ServiceRegistry, Error> {
let container = Arc::new(Container::new());
// Register infrastructure services
container.register_singleton(Arc::new(
HttpClient::new(&config.http_client)?
)).await?;
container.register_singleton(Arc::new(
BackendManager::new(&config.backends)?
)).await?;
// Register core services
container.register_singleton(Arc::new(
BackendServiceImpl::new(container.clone())
)).await?;
container.register_singleton(Arc::new(
ModelServiceImpl::new(container.clone())
)).await?;
// Create service registry
let registry = ServiceRegistry::new(container);
registry.initialize().await?;
Ok(registry)
}
Service Dependencies¶
Services declare their dependencies through constructor injection:
pub struct ProxyServiceImpl {
backend_service: Arc<dyn BackendService>,
model_service: Arc<dyn ModelService>,
retry_handler: Arc<dyn RetryHandler>,
http_client: Arc<HttpClient>,
}
impl ProxyServiceImpl {
pub fn new(container: Arc<Container>) -> CoreResult<Self> {
Ok(Self {
backend_service: container.resolve()?,
model_service: container.resolve()?,
retry_handler: container.resolve()?,
http_client: container.resolve()?,
})
}
}
Benefits¶
- Testability: Services can be mocked for unit testing
- Flexibility: Implementations can be swapped without code changes
- Lifecycle Management: Container manages service initialization and cleanup
- Circular Dependency Detection: Container prevents circular dependencies
Error Handling Strategy¶
The router implements a comprehensive error handling strategy with typed errors, intelligent recovery, and user-friendly responses.
Error Type Hierarchy¶
- CoreError: Domain-level errors (validation, service failures, timeouts, configuration)
- RouterError: Application-level errors combining Core, HTTP, Backend, and Model errors
- HttpError: HTTP-specific errors (400 BadRequest, 401 Unauthorized, 404 NotFound, 500 InternalServerError, etc.)
Error Handling Principles¶
- Fail Fast: Validate inputs early with clear error messages
- Error Context: Include relevant context (field names, operation details)
- Retryable Classification: Distinguish between retryable (timeout, 503) and non-retryable (400, 401) errors
- User-Friendly Responses: Convert internal errors to OpenAI-compatible error format
- Structured Logging: Log errors with appropriate severity and context
Error Recovery Mechanisms¶
- Circuit Breaker: Prevent cascading failures (see Circuit Breaker)
- Retry with Exponential Backoff: Automatically retry transient failures
- Model Fallback: Route to alternative models when primary unavailable (see Model Fallback System)
- Graceful Degradation: Continue with reduced functionality when components fail
For detailed error handling, recovery strategies, monitoring, and troubleshooting, see error-handling.md.
Extension Points¶
Backend Type Architecture¶
The router supports multiple backend types with different API formats. Each backend type handles request/response transformation automatically.
Supported Backend Types¶
| Backend Type | API Format | Authentication | Use Case |
|---|---|---|---|
openai | OpenAI Chat Completions | Authorization: Bearer | OpenAI, Azure OpenAI, vLLM, LocalAI |
anthropic | Anthropic Messages API | x-api-key header | Claude models via native API |
gemini | OpenAI-compatible | Authorization: Bearer | Google Gemini via OpenAI compatibility layer |
Anthropic Backend Architecture¶
The Anthropic backend provides native support for Claude models with automatic format translation:
┌─────────────────────────────────────────────────────────────────┐
│ OpenAI Format Request │
│ POST /v1/chat/completions │
│ { "model": "claude-haiku-4-5", "messages": [...] } │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Request Transformation Layer │
│ transform_openai_to_anthropic_request() │
│ • Extract system messages → separate `system` parameter │
│ • Transform image_url → Anthropic image format │
│ • Map max_tokens / max_completion_tokens │
│ • Convert reasoning_effort → thinking parameter │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Anthropic Messages API │
│ POST https://api.anthropic.com/v1/messages │
│ Headers: x-api-key, anthropic-version: 2023-06-01 │
│ { "model": "...", "system": "...", "messages": [...] } │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ AnthropicStreamTransformer │
│ SSE Event Transformation (Anthropic → OpenAI format) │
│ • message_start → initial chunk with role │
│ • content_block_delta → content chunks │
│ • thinking_delta → reasoning_content (extended thinking) │
│ • message_delta → finish_reason mapping │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ OpenAI Format Response │
│ data: {"choices":[{"delta":{"content":"..."}}]} │
└─────────────────────────────────────────────────────────────────┘
Key Transformations¶
Request Format Differences:
| Aspect | OpenAI Format | Anthropic Format |
|---|---|---|
| System prompt | messages[0].role="system" | Separate system parameter |
| Auth header | Authorization: Bearer | x-api-key |
| Max tokens | Optional | Required (max_tokens) |
| Images | image_url.url | source.type + source.data |
Extended Thinking Support:
// OpenAI reasoning_effort → Anthropic thinking
{
"reasoning_effort": "high" // OpenAI format
}
// Transforms to:
{
"thinking": {
"type": "enabled",
"budget_tokens": 32768 // Mapped from effort level
}
}
Adding New Backend Types¶
-
Implement Backend Trait:
// In src/infrastructure/backends/custom_backend.rs pub struct CustomBackend { client: Arc<HttpClient>, config: CustomBackendConfig, } #[async_trait] impl BackendTrait for CustomBackend { async fn health_check(&self) -> CoreResult<()> { /* ... */ } async fn list_models(&self) -> CoreResult<Vec<Model>> { /* ... */ } async fn chat_completion(&self, request: ChatRequest) -> CoreResult<Response> { /* ... */ } } -
Register in Backend Factory:
// In src/infrastructure/backends/mod.rs pub fn create_backend(backend_type: &str, config: &BackendConfig) -> CoreResult<Box<dyn BackendTrait>> { match backend_type { "openai" => Ok(Box::new(OpenAIBackend::new(config)?)), "vllm" => Ok(Box::new(VLLMBackend::new(config)?)), "custom" => Ok(Box::new(CustomBackend::new(config)?)), // New backend _ => Err(CoreError::ValidationFailed { message: format!("Unknown backend type: {}", backend_type), field: Some("backend_type".to_string()), }), } }
Adding New Middleware¶
-
Implement Middleware Trait:
// In src/http/middleware/custom_middleware.rs pub struct CustomMiddleware { config: CustomConfig, } impl<S> tower::Layer<S> for CustomMiddleware { type Service = CustomMiddlewareService<S>; fn layer(&self, inner: S) -> Self::Service { CustomMiddlewareService { inner, config: self.config.clone() } } } -
Register in HTTP Router:
Adding New Cache Types¶
-
Implement Cache Trait:
// In src/infrastructure/cache/redis_cache.rs pub struct RedisCache { client: redis::Client, ttl: Duration, } #[async_trait] impl CacheTrait for RedisCache { async fn get<T>(&self, key: &str) -> CoreResult<Option<T>> where T: DeserializeOwned { /* ... */ } async fn set<T>(&self, key: &str, value: &T, ttl: Option<Duration>) -> CoreResult<()> where T: Serialize { /* ... */ } } -
Use in Service:
Adding New Load Balancing Strategies¶
// In src/services/load_balancer.rs
pub enum LoadBalancingStrategy {
RoundRobin,
WeightedRoundRobin,
LeastConnections, // New strategy
Random,
}
impl LoadBalancingStrategy {
pub fn select_backend(&self, backends: &[Backend]) -> Option<&Backend> {
match self {
Self::RoundRobin => /* ... */,
Self::WeightedRoundRobin => /* ... */,
Self::LeastConnections => self.select_least_connections(backends),
Self::Random => /* ... */,
}
}
}
Design Decisions¶
Why 4-Layer Architecture?¶
Decision: Use a 4-layer architecture (HTTP → Services → Infrastructure → Core)
Rationale¶
- Clear Separation: Each layer has distinct responsibilities
- Testability: Layers can be tested independently
- Maintainability: Changes in one layer don't affect others
- Flexibility: Easy to swap implementations (e.g., different cache backends)
Trade-offs¶
- ✅ Pros: Clean, maintainable, testable, extensible
- ❌ Cons: More complexity, slight performance overhead
- Verdict: Benefits outweigh costs for a production system
Why Dependency Injection?¶
Decision: Use a custom DI container instead of compile-time injection
Rationale¶
- Runtime Flexibility: Can swap implementations based on configuration
- Service Lifecycle: Centralized management of service initialization/cleanup
- Testing: Easy to inject mocks and test doubles
Alternatives Considered¶
- Manual dependency passing: Too verbose and error-prone
- Compile-time DI (generics): Less flexible, harder to configure
Why Arc> for Shared State?¶
Decision: Use Arc<RwLock<T>> for shared mutable state
Rationale¶
- Reader-Writer Semantics: Multiple readers, exclusive writers
- Performance: Better than
Arc<Mutex<T>>for read-heavy workloads - Safety: Prevents data races at compile time
Alternatives Considered¶
Arc<Mutex<T>>: Simpler but worse performance for reads- Channels: Too complex for simple shared state
- Atomic types: Not suitable for complex data structures
Why async/await Throughout?¶
Decision: Use async/await for all I/O operations
Rationale¶
- Performance: Non-blocking I/O allows high concurrency
- Resource Efficiency: Lower memory usage than thread-per-request
- Ecosystem: Rust async ecosystem (Tokio, reqwest, axum) is mature
Trade-offs¶
- ✅ Pros: High performance, low resource usage, good ecosystem
- ❌ Cons: Complexity, learning curve, debugging challenges
- Verdict: Essential for high-performance network services
Why Configuration Hot-Reload?¶
Decision: Support configuration hot-reload using file watching
Rationale¶
- Zero Downtime: Update configuration without restarting
- Operations Friendly: Easy to adjust settings in production
- Development: Faster iteration during development
Implementation¶
- File system watcher detects changes
- Validate new configuration before applying
- Atomic updates to avoid inconsistent state
- Fallback to previous config on validation errors
Performance Considerations¶
Memory Management¶
- Connection Pooling: Reuse HTTP connections to reduce allocation overhead
- Smart Caching: LRU eviction prevents unbounded memory growth
- Arc Cloning: Cheap reference counting instead of deep cloning
- Streaming: Process responses in chunks to avoid loading large responses into memory
Concurrency¶
- RwLock for Read-Heavy Workloads: Multiple concurrent readers for backend pool and model cache
- Lock-Free Where Possible: Use atomics for counters and simple state
- Async Task Spawning: Background tasks for health checks and cache updates
- Bounded Channels: Prevent unbounded queuing of tasks
I/O Optimization¶
- Connection Keep-Alive: TCP connections stay open for reuse
- Streaming Responses: Forward SSE chunks without buffering
- Timeouts: Prevent hanging on slow backends
- Retry with Backoff: Avoid overwhelming failing backends
Memory Layout¶
// Optimized data structures for cache efficiency
pub struct Backend {
pub name: String, // Inline string for small names
pub url: Arc<str>, // Shared string for URL
pub weight: u32, // Compact integer
pub is_healthy: AtomicBool, // Lock-free health status
}
// Cache-friendly model storage
pub struct ModelCache {
models: HashMap<String, Arc<ModelInfo>>, // Shared model info
last_updated: AtomicU64, // Lock-free timestamp
ttl: Duration,
}
Benchmarking Results¶
Based on our benchmarks (see benches/performance_benchmarks.rs):
- Request Latency: < 5ms overhead for routing decisions
- Memory Usage: ~50MB base memory, scales linearly with backends
- Throughput: 1000+ requests/second on modest hardware
- Connection Efficiency: 100+ concurrent connections per backend with minimal memory overhead
Rate Limiting¶
The router implements sophisticated rate limiting to protect against abuse and ensure fair resource allocation across clients.
Key Features: - Dual-window approach: sustained limit (100 req/min) + burst protection (20 req/5s) - Client identification by API key (preferred) or IP address (fallback) - Per-client isolation with automatic cache cleanup - DoS prevention with short TTL for empty responses
Rate Limit V2 Architecture¶
The enhanced rate limiting system (rate_limit_v2/) provides a modular, high-performance implementation:
Module Structure¶
src/http/middleware/rate_limit_v2/
├── mod.rs # Public API and module exports
├── middleware.rs # Axum middleware integration
├── store.rs # Rate limit storage and client tracking
└── token_bucket.rs # Token bucket algorithm implementation
Components¶
- Token Bucket Algorithm (
token_bucket.rs) - Configurable bucket capacity and refill rate
- Atomic operations for lock-free token consumption
- Automatic token replenishment based on elapsed time
-
Separate buckets for sustained and burst limits
-
Rate Limit Store (
store.rs) - Per-client state tracking with
DashMapfor concurrent access - Automatic cleanup of expired client entries
- Configurable TTL for inactive clients (default: 1 hour)
-
Memory-efficient with bounded storage
-
Middleware Integration (
middleware.rs) - Extracts client identifier (API key → IP address fallback)
- Checks both sustained and burst limits before processing
- Returns HTTP 429 (Too Many Requests) with
Retry-Afterheader - Prometheus metrics for monitoring rate limit hits
Configuration Example¶
rate_limiting:
enabled: true
sustained:
max_requests: 100
window_seconds: 60
burst:
max_requests: 20
window_seconds: 5
cleanup_interval_seconds: 300
Decision Flow¶
Request arrives
↓
Extract client ID (API key or IP)
↓
Check sustained limit (100 req/min)
↓ OK
Check burst limit (20 req/5s)
↓ OK
Process request
For detailed configuration information, see configuration.md section on rate limiting.
Model Fallback System¶
The router implements a configurable model fallback system that automatically routes requests to alternative models when the primary model is unavailable.
Key Features: - Automatic fallback chain execution (e.g., gpt-4o → gpt-4-turbo → gpt-3.5-turbo) - Cross-provider fallback support with parameter translation - Integration with circuit breaker for intelligent triggering - Prometheus metrics for monitoring fallback usage
For detailed configuration and implementation, see error-handling.md section on model fallback.
Circuit Breaker¶
The router implements the circuit breaker pattern to prevent cascading failures and provide automatic failover when backends become unhealthy.
Three-State Machine:
| State | Behavior |
|---|---|
| Closed | Normal operation. Failures are counted. |
| Open | Fast-fail mode. Requests rejected immediately. |
| HalfOpen | Recovery testing. Limited requests allowed. |
Key Features: - Per-backend isolation with independent state - Lock-free atomic operations for minimal hot-path overhead - Admin endpoints for manual control (/admin/circuit/*) - Prometheus metrics for observability
For detailed configuration and implementation, see error-handling.md section on circuit breaker.
File Storage¶
The router provides OpenAI Files API compatible file storage with persistent metadata.
Key Features: - Persistent metadata storage with sidecar JSON files - Automatic recovery on server restart - Orphan file detection and cleanup - Pluggable backends (memory/persistent)
For detailed architecture and implementation, see File Storage Guide.
Image Generation Architecture¶
The router provides a unified interface for image generation across multiple backends (OpenAI GPT Image, DALL-E, and Google Gemini/Nano Banana) with automatic parameter translation.
Multi-Backend Image Generation¶
┌─────────────────────────────────────────────────────────────────┐
│ OpenAI-Compatible Request │
│ POST /v1/images/generations │
│ { "model": "...", "prompt": "...", "size": "1536x1024" } │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Model Router (image_gen.rs) │
│ • Detects model type (GPT Image, DALL-E, Nano Banana) │
│ • Routes to appropriate handler │
│ • Handles streaming vs non-streaming │
└─────────────────────────────────────────────────────────────────┘
│ │
┌──────────┘ └──────────┐
▼ ▼
┌───────────────────────────┐ ┌───────────────────────────┐
│ OpenAI Backend │ │ Gemini Backend │
│ (GPT Image, DALL-E) │ │ (Nano Banana) │
│ │ │ │
│ • Pass-through request │ │ • Convert to Gemini API │
│ • SSE streaming support │ │ • Map size → aspectRatio │
│ • output_format support │ │ • imageConfig generation │
└───────────────────────────┘ └───────────────────────────┘
OpenAI → Gemini Parameter Conversion¶
When using Nano Banana (Gemini) models, OpenAI-style parameters are automatically converted to Gemini's native format:
Size to Aspect Ratio Mapping¶
OpenAI size | Gemini aspectRatio | Gemini imageSize | Notes |
|---|---|---|---|
256x256 | 1:1 | 1K | Minimum Gemini size |
512x512 | 1:1 | 1K | Minimum Gemini size |
1024x1024 | 1:1 | 1K | Default |
1536x1024 | 3:2 | 1K | Landscape (new) |
1024x1536 | 2:3 | 1K | Portrait (new) |
1792x1024 | 16:9 | 1K | Wide landscape |
1024x1792 | 9:16 | 1K | Tall portrait |
2048x2048 | 1:1 | 2K | Pro models only |
4096x4096 | 1:1 | 4K | Pro models only |
auto | 1:1 | 1K | Default fallback |
Request Transformation¶
OpenAI Format (Input):
Gemini Format (Converted):
{
"contents": [
{
"parts": [{"text": "A serene Japanese garden"}]
}
],
"generationConfig": {
"imageConfig": {
"aspectRatio": "3:2",
"imageSize": "1K"
}
}
}
Conversion Implementation¶
The conversion is handled by src/infrastructure/backends/gemini/image_generation.rs:
pub fn convert_openai_to_gemini(request: &OpenAIImageRequest)
-> CoreResult<(String, GeminiImageRequest)>
{
// 1. Map model name
let gemini_model = map_model_to_gemini(&request.model);
// 2. Parse size to aspect ratio and size category
let parsed_size = parse_openai_size(&request.size, &request.model)?;
// 3. Build Gemini request with imageConfig
let gemini_request = GeminiImageRequest {
contents: vec![GeminiContent { parts: vec![...] }],
generation_config: Some(GeminiGenerationConfig {
image_config: Some(GeminiImageConfig {
aspect_ratio: Some(parsed_size.aspect_ratio.to_gemini_string()),
image_size: Some(parsed_size.size_category.to_gemini_image_size()),
}),
}),
};
Ok((gemini_model, gemini_request))
}
Streaming Image Generation (SSE)¶
For GPT Image models, the router supports true SSE passthrough for streaming image generation:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Client │────stream:true─▶│ Router │────stream:true─▶│ OpenAI │
│ │ │ │ │ │
│ │◀───SSE events──│ Passthrough│◀───SSE events──│ │
└─────────────┘ └─────────────┘ └─────────────┘
SSE Event Types:
| Event | Description |
|---|---|
image_generation.partial_image | Intermediate preview during generation |
image_generation.complete | Final image data |
image_generation.usage | Token usage for billing |
done | Stream completion |
Implementation (src/proxy/image_gen.rs):
async fn handle_streaming_image_generation(...) -> Result<Response, StatusCode> {
// 1. Keep stream: true in backend request
// 2. Make streaming request via bytes_stream()
// 3. Forward SSE events through tokio channel
let (tx, rx) = tokio::sync::mpsc::unbounded_channel();
tokio::spawn(async move {
let mut stream = backend_response.bytes_stream();
while let Some(chunk) = stream.next().await {
// Parse SSE format (event:/data: lines)
// Forward events to client
for line in chunk_str.lines() {
if let Some(event_type) = line.strip_prefix("event:") { ... }
if let Some(data) = line.strip_prefix("data:") {
let event = Event::default().event(event_type).data(data);
tx.send(Ok(event));
}
}
}
});
Ok(Sse::new(UnboundedReceiverStream::new(rx)).into_response())
}
GPT Image Model Features¶
The router supports enhanced parameters for GPT Image models (gpt-image-1, gpt-image-1.5, gpt-image-1-mini):
| Parameter | Description | Values |
|---|---|---|
output_format | Image file format | png, jpeg, webp |
output_compression | Compression level | 0-100 (jpeg/webp only) |
background | Transparency control | transparent, opaque, auto |
quality | Generation quality | low, medium, high, auto |
stream | Enable SSE streaming | true, false |
partial_images | Preview count | 0-3 |
Model Support Matrix¶
| Feature | GPT Image 1.5 | GPT Image 1 | GPT Image 1 Mini | DALL-E 3 | DALL-E 2 | Nano Banana | Nano Banana Pro |
|---|---|---|---|---|---|---|---|
| Streaming | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
| output_format | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
| background | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
| Custom quality | ✅ | ✅ | ✅ | standard/hd | ❌ | ❌ | ❌ |
| Image Edit | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ |
| Image Variations | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ |
| Max Resolution | 1536px | 1536px | 1536px | 1792px | 1024px | 1024px | 4096px |
Image Edit and Variations¶
The router provides OpenAI-compatible image editing and variations endpoints through /v1/images/edits and /v1/images/variations.
Image Editing (/v1/images/edits)¶
Endpoint: POST /v1/images/edits
Allows editing an existing image with a text prompt and optional mask. Supported by GPT Image models and DALL-E 2.
Request Format (multipart/form-data):
image: <file> # Original image (PNG, required)
prompt: <string> # Edit instructions (required)
mask: <file> # Optional mask image (PNG)
model: <string> # Model name (e.g., "gpt-image-1", "dall-e-2")
n: <integer> # Number of images (default: 1)
size: <string> # Output size (e.g., "1024x1024")
response_format: <string> # "url" or "b64_json"
Implementation (src/proxy/image_edit.rs): - Multipart form parsing for image and mask files - Image validation (format, size, aspect ratio) - Model-specific parameter transformation - Proper error handling for invalid inputs
Supported Features¶
- Transparent PNG mask support for targeted editing
- Multiple image generation (n parameter)
- Flexible output sizes
- Both URL and base64 response formats
Image Variations (/v1/images/variations)¶
Endpoint: POST /v1/images/variations
Creates variations of a given image. Supported by DALL-E 2 only.
Request Format (multipart/form-data):
image: <file> # Source image (PNG, required)
model: <string> # Model name (default: "dall-e-2")
n: <integer> # Number of variations (default: 1, max: 10)
size: <string> # Output size ("256x256", "512x512", "1024x1024")
response_format: <string> # "url" or "b64_json"
Implementation (src/proxy/image_edit.rs): - Image file validation and preprocessing - DALL-E 2-specific routing - Error handling for unsupported models - Consistent response formatting
Key Features¶
- Generate multiple variations in a single request
- Automatic image format validation
- Standard OpenAI response format compatibility
Image Utilities Module¶
The image_utils.rs module provides shared utilities for image processing:
Functions¶
validate_image_format(): Validates PNG/JPEG format and dimensionsparse_multipart_image_request(): Extracts images from multipart formscheck_image_dimensions(): Validates size constraintsformat_image_error_response(): Standardized error responses
Validation Rules¶
- Maximum file size: 4MB (configurable)
- Supported formats: PNG (required for edits/variations), JPEG (generation only)
- Aspect ratio constraints per model
- Transparent PNG requirement for masks
This architecture provides a solid foundation for building a production-ready LLM router that can scale to handle thousands of requests while remaining maintainable and extensible. The clean separation of concerns makes it easy to add new features, swap implementations, and thoroughly test each component.