Changelog¶
All notable changes to Continuum Router are documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[0.22.0] - 2025-12-19¶
Added¶
- Docker Support with Pre-built Binaries - Add Dockerfile and Dockerfile.alpine that download pre-built binaries from GitHub Releases (#189)
- Debian Bookworm-based image (~50MB) for general use
- Alpine 3.20-based image (~10MB) for minimal deployments
- Multi-architecture support (linux/amd64, linux/arm64) using TARGETARCH
- VERSION build argument for selecting release version
- Non-root user execution for security
- OCI labels for image metadata
- Container Health Check CLI - Implement
--health-checkCLI argument for container orchestration (#189)- Returns exit code 0 if server is healthy, 1 if unhealthy
- Optional
--health-check-urlfor custom health endpoint - Proper IPv6 address handling
- 5-second default timeout
- Docker Compose Quick Start - Add docker-compose.yml for easy deployment (#189)
- Volume mount for configuration
- Environment variable support (RUST_LOG)
- Resource limits and health checks
- Automated Docker Image Publishing - Add Docker build and push to ghcr.io in release workflow (#189)
- Builds both Debian and Alpine images after binary release
- Multi-platform support (linux/amd64, linux/arm64)
- Automatic tagging with semver (VERSION, MAJOR.MINOR, latest)
- Alpine images tagged with -alpine suffix
- GitHub Actions cache for faster builds
- MkDocs Documentation Website - Build comprehensive documentation site with Material theme (#183)
- Full navigation structure with Getting Started, Features, Operations, and Development sections
- GitHub Actions workflow for automatic deployment to GitHub Pages
- Custom stylesheets and theme configuration
- Korean Documentation Translation - Complete Korean localization of all documentation (#190)
- All 20 documentation files translated to Korean
- Language switcher in navigation (English/Korean)
- Multi-language build in GitHub Actions workflow
- Dependency Security Auditing - Add cargo-deny for vulnerability scanning (#192)
- Security advisory checking in CI workflow
- License compliance verification
- Dependency source validation
- Dependabot Integration - Automated dependency updates for Cargo and GitHub Actions (#192)
- Security Policy - Add comprehensive SECURITY.md with vulnerability reporting process (#191)
Changed¶
- Integrate orphaned architecture documentation into MkDocs site (#186)
- Rename documentation files to lowercase kebab-case for URL-friendly filenames (#186)
- Update various GitHub Actions to latest versions (checkout@v6, setup-python@v6, upload-artifact@v6, etc.)
Fixed¶
- Health check response validation logic bug (operator precedence issue)
- Address parsing fallback that was silently hiding configuration errors
- IPv6 address formatting in health check (now correctly uses bracket notation)
Security¶
- Updated reqwest 0.11→0.12, prometheus 0.13→0.14, validator 0.18→0.20
- Replaced dotenv with dotenvy for better maintenance
- Added .dockerignore to exclude sensitive files from build context
[0.21.0] - 2025-12-19¶
Added¶
- Gemini 3 Flash Preview Model - Add support for gemini-3-flash-preview model (#168)
- Backend Error Passthrough - Pass through detailed error messages from backends for 4xx responses (#177)
- Parse and forward original error messages from OpenAI, Anthropic, and Gemini backends
- Preserve
paramfield when available (useful for invalid parameter errors) - Falls back to generic error message if backend response cannot be parsed
- Error format remains OpenAI-compatible
- Comprehensive unit tests for error parsing across all backend formats
- Default Authentication Mode for API Endpoints - Configurable authentication enforcement for API endpoints (#173)
- New
modefield inapi_keysconfiguration:permissive(default) orblocking permissivemode: Requests without API key are allowed (backward compatible)blockingmode: Only authenticated requests are processed, unauthenticated requests receive 401- Protected endpoints:
/v1/chat/completions,/v1/completions,/v1/responses,/v1/images/*,/v1/models - Health endpoints (
/health,/healthz) always accessible without authentication - Hot reload support for authentication mode changes
- Comprehensive integration tests for both modes
- Updated API.md, configuration.md, and manpage documentation
- New
Fixed¶
- UTF-8 Multi-byte Character Corruption - Handle UTF-8 multi-byte character corruption in streaming responses (#179)
- GPT Image response_format - Strip response_format parameter for GPT Image models (#176)
- Auto-discovery Validation - Allow auto-discovery for all backends except Anthropic (#172)
Changed¶
- Updated architecture.md and fixed documentation issues (#167, #169)
- Added AGENTS.md and linked CLAUDE.md to it
[0.20.0] - 2025-12-18¶
Added¶
- Image Variations Support for Gemini - Add image variations support for Gemini (nano-banana) models (#165)
- Image Edit Support for Gemini - Implement limited image edit support for Gemini (nano-banana) models (#164)
- Enhanced Image Generation - Enhance /v1/images/generations with streaming and GPT Image features (#161)
- GPT Image 1.5 Model - Add gpt-image-1.5 model support (#159)
- Image Variations Endpoint - Implement /v1/images/variations endpoint for image variations (#155)
- Image Edits Endpoint - Implement /v1/images/edits endpoint for image editing (inpainting) (#156)
- Full OpenAI Images Edit API compatibility
- Supports GPT Image models:
gpt-image-1,gpt-image-1-mini,gpt-image-1.5(recommended) - Legacy support for
dall-e-2model - Multipart form-data parsing with shared utilities
- PNG image validation (format, size, square dimensions)
- Optional mask validation (dimension matching with source image)
- Shared Image Utilities - Implement shared utilities for image edit/variations endpoints (#154)
- External Prompt Files - Support loading system prompts from external Markdown files (#146)
- New
prompt_filefield inBackendPromptConfigandModelPromptConfig - New
default_fileandprompts_dirfields inGlobalPromptConfig - Secure path validation with path traversal attack prevention
- REST API endpoints for prompt file management
- File caching with size limits (100 entries max, 50MB total)
- Hot-reload support for prompt files
- New
- Solar Open 100B Model - Add Solar Open 100B model metadata
- Automatic Model Discovery - Backends automatically discover available models from
/v1/modelsAPI when models are not explicitly configured (#142)- OpenAI, Gemini, and vLLM backends support auto-discovery
- Ollama backend uses vLLM's discovery mechanism (OpenAI-compatible API)
- 10-second timeout prevents blocking startup
- Falls back to hardcoded defaults if discovery fails
Changed¶
BackendFactory::create_backend_from_typed_config()is now async to support async model discovery- Backend
from_config()methods for OpenAI, Gemini, and vLLM are now async
Security¶
- API Key Redaction - Implement API key redaction to prevent credential exposure (#150)
Performance¶
- Binary Size Optimization - Optimize release binary size from 20MB to 6MB (70% reduction) (#144)
Refactored¶
[0.19.0] - 2025-12-13¶
Added¶
- Runtime Configuration Management API - Comprehensive REST API for viewing and modifying configuration at runtime (#139)
- Configuration Query APIs:
GET /admin/config/full- Retrieve full configuration with sensitive info maskedGET /admin/config/sections- List all 15 configuration sectionsGET /admin/config/{section}- Get specific section configurationGET /admin/config/schema- JSON Schema for client-side validation
- Configuration Modification APIs:
PUT /admin/config/{section}- Replace section configurationPATCH /admin/config/{section}- Partial update (JSON merge patch)POST /admin/config/validate- Validate configuration before applyingPOST /admin/config/apply- Apply configuration with hot reload
- Configuration Save/Restore APIs:
POST /admin/config/export- Export configuration (YAML/JSON/TOML)POST /admin/config/import- Import and apply configurationGET /admin/config/history- View configuration change historyPOST /admin/config/rollback/{version}- Rollback to previous version
- Backend Management APIs:
POST /admin/backends- Add new backendGET /admin/backends/{name}- Get backend configurationPUT /admin/backends/{name}- Update backend configurationDELETE /admin/backends/{name}- Remove backendPUT /admin/backends/{name}/weight- Update backend weightPUT /admin/backends/{name}/models- Update backend model list
- Sensitive information masking for API keys, passwords, tokens
- JSON Schema generation for all configuration sections
- Configuration history tracking (up to 100 entries, configurable)
- Memory-efficient history storage with size-based eviction (10MB limit)
- Atomic version counter using AtomicU64 for thread safety
- Structured error responses with error codes
- Configuration Query APIs:
- Admin REST API Documentation - Comprehensive developer guide (docs/admin-api.md)
- Complete API reference with request/response examples
- Client SDK examples for Python, JavaScript/TypeScript, and Go
- Best practices and security considerations
- Integration Tests - 33 integration tests for Configuration Management API endpoints
Fixed¶
- CRITICAL: Configuration changes now actually applied to running system
- CRITICAL: Memory growth controlled with JSON string storage and size-based eviction
- HIGH: Input validation added (1MB content limit, 32-level nesting depth)
- HIGH: Sensitive export requires elevated permission and audit logging
- HIGH: Comprehensive sensitive field detection (30+ patterns)
- MEDIUM: Validation functions now perform actual validation
- MEDIUM: Race condition fixed with AtomicU64 for version counter
- MEDIUM: Colon removed from allowed backend name characters
- MEDIUM: Structured error responses with error codes
- MEDIUM: Initialize flag prevents duplicate history entries
- LOW: Unnecessary clones removed for better performance
- LOW: Limits now configurable via AdminConfig
- LOW: Duplicate validation logic refactored
- LOW: Test coverage improved for edge cases
Changed¶
- Enhanced documentation for Configuration Management API across all guides
- Updated manpage with new admin endpoints
- Updated API.md with comprehensive Configuration Management API section
[0.18.0] - 2025-12-13¶
Added¶
- Per-API-Key Rate Limiting - Implement per-API-key rate limiting (#137)
- Individual rate limits for each API key
- Configurable requests per minute per key
- API Key Management System - Comprehensive API key management and configuration system
- Multiple key sources: config file, external file, environment variables
- Key properties: scopes, rate limits, expiration, enabled status
- Hot reload support for key configuration changes
- Files API Authentication - Implement authentication and authorization for Files API (#131)
- API key authentication for file operations
- File ownership enforcement
- Admin access control for all files
- Hot Reload for Runtime Configuration - Complete hot reload functionality for runtime configuration updates (#130)
- Automatic configuration file watching
- Classified updates: immediate, gradual, restart-required
Changed¶
- Major refactoring with modular structure
- Updated architecture.md to reflect refactored module structure
Fixed¶
- Add ConnectInfo extension for admin/metrics/files endpoints
- Address security vulnerabilities in API key management
- Address code quality issues in API key management
Documentation¶
- Add API key management documentation
- Add comprehensive API key management tests
[0.17.0] - 2025-12-12¶
Added¶
- Anthropic Backend File Content Transformation - Files uploaded to the router can now be used with Anthropic backend (#126)
- Automatic conversion of file content to Anthropic message format
- Support for text and document files with base64 encoding
- Seamless integration with file resolution middleware
- Gemini Backend File Content Transformation - Files uploaded to the router can now be used with Gemini backend (#127)
- Automatic conversion of file content to Gemini API format
- Support for inline data with proper MIME type handling
- Cross-provider file support enables files uploaded once to work across all backends
Fixed¶
- Streaming File Uploads - Implement streaming file uploads to prevent memory exhaustion (#128)
- Large file uploads no longer load entire file into memory
- Streaming processing for efficient memory usage
- Prevents OOM errors when uploading large files
Changed¶
- None
[0.16.0] - 2025-12-12¶
Added¶
- OpenAI-Compatible Files API - Full implementation of OpenAI Files API endpoints (#111)
- Upload files with multipart/form-data support
- List, retrieve, and delete files
- Download file content
- Supports purpose: fine-tune, batch, assistants, user_data
- File Resolution Middleware - Automatic file content injection for chat completions (#120)
- Reference uploaded files in chat messages with file IDs
- Automatic content injection into chat context
- Persistent Metadata Storage - File metadata persists across server restarts (#125)
- Sidecar JSON files (.meta.json) stored alongside data files
- Automatic recovery on startup with metadata rebuild from files
- Orphan file detection and optional cleanup
- OpenAI Backend File Handling - Files uploaded locally are forwarded to OpenAI when needed (#121, #122)
- GPT-5.2 Model Support - Added GPT-5.2 model metadata to OpenAI backend (#124)
- Circuit Breaker Pattern - Automatic backend failover with circuit breaker (#93)
- States: Closed → Open → Half-Open → Closed cycle
- Configurable failure thresholds and recovery timeout
- Per-backend circuit breaker instances
- Admin endpoints for circuit breaker status and control
- Admin Endpoint Authentication - Secure admin endpoints with authentication and audit logging
- Configurable Fallback Models - Automatic model fallback for unavailable model scenarios (#50)
- Define fallback chains for primary models (e.g., gpt-4o → gpt-4-turbo → gpt-3.5-turbo)
- Cross-provider fallback support (e.g., OpenAI → Anthropic)
- Automatic parameter translation between providers
- Integration with circuit breaker for layered failover protection
- Configurable trigger conditions (error codes, timeout, connection error, circuit breaker open)
- Response headers indicate when fallback was used (X-Fallback-Used, X-Original-Model, X-Fallback-Model)
- Prometheus metrics for fallback monitoring
- Pre-commit Hook - Automated code formatting and linting before commits
Fixed¶
- Fallback Chain Validation - Integrate chain validation into Validate derive
- Fallback Performance - Use index-based lookup for fallback chain traversal
- Lock Contention - Reduce lock contention in FallbackService with snapshot pattern
- Security - Sanitize fallback error headers and metric labels
- Circuit Breaker Security - Add backend name validation in admin endpoints
- Thread Safety - Use CAS loop for thread-safe half-open request limiting
Changed¶
- Documentation Updates - Comprehensive documentation for fallback configuration, circuit breaker, and Files API
- Code Quality - Fix clippy warnings and format code
- Pre-commit Hook Location - Move pre-commit hook to .githooks directory
[0.15.0] - 2025-12-05¶
Added¶
- Nano Banana API Support - Add Gemini Image Generation API support with OpenAI-compatible interface (#102)
- Supports nano-banana and nano-banana-pro models
- Automatic format conversion between OpenAI Images API and Gemini Imagen API
- Split /v1/models Endpoint - Standard lightweight response vs extended metadata response (#101)
/v1/modelsreturns lightweight response for better performance/v1/models?extended=truereturns full metadata for detailed model information
Changed¶
- Extract StreamService - Streaming handler logic extracted to dedicated StreamService for modular architecture (#106)
- Eliminate Retry Logic Duplication - Consolidated retry logic code in proxy.rs (#103)
Fixed¶
- Proper Error Propagation - Replace
.expect()panics with proper error propagation in HttpClientFactory (#104)
Performance¶
- LRU Cache Optimization - Use read lock instead of write lock for cache lookups (#105)
[0.14.2] - 2025-12-05¶
Added¶
- Token Usage Logging - Log input/output token counts on request completion (#92)
- Exclude List for Reports - Add exclude list configuration for reports
Changed¶
- None
Fixed¶
- None
[0.14.1] - 2025-12-05¶
Added¶
- TTFB Benchmark Targets - Add TTFB benchmark targets to Makefile
- Connection Pre-warming - Add connection pre-warming for Anthropic, Gemini, OpenAI backends
Fixed¶
- Anthropic Backend TTFT - Optimize Anthropic backend TTFT with connection pooling and HTTP/2 (#90)
- Gemini Backend TTFT - Optimize Gemini backend TTFT with connection pooling and HTTP/2 (#88)
- Model Metadata Alias Matching - Apply base name fallback matching to aliases in model metadata lookup (#84)
Changed¶
- Shared HTTP Client - Share HTTP client between HealthChecker and request handler
- Updated architecture and performance documentation
[0.14.0] - 2025-12-04¶
Added¶
- Global System Prompt Injection - Add router-wide global system prompt injection (#82)
Fixed¶
- GitHub Actions - Replace deprecated actions-rs/toolchain with dtolnay/rust-toolchain
- macOS ARM64 Build - Add RUSTFLAGS for macOS ARM64 ring build
- musl Build - Switch to rustls-tls for musl cross-compilation support
Changed¶
- Update GitHub Action runner
[0.13.0] - 2025-12-04¶
Added¶
- OpenAI Responses API (
/v1/responses) - Full implementation of OpenAI's Responses API (#49)- Session-based response management with automatic expiration
- Background cleanup task for expired sessions
- Request/response format converter between Responses API and Chat Completions
- SecretString for API Keys - Secure API key storage using SecretString across all backends (#76)
- Model Metadata Override - Allow overriding /v1/models response fields via model-metadata.yaml (#75)
Fixed¶
- True SSE Streaming - Implement proper Server-Sent Events streaming for /v1/responses API
Changed¶
- Immediate Mode for SseParser - Reduced first-response latency with immediate parsing mode
- String Allocation Optimizations - Improved performance with reduced allocations
- Error Handling Standardization - Consistent error handling patterns across the codebase
Security¶
- Session Access Control - Added proper access control for session management
- Input Validation - Comprehensive input validation for Responses API
[0.12.0] - 2025-12-04¶
Added¶
- SSRF Prevention Module - New UrlValidator module with comprehensive SSRF prevention (#66)
- Centralized HTTP Client Factory - HttpClientFactory for consistent HTTP client creation across backends (#67)
Fixed¶
- Consistent Hash Algorithm - Handle exact hash matches in binary search for proper routing (#72)
- Replace Panics with Option Returns - Improve reliability by replacing panics with Option returns (#71)
- Remove Hardcoded Auth Requirement - /v1/models endpoint no longer requires hardcoded authentication
- GitHub Actions - Use GitHub App token for Projects V2 API access
Changed¶
- Reorganize OpenAI Model Metadata - Model metadata organized by family for better maintainability (#74)
- Extract AnthropicStreamTransformer - Dedicated module for Anthropic stream transformation (#73)
- Split Backends Module - backends mod.rs split into separate modules for cleaner architecture (#69)
- Extract Embedded Tests - Tests moved to separate files for better organization (#68)
- Extract RequestExecutor - Shared common module for request execution (#65)
- Extract HeaderBuilder - Auth strategies moved to dedicated module (#64)
- Extract AtomicStatistics - Shared common module for atomic statistics
Technical Improvements¶
- Improved code organization with modular architecture
- Implemented stats aggregation for better observability
- Enhanced security with SSRF prevention capabilities
[0.11.0] - 2025-12-03¶
Added¶
- Native Anthropic Claude API backend (
type: anthropic) with OpenAI-compatible endpoint (#33)- Automatic API key loading from
CONTINUUM_ANTHROPIC_API_KEYenvironment variable - Extended thinking block support for Claude thinking models
- OpenAI to Claude reasoning parameter conversion (
reasoning_effort) - Support for flat
reasoning_effortparameter
- Automatic API key loading from
- Claude 4, 4.1, 4.5 model metadata documentation
Fixed¶
- Improve health check and model fetching for Anthropic/Gemini backends
- Add
Accept-Encoding: identityheader to streaming requests to prevent compression issues - Fix
make_backend_requestin proxy.rs for proper Accept-Encoding handling
Changed¶
- Refactor: apply code formatting and fix clippy warnings
- Refactor: use reqwest
no_gzip/no_brotli/no_deflateinstead of Accept-Encoding header
[0.10.0] - 2025-12-03¶
Added¶
- Native Google Gemini API backend (
type: gemini) with OpenAI-compatible endpoint (#32)- Automatic API key loading from
CONTINUUM_GEMINI_API_KEYenvironment variable - Extended 300s streaming timeout for thinking models (gemini-2.5-pro, gemini-3-pro)
- Automatic
max_tokensadjustment for thinking models to prevent response truncation - Support for
reasoning_effortparameter
- Automatic API key loading from
- Native OpenAI API backend (
type: openai) with built-in configuration- Automatic API key loading from
CONTINUUM_OPENAI_API_KEYenvironment variable - Built-in OpenAI model metadata in /v1/models response
- Automatic API key loading from
- OpenAI Images API support (
/v1/images/generations) for DALL-E and gpt-image-1 models (#35)- Configurable image generation timeout (
timeouts.request.image_generation) - Comprehensive input validation for image generation parameters
- Response format validation for image generation API
- Configurable image generation timeout (
- Authenticated health checks for OpenAI and API-key backends
- API key authentication to streaming requests
- Filter /v1/models to show only configured models
- Allow any config file path when explicitly specified via -c/--config
.env.exampleand typed backend configuration examples- Comprehensive model metadata for GLM 4.6, Kimi K2, DeepSeek, GPT, and Qwen3 series
Fixed¶
- Streaming response truncation for thinking models (gemini-2.5-pro, gemini-3-pro)
- Model ID normalization and streaming compatibility for Gemini backend
- Convert
max_tokenstomax_completion_tokensfor newer OpenAI models - Correct URL construction for all API endpoints
- Security: Remove sensitive data from debug logs
- Security: Add request body size limits to prevent DoS attacks
Changed¶
- Refactor: Unify request retry logic with RequestType enum
- Refactor: Improve Gemini backend performance with lock-free statistics and slice returns
- Add Gemini backend documentation and max_tokens behavior documentation
- Add image generation API documentation
- Standardize capability naming in model-metadata.yaml
[0.9.0] - 2025-12-02¶
Added¶
- Native Google Gemini API backend (
type: gemini) with OpenAI-compatible endpoint (#32)- Automatic API key loading from
CONTINUUM_GEMINI_API_KEYenvironment variable - Extended 300s streaming timeout for thinking models (gemini-2.5-pro, gemini-3-pro)
- Automatic
max_tokensadjustment for thinking models to prevent response truncation - Support for
reasoning_effortparameter
- Automatic API key loading from
- OpenAI Images API support (
/v1/images/generations) for DALL-E and gpt-image-1 models (#35) - Configurable image generation timeout (
timeouts.request.image_generation) - Comprehensive model metadata for OpenAI models including GPT-5 family, o-series, audio/speech, video (Sora), and embedding models
- Enhanced rate limiting with token bucket algorithm (#11)
- Comprehensive Prometheus metrics and monitoring (#10)
- Configuration file migration and auto-correction CLI utility (#29)
- Comprehensive authentication for metrics endpoint
Fixed¶
- CRITICAL: Eliminate race condition in token refill
- CRITICAL: Protect API keys with SHA-256 hashing
- CRITICAL: Prevent memory exhaustion via unbounded bucket growth
- CRITICAL: Prevent header injection vulnerabilities
- HIGH: Prevent IP spoofing via X-Forwarded-For manipulation
- HIGH: Implement singleton pattern for metrics to prevent memory leaks
- HIGH: Eliminate unnecessary string allocations
- HIGH: Implement model extraction for rate limiting
- Add comprehensive cardinality limits and label sanitization to prevent metric explosion DoS attacks
- Improve error handling to prevent panic conditions
- Resolve environment variable race condition in config test
- Fix integration test failure in metrics RequestTimer
- Fix unit test failures in metrics security module
Changed¶
- Refactor: remove excessive Arc wrapping in rate limiting
- Reorganize documentation structure for better maintainability
- Add comprehensive metrics documentation
- Update documentation for rate limiting feature
- Remove development mock server and sample config files
- Remove temporary test files and improve gitignore
- Remove duplicate man page and update gitignore
- Update README.md to mention correct repo
- Update release workflows
[0.8.0] - 2025-09-09¶
Added¶
- Model ID alias support for metadata sharing (#27)
- Comprehensive rate limiting documentation
- Robust rate limiting to models endpoint to prevent DoS via cache poisoning
Fixed¶
- Return empty list instead of 503 when all backends are unhealthy (#28)
- Improve error handling and classification
- Resolve clippy warnings for MutexGuard held across await points
Changed¶
- Increase rate limits for /v1/models endpoint to be more practical
- Add alias feature documentation to configuration.md
[0.7.1] - 2025-09-08¶
Fixed¶
- Improve config path validation for home directory and executable paths (#26)
[0.7.0] - 2025-09-07¶
Added¶
- Extend /v1/models endpoint with rich metadata support (#23) (#25)
- Enhanced Configuration Management (#9) (#22)
- Advanced load balancing strategies with enhanced error handling (#21)
Fixed¶
- Use streaming timeout configuration from config.yaml instead of hardcoded 25s limit
Changed¶
- Add yaml to exclude list
[0.6.0] - 2025-09-03¶
Added¶
- GitHub Project automation workflow
- Comprehensive timeout configuration and model documentation updates
Fixed¶
- Use timeout configuration from config.yaml instead of hardcoded values (#19)
- Fix clippy warnings and benchmark compilation issues
Changed¶
- Apply cargo fmt
[0.5.0] - 2025-09-02¶
Added¶
- Extensible architecture with layered design (#16)
- Comprehensive integration tests and performance optimizations
- Complete service layer implementation
- Middleware architecture and enhanced backend abstraction
- Configurable connection pool size with CLI and config file support
- Comprehensive configuration management with YAML support (#7)
- Debian packaging and man page for continuum-router
Fixed¶
- Handle Option
correctly in tests - Update test to handle streaming requests without model field gracefully
- Resolve floating-point precision and timing issues in tests
- Resolve test failures and deadlocks in object pool and SSE parser
- Resolve CI test failures and improve test performance
- Resolve config watcher test failures in CI environment
- Resolve initial health check race condition
- Critical security vulnerabilities in error handling and retry logic
- Adjust timeout test tolerance for timing variations
Changed¶
- Extract complex types into type aliases for better readability
- Resolve all cargo fmt and clippy warnings
- Make retry configuration optional with sensible defaults
- Optimize config access and add comprehensive timeout management
- Update model names in timeout configuration to latest versions
- Complete documentation update
- Split oversized modules into layered architecture
Performance¶
- Optimize config access and add comprehensive timeout management
[0.4.0] - 2025-08-25¶
Added¶
- Model-based routing with health monitoring (#6)
Fixed¶
- Improve health check integration and SSE parsing for better compatibility
Changed¶
- Update README.md
[0.3.0] - 2025-08-25¶
Added¶
- SSE streaming support for real-time chat completions (#5)
Fixed¶
- Handle non-success status codes in streaming responses
- Allow streaming to continue even when backend returns 404 or other error status codes
- Send SSE error event first to notify client of the backend error status
[0.2.0] - 2025-08-25¶
Added¶
- Model aggregation from multiple endpoints (#4)
[0.1.0] - 2025-08-24¶
Added¶
- OpenAI-compatible endpoints and proxy functionality
/v1/modelsendpoint for listing available models/v1/completionsendpoint for legacy OpenAI completions API/v1/chat/completionsendpoint for chat API- Multiple backends support with round-robin load balancing (#1)
- Fallback handler for undefined routes with proper error messages
Fixed¶
- Improve error handling consistency across all endpoints
Changed¶
- Update README with changelog and version information
Migration Notes¶
Upgrading to v0.16.0¶
- New Files API: OpenAI-compatible Files API is now available at
/v1/files- Upload files for fine-tuning, batch processing, or assistants
- Files are stored locally with persistent metadata
- Configure via
files_apisection in config.yaml
- File Resolution: Reference uploaded files in chat completions
- Use file IDs in your chat messages for automatic content injection
- Persistent Metadata: File metadata now survives server restarts
- Set
metadata_storage: persistent(default) in files_api config - Set
cleanup_orphans_on_startup: trueto auto-clean orphaned files
- Set
- Circuit Breaker: Add
circuit_breakersection to your config.yaml for automatic backend failover- Configure failure threshold, recovery timeout, and half-open requests
- New Fallback Feature: Add
fallbacksection to your config.yaml to enable automatic model fallback- Define fallback chains:
fallback_chains: { "gpt-4o": ["gpt-4-turbo", "gpt-3.5-turbo"] } - Configure trigger conditions in
fallback_policy - Cross-provider fallback is supported (e.g., OpenAI → Anthropic)
- Define fallback chains:
- Circuit Breaker Integration: Set
circuit_breaker_open: truein trigger_conditions to integrate with existing circuit breaker - Response Headers: Check
X-Fallback-Usedheader to detect when fallback was used - GPT-5.2 Support: New GPT-5.2 model metadata is available
- No breaking changes from v0.15.0
Upgrading to v0.15.0¶
- Split /v1/models Endpoint: The
/v1/modelsendpoint now returns a lightweight response by default- For extended metadata, use
/v1/models?extended=true - This improves performance for clients that only need basic model information
- For extended metadata, use
- Nano Banana API: New support for Gemini Image Generation (Imagen) through OpenAI-compatible interface
- Use
nano-bananaornano-banana-promodel names
- Use
- Error Handling: Improved reliability with proper error propagation instead of panics
- Performance: LRU cache now uses read locks for better concurrent performance
- No breaking changes from v0.14.x
Upgrading to v0.13.0¶
- New Responses API: The
/v1/responsesendpoint is now available for OpenAI Responses API compatibility- Sessions are automatically managed with background cleanup for expired sessions
- True SSE streaming provides real-time responses
- Security: API keys are now stored using SecretString for improved security across all backends (#76)
- Model Metadata: Override /v1/models response fields via model-metadata.yaml (#75)
- No breaking changes from v0.12.0
Upgrading to v0.12.0¶
- No breaking changes: This is a refactoring release with improved code organization
- Bug fix: Consistent hash routing now correctly handles exact hash matches
- Security: SSRF prevention module added for URL validation
- Reliability: Panics replaced with Option returns for better error handling
- API change: /v1/models endpoint no longer has hardcoded auth requirement
Upgrading to v0.11.0¶
- New Anthropic backend: Add
type: anthropicbackends for native Anthropic Claude API support- Set
CONTINUUM_ANTHROPIC_API_KEYenvironment variable for authentication - Supports extended thinking with automatic parameter conversion
- OpenAI
reasoning_effortparameter is automatically converted to Claude's thinking format
- Set
- Streaming improvements: Accept-Encoding fixes ensure proper streaming for all backends
- No breaking changes from v0.10.0
Upgrading to v0.10.0¶
- New OpenAI backend: Add
type: openaibackends for native OpenAI API support- Set
CONTINUUM_OPENAI_API_KEYenvironment variable for authentication - Built-in model metadata is automatically included in /v1/models response
- Set
- Image Generation API: New
/v1/images/generationsendpoint for DALL-E models- Configure timeout via
timeouts.request.image_generation(default: 120s) - Supports responseformat validation (url or b64json)
- Configure timeout via
- Gemini improvements: Streaming response truncation fixed for thinking models
- Model ID normalization ensures proper routing
- API key authentication: Streaming requests now support API key authentication
- Security: Request body size limits prevent DoS attacks
- Newer OpenAI models automatically use
max_completion_tokensinstead ofmax_tokens
Upgrading to v0.9.0¶
- New Gemini backend: Add
type: geminibackends for native Google Gemini API support- Set
CONTINUUM_GEMINI_API_KEYenvironment variable for authentication - Thinking models (gemini-2.5-pro, gemini-3-pro) automatically get
max_tokens: 16384if client sends values below 4096
- Set
- Enhanced rate limiting with token bucket algorithm is now available
- Configure rate limiting via
rate_limitingsection in config.yaml - Prometheus metrics are now available at
/metricsendpoint with authentication - Use
--migrate-config-fileCLI option to migrate and fix configuration files - Multiple critical security fixes have been applied to rate limiting
Upgrading to v0.8.0¶
- Rate limiting is now enabled for the
/v1/modelsendpoint - Empty list is returned instead of 503 error when all backends are unhealthy
- Model aliases are now supported for metadata sharing
Upgrading to v0.7.0¶
- Enhanced configuration management requires updating configuration files
- New load balancing strategies are available
- Streaming timeout is now configurable via config.yaml
Upgrading to v0.6.0¶
- Timeout configuration is now read from config.yaml instead of hardcoded values
- Update your configuration files to include timeout settings
Upgrading to v0.5.0¶
- Major architectural refactoring with layered design
- Configuration management now supports YAML files
- Retry mechanisms have been enhanced with security improvements
- Connection pool size is now configurable
This changelog reflects the actual development history of Continuum Router from its initial release to the current version.