Skip to content

Changelog

All notable changes to Continuum Router are documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[0.22.0] - 2025-12-19

Added

  • Docker Support with Pre-built Binaries - Add Dockerfile and Dockerfile.alpine that download pre-built binaries from GitHub Releases (#189)
    • Debian Bookworm-based image (~50MB) for general use
    • Alpine 3.20-based image (~10MB) for minimal deployments
    • Multi-architecture support (linux/amd64, linux/arm64) using TARGETARCH
    • VERSION build argument for selecting release version
    • Non-root user execution for security
    • OCI labels for image metadata
  • Container Health Check CLI - Implement --health-check CLI argument for container orchestration (#189)
    • Returns exit code 0 if server is healthy, 1 if unhealthy
    • Optional --health-check-url for custom health endpoint
    • Proper IPv6 address handling
    • 5-second default timeout
  • Docker Compose Quick Start - Add docker-compose.yml for easy deployment (#189)
    • Volume mount for configuration
    • Environment variable support (RUST_LOG)
    • Resource limits and health checks
  • Automated Docker Image Publishing - Add Docker build and push to ghcr.io in release workflow (#189)
    • Builds both Debian and Alpine images after binary release
    • Multi-platform support (linux/amd64, linux/arm64)
    • Automatic tagging with semver (VERSION, MAJOR.MINOR, latest)
    • Alpine images tagged with -alpine suffix
    • GitHub Actions cache for faster builds
  • MkDocs Documentation Website - Build comprehensive documentation site with Material theme (#183)
    • Full navigation structure with Getting Started, Features, Operations, and Development sections
    • GitHub Actions workflow for automatic deployment to GitHub Pages
    • Custom stylesheets and theme configuration
  • Korean Documentation Translation - Complete Korean localization of all documentation (#190)
    • All 20 documentation files translated to Korean
    • Language switcher in navigation (English/Korean)
    • Multi-language build in GitHub Actions workflow
  • Dependency Security Auditing - Add cargo-deny for vulnerability scanning (#192)
    • Security advisory checking in CI workflow
    • License compliance verification
    • Dependency source validation
  • Dependabot Integration - Automated dependency updates for Cargo and GitHub Actions (#192)
  • Security Policy - Add comprehensive SECURITY.md with vulnerability reporting process (#191)

Changed

  • Integrate orphaned architecture documentation into MkDocs site (#186)
  • Rename documentation files to lowercase kebab-case for URL-friendly filenames (#186)
  • Update various GitHub Actions to latest versions (checkout@v6, setup-python@v6, upload-artifact@v6, etc.)

Fixed

  • Health check response validation logic bug (operator precedence issue)
  • Address parsing fallback that was silently hiding configuration errors
  • IPv6 address formatting in health check (now correctly uses bracket notation)

Security

  • Updated reqwest 0.11→0.12, prometheus 0.13→0.14, validator 0.18→0.20
  • Replaced dotenv with dotenvy for better maintenance
  • Added .dockerignore to exclude sensitive files from build context

[0.21.0] - 2025-12-19

Added

  • Gemini 3 Flash Preview Model - Add support for gemini-3-flash-preview model (#168)
  • Backend Error Passthrough - Pass through detailed error messages from backends for 4xx responses (#177)
    • Parse and forward original error messages from OpenAI, Anthropic, and Gemini backends
    • Preserve param field when available (useful for invalid parameter errors)
    • Falls back to generic error message if backend response cannot be parsed
    • Error format remains OpenAI-compatible
    • Comprehensive unit tests for error parsing across all backend formats
  • Default Authentication Mode for API Endpoints - Configurable authentication enforcement for API endpoints (#173)
    • New mode field in api_keys configuration: permissive (default) or blocking
    • permissive mode: Requests without API key are allowed (backward compatible)
    • blocking mode: Only authenticated requests are processed, unauthenticated requests receive 401
    • Protected endpoints: /v1/chat/completions, /v1/completions, /v1/responses, /v1/images/*, /v1/models
    • Health endpoints (/health, /healthz) always accessible without authentication
    • Hot reload support for authentication mode changes
    • Comprehensive integration tests for both modes
    • Updated API.md, configuration.md, and manpage documentation

Fixed

  • UTF-8 Multi-byte Character Corruption - Handle UTF-8 multi-byte character corruption in streaming responses (#179)
  • GPT Image response_format - Strip response_format parameter for GPT Image models (#176)
  • Auto-discovery Validation - Allow auto-discovery for all backends except Anthropic (#172)

Changed

  • Updated architecture.md and fixed documentation issues (#167, #169)
  • Added AGENTS.md and linked CLAUDE.md to it

[0.20.0] - 2025-12-18

Added

  • Image Variations Support for Gemini - Add image variations support for Gemini (nano-banana) models (#165)
  • Image Edit Support for Gemini - Implement limited image edit support for Gemini (nano-banana) models (#164)
  • Enhanced Image Generation - Enhance /v1/images/generations with streaming and GPT Image features (#161)
  • GPT Image 1.5 Model - Add gpt-image-1.5 model support (#159)
  • Image Variations Endpoint - Implement /v1/images/variations endpoint for image variations (#155)
  • Image Edits Endpoint - Implement /v1/images/edits endpoint for image editing (inpainting) (#156)
    • Full OpenAI Images Edit API compatibility
    • Supports GPT Image models: gpt-image-1, gpt-image-1-mini, gpt-image-1.5 (recommended)
    • Legacy support for dall-e-2 model
    • Multipart form-data parsing with shared utilities
    • PNG image validation (format, size, square dimensions)
    • Optional mask validation (dimension matching with source image)
  • Shared Image Utilities - Implement shared utilities for image edit/variations endpoints (#154)
  • External Prompt Files - Support loading system prompts from external Markdown files (#146)
    • New prompt_file field in BackendPromptConfig and ModelPromptConfig
    • New default_file and prompts_dir fields in GlobalPromptConfig
    • Secure path validation with path traversal attack prevention
    • REST API endpoints for prompt file management
    • File caching with size limits (100 entries max, 50MB total)
    • Hot-reload support for prompt files
  • Solar Open 100B Model - Add Solar Open 100B model metadata
  • Automatic Model Discovery - Backends automatically discover available models from /v1/models API when models are not explicitly configured (#142)
    • OpenAI, Gemini, and vLLM backends support auto-discovery
    • Ollama backend uses vLLM's discovery mechanism (OpenAI-compatible API)
    • 10-second timeout prevents blocking startup
    • Falls back to hardcoded defaults if discovery fails

Changed

  • BackendFactory::create_backend_from_typed_config() is now async to support async model discovery
  • Backend from_config() methods for OpenAI, Gemini, and vLLM are now async

Security

  • API Key Redaction - Implement API key redaction to prevent credential exposure (#150)

Performance

  • Binary Size Optimization - Optimize release binary size from 20MB to 6MB (70% reduction) (#144)

Refactored

  • Split large files for Priority 2 of issue #147
  • Large files to keep each under 500 lines (#148)

[0.19.0] - 2025-12-13

Added

  • Runtime Configuration Management API - Comprehensive REST API for viewing and modifying configuration at runtime (#139)
    • Configuration Query APIs:
      • GET /admin/config/full - Retrieve full configuration with sensitive info masked
      • GET /admin/config/sections - List all 15 configuration sections
      • GET /admin/config/{section} - Get specific section configuration
      • GET /admin/config/schema - JSON Schema for client-side validation
    • Configuration Modification APIs:
      • PUT /admin/config/{section} - Replace section configuration
      • PATCH /admin/config/{section} - Partial update (JSON merge patch)
      • POST /admin/config/validate - Validate configuration before applying
      • POST /admin/config/apply - Apply configuration with hot reload
    • Configuration Save/Restore APIs:
      • POST /admin/config/export - Export configuration (YAML/JSON/TOML)
      • POST /admin/config/import - Import and apply configuration
      • GET /admin/config/history - View configuration change history
      • POST /admin/config/rollback/{version} - Rollback to previous version
    • Backend Management APIs:
      • POST /admin/backends - Add new backend
      • GET /admin/backends/{name} - Get backend configuration
      • PUT /admin/backends/{name} - Update backend configuration
      • DELETE /admin/backends/{name} - Remove backend
      • PUT /admin/backends/{name}/weight - Update backend weight
      • PUT /admin/backends/{name}/models - Update backend model list
    • Sensitive information masking for API keys, passwords, tokens
    • JSON Schema generation for all configuration sections
    • Configuration history tracking (up to 100 entries, configurable)
    • Memory-efficient history storage with size-based eviction (10MB limit)
    • Atomic version counter using AtomicU64 for thread safety
    • Structured error responses with error codes
  • Admin REST API Documentation - Comprehensive developer guide (docs/admin-api.md)
    • Complete API reference with request/response examples
    • Client SDK examples for Python, JavaScript/TypeScript, and Go
    • Best practices and security considerations
  • Integration Tests - 33 integration tests for Configuration Management API endpoints

Fixed

  • CRITICAL: Configuration changes now actually applied to running system
  • CRITICAL: Memory growth controlled with JSON string storage and size-based eviction
  • HIGH: Input validation added (1MB content limit, 32-level nesting depth)
  • HIGH: Sensitive export requires elevated permission and audit logging
  • HIGH: Comprehensive sensitive field detection (30+ patterns)
  • MEDIUM: Validation functions now perform actual validation
  • MEDIUM: Race condition fixed with AtomicU64 for version counter
  • MEDIUM: Colon removed from allowed backend name characters
  • MEDIUM: Structured error responses with error codes
  • MEDIUM: Initialize flag prevents duplicate history entries
  • LOW: Unnecessary clones removed for better performance
  • LOW: Limits now configurable via AdminConfig
  • LOW: Duplicate validation logic refactored
  • LOW: Test coverage improved for edge cases

Changed

  • Enhanced documentation for Configuration Management API across all guides
  • Updated manpage with new admin endpoints
  • Updated API.md with comprehensive Configuration Management API section

[0.18.0] - 2025-12-13

Added

  • Per-API-Key Rate Limiting - Implement per-API-key rate limiting (#137)
    • Individual rate limits for each API key
    • Configurable requests per minute per key
  • API Key Management System - Comprehensive API key management and configuration system
    • Multiple key sources: config file, external file, environment variables
    • Key properties: scopes, rate limits, expiration, enabled status
    • Hot reload support for key configuration changes
  • Files API Authentication - Implement authentication and authorization for Files API (#131)
    • API key authentication for file operations
    • File ownership enforcement
    • Admin access control for all files
  • Hot Reload for Runtime Configuration - Complete hot reload functionality for runtime configuration updates (#130)
    • Automatic configuration file watching
    • Classified updates: immediate, gradual, restart-required

Changed

  • Major refactoring with modular structure
    • Extract CLI and app utilities into modular structure (#132)
    • Split converter.rs into modular structure (#132)
    • Split large source files into modular structure
    • Consolidate findgeminibackend function logic
  • Updated architecture.md to reflect refactored module structure

Fixed

  • Add ConnectInfo extension for admin/metrics/files endpoints
  • Address security vulnerabilities in API key management
  • Address code quality issues in API key management

Documentation

  • Add API key management documentation
  • Add comprehensive API key management tests

[0.17.0] - 2025-12-12

Added

  • Anthropic Backend File Content Transformation - Files uploaded to the router can now be used with Anthropic backend (#126)
    • Automatic conversion of file content to Anthropic message format
    • Support for text and document files with base64 encoding
    • Seamless integration with file resolution middleware
  • Gemini Backend File Content Transformation - Files uploaded to the router can now be used with Gemini backend (#127)
    • Automatic conversion of file content to Gemini API format
    • Support for inline data with proper MIME type handling
    • Cross-provider file support enables files uploaded once to work across all backends

Fixed

  • Streaming File Uploads - Implement streaming file uploads to prevent memory exhaustion (#128)
    • Large file uploads no longer load entire file into memory
    • Streaming processing for efficient memory usage
    • Prevents OOM errors when uploading large files

Changed

  • None

[0.16.0] - 2025-12-12

Added

  • OpenAI-Compatible Files API - Full implementation of OpenAI Files API endpoints (#111)
    • Upload files with multipart/form-data support
    • List, retrieve, and delete files
    • Download file content
    • Supports purpose: fine-tune, batch, assistants, user_data
  • File Resolution Middleware - Automatic file content injection for chat completions (#120)
    • Reference uploaded files in chat messages with file IDs
    • Automatic content injection into chat context
  • Persistent Metadata Storage - File metadata persists across server restarts (#125)
    • Sidecar JSON files (.meta.json) stored alongside data files
    • Automatic recovery on startup with metadata rebuild from files
    • Orphan file detection and optional cleanup
  • OpenAI Backend File Handling - Files uploaded locally are forwarded to OpenAI when needed (#121, #122)
  • GPT-5.2 Model Support - Added GPT-5.2 model metadata to OpenAI backend (#124)
  • Circuit Breaker Pattern - Automatic backend failover with circuit breaker (#93)
    • States: Closed → Open → Half-Open → Closed cycle
    • Configurable failure thresholds and recovery timeout
    • Per-backend circuit breaker instances
    • Admin endpoints for circuit breaker status and control
  • Admin Endpoint Authentication - Secure admin endpoints with authentication and audit logging
  • Configurable Fallback Models - Automatic model fallback for unavailable model scenarios (#50)
    • Define fallback chains for primary models (e.g., gpt-4o → gpt-4-turbo → gpt-3.5-turbo)
    • Cross-provider fallback support (e.g., OpenAI → Anthropic)
    • Automatic parameter translation between providers
    • Integration with circuit breaker for layered failover protection
    • Configurable trigger conditions (error codes, timeout, connection error, circuit breaker open)
    • Response headers indicate when fallback was used (X-Fallback-Used, X-Original-Model, X-Fallback-Model)
    • Prometheus metrics for fallback monitoring
  • Pre-commit Hook - Automated code formatting and linting before commits

Fixed

  • Fallback Chain Validation - Integrate chain validation into Validate derive
  • Fallback Performance - Use index-based lookup for fallback chain traversal
  • Lock Contention - Reduce lock contention in FallbackService with snapshot pattern
  • Security - Sanitize fallback error headers and metric labels
  • Circuit Breaker Security - Add backend name validation in admin endpoints
  • Thread Safety - Use CAS loop for thread-safe half-open request limiting

Changed

  • Documentation Updates - Comprehensive documentation for fallback configuration, circuit breaker, and Files API
  • Code Quality - Fix clippy warnings and format code
  • Pre-commit Hook Location - Move pre-commit hook to .githooks directory

[0.15.0] - 2025-12-05

Added

  • Nano Banana API Support - Add Gemini Image Generation API support with OpenAI-compatible interface (#102)
    • Supports nano-banana and nano-banana-pro models
    • Automatic format conversion between OpenAI Images API and Gemini Imagen API
  • Split /v1/models Endpoint - Standard lightweight response vs extended metadata response (#101)
    • /v1/models returns lightweight response for better performance
    • /v1/models?extended=true returns full metadata for detailed model information

Changed

  • Extract StreamService - Streaming handler logic extracted to dedicated StreamService for modular architecture (#106)
  • Eliminate Retry Logic Duplication - Consolidated retry logic code in proxy.rs (#103)

Fixed

  • Proper Error Propagation - Replace .expect() panics with proper error propagation in HttpClientFactory (#104)

Performance

  • LRU Cache Optimization - Use read lock instead of write lock for cache lookups (#105)

[0.14.2] - 2025-12-05

Added

  • Token Usage Logging - Log input/output token counts on request completion (#92)
  • Exclude List for Reports - Add exclude list configuration for reports

Changed

  • None

Fixed

  • None

[0.14.1] - 2025-12-05

Added

  • TTFB Benchmark Targets - Add TTFB benchmark targets to Makefile
  • Connection Pre-warming - Add connection pre-warming for Anthropic, Gemini, OpenAI backends

Fixed

  • Anthropic Backend TTFT - Optimize Anthropic backend TTFT with connection pooling and HTTP/2 (#90)
  • Gemini Backend TTFT - Optimize Gemini backend TTFT with connection pooling and HTTP/2 (#88)
  • Model Metadata Alias Matching - Apply base name fallback matching to aliases in model metadata lookup (#84)

Changed

  • Shared HTTP Client - Share HTTP client between HealthChecker and request handler
  • Updated architecture and performance documentation

[0.14.0] - 2025-12-04

Added

  • Global System Prompt Injection - Add router-wide global system prompt injection (#82)

Fixed

  • GitHub Actions - Replace deprecated actions-rs/toolchain with dtolnay/rust-toolchain
  • macOS ARM64 Build - Add RUSTFLAGS for macOS ARM64 ring build
  • musl Build - Switch to rustls-tls for musl cross-compilation support

Changed

  • Update GitHub Action runner

[0.13.0] - 2025-12-04

Added

  • OpenAI Responses API (/v1/responses) - Full implementation of OpenAI's Responses API (#49)
    • Session-based response management with automatic expiration
    • Background cleanup task for expired sessions
    • Request/response format converter between Responses API and Chat Completions
  • SecretString for API Keys - Secure API key storage using SecretString across all backends (#76)
  • Model Metadata Override - Allow overriding /v1/models response fields via model-metadata.yaml (#75)

Fixed

  • True SSE Streaming - Implement proper Server-Sent Events streaming for /v1/responses API

Changed

  • Immediate Mode for SseParser - Reduced first-response latency with immediate parsing mode
  • String Allocation Optimizations - Improved performance with reduced allocations
  • Error Handling Standardization - Consistent error handling patterns across the codebase

Security

  • Session Access Control - Added proper access control for session management
  • Input Validation - Comprehensive input validation for Responses API

[0.12.0] - 2025-12-04

Added

  • SSRF Prevention Module - New UrlValidator module with comprehensive SSRF prevention (#66)
  • Centralized HTTP Client Factory - HttpClientFactory for consistent HTTP client creation across backends (#67)

Fixed

  • Consistent Hash Algorithm - Handle exact hash matches in binary search for proper routing (#72)
  • Replace Panics with Option Returns - Improve reliability by replacing panics with Option returns (#71)
  • Remove Hardcoded Auth Requirement - /v1/models endpoint no longer requires hardcoded authentication
  • GitHub Actions - Use GitHub App token for Projects V2 API access

Changed

  • Reorganize OpenAI Model Metadata - Model metadata organized by family for better maintainability (#74)
  • Extract AnthropicStreamTransformer - Dedicated module for Anthropic stream transformation (#73)
  • Split Backends Module - backends mod.rs split into separate modules for cleaner architecture (#69)
  • Extract Embedded Tests - Tests moved to separate files for better organization (#68)
  • Extract RequestExecutor - Shared common module for request execution (#65)
  • Extract HeaderBuilder - Auth strategies moved to dedicated module (#64)
  • Extract AtomicStatistics - Shared common module for atomic statistics

Technical Improvements

  • Improved code organization with modular architecture
  • Implemented stats aggregation for better observability
  • Enhanced security with SSRF prevention capabilities

[0.11.0] - 2025-12-03

Added

  • Native Anthropic Claude API backend (type: anthropic) with OpenAI-compatible endpoint (#33)
    • Automatic API key loading from CONTINUUM_ANTHROPIC_API_KEY environment variable
    • Extended thinking block support for Claude thinking models
    • OpenAI to Claude reasoning parameter conversion (reasoning_effort)
    • Support for flat reasoning_effort parameter
  • Claude 4, 4.1, 4.5 model metadata documentation

Fixed

  • Improve health check and model fetching for Anthropic/Gemini backends
  • Add Accept-Encoding: identity header to streaming requests to prevent compression issues
  • Fix make_backend_request in proxy.rs for proper Accept-Encoding handling

Changed

  • Refactor: apply code formatting and fix clippy warnings
  • Refactor: use reqwest no_gzip/no_brotli/no_deflate instead of Accept-Encoding header

[0.10.0] - 2025-12-03

Added

  • Native Google Gemini API backend (type: gemini) with OpenAI-compatible endpoint (#32)
    • Automatic API key loading from CONTINUUM_GEMINI_API_KEY environment variable
    • Extended 300s streaming timeout for thinking models (gemini-2.5-pro, gemini-3-pro)
    • Automatic max_tokens adjustment for thinking models to prevent response truncation
    • Support for reasoning_effort parameter
  • Native OpenAI API backend (type: openai) with built-in configuration
    • Automatic API key loading from CONTINUUM_OPENAI_API_KEY environment variable
    • Built-in OpenAI model metadata in /v1/models response
  • OpenAI Images API support (/v1/images/generations) for DALL-E and gpt-image-1 models (#35)
    • Configurable image generation timeout (timeouts.request.image_generation)
    • Comprehensive input validation for image generation parameters
    • Response format validation for image generation API
  • Authenticated health checks for OpenAI and API-key backends
  • API key authentication to streaming requests
  • Filter /v1/models to show only configured models
  • Allow any config file path when explicitly specified via -c/--config
  • .env.example and typed backend configuration examples
  • Comprehensive model metadata for GLM 4.6, Kimi K2, DeepSeek, GPT, and Qwen3 series

Fixed

  • Streaming response truncation for thinking models (gemini-2.5-pro, gemini-3-pro)
  • Model ID normalization and streaming compatibility for Gemini backend
  • Convert max_tokens to max_completion_tokens for newer OpenAI models
  • Correct URL construction for all API endpoints
  • Security: Remove sensitive data from debug logs
  • Security: Add request body size limits to prevent DoS attacks

Changed

  • Refactor: Unify request retry logic with RequestType enum
  • Refactor: Improve Gemini backend performance with lock-free statistics and slice returns
  • Add Gemini backend documentation and max_tokens behavior documentation
  • Add image generation API documentation
  • Standardize capability naming in model-metadata.yaml

[0.9.0] - 2025-12-02

Added

  • Native Google Gemini API backend (type: gemini) with OpenAI-compatible endpoint (#32)
    • Automatic API key loading from CONTINUUM_GEMINI_API_KEY environment variable
    • Extended 300s streaming timeout for thinking models (gemini-2.5-pro, gemini-3-pro)
    • Automatic max_tokens adjustment for thinking models to prevent response truncation
    • Support for reasoning_effort parameter
  • OpenAI Images API support (/v1/images/generations) for DALL-E and gpt-image-1 models (#35)
  • Configurable image generation timeout (timeouts.request.image_generation)
  • Comprehensive model metadata for OpenAI models including GPT-5 family, o-series, audio/speech, video (Sora), and embedding models
  • Enhanced rate limiting with token bucket algorithm (#11)
  • Comprehensive Prometheus metrics and monitoring (#10)
  • Configuration file migration and auto-correction CLI utility (#29)
  • Comprehensive authentication for metrics endpoint

Fixed

  • CRITICAL: Eliminate race condition in token refill
  • CRITICAL: Protect API keys with SHA-256 hashing
  • CRITICAL: Prevent memory exhaustion via unbounded bucket growth
  • CRITICAL: Prevent header injection vulnerabilities
  • HIGH: Prevent IP spoofing via X-Forwarded-For manipulation
  • HIGH: Implement singleton pattern for metrics to prevent memory leaks
  • HIGH: Eliminate unnecessary string allocations
  • HIGH: Implement model extraction for rate limiting
  • Add comprehensive cardinality limits and label sanitization to prevent metric explosion DoS attacks
  • Improve error handling to prevent panic conditions
  • Resolve environment variable race condition in config test
  • Fix integration test failure in metrics RequestTimer
  • Fix unit test failures in metrics security module

Changed

  • Refactor: remove excessive Arc wrapping in rate limiting
  • Reorganize documentation structure for better maintainability
  • Add comprehensive metrics documentation
  • Update documentation for rate limiting feature
  • Remove development mock server and sample config files
  • Remove temporary test files and improve gitignore
  • Remove duplicate man page and update gitignore
  • Update README.md to mention correct repo
  • Update release workflows

[0.8.0] - 2025-09-09

Added

  • Model ID alias support for metadata sharing (#27)
  • Comprehensive rate limiting documentation
  • Robust rate limiting to models endpoint to prevent DoS via cache poisoning

Fixed

  • Return empty list instead of 503 when all backends are unhealthy (#28)
  • Improve error handling and classification
  • Resolve clippy warnings for MutexGuard held across await points

Changed

  • Increase rate limits for /v1/models endpoint to be more practical
  • Add alias feature documentation to configuration.md

[0.7.1] - 2025-09-08

Fixed

  • Improve config path validation for home directory and executable paths (#26)

[0.7.0] - 2025-09-07

Added

  • Extend /v1/models endpoint with rich metadata support (#23) (#25)
  • Enhanced Configuration Management (#9) (#22)
  • Advanced load balancing strategies with enhanced error handling (#21)

Fixed

  • Use streaming timeout configuration from config.yaml instead of hardcoded 25s limit

Changed

  • Add yaml to exclude list

[0.6.0] - 2025-09-03

Added

  • GitHub Project automation workflow
  • Comprehensive timeout configuration and model documentation updates

Fixed

  • Use timeout configuration from config.yaml instead of hardcoded values (#19)
  • Fix clippy warnings and benchmark compilation issues

Changed

  • Apply cargo fmt

[0.5.0] - 2025-09-02

Added

  • Extensible architecture with layered design (#16)
  • Comprehensive integration tests and performance optimizations
  • Complete service layer implementation
  • Middleware architecture and enhanced backend abstraction
  • Configurable connection pool size with CLI and config file support
  • Comprehensive configuration management with YAML support (#7)
  • Debian packaging and man page for continuum-router

Fixed

  • Handle Option correctly in tests
  • Update test to handle streaming requests without model field gracefully
  • Resolve floating-point precision and timing issues in tests
  • Resolve test failures and deadlocks in object pool and SSE parser
  • Resolve CI test failures and improve test performance
  • Resolve config watcher test failures in CI environment
  • Resolve initial health check race condition
  • Critical security vulnerabilities in error handling and retry logic
  • Adjust timeout test tolerance for timing variations

Changed

  • Extract complex types into type aliases for better readability
  • Resolve all cargo fmt and clippy warnings
  • Make retry configuration optional with sensible defaults
  • Optimize config access and add comprehensive timeout management
  • Update model names in timeout configuration to latest versions
  • Complete documentation update
  • Split oversized modules into layered architecture

Performance

  • Optimize config access and add comprehensive timeout management

[0.4.0] - 2025-08-25

Added

  • Model-based routing with health monitoring (#6)

Fixed

  • Improve health check integration and SSE parsing for better compatibility

Changed

  • Update README.md

[0.3.0] - 2025-08-25

Added

  • SSE streaming support for real-time chat completions (#5)

Fixed

  • Handle non-success status codes in streaming responses
  • Allow streaming to continue even when backend returns 404 or other error status codes
  • Send SSE error event first to notify client of the backend error status

[0.2.0] - 2025-08-25

Added

  • Model aggregation from multiple endpoints (#4)

[0.1.0] - 2025-08-24

Added

  • OpenAI-compatible endpoints and proxy functionality
  • /v1/models endpoint for listing available models
  • /v1/completions endpoint for legacy OpenAI completions API
  • /v1/chat/completions endpoint for chat API
  • Multiple backends support with round-robin load balancing (#1)
  • Fallback handler for undefined routes with proper error messages

Fixed

  • Improve error handling consistency across all endpoints

Changed

  • Update README with changelog and version information

Migration Notes

Upgrading to v0.16.0

  • New Files API: OpenAI-compatible Files API is now available at /v1/files
    • Upload files for fine-tuning, batch processing, or assistants
    • Files are stored locally with persistent metadata
    • Configure via files_api section in config.yaml
  • File Resolution: Reference uploaded files in chat completions
    • Use file IDs in your chat messages for automatic content injection
  • Persistent Metadata: File metadata now survives server restarts
    • Set metadata_storage: persistent (default) in files_api config
    • Set cleanup_orphans_on_startup: true to auto-clean orphaned files
  • Circuit Breaker: Add circuit_breaker section to your config.yaml for automatic backend failover
    • Configure failure threshold, recovery timeout, and half-open requests
  • New Fallback Feature: Add fallback section to your config.yaml to enable automatic model fallback
    • Define fallback chains: fallback_chains: { "gpt-4o": ["gpt-4-turbo", "gpt-3.5-turbo"] }
    • Configure trigger conditions in fallback_policy
    • Cross-provider fallback is supported (e.g., OpenAI → Anthropic)
  • Circuit Breaker Integration: Set circuit_breaker_open: true in trigger_conditions to integrate with existing circuit breaker
  • Response Headers: Check X-Fallback-Used header to detect when fallback was used
  • GPT-5.2 Support: New GPT-5.2 model metadata is available
  • No breaking changes from v0.15.0

Upgrading to v0.15.0

  • Split /v1/models Endpoint: The /v1/models endpoint now returns a lightweight response by default
    • For extended metadata, use /v1/models?extended=true
    • This improves performance for clients that only need basic model information
  • Nano Banana API: New support for Gemini Image Generation (Imagen) through OpenAI-compatible interface
    • Use nano-banana or nano-banana-pro model names
  • Error Handling: Improved reliability with proper error propagation instead of panics
  • Performance: LRU cache now uses read locks for better concurrent performance
  • No breaking changes from v0.14.x

Upgrading to v0.13.0

  • New Responses API: The /v1/responses endpoint is now available for OpenAI Responses API compatibility
    • Sessions are automatically managed with background cleanup for expired sessions
    • True SSE streaming provides real-time responses
  • Security: API keys are now stored using SecretString for improved security across all backends (#76)
  • Model Metadata: Override /v1/models response fields via model-metadata.yaml (#75)
  • No breaking changes from v0.12.0

Upgrading to v0.12.0

  • No breaking changes: This is a refactoring release with improved code organization
  • Bug fix: Consistent hash routing now correctly handles exact hash matches
  • Security: SSRF prevention module added for URL validation
  • Reliability: Panics replaced with Option returns for better error handling
  • API change: /v1/models endpoint no longer has hardcoded auth requirement

Upgrading to v0.11.0

  • New Anthropic backend: Add type: anthropic backends for native Anthropic Claude API support
    • Set CONTINUUM_ANTHROPIC_API_KEY environment variable for authentication
    • Supports extended thinking with automatic parameter conversion
    • OpenAI reasoning_effort parameter is automatically converted to Claude's thinking format
  • Streaming improvements: Accept-Encoding fixes ensure proper streaming for all backends
  • No breaking changes from v0.10.0

Upgrading to v0.10.0

  • New OpenAI backend: Add type: openai backends for native OpenAI API support
    • Set CONTINUUM_OPENAI_API_KEY environment variable for authentication
    • Built-in model metadata is automatically included in /v1/models response
  • Image Generation API: New /v1/images/generations endpoint for DALL-E models
    • Configure timeout via timeouts.request.image_generation (default: 120s)
    • Supports responseformat validation (url or b64json)
  • Gemini improvements: Streaming response truncation fixed for thinking models
    • Model ID normalization ensures proper routing
  • API key authentication: Streaming requests now support API key authentication
  • Security: Request body size limits prevent DoS attacks
  • Newer OpenAI models automatically use max_completion_tokens instead of max_tokens

Upgrading to v0.9.0

  • New Gemini backend: Add type: gemini backends for native Google Gemini API support
    • Set CONTINUUM_GEMINI_API_KEY environment variable for authentication
    • Thinking models (gemini-2.5-pro, gemini-3-pro) automatically get max_tokens: 16384 if client sends values below 4096
  • Enhanced rate limiting with token bucket algorithm is now available
  • Configure rate limiting via rate_limiting section in config.yaml
  • Prometheus metrics are now available at /metrics endpoint with authentication
  • Use --migrate-config-file CLI option to migrate and fix configuration files
  • Multiple critical security fixes have been applied to rate limiting

Upgrading to v0.8.0

  • Rate limiting is now enabled for the /v1/models endpoint
  • Empty list is returned instead of 503 error when all backends are unhealthy
  • Model aliases are now supported for metadata sharing

Upgrading to v0.7.0

  • Enhanced configuration management requires updating configuration files
  • New load balancing strategies are available
  • Streaming timeout is now configurable via config.yaml

Upgrading to v0.6.0

  • Timeout configuration is now read from config.yaml instead of hardcoded values
  • Update your configuration files to include timeout settings

Upgrading to v0.5.0

  • Major architectural refactoring with layered design
  • Configuration management now supports YAML files
  • Retry mechanisms have been enhanced with security improvements
  • Connection pool size is now configurable

This changelog reflects the actual development history of Continuum Router from its initial release to the current version.