Changelog¶
All notable changes to Continuum Router are documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]¶
Added¶
-
Cohere/Jina-Compatible Rerank and Sparse Embedding Endpoints - Add support for advanced retrieval APIs (#374)
- New
/v1/rerankendpoint (Cohere-compatible) for document reranking as a second-stage retrieval step - New
/embed_sparseendpoint (TEI/Jina-compatible) for sparse embeddings (SPLADE format) - Supports both simple string documents and structured documents with text field for reranking
- Request/response types with comprehensive validation for model, query, and documents fields
- New capability mappings:
rerank->rerankmethod,sparse_embedding->embed_sparsemethod - Example models added to model-metadata.yaml: BGE Reranker, Jina Reranker, SPLADE models
- New
-
BGE-M3 and Multilingual Embedding Model Support - Add model metadata and configuration examples for BGE-M3 and equivalent multilingual embedding models (#373)
- BGE-M3: 568M parameters, 1024 dimensions, 100+ languages, 8192 context. Supports dense, sparse (lexical), and ColBERT multi-vector retrieval
- BGE-Large-EN-v1.5: 335M parameters, 1024 dimensions, English-only, 512 context
- Multilingual-E5-Large: 560M parameters, 1024 dimensions, 100+ languages, 514 context
- Example backend configurations for vLLM, Ollama, and Text Embeddings Inference (TEI) deployments
- Addresses cross-lingual retrieval requirements for RAG systems
-
Plain Text Support for Anthropic File Transformer - Add text/plain support to Anthropic file transformer (#342)
- Text files are converted to
documentblocks with base64 data (same format as PDF) - Maximum text file size: 32MB (same as PDF)
- Text files don't have magic bytes validation (accepts any content)
- Updated SUPPORTEDDOCUMENTTYPES to include
text/plainalongsideapplication/pdf - Updated error messages to mention plain text support
- Text files are converted to
-
PDF Support for OpenAI and Anthropic File Transformers - Add PDF file support to file transformers (#340)
- OpenAI transformer: PDFs are converted to
fileblocks with base64 data or file IDs - Anthropic transformer: PDFs are converted to
documentblocks with base64 data - PDF magic bytes validation (
%PDF-signature) for security - Maximum PDF size: 32MB (100 page limit enforced by backends)
- Images remain at 20MB limit
- OpenAI transformer: PDFs are converted to
-
Native Anthropic Responses API Support - Add native Anthropic Messages API conversion for Responses API (#332)
- New
AnthropicConverterfor converting Responses API requests to native Anthropic Messages format - Full PDF file support via Anthropic's document understanding (
input_filewithfile_data) - Image file support with automatic media type detection
- Extended thinking (reasoning) content support for Claude 3+ models
- Streaming support with proper SSE event transformation from Anthropic format
- Non-streaming support with complete response transformation
- New
Fixed¶
-
SSRF Validation for External File URLs - Add SSRF protection when fetching external files (#332)
- Private IP address validation (blocks 10.x.x.x, 172.16-31.x.x, 192.168.x.x, 127.x.x.x)
- Localhost and link-local address blocking
- IPv6 loopback and link-local address blocking
- DNS rebinding protection via IP validation after resolution
-
Media Type Whitelist for File Inputs - Add security whitelist for allowed file types (#332)
- PDF:
application/pdf - Images:
image/jpeg,image/png,image/gif,image/webp - Rejects unsupported media types with clear error messages
- PDF:
-
AI SDK Compatibility for Responses API Streaming - Fix Vercel AI SDK compatibility issues with Responses API streaming (#334)
- Updated
ResponseStreamEventserialization to use dot-separated type names matching OpenAI spec (e.g.,"type": "response.output_text.done"instead of"type": "output_text_done") - Added
item_idfield toOutputItemAdded,OutputItemInProgress,OutputItemDone, and other streaming events - Added
sequence_numberfield to track event ordering (usesu64to prevent overflow in long streaming sessions) - Custom
Serializeimplementation forResponseStreamEventensures correct JSON output format - All existing streaming tests updated to verify new fields are correctly serialized
- Updated
-
Immediate Health Check After Hot Reload Backend Sync - Trigger immediate health check when backends are added via hot reload (#367)
- New backends added via configuration hot reload are now health-checked immediately
- Previously, new backends remained unavailable for up to 30 seconds (the default health check interval)
- Made
HealthChecker::perform_health_checks()public for external invocation - Improves model availability responsiveness in Backend.AI GO and other clients
0.34.0 - 2026-01-14¶
Added¶
- Automatic Quality Parameter Conversion - Add automatic quality parameter conversion between DALL-E and GPT Image models (#330)
to_dalle_quality()method onImageQualityenum for converting GPT Image quality values to DALL-E equivalents- Quality conversion applied transparently in
handle_openai_image_generation()andhandle_streaming_image_generation() - Quality conversion mapping:
- DALL-E 3: low/medium/auto → standard, high → hd
- GPT Image: standard → medium, hd → high
- Gemini models: quality parameter ignored (no changes needed)
is_dalle3_model()helper for exact DALL-E 3 model matchingconvert_quality_for_model()helper to eliminate code duplication- Conversion is logged for debugging and happens transparently without user-facing warnings
0.33.0 - 2026-01-13¶
Added¶
- Local File Resolution for Responses API - Resolve local
file_idreferences in Responses API requests (#325)- Files uploaded via the Files API can now be referenced using
file_idin Responses API requests - FileResolver service scans requests for
file_idreferences and loads content from local storage - File content is converted to base64
file_dataformat before sending to backends - Security features: file ownership verification (user_id check) and 10MB size limit for injection
- Graceful degradation: resolution failures fall back to original request with warning logs
- Files uploaded via the Files API can now be referenced using
Fixed¶
- Responses API Flat Tool Format - Fix Responses API
/v1/responsesendpoint to accept flat tool format (#323)- Function tools now use flat format:
{"type": "function", "name": "...", "parameters": {...}} - This aligns with OpenAI's Responses API specification
- Nested format (with
functionwrapper object) is no longer accepted for Responses API - Updated documentation with flat tool format examples
- Function tools now use flat format:
0.32.0 - 2026-01-09¶
Added¶
- Reasoning Effort Documentation and xhigh Fallback Logging - Add comprehensive reasoning effort documentation and improve fallback logging for xhigh effort level (#317)
- New documentation explaining reasoning effort parameter usage
- Improved logging when xhigh effort level falls back to high for non-GPT-5.2 models
Fixed¶
- Implicit Message Type Inference in Responses API InputItem - Support implicit message type inference when role field is missing (#316)
- Optimized InputItem deserializer for better performance
- Added invalid role test coverage
- Enables more flexible input handling in Responses API
0.31.5 - 2026-01-09¶
Added¶
-
Responses API Pass-through for Native OpenAI Backends - Smart routing for
/v1/responsesAPI based on backend type (#313)- OpenAI and Azure OpenAI backends now use pass-through mode, forwarding requests directly to
/v1/responsesendpoint - Other backends (Anthropic, Gemini, vLLM, Ollama, LlamaCpp, Generic) automatically convert to their native format
- Pass-through mode benefits: native PDF support, preserved reasoning state, access to built-in tools (websearch, filesearch), better cache utilization
- New
router.rsmodule withResponsesApiStrategyenum for routing decisions - New
passthrough.rsmodule withPassthroughServicefor direct request forwarding - Request payload size validation (16MB limit) for DoS prevention
- Comprehensive test coverage for routing strategy, error handling, and request validation
- OpenAI and Azure OpenAI backends now use pass-through mode, forwarding requests directly to
-
OpenAI Responses API File Input Types - Add support for multi-modal file inputs in Responses API (#311)
- New
input_text,input_file, andinput_imagecontent part types - Support for PDF documents and images via base64 data URLs (
file_data) - Support for external file URLs (
file_url) with SSRF validation - Warning logs for unsupported
file_idreferences (Files API integration pending) - Backend-specific transformers for Anthropic (document/image blocks) and Gemini (inlinedata/filedata)
- Comprehensive test coverage for all input types
- New
Fixed¶
- Pass-through Raw Error Responses - Forward raw backend error responses in pass-through mode for better error debugging
0.31.4 - 2026-01-07¶
Fixed¶
- Hot Reload Support for API Key Forwarding - Fix hot reload support in proxy and streaming handlers (#310)
- Use
current_config()instead of captured config snapshot in proxy and streaming handlers - API key and other configuration changes via hot reload now properly apply to new requests
- Ensures runtime configuration updates affect backend request forwarding
- Added comprehensive end-to-end tests for hot reload api_key application
- Use
0.31.3 - 2026-01-06¶
Fixed¶
-
Unix Socket Anthropic Request/Response Transformation - Fix Anthropic backends accessed via Unix socket failing due to missing transformations (#307, #308)
- Unix socket transport now applies the same request transformation as HTTP transport for Anthropic backends
- OpenAI-format requests are properly converted to Anthropic format before sending
- Anthropic responses are transformed back to OpenAI format
- Endpoint is correctly rewritten from
/v1/chat/completionsto/v1/messages - Added comprehensive integration tests for Unix socket Anthropic transformations
-
Anthropic Non-streaming Stream Parameter - Preserve stream parameter for non-streaming Anthropic requests (#305, #306)
- Replace
transform_openai_to_anthropic_request(which forcesstream: true) withtransform_openai_to_anthropic_with_global_promptin non-streaming path - Fixes issue where requests with
stream: falsewere incorrectly sent to Anthropic API withstream: true - Renamed
transform_openai_to_anthropic_requesttotransform_openai_to_anthropic_streamingfor clarity
- Replace
Documentation¶
- Jinja2 Syntax Escaping - Escape Jinja2 syntax in Korean configuration docs to prevent mkdocs-macros-plugin errors
0.31.2 - 2026-01-05¶
Added¶
-
Non-streaming Support for Anthropic Backend - Transform OpenAI-formatted requests to Anthropic format for non-streaming chat completion calls, and convert Anthropic responses back to OpenAI format
- Non-streaming requests to Anthropic backends now properly transform request/response formats
- Updated streaming handlers to use transform_str for proper tool call handling
-
Tool Call and Tool Result Transformation for Anthropic Backend - Enable proper tool use workflows when routing to Anthropic models
- Transform OpenAI-style toolcalls in assistant messages to Anthropic's tooluse format
- Transform tool result messages to Anthropic's tool_result format
- Enables multi-turn tool use conversations with Anthropic models
Dependencies¶
- Update 12 packages including rustls, tokio-stream, and syn to latest versions
0.31.1 - 2026-01-04¶
Fixed¶
- Anthropic Non-streaming Authentication Headers - Fix non-streaming Anthropic requests failing with wrong authentication header (#300, #301)
- Non-streaming requests to Anthropic backends now correctly use
x-api-keyheader instead ofAuthorization: Bearer - Added
anthropic-versionheader for all Anthropic backend requests - Applied consistent header handling between HTTP and Unix socket transport paths
- Fixed issue where Anthropic API returned "Invalid Anthropic API Key" error (HTTP 400)
- Non-streaming requests to Anthropic backends now correctly use
0.31.0 - 2026-01-04¶
Added¶
-
Unix Socket Server Binding - Add Unix socket binding support alongside TCP (#298)
- Server can now bind to Unix domain sockets for local communication
- Configure via
server.unix_socketin config file - Supports concurrent TCP and Unix socket bindings
-
Reasoning Parameter Support for Responses API - Add
reasoningparameter support to/v1/responsesendpoint (#296)- Supports nested format:
{"reasoning": {"effort": "high"}} - Valid effort levels:
low,medium,high,xhigh(GPT-5.2 only) - Type-safe validation using
ReasoningEffortLevelenum - Automatic conversion to flat
reasoning_effortformat for backends - Invalid effort values rejected at deserialization with clear error messages
- Added
with_reasoning()builder method forResponsesRequest
- Supports nested format:
-
Configurable Health Check Endpoints - Add configurable health check endpoints per backend type
- Customize health check paths for different backend types
- Support for backend-specific health verification
0.30.0 - 2026-01-01¶
Added¶
- Wildcard Patterns and Date Suffix Handling for Model Aliases - Support wildcard patterns and automatic date suffix handling in model aliases (#286)
- Automatic date suffix normalization: Models with date suffixes (e.g.,
claude-opus-4-5-20251130) automatically match metadata for base model (e.g.,claude-opus-4-5-20251101) - Supported date formats:
-YYYYMMDD,-YYYY-MM-DD,-YYMM,@YYYYMMDD - Wildcard pattern matching in aliases using
*character - Prefix patterns:
claude-*matchesclaude-opus,claude-sonnet, etc. - Suffix patterns:
*-previewmatchesgpt-4o-preview,o1-preview, etc. - Infix patterns:
gpt-*-turbomatchesgpt-4-turbo,gpt-3.5-turbo, etc. - Zero-config date handling: Works automatically without configuration changes
- Matching priority: Exact ID > Exact alias > Date suffix > Wildcard > Base name fallback
- Automatic date suffix normalization: Models with date suffixes (e.g.,
Fixed¶
- Anthropic Backend Default URL - Apply default URL for Anthropic backend when not specified (#288)
- Backend-specific owned_by Values - Replace owned_by placeholders with backend-type-specific values (#287)
Documentation¶
- Translate wildcard pattern and date suffix handling documentation to Korean (#289)
0.29.0 - 2026-01-01¶
Added¶
- Accelerated Health Checks During Backend Warmup - Implement accelerated health check during backend warmup (#282)
- When a backend returns HTTP 503 (Service Unavailable), it enters a "warming up" state
- During warmup, health checks occur at an accelerated interval (default: 1 second)
- Reduces model availability detection latency from up to 30 seconds to approximately 1 second
- Configurable via
warmup_check_interval(default: 1s) andmax_warmup_duration(default: 300s) - Particularly useful for backends like llama.cpp that return HTTP 503 while loading models
- Model Metadata CLI Option - Add --model-metadata option for specifying model metadata file path (#281)
- New
--model-metadataCLI argument to specify model metadata YAML file at runtime - Overrides config file
model_metadata_filesetting - Supports absolute paths, relative paths, and tilde expansion (~)
- New
Fixed¶
- OpenAI owned_by Field - Replace OpenAI owned_by placeholder with 'openai' (#280)
- Models from OpenAI backend now correctly show
owned_by: openaiinstead of placeholder text
- Models from OpenAI backend now correctly show
- Admin API Race Condition - Prevent race condition in Admin API concurrent backend creation (#278)
- Fixed issue where concurrent backend creation requests could cause data corruption
- Added proper synchronization for backend management operations
- Hot Reload Processing Steps - Add missing processing steps to hot reload (#277)
- Fixed issue where some configuration changes were not properly applied during hot reload
- Ensures all processing steps are executed when configuration is reloaded
- Cloud Backend Availability Status - Cloud backends now show available:true in /v1/models/{model_id} (#272)
- Fixed issue where cloud backends (OpenAI, Anthropic, Gemini) were incorrectly showing as unavailable
- Cloud backends are now correctly marked as available when healthy
Documentation¶
- Add tests and documentation for v0.29.0 features (#459da6a)
0.28.0 - 2025-12-31¶
Added¶
- SSE Streaming Support for Tool Calls - Add SSE streaming support for tool calls (#258)
- Real-time streaming of tool call responses over Server-Sent Events
- Enables efficient streaming responses for function-calling scenarios
- llama.cpp Tool Calling Auto-Detection - Auto-detect tool calling support via /props endpoint (#263)
- Queries
/propsendpoint during model discovery to analyzechat_template - Detects tool-related keywords (
tool,tools,tool_call,function, etc.) - Automatically enables
function_callingcapability when detected - Graceful fallback when
/propsendpoint is unavailable - Works with both HTTP and Unix socket backends
- Queries
- Extended /v1/models/{model_id} Endpoint - Extend with rich metadata fields (#262)
- Returns comprehensive model metadata including capabilities and pricing
- Enhanced response format with additional model information
- Tool Result Message Transformation - Implement tool result message transformation for multi-turn conversations (#265)
- Transforms tool result messages (
role: "tool") to backend-native formats - Anthropic: Converts to
userrole withtool_resultcontent blocks - Gemini: Converts to
functionrole withfunctionResponseparts - Combines consecutive tool results for Anthropic (parallel tool calls)
- Automatic function name lookup for Gemini transformations
- Preserves
is_errorindicator for error responses
- Transforms tool result messages (
- Backend-specific owned_by Placeholders - Add owned_by placeholders for llamacpp, vllm, ollama, http backends (#267)
Improved¶
- CLI Help Output Formatting - Improve --help output formatting with title header and project attribution (#269)
- Enhanced visual appearance for command-line help
- Added project attribution in help output
Fixed¶
- Model Metadata Cache Sync - Sync model metadata cache with ConfigManager (#270)
- Ensures model cache properly reflects configuration changes
CI/CD¶
- Comprehensive integration tests for tool calling (#264)
Dependencies¶
- Bump the minor-and-patch group with 3 updates (#257)
Technical¶
- Fix dead_code warnings for Unix-only items on Windows builds
[0.27.0] - 2025-12-28¶
Added¶
- llama.cpp Tool Calling Auto-Detection - Automatic detection of tool calling support for llama.cpp backends (#260)
- Queries
/propsendpoint during model discovery to analyzechat_template - Detects tool-related keywords (
tool,tools,tool_call,function, etc.) - Automatically enables
function_callingcapability when detected - Graceful fallback when
/propsendpoint is unavailable - Works with both HTTP and Unix socket backends
- Queries
- Complete Unix Socket Support - Full Unix socket support for model discovery and streaming (#248, #252, #253, #254, #256)
- SSE/streaming support for Unix socket backends, enabling real-time responses over local sockets
- Backend type auto-detection for Unix socket connections
- vLLM model discovery support via Unix sockets
- llama.cpp model discovery support via Unix sockets
- Model fetcher fully supports Unix socket backends
- Tool Call Transformation - Implement OpenAI tool call transformation across all backends (#244, #245, #246)
- Tool definition transformation for Anthropic, Gemini, and llama.cpp backends
- Tool choice transformation with support for
auto,none,required, and specific function selection - Tool call response transformation for unified response format across providers
- Multi-Turn Tool Conversation Support - Message transformation for tool calling in multi-turn conversations (#241)
- Transforms tool result messages (
role: "tool") to backend-native formats - Anthropic: Converts to
userrole withtool_resultcontent blocks - Gemini: Converts to
functionrole withfunctionResponseparts - Combines consecutive tool results for Anthropic (parallel tool calls)
- Automatic function name lookup for Gemini transformations
- Preserves
is_errorindicator for error responses
- Transforms tool result messages (
[0.26.0] - 2025-12-27¶
Added¶
- Tool Choice Transformation - Automatic transformation of
tool_choiceparameter across backends (#239)- Supports
auto,none,required, and specific function selection - Anthropic: Transforms to
{"type": "auto|any|tool"}format, handles "none" by removing tools - Gemini: Transforms to
tool_config.function_calling_configstructure - llama.cpp: Preserves
parallel_tool_callsparameter for parallel function calling - Integrates with model fallback system for cross-provider tool calling
- Supports
- Single Model Retrieval Endpoint - Add GET /v1/models/{model} endpoint for single model retrieval with availability status (#236)
- Returns model information with an additional
availablefield indicating real-time availability available: truewhen at least one healthy backend provides the modelavailable: falsewhen the model exists but all backends providing it are unhealthy- Returns 404 if the model does not exist across any backend
- Optimized performance: avoids full model aggregation by targeting specific model lookup
- Returns model information with an additional
[0.25.0] - 2025-12-26¶
Added¶
- CORS (Cross-Origin Resource Sharing) Support - Configurable CORS middleware for embedding the router in web applications (#234)
- Support for Tauri desktop apps, Electron apps, and web frontends
- Wildcard origins and port patterns (e.g.,
http://localhost:*) - Custom schemes support (e.g.,
tauri://localhost) - Configurable methods, headers, and credentials
- Preflight cache with configurable max-age
- Unix Domain Socket Backend Support - Secure local LLM communication via Unix sockets (#232)
- Use
unix:///path/to/socketURL scheme for local backends - Better security through file system permissions (no TCP port exposure)
- Lower latency than localhost TCP (~30% improvement)
- No port conflicts when running multiple LLM servers
- Platform support: Linux and macOS (Windows planned for future releases)
- Use
[0.23.1] - 2025-12-25¶
CI/CD¶
- Windows Build Support - Add Windows x86_64 build target to release workflow (#224)
- Enables native Windows builds in the release pipeline
- Cross-compilation from Linux using mingw-w64
[0.23.0] - 2025-12-23¶
Added¶
- GLM 4.7 Model Support - Add support for Zhipu AI's GLM 4.7 model with thinking capabilities (#222)
- Model metadata in model-metadata.yaml with full specifications (355B MoE, 32B active parameters)
- Support for thinking parameters:
enable_thinking(boolean) andthinking_budget(1-204,800 tokens) - 200K context window with up to 131K token output
- Z.AI backend configuration example in config.yaml.example
- SiliconFlow alternative backend configuration
- Comprehensive integration tests for model metadata
- Pricing: $0.60/1M input tokens, $2.20/1M output tokens
- Thinking Pattern Metadata - Add thinking pattern metadata for models with implicit start tags (#218)
- Support for models that use implicit thinking start tags
- Pattern-based detection for thinking content extraction
- GCP Service Account Authentication - Add GCP Service Account authentication support for Gemini backend (#208)
- Support for JSON key file authentication
- Environment variable based authentication
- Automatic token refresh and management
- Distributed Tracing - Add distributed tracing with correlation ID propagation (#207)
- W3C Trace Context support with traceparent header
- Configurable trace ID, request ID, and correlation ID headers
- Trace ID propagation across all retry attempts
- Security validation for trace IDs from headers
- New Model Metadata - Add model metadata for NVIDIA Nemotron 3 Nano, Qwen Image Layered, and Kakao Kanana-2 (#202)
- ASCII Diagram Replacement - Add ASCII diagram to image replacement system for MkDocs (#200)
- Automatic replacement of ASCII diagrams with SVG images during MkDocs build
- Preserve ASCII art visibility in raw Markdown
Changed¶
- CI Optimization - Skip Rust tests when only non-code files change (#204)
- Faster CI for documentation-only changes
- Path-based filtering for test execution
Fixed¶
- Cache Stampede Prevention - Prevent cache stampede with singleflight, stale-while-revalidate, and background refresh (#220)
- Singleflight pattern prevents thundering herd on model cache expiration
- Stale-while-revalidate returns cached data immediately while refreshing in background
- Background refresh proactively updates cache before expiration
- Hot Reload for Global Prompts - Apply global_prompts changes via hot reload (#219)
- Global prompt configuration changes now take effect without restart
- Model Cache Invalidation - Invalidate model cache when backend config changes (#206)
- Backend configuration changes now properly trigger model cache refresh
- Documentation Improvements - Improve diagram rendering with inline SVG and responsive sizing
- Translation Typo - Fix translation typo in documentation
- Docker CI Fixes - Handle multi-line tags in Docker manifest creation
- Private Repo Access - Use gh CLI for private repo asset download in CI
- GitHub Token Auth - Add GitHub token authentication for private repository access
- Docs Workflow - Remove release trigger from docs workflow to avoid environment protection error
CI/CD¶
- Bump actions/github-script from 7 to 8 (#210)
- Bump apple-actions/import-codesign-certs from 3 to 6 (#212)
- Bump actions/cache from 4 to 5 (#211)
- Bump actions/checkout from 4 to 6 (#209)
[0.22.0] - 2025-12-19¶
Added¶
- Docker Support with Pre-built Binaries - Add Dockerfile and Dockerfile.alpine that download pre-built binaries from GitHub Releases (#189)
- Debian Bookworm-based image (~50MB) for general use
- Alpine 3.20-based image (~10MB) for minimal deployments
- Multi-architecture support (linux/amd64, linux/arm64) using TARGETARCH
- VERSION build argument for selecting release version
- Non-root user execution for security
- OCI labels for image metadata
- Container Health Check CLI - Implement
--health-checkCLI argument for container orchestration (#189)- Returns exit code 0 if server is healthy, 1 if unhealthy
- Optional
--health-check-urlfor custom health endpoint - Proper IPv6 address handling
- 5-second default timeout
- Docker Compose Quick Start - Add docker-compose.yml for easy deployment (#189)
- Volume mount for configuration
- Environment variable support (RUST_LOG)
- Resource limits and health checks
- Automated Docker Image Publishing - Add Docker build and push to ghcr.io in release workflow (#189)
- Builds both Debian and Alpine images after binary release
- Multi-platform support (linux/amd64, linux/arm64)
- Automatic tagging with semver (VERSION, MAJOR.MINOR, latest)
- Alpine images tagged with -alpine suffix
- GitHub Actions cache for faster builds
- MkDocs Documentation Website - Build comprehensive documentation site with Material theme (#183)
- Full navigation structure with Getting Started, Features, Operations, and Development sections
- GitHub Actions workflow for automatic deployment to GitHub Pages
- Custom stylesheets and theme configuration
- Korean Documentation Translation - Complete Korean localization of all documentation (#190)
- All 20 documentation files translated to Korean
- Language switcher in navigation (English/Korean)
- Multi-language build in GitHub Actions workflow
- Dependency Security Auditing - Add cargo-deny for vulnerability scanning (#192)
- Security advisory checking in CI workflow
- License compliance verification
- Dependency source validation
- Dependabot Integration - Automated dependency updates for Cargo and GitHub Actions (#192)
- Security Policy - Add comprehensive SECURITY.md with vulnerability reporting process (#191)
Changed¶
- Integrate orphaned architecture documentation into MkDocs site (#186)
- Rename documentation files to lowercase kebab-case for URL-friendly filenames (#186)
- Update various GitHub Actions to latest versions (checkout@v6, setup-python@v6, upload-artifact@v6, etc.)
Fixed¶
- Health check response validation logic bug (operator precedence issue)
- Address parsing fallback that was silently hiding configuration errors
- IPv6 address formatting in health check (now correctly uses bracket notation)
Security¶
- Updated reqwest 0.11→0.12, prometheus 0.13→0.14, validator 0.18→0.20
- Replaced dotenv with dotenvy for better maintenance
- Added .dockerignore to exclude sensitive files from build context
[0.21.0] - 2025-12-19¶
Added¶
- Gemini 3 Flash Preview Model - Add support for gemini-3-flash-preview model (#168)
- Backend Error Passthrough - Pass through detailed error messages from backends for 4xx responses (#177)
- Parse and forward original error messages from OpenAI, Anthropic, and Gemini backends
- Preserve
paramfield when available (useful for invalid parameter errors) - Falls back to generic error message if backend response cannot be parsed
- Error format remains OpenAI-compatible
- Comprehensive unit tests for error parsing across all backend formats
- Default Authentication Mode for API Endpoints - Configurable authentication enforcement for API endpoints (#173)
- New
modefield inapi_keysconfiguration:permissive(default) orblocking permissivemode: Requests without API key are allowed (backward compatible)blockingmode: Only authenticated requests are processed, unauthenticated requests receive 401- Protected endpoints:
/v1/chat/completions,/v1/completions,/v1/responses,/v1/images/*,/v1/models - Health endpoints (
/health,/healthz) always accessible without authentication - Hot reload support for authentication mode changes
- Comprehensive integration tests for both modes
- Updated API.md, configuration.md, and manpage documentation
- New
Fixed¶
- UTF-8 Multi-byte Character Corruption - Handle UTF-8 multi-byte character corruption in streaming responses (#179)
- GPT Image response_format - Strip response_format parameter for GPT Image models (#176)
- Auto-discovery Validation - Allow auto-discovery for all backends except Anthropic (#172)
Changed¶
- Updated architecture.md and fixed documentation issues (#167, #169)
- Added AGENTS.md and linked CLAUDE.md to it
[0.20.0] - 2025-12-18¶
Added¶
- Image Variations Support for Gemini - Add image variations support for Gemini (nano-banana) models (#165)
- Image Edit Support for Gemini - Implement limited image edit support for Gemini (nano-banana) models (#164)
- Enhanced Image Generation - Enhance /v1/images/generations with streaming and GPT Image features (#161)
- GPT Image 1.5 Model - Add gpt-image-1.5 model support (#159)
- Image Variations Endpoint - Implement /v1/images/variations endpoint for image variations (#155)
- Image Edits Endpoint - Implement /v1/images/edits endpoint for image editing (inpainting) (#156)
- Full OpenAI Images Edit API compatibility
- Supports GPT Image models:
gpt-image-1,gpt-image-1-mini,gpt-image-1.5(recommended) - Legacy support for
dall-e-2model - Multipart form-data parsing with shared utilities
- PNG image validation (format, size, square dimensions)
- Optional mask validation (dimension matching with source image)
- Shared Image Utilities - Implement shared utilities for image edit/variations endpoints (#154)
- External Prompt Files - Support loading system prompts from external Markdown files (#146)
- New
prompt_filefield inBackendPromptConfigandModelPromptConfig - New
default_fileandprompts_dirfields inGlobalPromptConfig - Secure path validation with path traversal attack prevention
- REST API endpoints for prompt file management
- File caching with size limits (100 entries max, 50MB total)
- Hot-reload support for prompt files
- New
- Solar Open 100B Model - Add Solar Open 100B model metadata
- Automatic Model Discovery - Backends automatically discover available models from
/v1/modelsAPI when models are not explicitly configured (#142)- OpenAI, Gemini, and vLLM backends support auto-discovery
- Ollama backend uses vLLM's discovery mechanism (OpenAI-compatible API)
- 10-second timeout prevents blocking startup
- Falls back to hardcoded defaults if discovery fails
Changed¶
BackendFactory::create_backend_from_typed_config()is now async to support async model discovery- Backend
from_config()methods for OpenAI, Gemini, and vLLM are now async
Security¶
- API Key Redaction - Implement API key redaction to prevent credential exposure (#150)
Performance¶
- Binary Size Optimization - Optimize release binary size from 20MB to 6MB (70% reduction) (#144)
Refactored¶
[0.19.0] - 2025-12-13¶
Added¶
- Runtime Configuration Management API - Comprehensive REST API for viewing and modifying configuration at runtime (#139)
- Configuration Query APIs:
GET /admin/config/full- Retrieve full configuration with sensitive info maskedGET /admin/config/sections- List all 15 configuration sectionsGET /admin/config/{section}- Get specific section configurationGET /admin/config/schema- JSON Schema for client-side validation
- Configuration Modification APIs:
PUT /admin/config/{section}- Replace section configurationPATCH /admin/config/{section}- Partial update (JSON merge patch)POST /admin/config/validate- Validate configuration before applyingPOST /admin/config/apply- Apply configuration with hot reload
- Configuration Save/Restore APIs:
POST /admin/config/export- Export configuration (YAML/JSON/TOML)POST /admin/config/import- Import and apply configurationGET /admin/config/history- View configuration change historyPOST /admin/config/rollback/{version}- Rollback to previous version
- Backend Management APIs:
POST /admin/backends- Add new backendGET /admin/backends/{name}- Get backend configurationPUT /admin/backends/{name}- Update backend configurationDELETE /admin/backends/{name}- Remove backendPUT /admin/backends/{name}/weight- Update backend weightPUT /admin/backends/{name}/models- Update backend model list
- Sensitive information masking for API keys, passwords, tokens
- JSON Schema generation for all configuration sections
- Configuration history tracking (up to 100 entries, configurable)
- Memory-efficient history storage with size-based eviction (10MB limit)
- Atomic version counter using AtomicU64 for thread safety
- Structured error responses with error codes
- Configuration Query APIs:
- Admin REST API Documentation - Comprehensive developer guide (docs/admin-api.md)
- Complete API reference with request/response examples
- Client SDK examples for Python, JavaScript/TypeScript, and Go
- Best practices and security considerations
- Integration Tests - 33 integration tests for Configuration Management API endpoints
Fixed¶
- CRITICAL: Configuration changes now actually applied to running system
- CRITICAL: Memory growth controlled with JSON string storage and size-based eviction
- HIGH: Input validation added (1MB content limit, 32-level nesting depth)
- HIGH: Sensitive export requires elevated permission and audit logging
- HIGH: Comprehensive sensitive field detection (30+ patterns)
- MEDIUM: Validation functions now perform actual validation
- MEDIUM: Race condition fixed with AtomicU64 for version counter
- MEDIUM: Colon removed from allowed backend name characters
- MEDIUM: Structured error responses with error codes
- MEDIUM: Initialize flag prevents duplicate history entries
- LOW: Unnecessary clones removed for better performance
- LOW: Limits now configurable via AdminConfig
- LOW: Duplicate validation logic refactored
- LOW: Test coverage improved for edge cases
Changed¶
- Enhanced documentation for Configuration Management API across all guides
- Updated manpage with new admin endpoints
- Updated API.md with comprehensive Configuration Management API section
[0.18.0] - 2025-12-13¶
Added¶
- Per-API-Key Rate Limiting - Implement per-API-key rate limiting (#137)
- Individual rate limits for each API key
- Configurable requests per minute per key
- API Key Management System - Comprehensive API key management and configuration system
- Multiple key sources: config file, external file, environment variables
- Key properties: scopes, rate limits, expiration, enabled status
- Hot reload support for key configuration changes
- Files API Authentication - Implement authentication and authorization for Files API (#131)
- API key authentication for file operations
- File ownership enforcement
- Admin access control for all files
- Hot Reload for Runtime Configuration - Complete hot reload functionality for runtime configuration updates (#130)
- Automatic configuration file watching
- Classified updates: immediate, gradual, restart-required
Changed¶
- Major refactoring with modular structure
- Updated architecture.md to reflect refactored module structure
Fixed¶
- Add ConnectInfo extension for admin/metrics/files endpoints
- Address security vulnerabilities in API key management
- Address code quality issues in API key management
Documentation¶
- Add API key management documentation
- Add comprehensive API key management tests
[0.17.0] - 2025-12-12¶
Added¶
- Anthropic Backend File Content Transformation - Files uploaded to the router can now be used with Anthropic backend (#126)
- Automatic conversion of file content to Anthropic message format
- Support for text and document files with base64 encoding
- Seamless integration with file resolution middleware
- Gemini Backend File Content Transformation - Files uploaded to the router can now be used with Gemini backend (#127)
- Automatic conversion of file content to Gemini API format
- Support for inline data with proper MIME type handling
- Cross-provider file support enables files uploaded once to work across all backends
Fixed¶
- Streaming File Uploads - Implement streaming file uploads to prevent memory exhaustion (#128)
- Large file uploads no longer load entire file into memory
- Streaming processing for efficient memory usage
- Prevents OOM errors when uploading large files
Changed¶
- None
[0.16.0] - 2025-12-12¶
Added¶
- OpenAI-Compatible Files API - Full implementation of OpenAI Files API endpoints (#111)
- Upload files with multipart/form-data support
- List, retrieve, and delete files
- Download file content
- Supports purpose: fine-tune, batch, assistants, user_data
- File Resolution Middleware - Automatic file content injection for chat completions (#120)
- Reference uploaded files in chat messages with file IDs
- Automatic content injection into chat context
- Persistent Metadata Storage - File metadata persists across server restarts (#125)
- Sidecar JSON files (.meta.json) stored alongside data files
- Automatic recovery on startup with metadata rebuild from files
- Orphan file detection and optional cleanup
- OpenAI Backend File Handling - Files uploaded locally are forwarded to OpenAI when needed (#121, #122)
- GPT-5.2 Model Support - Added GPT-5.2 model metadata to OpenAI backend (#124)
- Circuit Breaker Pattern - Automatic backend failover with circuit breaker (#93)
- States: Closed → Open → Half-Open → Closed cycle
- Configurable failure thresholds and recovery timeout
- Per-backend circuit breaker instances
- Admin endpoints for circuit breaker status and control
- Admin Endpoint Authentication - Secure admin endpoints with authentication and audit logging
- Configurable Fallback Models - Automatic model fallback for unavailable model scenarios (#50)
- Define fallback chains for primary models (e.g., gpt-4o → gpt-4-turbo → gpt-3.5-turbo)
- Cross-provider fallback support (e.g., OpenAI → Anthropic)
- Automatic parameter translation between providers
- Integration with circuit breaker for layered failover protection
- Configurable trigger conditions (error codes, timeout, connection error, circuit breaker open)
- Response headers indicate when fallback was used (X-Fallback-Used, X-Original-Model, X-Fallback-Model)
- Prometheus metrics for fallback monitoring
- Pre-commit Hook - Automated code formatting and linting before commits
Fixed¶
- Fallback Chain Validation - Integrate chain validation into Validate derive
- Fallback Performance - Use index-based lookup for fallback chain traversal
- Lock Contention - Reduce lock contention in FallbackService with snapshot pattern
- Security - Sanitize fallback error headers and metric labels
- Circuit Breaker Security - Add backend name validation in admin endpoints
- Thread Safety - Use CAS loop for thread-safe half-open request limiting
Changed¶
- Documentation Updates - Comprehensive documentation for fallback configuration, circuit breaker, and Files API
- Code Quality - Fix clippy warnings and format code
- Pre-commit Hook Location - Move pre-commit hook to .githooks directory
[0.15.0] - 2025-12-05¶
Added¶
- Nano Banana API Support - Add Gemini Image Generation API support with OpenAI-compatible interface (#102)
- Supports nano-banana and nano-banana-pro models
- Automatic format conversion between OpenAI Images API and Gemini Imagen API
- Split /v1/models Endpoint - Standard lightweight response vs extended metadata response (#101)
/v1/modelsreturns lightweight response for better performance/v1/models?extended=truereturns full metadata for detailed model information
Changed¶
- Extract StreamService - Streaming handler logic extracted to dedicated StreamService for modular architecture (#106)
- Eliminate Retry Logic Duplication - Consolidated retry logic code in proxy.rs (#103)
Fixed¶
- Proper Error Propagation - Replace
.expect()panics with proper error propagation in HttpClientFactory (#104)
Performance¶
- LRU Cache Optimization - Use read lock instead of write lock for cache lookups (#105)
[0.14.2] - 2025-12-05¶
Added¶
- Token Usage Logging - Log input/output token counts on request completion (#92)
- Exclude List for Reports - Add exclude list configuration for reports
Changed¶
- None
Fixed¶
- None
[0.14.1] - 2025-12-05¶
Added¶
- TTFB Benchmark Targets - Add TTFB benchmark targets to Makefile
- Connection Pre-warming - Add connection pre-warming for Anthropic, Gemini, OpenAI backends
Fixed¶
- Anthropic Backend TTFT - Optimize Anthropic backend TTFT with connection pooling and HTTP/2 (#90)
- Gemini Backend TTFT - Optimize Gemini backend TTFT with connection pooling and HTTP/2 (#88)
- Model Metadata Alias Matching - Apply base name fallback matching to aliases in model metadata lookup (#84)
Changed¶
- Shared HTTP Client - Share HTTP client between HealthChecker and request handler
- Updated architecture and performance documentation
[0.14.0] - 2025-12-04¶
Added¶
- Global System Prompt Injection - Add router-wide global system prompt injection (#82)
Fixed¶
- GitHub Actions - Replace deprecated actions-rs/toolchain with dtolnay/rust-toolchain
- macOS ARM64 Build - Add RUSTFLAGS for macOS ARM64 ring build
- musl Build - Switch to rustls-tls for musl cross-compilation support
Changed¶
- Update GitHub Action runner
[0.13.0] - 2025-12-04¶
Added¶
- OpenAI Responses API (
/v1/responses) - Full implementation of OpenAI's Responses API (#49)- Session-based response management with automatic expiration
- Background cleanup task for expired sessions
- Request/response format converter between Responses API and Chat Completions
- SecretString for API Keys - Secure API key storage using SecretString across all backends (#76)
- Model Metadata Override - Allow overriding /v1/models response fields via model-metadata.yaml (#75)
Fixed¶
- True SSE Streaming - Implement proper Server-Sent Events streaming for /v1/responses API
Changed¶
- Immediate Mode for SseParser - Reduced first-response latency with immediate parsing mode
- String Allocation Optimizations - Improved performance with reduced allocations
- Error Handling Standardization - Consistent error handling patterns across the codebase
Security¶
- Session Access Control - Added proper access control for session management
- Input Validation - Comprehensive input validation for Responses API
[0.12.0] - 2025-12-04¶
Added¶
- SSRF Prevention Module - New UrlValidator module with comprehensive SSRF prevention (#66)
- Centralized HTTP Client Factory - HttpClientFactory for consistent HTTP client creation across backends (#67)
Fixed¶
- Consistent Hash Algorithm - Handle exact hash matches in binary search for proper routing (#72)
- Replace Panics with Option Returns - Improve reliability by replacing panics with Option returns (#71)
- Remove Hardcoded Auth Requirement - /v1/models endpoint no longer requires hardcoded authentication
- GitHub Actions - Use GitHub App token for Projects V2 API access
Changed¶
- Reorganize OpenAI Model Metadata - Model metadata organized by family for better maintainability (#74)
- Extract AnthropicStreamTransformer - Dedicated module for Anthropic stream transformation (#73)
- Split Backends Module - backends mod.rs split into separate modules for cleaner architecture (#69)
- Extract Embedded Tests - Tests moved to separate files for better organization (#68)
- Extract RequestExecutor - Shared common module for request execution (#65)
- Extract HeaderBuilder - Auth strategies moved to dedicated module (#64)
- Extract AtomicStatistics - Shared common module for atomic statistics
Technical Improvements¶
- Improved code organization with modular architecture
- Implemented stats aggregation for better observability
- Enhanced security with SSRF prevention capabilities
[0.11.0] - 2025-12-03¶
Added¶
- Native Anthropic Claude API backend (
type: anthropic) with OpenAI-compatible endpoint (#33)- Automatic API key loading from
CONTINUUM_ANTHROPIC_API_KEYenvironment variable - Extended thinking block support for Claude thinking models
- OpenAI to Claude reasoning parameter conversion (
reasoning_effort) - Support for flat
reasoning_effortparameter
- Automatic API key loading from
- Claude 4, 4.1, 4.5 model metadata documentation
Fixed¶
- Improve health check and model fetching for Anthropic/Gemini backends
- Add
Accept-Encoding: identityheader to streaming requests to prevent compression issues - Fix
make_backend_requestin proxy.rs for proper Accept-Encoding handling
Changed¶
- Refactor: apply code formatting and fix clippy warnings
- Refactor: use reqwest
no_gzip/no_brotli/no_deflateinstead of Accept-Encoding header
[0.10.0] - 2025-12-03¶
Added¶
- Native Google Gemini API backend (
type: gemini) with OpenAI-compatible endpoint (#32)- Automatic API key loading from
CONTINUUM_GEMINI_API_KEYenvironment variable - Extended 300s streaming timeout for thinking models (gemini-2.5-pro, gemini-3-pro)
- Automatic
max_tokensadjustment for thinking models to prevent response truncation - Support for
reasoning_effortparameter
- Automatic API key loading from
- Native OpenAI API backend (
type: openai) with built-in configuration- Automatic API key loading from
CONTINUUM_OPENAI_API_KEYenvironment variable - Built-in OpenAI model metadata in /v1/models response
- Automatic API key loading from
- OpenAI Images API support (
/v1/images/generations) for DALL-E and gpt-image-1 models (#35)- Configurable image generation timeout (
timeouts.request.image_generation) - Comprehensive input validation for image generation parameters
- Response format validation for image generation API
- Configurable image generation timeout (
- Authenticated health checks for OpenAI and API-key backends
- API key authentication to streaming requests
- Filter /v1/models to show only configured models
- Allow any config file path when explicitly specified via -c/--config
.env.exampleand typed backend configuration examples- Comprehensive model metadata for GLM 4.6, Kimi K2, DeepSeek, GPT, and Qwen3 series
Fixed¶
- Streaming response truncation for thinking models (gemini-2.5-pro, gemini-3-pro)
- Model ID normalization and streaming compatibility for Gemini backend
- Convert
max_tokenstomax_completion_tokensfor newer OpenAI models - Correct URL construction for all API endpoints
- Security: Remove sensitive data from debug logs
- Security: Add request body size limits to prevent DoS attacks
Changed¶
- Refactor: Unify request retry logic with RequestType enum
- Refactor: Improve Gemini backend performance with lock-free statistics and slice returns
- Add Gemini backend documentation and max_tokens behavior documentation
- Add image generation API documentation
- Standardize capability naming in model-metadata.yaml
[0.9.0] - 2025-12-02¶
Added¶
- Native Google Gemini API backend (
type: gemini) with OpenAI-compatible endpoint (#32)- Automatic API key loading from
CONTINUUM_GEMINI_API_KEYenvironment variable - Extended 300s streaming timeout for thinking models (gemini-2.5-pro, gemini-3-pro)
- Automatic
max_tokensadjustment for thinking models to prevent response truncation - Support for
reasoning_effortparameter
- Automatic API key loading from
- OpenAI Images API support (
/v1/images/generations) for DALL-E and gpt-image-1 models (#35) - Configurable image generation timeout (
timeouts.request.image_generation) - Comprehensive model metadata for OpenAI models including GPT-5 family, o-series, audio/speech, video (Sora), and embedding models
- Enhanced rate limiting with token bucket algorithm (#11)
- Comprehensive Prometheus metrics and monitoring (#10)
- Configuration file migration and auto-correction CLI utility (#29)
- Comprehensive authentication for metrics endpoint
Fixed¶
- CRITICAL: Eliminate race condition in token refill
- CRITICAL: Protect API keys with SHA-256 hashing
- CRITICAL: Prevent memory exhaustion via unbounded bucket growth
- CRITICAL: Prevent header injection vulnerabilities
- HIGH: Prevent IP spoofing via X-Forwarded-For manipulation
- HIGH: Implement singleton pattern for metrics to prevent memory leaks
- HIGH: Eliminate unnecessary string allocations
- HIGH: Implement model extraction for rate limiting
- Add comprehensive cardinality limits and label sanitization to prevent metric explosion DoS attacks
- Improve error handling to prevent panic conditions
- Resolve environment variable race condition in config test
- Fix integration test failure in metrics RequestTimer
- Fix unit test failures in metrics security module
Changed¶
- Refactor: remove excessive Arc wrapping in rate limiting
- Reorganize documentation structure for better maintainability
- Add comprehensive metrics documentation
- Update documentation for rate limiting feature
- Remove development mock server and sample config files
- Remove temporary test files and improve gitignore
- Remove duplicate man page and update gitignore
- Update README.md to mention correct repo
- Update release workflows
[0.8.0] - 2025-09-09¶
Added¶
- Model ID alias support for metadata sharing (#27)
- Comprehensive rate limiting documentation
- Robust rate limiting to models endpoint to prevent DoS via cache poisoning
Fixed¶
- Return empty list instead of 503 when all backends are unhealthy (#28)
- Improve error handling and classification
- Resolve clippy warnings for MutexGuard held across await points
Changed¶
- Increase rate limits for /v1/models endpoint to be more practical
- Add alias feature documentation to configuration.md
[0.7.1] - 2025-09-08¶
Fixed¶
- Improve config path validation for home directory and executable paths (#26)
[0.7.0] - 2025-09-07¶
Added¶
- Extend /v1/models endpoint with rich metadata support (#23) (#25)
- Enhanced Configuration Management (#9) (#22)
- Advanced load balancing strategies with enhanced error handling (#21)
Fixed¶
- Use streaming timeout configuration from config.yaml instead of hardcoded 25s limit
Changed¶
- Add yaml to exclude list
[0.6.0] - 2025-09-03¶
Added¶
- GitHub Project automation workflow
- Comprehensive timeout configuration and model documentation updates
Fixed¶
- Use timeout configuration from config.yaml instead of hardcoded values (#19)
- Fix clippy warnings and benchmark compilation issues
Changed¶
- Apply cargo fmt
[0.5.0] - 2025-09-02¶
Added¶
- Extensible architecture with layered design (#16)
- Comprehensive integration tests and performance optimizations
- Complete service layer implementation
- Middleware architecture and enhanced backend abstraction
- Configurable connection pool size with CLI and config file support
- Comprehensive configuration management with YAML support (#7)
- Debian packaging and man page for continuum-router
Fixed¶
- Handle Option
correctly in tests - Update test to handle streaming requests without model field gracefully
- Resolve floating-point precision and timing issues in tests
- Resolve test failures and deadlocks in object pool and SSE parser
- Resolve CI test failures and improve test performance
- Resolve config watcher test failures in CI environment
- Resolve initial health check race condition
- Critical security vulnerabilities in error handling and retry logic
- Adjust timeout test tolerance for timing variations
Changed¶
- Extract complex types into type aliases for better readability
- Resolve all cargo fmt and clippy warnings
- Make retry configuration optional with sensible defaults
- Optimize config access and add comprehensive timeout management
- Update model names in timeout configuration to latest versions
- Complete documentation update
- Split oversized modules into layered architecture
Performance¶
- Optimize config access and add comprehensive timeout management
[0.4.0] - 2025-08-25¶
Added¶
- Model-based routing with health monitoring (#6)
Fixed¶
- Improve health check integration and SSE parsing for better compatibility
Changed¶
- Update README.md
[0.3.0] - 2025-08-25¶
Added¶
- SSE streaming support for real-time chat completions (#5)
Fixed¶
- Handle non-success status codes in streaming responses
- Allow streaming to continue even when backend returns 404 or other error status codes
- Send SSE error event first to notify client of the backend error status
[0.2.0] - 2025-08-25¶
Added¶
- Model aggregation from multiple endpoints (#4)
[0.1.0] - 2025-08-24¶
Added¶
- OpenAI-compatible endpoints and proxy functionality
/v1/modelsendpoint for listing available models/v1/completionsendpoint for legacy OpenAI completions API/v1/chat/completionsendpoint for chat API- Multiple backends support with round-robin load balancing (#1)
- Fallback handler for undefined routes with proper error messages
Fixed¶
- Improve error handling consistency across all endpoints
Changed¶
- Update README with changelog and version information
Migration Notes¶
Upgrading to v0.16.0¶
- New Files API: OpenAI-compatible Files API is now available at
/v1/files- Upload files for fine-tuning, batch processing, or assistants
- Files are stored locally with persistent metadata
- Configure via
files_apisection in config.yaml
- File Resolution: Reference uploaded files in chat completions
- Use file IDs in your chat messages for automatic content injection
- Persistent Metadata: File metadata now survives server restarts
- Set
metadata_storage: persistent(default) in files_api config - Set
cleanup_orphans_on_startup: trueto auto-clean orphaned files
- Set
- Circuit Breaker: Add
circuit_breakersection to your config.yaml for automatic backend failover- Configure failure threshold, recovery timeout, and half-open requests
- New Fallback Feature: Add
fallbacksection to your config.yaml to enable automatic model fallback- Define fallback chains:
fallback_chains: { "gpt-4o": ["gpt-4-turbo", "gpt-3.5-turbo"] } - Configure trigger conditions in
fallback_policy - Cross-provider fallback is supported (e.g., OpenAI → Anthropic)
- Define fallback chains:
- Circuit Breaker Integration: Set
circuit_breaker_open: truein trigger_conditions to integrate with existing circuit breaker - Response Headers: Check
X-Fallback-Usedheader to detect when fallback was used - GPT-5.2 Support: New GPT-5.2 model metadata is available
- No breaking changes from v0.15.0
Upgrading to v0.15.0¶
- Split /v1/models Endpoint: The
/v1/modelsendpoint now returns a lightweight response by default- For extended metadata, use
/v1/models?extended=true - This improves performance for clients that only need basic model information
- For extended metadata, use
- Nano Banana API: New support for Gemini Image Generation (Imagen) through OpenAI-compatible interface
- Use
nano-bananaornano-banana-promodel names
- Use
- Error Handling: Improved reliability with proper error propagation instead of panics
- Performance: LRU cache now uses read locks for better concurrent performance
- No breaking changes from v0.14.x
Upgrading to v0.13.0¶
- New Responses API: The
/v1/responsesendpoint is now available for OpenAI Responses API compatibility- Sessions are automatically managed with background cleanup for expired sessions
- True SSE streaming provides real-time responses
- Security: API keys are now stored using SecretString for improved security across all backends (#76)
- Model Metadata: Override /v1/models response fields via model-metadata.yaml (#75)
- No breaking changes from v0.12.0
Upgrading to v0.12.0¶
- No breaking changes: This is a refactoring release with improved code organization
- Bug fix: Consistent hash routing now correctly handles exact hash matches
- Security: SSRF prevention module added for URL validation
- Reliability: Panics replaced with Option returns for better error handling
- API change: /v1/models endpoint no longer has hardcoded auth requirement
Upgrading to v0.11.0¶
- New Anthropic backend: Add
type: anthropicbackends for native Anthropic Claude API support- Set
CONTINUUM_ANTHROPIC_API_KEYenvironment variable for authentication - Supports extended thinking with automatic parameter conversion
- OpenAI
reasoning_effortparameter is automatically converted to Claude's thinking format
- Set
- Streaming improvements: Accept-Encoding fixes ensure proper streaming for all backends
- No breaking changes from v0.10.0
Upgrading to v0.10.0¶
- New OpenAI backend: Add
type: openaibackends for native OpenAI API support- Set
CONTINUUM_OPENAI_API_KEYenvironment variable for authentication - Built-in model metadata is automatically included in /v1/models response
- Set
- Image Generation API: New
/v1/images/generationsendpoint for DALL-E models- Configure timeout via
timeouts.request.image_generation(default: 120s) - Supports responseformat validation (url or b64json)
- Configure timeout via
- Gemini improvements: Streaming response truncation fixed for thinking models
- Model ID normalization ensures proper routing
- API key authentication: Streaming requests now support API key authentication
- Security: Request body size limits prevent DoS attacks
- Newer OpenAI models automatically use
max_completion_tokensinstead ofmax_tokens
Upgrading to v0.9.0¶
- New Gemini backend: Add
type: geminibackends for native Google Gemini API support- Set
CONTINUUM_GEMINI_API_KEYenvironment variable for authentication - Thinking models (gemini-2.5-pro, gemini-3-pro) automatically get
max_tokens: 16384if client sends values below 4096
- Set
- Enhanced rate limiting with token bucket algorithm is now available
- Configure rate limiting via
rate_limitingsection in config.yaml - Prometheus metrics are now available at
/metricsendpoint with authentication - Use
--migrate-config-fileCLI option to migrate and fix configuration files - Multiple critical security fixes have been applied to rate limiting
Upgrading to v0.8.0¶
- Rate limiting is now enabled for the
/v1/modelsendpoint - Empty list is returned instead of 503 error when all backends are unhealthy
- Model aliases are now supported for metadata sharing
Upgrading to v0.7.0¶
- Enhanced configuration management requires updating configuration files
- New load balancing strategies are available
- Streaming timeout is now configurable via config.yaml
Upgrading to v0.6.0¶
- Timeout configuration is now read from config.yaml instead of hardcoded values
- Update your configuration files to include timeout settings
Upgrading to v0.5.0¶
- Major architectural refactoring with layered design
- Configuration management now supports YAML files
- Retry mechanisms have been enhanced with security improvements
- Connection pool size is now configurable
This changelog reflects the actual development history of Continuum Router from its initial release to the current version.