Changelog¶

All notable changes to Continuum Router are documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]¶

Added¶

Cohere/Jina-Compatible Rerank and Sparse Embedding Endpoints - Add support for advanced retrieval APIs (#374)
- New /v1/rerank endpoint (Cohere-compatible) for document reranking as a second-stage retrieval step
- New /embed_sparse endpoint (TEI/Jina-compatible) for sparse embeddings (SPLADE format)
- Supports both simple string documents and structured documents with text field for reranking
- Request/response types with comprehensive validation for model, query, and documents fields
- New capability mappings: rerank -> rerank method, sparse_embedding -> embed_sparse method
- Example models added to model-metadata.yaml: BGE Reranker, Jina Reranker, SPLADE models
BGE-M3 and Multilingual Embedding Model Support - Add model metadata and configuration examples for BGE-M3 and equivalent multilingual embedding models (#373)
- BGE-M3: 568M parameters, 1024 dimensions, 100+ languages, 8192 context. Supports dense, sparse (lexical), and ColBERT multi-vector retrieval
- BGE-Large-EN-v1.5: 335M parameters, 1024 dimensions, English-only, 512 context
- Multilingual-E5-Large: 560M parameters, 1024 dimensions, 100+ languages, 514 context
- Example backend configurations for vLLM, Ollama, and Text Embeddings Inference (TEI) deployments
- Addresses cross-lingual retrieval requirements for RAG systems
Plain Text Support for Anthropic File Transformer - Add text/plain support to Anthropic file transformer (#342)
- Text files are converted to document blocks with base64 data (same format as PDF)
- Maximum text file size: 32MB (same as PDF)
- Text files don't have magic bytes validation (accepts any content)
- Updated SUPPORTEDDOCUMENTTYPES to include text/plain alongside application/pdf
- Updated error messages to mention plain text support
PDF Support for OpenAI and Anthropic File Transformers - Add PDF file support to file transformers (#340)
- OpenAI transformer: PDFs are converted to file blocks with base64 data or file IDs
- Anthropic transformer: PDFs are converted to document blocks with base64 data
- PDF magic bytes validation (%PDF- signature) for security
- Maximum PDF size: 32MB (100 page limit enforced by backends)
- Images remain at 20MB limit
Native Anthropic Responses API Support - Add native Anthropic Messages API conversion for Responses API (#332)
- New AnthropicConverter for converting Responses API requests to native Anthropic Messages format
- Full PDF file support via Anthropic's document understanding (input_file with file_data)
- Image file support with automatic media type detection
- Extended thinking (reasoning) content support for Claude 3+ models
- Streaming support with proper SSE event transformation from Anthropic format
- Non-streaming support with complete response transformation

Fixed¶

SSRF Validation for External File URLs - Add SSRF protection when fetching external files (#332)
- Private IP address validation (blocks 10.x.x.x, 172.16-31.x.x, 192.168.x.x, 127.x.x.x)
- Localhost and link-local address blocking
- IPv6 loopback and link-local address blocking
- DNS rebinding protection via IP validation after resolution
Media Type Whitelist for File Inputs - Add security whitelist for allowed file types (#332)
- PDF: application/pdf
- Images: image/jpeg, image/png, image/gif, image/webp
- Rejects unsupported media types with clear error messages
AI SDK Compatibility for Responses API Streaming - Fix Vercel AI SDK compatibility issues with Responses API streaming (#334)
- Updated ResponseStreamEvent serialization to use dot-separated type names matching OpenAI spec (e.g., "type": "response.output_text.done" instead of "type": "output_text_done")
- Added item_id field to OutputItemAdded, OutputItemInProgress, OutputItemDone, and other streaming events
- Added sequence_number field to track event ordering (uses u64 to prevent overflow in long streaming sessions)
- Custom Serialize implementation for ResponseStreamEvent ensures correct JSON output format
- All existing streaming tests updated to verify new fields are correctly serialized
Immediate Health Check After Hot Reload Backend Sync - Trigger immediate health check when backends are added via hot reload (#367)
- New backends added via configuration hot reload are now health-checked immediately
- Previously, new backends remained unavailable for up to 30 seconds (the default health check interval)
- Made HealthChecker::perform_health_checks() public for external invocation
- Improves model availability responsiveness in Backend.AI GO and other clients

0.34.0 - 2026-01-14¶

Added¶

Automatic Quality Parameter Conversion - Add automatic quality parameter conversion between DALL-E and GPT Image models (#330)
- to_dalle_quality() method on ImageQuality enum for converting GPT Image quality values to DALL-E equivalents
- Quality conversion applied transparently in handle_openai_image_generation() and handle_streaming_image_generation()
- Quality conversion mapping:
  - DALL-E 3: low/medium/auto → standard, high → hd
  - GPT Image: standard → medium, hd → high
  - Gemini models: quality parameter ignored (no changes needed)
- is_dalle3_model() helper for exact DALL-E 3 model matching
- convert_quality_for_model() helper to eliminate code duplication
- Conversion is logged for debugging and happens transparently without user-facing warnings

0.33.0 - 2026-01-13¶

Added¶

Local File Resolution for Responses API - Resolve local file_id references in Responses API requests (#325)
- Files uploaded via the Files API can now be referenced using file_id in Responses API requests
- FileResolver service scans requests for file_id references and loads content from local storage
- File content is converted to base64 file_data format before sending to backends
- Security features: file ownership verification (user_id check) and 10MB size limit for injection
- Graceful degradation: resolution failures fall back to original request with warning logs

Fixed¶

Responses API Flat Tool Format - Fix Responses API /v1/responses endpoint to accept flat tool format (#323)
- Function tools now use flat format: {"type": "function", "name": "...", "parameters": {...}}
- This aligns with OpenAI's Responses API specification
- Nested format (with function wrapper object) is no longer accepted for Responses API
- Updated documentation with flat tool format examples

0.32.0 - 2026-01-09¶

Added¶

Reasoning Effort Documentation and xhigh Fallback Logging - Add comprehensive reasoning effort documentation and improve fallback logging for xhigh effort level (#317)
- New documentation explaining reasoning effort parameter usage
- Improved logging when xhigh effort level falls back to high for non-GPT-5.2 models

Fixed¶

Implicit Message Type Inference in Responses API InputItem - Support implicit message type inference when role field is missing (#316)
- Optimized InputItem deserializer for better performance
- Added invalid role test coverage
- Enables more flexible input handling in Responses API

0.31.5 - 2026-01-09¶

Added¶

Responses API Pass-through for Native OpenAI Backends - Smart routing for /v1/responses API based on backend type (#313)
- OpenAI and Azure OpenAI backends now use pass-through mode, forwarding requests directly to /v1/responses endpoint
- Other backends (Anthropic, Gemini, vLLM, Ollama, LlamaCpp, Generic) automatically convert to their native format
- Pass-through mode benefits: native PDF support, preserved reasoning state, access to built-in tools (websearch, filesearch), better cache utilization
- New router.rs module with ResponsesApiStrategy enum for routing decisions
- New passthrough.rs module with PassthroughService for direct request forwarding
- Request payload size validation (16MB limit) for DoS prevention
- Comprehensive test coverage for routing strategy, error handling, and request validation
OpenAI Responses API File Input Types - Add support for multi-modal file inputs in Responses API (#311)
- New input_text, input_file, and input_image content part types
- Support for PDF documents and images via base64 data URLs (file_data)
- Support for external file URLs (file_url) with SSRF validation
- Warning logs for unsupported file_id references (Files API integration pending)
- Backend-specific transformers for Anthropic (document/image blocks) and Gemini (inlinedata/filedata)
- Comprehensive test coverage for all input types

Fixed¶

Pass-through Raw Error Responses - Forward raw backend error responses in pass-through mode for better error debugging

0.31.4 - 2026-01-07¶

Fixed¶

Hot Reload Support for API Key Forwarding - Fix hot reload support in proxy and streaming handlers (#310)
- Use current_config() instead of captured config snapshot in proxy and streaming handlers
- API key and other configuration changes via hot reload now properly apply to new requests
- Ensures runtime configuration updates affect backend request forwarding
- Added comprehensive end-to-end tests for hot reload api_key application

0.31.3 - 2026-01-06¶

Fixed¶

Unix Socket Anthropic Request/Response Transformation - Fix Anthropic backends accessed via Unix socket failing due to missing transformations (#307, #308)
- Unix socket transport now applies the same request transformation as HTTP transport for Anthropic backends
- OpenAI-format requests are properly converted to Anthropic format before sending
- Anthropic responses are transformed back to OpenAI format
- Endpoint is correctly rewritten from /v1/chat/completions to /v1/messages
- Added comprehensive integration tests for Unix socket Anthropic transformations
Anthropic Non-streaming Stream Parameter - Preserve stream parameter for non-streaming Anthropic requests (#305, #306)
- Replace transform_openai_to_anthropic_request (which forces stream: true) with transform_openai_to_anthropic_with_global_prompt in non-streaming path
- Fixes issue where requests with stream: false were incorrectly sent to Anthropic API with stream: true
- Renamed transform_openai_to_anthropic_request to transform_openai_to_anthropic_streaming for clarity

Documentation¶

Jinja2 Syntax Escaping - Escape Jinja2 syntax in Korean configuration docs to prevent mkdocs-macros-plugin errors

0.31.2 - 2026-01-05¶

Added¶

Non-streaming Support for Anthropic Backend - Transform OpenAI-formatted requests to Anthropic format for non-streaming chat completion calls, and convert Anthropic responses back to OpenAI format
- Non-streaming requests to Anthropic backends now properly transform request/response formats
- Updated streaming handlers to use transform_str for proper tool call handling
Tool Call and Tool Result Transformation for Anthropic Backend - Enable proper tool use workflows when routing to Anthropic models
- Transform OpenAI-style toolcalls in assistant messages to Anthropic's tooluse format
- Transform tool result messages to Anthropic's tool_result format
- Enables multi-turn tool use conversations with Anthropic models

Dependencies¶

Update 12 packages including rustls, tokio-stream, and syn to latest versions

0.31.1 - 2026-01-04¶

Fixed¶

Anthropic Non-streaming Authentication Headers - Fix non-streaming Anthropic requests failing with wrong authentication header (#300, #301)
- Non-streaming requests to Anthropic backends now correctly use x-api-key header instead of Authorization: Bearer
- Added anthropic-version header for all Anthropic backend requests
- Applied consistent header handling between HTTP and Unix socket transport paths
- Fixed issue where Anthropic API returned "Invalid Anthropic API Key" error (HTTP 400)

0.31.0 - 2026-01-04¶

Added¶

Unix Socket Server Binding - Add Unix socket binding support alongside TCP (#298)
- Server can now bind to Unix domain sockets for local communication
- Configure via server.unix_socket in config file
- Supports concurrent TCP and Unix socket bindings
Reasoning Parameter Support for Responses API - Add reasoning parameter support to /v1/responses endpoint (#296)
- Supports nested format: {"reasoning": {"effort": "high"}}
- Valid effort levels: low, medium, high, xhigh (GPT-5.2 only)
- Type-safe validation using ReasoningEffortLevel enum
- Automatic conversion to flat reasoning_effort format for backends
- Invalid effort values rejected at deserialization with clear error messages
- Added with_reasoning() builder method for ResponsesRequest
Configurable Health Check Endpoints - Add configurable health check endpoints per backend type
- Customize health check paths for different backend types
- Support for backend-specific health verification

0.30.0 - 2026-01-01¶

Added¶

Wildcard Patterns and Date Suffix Handling for Model Aliases - Support wildcard patterns and automatic date suffix handling in model aliases (#286)
- Automatic date suffix normalization: Models with date suffixes (e.g., claude-opus-4-5-20251130) automatically match metadata for base model (e.g., claude-opus-4-5-20251101)
- Supported date formats: -YYYYMMDD, -YYYY-MM-DD, -YYMM, @YYYYMMDD
- Wildcard pattern matching in aliases using * character
- Prefix patterns: claude-* matches claude-opus, claude-sonnet, etc.
- Suffix patterns: *-preview matches gpt-4o-preview, o1-preview, etc.
- Infix patterns: gpt-*-turbo matches gpt-4-turbo, gpt-3.5-turbo, etc.
- Zero-config date handling: Works automatically without configuration changes
- Matching priority: Exact ID > Exact alias > Date suffix > Wildcard > Base name fallback

Fixed¶

Anthropic Backend Default URL - Apply default URL for Anthropic backend when not specified (#288)
Backend-specific owned_by Values - Replace owned_by placeholders with backend-type-specific values (#287)

Documentation¶

Translate wildcard pattern and date suffix handling documentation to Korean (#289)

0.29.0 - 2026-01-01¶

Added¶

Accelerated Health Checks During Backend Warmup - Implement accelerated health check during backend warmup (#282)
- When a backend returns HTTP 503 (Service Unavailable), it enters a "warming up" state
- During warmup, health checks occur at an accelerated interval (default: 1 second)
- Reduces model availability detection latency from up to 30 seconds to approximately 1 second
- Configurable via warmup_check_interval (default: 1s) and max_warmup_duration (default: 300s)
- Particularly useful for backends like llama.cpp that return HTTP 503 while loading models
Model Metadata CLI Option - Add --model-metadata option for specifying model metadata file path (#281)
- New --model-metadata CLI argument to specify model metadata YAML file at runtime
- Overrides config file model_metadata_file setting
- Supports absolute paths, relative paths, and tilde expansion (~)

Fixed¶

OpenAI owned_by Field - Replace OpenAI owned_by placeholder with 'openai' (#280)
- Models from OpenAI backend now correctly show owned_by: openai instead of placeholder text
Admin API Race Condition - Prevent race condition in Admin API concurrent backend creation (#278)
- Fixed issue where concurrent backend creation requests could cause data corruption
- Added proper synchronization for backend management operations
Hot Reload Processing Steps - Add missing processing steps to hot reload (#277)
- Fixed issue where some configuration changes were not properly applied during hot reload
- Ensures all processing steps are executed when configuration is reloaded
Cloud Backend Availability Status - Cloud backends now show available:true in /v1/models/{model_id} (#272)
- Fixed issue where cloud backends (OpenAI, Anthropic, Gemini) were incorrectly showing as unavailable
- Cloud backends are now correctly marked as available when healthy

Documentation¶

Add tests and documentation for v0.29.0 features (#459da6a)

0.28.0 - 2025-12-31¶

Added¶

SSE Streaming Support for Tool Calls - Add SSE streaming support for tool calls (#258)
- Real-time streaming of tool call responses over Server-Sent Events
- Enables efficient streaming responses for function-calling scenarios
llama.cpp Tool Calling Auto-Detection - Auto-detect tool calling support via /props endpoint (#263)
- Queries /props endpoint during model discovery to analyze chat_template
- Detects tool-related keywords (tool, tools, tool_call, function, etc.)
- Automatically enables function_calling capability when detected
- Graceful fallback when /props endpoint is unavailable
- Works with both HTTP and Unix socket backends
Extended /v1/models/{model_id} Endpoint - Extend with rich metadata fields (#262)
- Returns comprehensive model metadata including capabilities and pricing
- Enhanced response format with additional model information
Tool Result Message Transformation - Implement tool result message transformation for multi-turn conversations (#265)
- Transforms tool result messages (role: "tool") to backend-native formats
- Anthropic: Converts to user role with tool_result content blocks
- Gemini: Converts to function role with functionResponse parts
- Combines consecutive tool results for Anthropic (parallel tool calls)
- Automatic function name lookup for Gemini transformations
- Preserves is_error indicator for error responses
Backend-specific owned_by Placeholders - Add owned_by placeholders for llamacpp, vllm, ollama, http backends (#267)

Improved¶

CLI Help Output Formatting - Improve --help output formatting with title header and project attribution (#269)
- Enhanced visual appearance for command-line help
- Added project attribution in help output

Fixed¶

Model Metadata Cache Sync - Sync model metadata cache with ConfigManager (#270)
- Ensures model cache properly reflects configuration changes

CI/CD¶

Comprehensive integration tests for tool calling (#264)

Dependencies¶

Bump the minor-and-patch group with 3 updates (#257)

Technical¶

Fix dead_code warnings for Unix-only items on Windows builds

[0.27.0] - 2025-12-28¶

Added¶

llama.cpp Tool Calling Auto-Detection - Automatic detection of tool calling support for llama.cpp backends (#260)
- Queries /props endpoint during model discovery to analyze chat_template
- Detects tool-related keywords (tool, tools, tool_call, function, etc.)
- Automatically enables function_calling capability when detected
- Graceful fallback when /props endpoint is unavailable
- Works with both HTTP and Unix socket backends
Complete Unix Socket Support - Full Unix socket support for model discovery and streaming (#248, #252, #253, #254, #256)
- SSE/streaming support for Unix socket backends, enabling real-time responses over local sockets
- Backend type auto-detection for Unix socket connections
- vLLM model discovery support via Unix sockets
- llama.cpp model discovery support via Unix sockets
- Model fetcher fully supports Unix socket backends
Tool Call Transformation - Implement OpenAI tool call transformation across all backends (#244, #245, #246)
- Tool definition transformation for Anthropic, Gemini, and llama.cpp backends
- Tool choice transformation with support for auto, none, required, and specific function selection
- Tool call response transformation for unified response format across providers
Multi-Turn Tool Conversation Support - Message transformation for tool calling in multi-turn conversations (#241)
- Transforms tool result messages (role: "tool") to backend-native formats
- Anthropic: Converts to user role with tool_result content blocks
- Gemini: Converts to function role with functionResponse parts
- Combines consecutive tool results for Anthropic (parallel tool calls)
- Automatic function name lookup for Gemini transformations
- Preserves is_error indicator for error responses

[0.26.0] - 2025-12-27¶

Added¶

Tool Choice Transformation - Automatic transformation of tool_choice parameter across backends (#239)
- Supports auto, none, required, and specific function selection
- Anthropic: Transforms to {"type": "auto|any|tool"} format, handles "none" by removing tools
- Gemini: Transforms to tool_config.function_calling_config structure
- llama.cpp: Preserves parallel_tool_calls parameter for parallel function calling
- Integrates with model fallback system for cross-provider tool calling
Single Model Retrieval Endpoint - Add GET /v1/models/{model} endpoint for single model retrieval with availability status (#236)
- Returns model information with an additional available field indicating real-time availability
- available: true when at least one healthy backend provides the model
- available: false when the model exists but all backends providing it are unhealthy
- Returns 404 if the model does not exist across any backend
- Optimized performance: avoids full model aggregation by targeting specific model lookup

[0.25.0] - 2025-12-26¶

Added¶

CORS (Cross-Origin Resource Sharing) Support - Configurable CORS middleware for embedding the router in web applications (#234)
- Support for Tauri desktop apps, Electron apps, and web frontends
- Wildcard origins and port patterns (e.g., http://localhost:*)
- Custom schemes support (e.g., tauri://localhost)
- Configurable methods, headers, and credentials
- Preflight cache with configurable max-age
Unix Domain Socket Backend Support - Secure local LLM communication via Unix sockets (#232)
- Use unix:///path/to/socket URL scheme for local backends
- Better security through file system permissions (no TCP port exposure)
- Lower latency than localhost TCP (~30% improvement)
- No port conflicts when running multiple LLM servers
- Platform support: Linux and macOS (Windows planned for future releases)

[0.23.1] - 2025-12-25¶

CI/CD¶

Windows Build Support - Add Windows x86_64 build target to release workflow (#224)
- Enables native Windows builds in the release pipeline
- Cross-compilation from Linux using mingw-w64

[0.23.0] - 2025-12-23¶

Added¶

GLM 4.7 Model Support - Add support for Zhipu AI's GLM 4.7 model with thinking capabilities (#222)
- Model metadata in model-metadata.yaml with full specifications (355B MoE, 32B active parameters)
- Support for thinking parameters: enable_thinking (boolean) and thinking_budget (1-204,800 tokens)
- 200K context window with up to 131K token output
- Z.AI backend configuration example in config.yaml.example
- SiliconFlow alternative backend configuration
- Comprehensive integration tests for model metadata
- Pricing: $0.60/1M input tokens, $2.20/1M output tokens
Thinking Pattern Metadata - Add thinking pattern metadata for models with implicit start tags (#218)
- Support for models that use implicit thinking start tags
- Pattern-based detection for thinking content extraction
GCP Service Account Authentication - Add GCP Service Account authentication support for Gemini backend (#208)
- Support for JSON key file authentication
- Environment variable based authentication
- Automatic token refresh and management
Distributed Tracing - Add distributed tracing with correlation ID propagation (#207)
- W3C Trace Context support with traceparent header
- Configurable trace ID, request ID, and correlation ID headers
- Trace ID propagation across all retry attempts
- Security validation for trace IDs from headers
New Model Metadata - Add model metadata for NVIDIA Nemotron 3 Nano, Qwen Image Layered, and Kakao Kanana-2 (#202)
ASCII Diagram Replacement - Add ASCII diagram to image replacement system for MkDocs (#200)
- Automatic replacement of ASCII diagrams with SVG images during MkDocs build
- Preserve ASCII art visibility in raw Markdown

Changed¶

CI Optimization - Skip Rust tests when only non-code files change (#204)
- Faster CI for documentation-only changes
- Path-based filtering for test execution

Fixed¶

Cache Stampede Prevention - Prevent cache stampede with singleflight, stale-while-revalidate, and background refresh (#220)
- Singleflight pattern prevents thundering herd on model cache expiration
- Stale-while-revalidate returns cached data immediately while refreshing in background
- Background refresh proactively updates cache before expiration
Hot Reload for Global Prompts - Apply global_prompts changes via hot reload (#219)
- Global prompt configuration changes now take effect without restart
Model Cache Invalidation - Invalidate model cache when backend config changes (#206)
- Backend configuration changes now properly trigger model cache refresh
Documentation Improvements - Improve diagram rendering with inline SVG and responsive sizing
Translation Typo - Fix translation typo in documentation
Docker CI Fixes - Handle multi-line tags in Docker manifest creation
Private Repo Access - Use gh CLI for private repo asset download in CI
GitHub Token Auth - Add GitHub token authentication for private repository access
Docs Workflow - Remove release trigger from docs workflow to avoid environment protection error

CI/CD¶

Bump actions/github-script from 7 to 8 (#210)
Bump apple-actions/import-codesign-certs from 3 to 6 (#212)
Bump actions/cache from 4 to 5 (#211)
Bump actions/checkout from 4 to 6 (#209)

[0.22.0] - 2025-12-19¶

Added¶

Docker Support with Pre-built Binaries - Add Dockerfile and Dockerfile.alpine that download pre-built binaries from GitHub Releases (#189)
- Debian Bookworm-based image (~50MB) for general use
- Alpine 3.20-based image (~10MB) for minimal deployments
- Multi-architecture support (linux/amd64, linux/arm64) using TARGETARCH
- VERSION build argument for selecting release version
- Non-root user execution for security
- OCI labels for image metadata
Container Health Check CLI - Implement --health-check CLI argument for container orchestration (#189)
- Returns exit code 0 if server is healthy, 1 if unhealthy
- Optional --health-check-url for custom health endpoint
- Proper IPv6 address handling
- 5-second default timeout
Docker Compose Quick Start - Add docker-compose.yml for easy deployment (#189)
- Volume mount for configuration
- Environment variable support (RUST_LOG)
- Resource limits and health checks
Automated Docker Image Publishing - Add Docker build and push to ghcr.io in release workflow (#189)
- Builds both Debian and Alpine images after binary release
- Multi-platform support (linux/amd64, linux/arm64)
- Automatic tagging with semver (VERSION, MAJOR.MINOR, latest)
- Alpine images tagged with -alpine suffix
- GitHub Actions cache for faster builds
MkDocs Documentation Website - Build comprehensive documentation site with Material theme (#183)
- Full navigation structure with Getting Started, Features, Operations, and Development sections
- GitHub Actions workflow for automatic deployment to GitHub Pages
- Custom stylesheets and theme configuration
Korean Documentation Translation - Complete Korean localization of all documentation (#190)
- All 20 documentation files translated to Korean
- Language switcher in navigation (English/Korean)
- Multi-language build in GitHub Actions workflow
Dependency Security Auditing - Add cargo-deny for vulnerability scanning (#192)
- Security advisory checking in CI workflow
- License compliance verification
- Dependency source validation
Dependabot Integration - Automated dependency updates for Cargo and GitHub Actions (#192)
Security Policy - Add comprehensive SECURITY.md with vulnerability reporting process (#191)

Changed¶

Integrate orphaned architecture documentation into MkDocs site (#186)
Rename documentation files to lowercase kebab-case for URL-friendly filenames (#186)
Update various GitHub Actions to latest versions (checkout@v6, setup-python@v6, upload-artifact@v6, etc.)

Fixed¶

Health check response validation logic bug (operator precedence issue)
Address parsing fallback that was silently hiding configuration errors
IPv6 address formatting in health check (now correctly uses bracket notation)

Security¶

Updated reqwest 0.11→0.12, prometheus 0.13→0.14, validator 0.18→0.20
Replaced dotenv with dotenvy for better maintenance
Added .dockerignore to exclude sensitive files from build context

[0.21.0] - 2025-12-19¶

Added¶

Gemini 3 Flash Preview Model - Add support for gemini-3-flash-preview model (#168)
Backend Error Passthrough - Pass through detailed error messages from backends for 4xx responses (#177)
- Parse and forward original error messages from OpenAI, Anthropic, and Gemini backends
- Preserve param field when available (useful for invalid parameter errors)
- Falls back to generic error message if backend response cannot be parsed
- Error format remains OpenAI-compatible
- Comprehensive unit tests for error parsing across all backend formats
Default Authentication Mode for API Endpoints - Configurable authentication enforcement for API endpoints (#173)
- New mode field in api_keys configuration: permissive (default) or blocking
- permissive mode: Requests without API key are allowed (backward compatible)
- blocking mode: Only authenticated requests are processed, unauthenticated requests receive 401
- Protected endpoints: /v1/chat/completions, /v1/completions, /v1/responses, /v1/images/*, /v1/models
- Health endpoints (/health, /healthz) always accessible without authentication
- Hot reload support for authentication mode changes
- Comprehensive integration tests for both modes
- Updated API.md, configuration.md, and manpage documentation

Fixed¶

UTF-8 Multi-byte Character Corruption - Handle UTF-8 multi-byte character corruption in streaming responses (#179)
GPT Image response_format - Strip response_format parameter for GPT Image models (#176)
Auto-discovery Validation - Allow auto-discovery for all backends except Anthropic (#172)

Changed¶

Updated architecture.md and fixed documentation issues (#167, #169)
Added AGENTS.md and linked CLAUDE.md to it

[0.20.0] - 2025-12-18¶

Added¶

Image Variations Support for Gemini - Add image variations support for Gemini (nano-banana) models (#165)
Image Edit Support for Gemini - Implement limited image edit support for Gemini (nano-banana) models (#164)
Enhanced Image Generation - Enhance /v1/images/generations with streaming and GPT Image features (#161)
GPT Image 1.5 Model - Add gpt-image-1.5 model support (#159)
Image Variations Endpoint - Implement /v1/images/variations endpoint for image variations (#155)
Image Edits Endpoint - Implement /v1/images/edits endpoint for image editing (inpainting) (#156)
- Full OpenAI Images Edit API compatibility
- Supports GPT Image models: gpt-image-1, gpt-image-1-mini, gpt-image-1.5 (recommended)
- Legacy support for dall-e-2 model
- Multipart form-data parsing with shared utilities
- PNG image validation (format, size, square dimensions)
- Optional mask validation (dimension matching with source image)
Shared Image Utilities - Implement shared utilities for image edit/variations endpoints (#154)
External Prompt Files - Support loading system prompts from external Markdown files (#146)
- New prompt_file field in BackendPromptConfig and ModelPromptConfig
- New default_file and prompts_dir fields in GlobalPromptConfig
- Secure path validation with path traversal attack prevention
- REST API endpoints for prompt file management
- File caching with size limits (100 entries max, 50MB total)
- Hot-reload support for prompt files
Solar Open 100B Model - Add Solar Open 100B model metadata
Automatic Model Discovery - Backends automatically discover available models from /v1/models API when models are not explicitly configured (#142)
- OpenAI, Gemini, and vLLM backends support auto-discovery
- Ollama backend uses vLLM's discovery mechanism (OpenAI-compatible API)
- 10-second timeout prevents blocking startup
- Falls back to hardcoded defaults if discovery fails

Changed¶

BackendFactory::create_backend_from_typed_config() is now async to support async model discovery
Backend from_config() methods for OpenAI, Gemini, and vLLM are now async

Security¶

API Key Redaction - Implement API key redaction to prevent credential exposure (#150)

Performance¶

Binary Size Optimization - Optimize release binary size from 20MB to 6MB (70% reduction) (#144)

Refactored¶

Split large files for Priority 2 of issue #147
Large files to keep each under 500 lines (#148)

[0.19.0] - 2025-12-13¶

Added¶

Runtime Configuration Management API - Comprehensive REST API for viewing and modifying configuration at runtime (#139)
- Configuration Query APIs:
  - GET /admin/config/full - Retrieve full configuration with sensitive info masked
  - GET /admin/config/sections - List all 15 configuration sections
  - GET /admin/config/{section} - Get specific section configuration
  - GET /admin/config/schema - JSON Schema for client-side validation
- Configuration Modification APIs:
  - PUT /admin/config/{section} - Replace section configuration
  - PATCH /admin/config/{section} - Partial update (JSON merge patch)
  - POST /admin/config/validate - Validate configuration before applying
  - POST /admin/config/apply - Apply configuration with hot reload
- Configuration Save/Restore APIs:
  - POST /admin/config/export - Export configuration (YAML/JSON/TOML)
  - POST /admin/config/import - Import and apply configuration
  - GET /admin/config/history - View configuration change history
  - POST /admin/config/rollback/{version} - Rollback to previous version
- Backend Management APIs:
  - POST /admin/backends - Add new backend
  - GET /admin/backends/{name} - Get backend configuration
  - PUT /admin/backends/{name} - Update backend configuration
  - DELETE /admin/backends/{name} - Remove backend
  - PUT /admin/backends/{name}/weight - Update backend weight
  - PUT /admin/backends/{name}/models - Update backend model list
- Sensitive information masking for API keys, passwords, tokens
- JSON Schema generation for all configuration sections
- Configuration history tracking (up to 100 entries, configurable)
- Memory-efficient history storage with size-based eviction (10MB limit)
- Atomic version counter using AtomicU64 for thread safety
- Structured error responses with error codes
Admin REST API Documentation - Comprehensive developer guide (docs/admin-api.md)
- Complete API reference with request/response examples
- Client SDK examples for Python, JavaScript/TypeScript, and Go
- Best practices and security considerations
Integration Tests - 33 integration tests for Configuration Management API endpoints

Fixed¶

CRITICAL: Configuration changes now actually applied to running system
CRITICAL: Memory growth controlled with JSON string storage and size-based eviction
HIGH: Input validation added (1MB content limit, 32-level nesting depth)
HIGH: Sensitive export requires elevated permission and audit logging
HIGH: Comprehensive sensitive field detection (30+ patterns)
MEDIUM: Validation functions now perform actual validation
MEDIUM: Race condition fixed with AtomicU64 for version counter
MEDIUM: Colon removed from allowed backend name characters
MEDIUM: Structured error responses with error codes
MEDIUM: Initialize flag prevents duplicate history entries
LOW: Unnecessary clones removed for better performance
LOW: Limits now configurable via AdminConfig
LOW: Duplicate validation logic refactored
LOW: Test coverage improved for edge cases

Changed¶

Enhanced documentation for Configuration Management API across all guides
Updated manpage with new admin endpoints
Updated API.md with comprehensive Configuration Management API section

[0.18.0] - 2025-12-13¶

Added¶

Per-API-Key Rate Limiting - Implement per-API-key rate limiting (#137)
- Individual rate limits for each API key
- Configurable requests per minute per key
API Key Management System - Comprehensive API key management and configuration system
- Multiple key sources: config file, external file, environment variables
- Key properties: scopes, rate limits, expiration, enabled status
- Hot reload support for key configuration changes
Files API Authentication - Implement authentication and authorization for Files API (#131)
- API key authentication for file operations
- File ownership enforcement
- Admin access control for all files
Hot Reload for Runtime Configuration - Complete hot reload functionality for runtime configuration updates (#130)
- Automatic configuration file watching
- Classified updates: immediate, gradual, restart-required

Changed¶

Major refactoring with modular structure
- Extract CLI and app utilities into modular structure (#132)
- Split converter.rs into modular structure (#132)
- Split large source files into modular structure
- Consolidate findgeminibackend function logic
Updated architecture.md to reflect refactored module structure

Fixed¶

Add ConnectInfo extension for admin/metrics/files endpoints
Address security vulnerabilities in API key management
Address code quality issues in API key management

Documentation¶

Add API key management documentation
Add comprehensive API key management tests

[0.17.0] - 2025-12-12¶

Added¶

Anthropic Backend File Content Transformation - Files uploaded to the router can now be used with Anthropic backend (#126)
- Automatic conversion of file content to Anthropic message format
- Support for text and document files with base64 encoding
- Seamless integration with file resolution middleware
Gemini Backend File Content Transformation - Files uploaded to the router can now be used with Gemini backend (#127)
- Automatic conversion of file content to Gemini API format
- Support for inline data with proper MIME type handling
- Cross-provider file support enables files uploaded once to work across all backends

Fixed¶

Streaming File Uploads - Implement streaming file uploads to prevent memory exhaustion (#128)
- Large file uploads no longer load entire file into memory
- Streaming processing for efficient memory usage
- Prevents OOM errors when uploading large files

Changed¶

None

[0.16.0] - 2025-12-12¶

Added¶

OpenAI-Compatible Files API - Full implementation of OpenAI Files API endpoints (#111)
- Upload files with multipart/form-data support
- List, retrieve, and delete files
- Download file content
- Supports purpose: fine-tune, batch, assistants, user_data
File Resolution Middleware - Automatic file content injection for chat completions (#120)
- Reference uploaded files in chat messages with file IDs
- Automatic content injection into chat context
Persistent Metadata Storage - File metadata persists across server restarts (#125)
- Sidecar JSON files (.meta.json) stored alongside data files
- Automatic recovery on startup with metadata rebuild from files
- Orphan file detection and optional cleanup
OpenAI Backend File Handling - Files uploaded locally are forwarded to OpenAI when needed (#121, #122)
GPT-5.2 Model Support - Added GPT-5.2 model metadata to OpenAI backend (#124)
Circuit Breaker Pattern - Automatic backend failover with circuit breaker (#93)
- States: Closed → Open → Half-Open → Closed cycle
- Configurable failure thresholds and recovery timeout
- Per-backend circuit breaker instances
- Admin endpoints for circuit breaker status and control
Admin Endpoint Authentication - Secure admin endpoints with authentication and audit logging
Configurable Fallback Models - Automatic model fallback for unavailable model scenarios (#50)
- Define fallback chains for primary models (e.g., gpt-4o → gpt-4-turbo → gpt-3.5-turbo)
- Cross-provider fallback support (e.g., OpenAI → Anthropic)
- Automatic parameter translation between providers
- Integration with circuit breaker for layered failover protection
- Configurable trigger conditions (error codes, timeout, connection error, circuit breaker open)
- Response headers indicate when fallback was used (X-Fallback-Used, X-Original-Model, X-Fallback-Model)
- Prometheus metrics for fallback monitoring
Pre-commit Hook - Automated code formatting and linting before commits

Fixed¶

Fallback Chain Validation - Integrate chain validation into Validate derive
Fallback Performance - Use index-based lookup for fallback chain traversal
Lock Contention - Reduce lock contention in FallbackService with snapshot pattern
Security - Sanitize fallback error headers and metric labels
Circuit Breaker Security - Add backend name validation in admin endpoints
Thread Safety - Use CAS loop for thread-safe half-open request limiting

Changed¶

Documentation Updates - Comprehensive documentation for fallback configuration, circuit breaker, and Files API
Code Quality - Fix clippy warnings and format code
Pre-commit Hook Location - Move pre-commit hook to .githooks directory

[0.15.0] - 2025-12-05¶

Added¶

Nano Banana API Support - Add Gemini Image Generation API support with OpenAI-compatible interface (#102)
- Supports nano-banana and nano-banana-pro models
- Automatic format conversion between OpenAI Images API and Gemini Imagen API
Split /v1/models Endpoint - Standard lightweight response vs extended metadata response (#101)
- /v1/models returns lightweight response for better performance
- /v1/models?extended=true returns full metadata for detailed model information

Changed¶

Extract StreamService - Streaming handler logic extracted to dedicated StreamService for modular architecture (#106)
Eliminate Retry Logic Duplication - Consolidated retry logic code in proxy.rs (#103)

Fixed¶

Proper Error Propagation - Replace .expect() panics with proper error propagation in HttpClientFactory (#104)

Performance¶

LRU Cache Optimization - Use read lock instead of write lock for cache lookups (#105)

[0.14.2] - 2025-12-05¶

Added¶

Token Usage Logging - Log input/output token counts on request completion (#92)
Exclude List for Reports - Add exclude list configuration for reports

Changed¶

None

Fixed¶

None

[0.14.1] - 2025-12-05¶

Added¶

TTFB Benchmark Targets - Add TTFB benchmark targets to Makefile
Connection Pre-warming - Add connection pre-warming for Anthropic, Gemini, OpenAI backends

Fixed¶

Anthropic Backend TTFT - Optimize Anthropic backend TTFT with connection pooling and HTTP/2 (#90)
Gemini Backend TTFT - Optimize Gemini backend TTFT with connection pooling and HTTP/2 (#88)
Model Metadata Alias Matching - Apply base name fallback matching to aliases in model metadata lookup (#84)

Changed¶

Shared HTTP Client - Share HTTP client between HealthChecker and request handler
Updated architecture and performance documentation

[0.14.0] - 2025-12-04¶

Added¶

Global System Prompt Injection - Add router-wide global system prompt injection (#82)

Fixed¶

GitHub Actions - Replace deprecated actions-rs/toolchain with dtolnay/rust-toolchain
macOS ARM64 Build - Add RUSTFLAGS for macOS ARM64 ring build
musl Build - Switch to rustls-tls for musl cross-compilation support

Changed¶

Update GitHub Action runner

[0.13.0] - 2025-12-04¶

Added¶

OpenAI Responses API (/v1/responses) - Full implementation of OpenAI's Responses API (#49)
- Session-based response management with automatic expiration
- Background cleanup task for expired sessions
- Request/response format converter between Responses API and Chat Completions
SecretString for API Keys - Secure API key storage using SecretString across all backends (#76)
Model Metadata Override - Allow overriding /v1/models response fields via model-metadata.yaml (#75)

Fixed¶

True SSE Streaming - Implement proper Server-Sent Events streaming for /v1/responses API

Changed¶

Immediate Mode for SseParser - Reduced first-response latency with immediate parsing mode
String Allocation Optimizations - Improved performance with reduced allocations
Error Handling Standardization - Consistent error handling patterns across the codebase

Security¶

Session Access Control - Added proper access control for session management
Input Validation - Comprehensive input validation for Responses API

[0.12.0] - 2025-12-04¶

Added¶

SSRF Prevention Module - New UrlValidator module with comprehensive SSRF prevention (#66)
Centralized HTTP Client Factory - HttpClientFactory for consistent HTTP client creation across backends (#67)

Fixed¶

Consistent Hash Algorithm - Handle exact hash matches in binary search for proper routing (#72)
Replace Panics with Option Returns - Improve reliability by replacing panics with Option returns (#71)
Remove Hardcoded Auth Requirement - /v1/models endpoint no longer requires hardcoded authentication
GitHub Actions - Use GitHub App token for Projects V2 API access

Changed¶

Reorganize OpenAI Model Metadata - Model metadata organized by family for better maintainability (#74)
Extract AnthropicStreamTransformer - Dedicated module for Anthropic stream transformation (#73)
Split Backends Module - backends mod.rs split into separate modules for cleaner architecture (#69)
Extract Embedded Tests - Tests moved to separate files for better organization (#68)
Extract RequestExecutor - Shared common module for request execution (#65)
Extract HeaderBuilder - Auth strategies moved to dedicated module (#64)
Extract AtomicStatistics - Shared common module for atomic statistics

Technical Improvements¶

Improved code organization with modular architecture
Implemented stats aggregation for better observability
Enhanced security with SSRF prevention capabilities

[0.11.0] - 2025-12-03¶

Added¶

Native Anthropic Claude API backend (type: anthropic) with OpenAI-compatible endpoint (#33)
- Automatic API key loading from CONTINUUM_ANTHROPIC_API_KEY environment variable
- Extended thinking block support for Claude thinking models
- OpenAI to Claude reasoning parameter conversion (reasoning_effort)
- Support for flat reasoning_effort parameter
Claude 4, 4.1, 4.5 model metadata documentation

Fixed¶

Improve health check and model fetching for Anthropic/Gemini backends
Add Accept-Encoding: identity header to streaming requests to prevent compression issues
Fix make_backend_request in proxy.rs for proper Accept-Encoding handling

Changed¶

Refactor: apply code formatting and fix clippy warnings
Refactor: use reqwest no_gzip/no_brotli/no_deflate instead of Accept-Encoding header

[0.10.0] - 2025-12-03¶

Added¶

Native Google Gemini API backend (type: gemini) with OpenAI-compatible endpoint (#32)
- Automatic API key loading from CONTINUUM_GEMINI_API_KEY environment variable
- Extended 300s streaming timeout for thinking models (gemini-2.5-pro, gemini-3-pro)
- Automatic max_tokens adjustment for thinking models to prevent response truncation
- Support for reasoning_effort parameter
Native OpenAI API backend (type: openai) with built-in configuration
- Automatic API key loading from CONTINUUM_OPENAI_API_KEY environment variable
- Built-in OpenAI model metadata in /v1/models response
OpenAI Images API support (/v1/images/generations) for DALL-E and gpt-image-1 models (#35)
- Configurable image generation timeout (timeouts.request.image_generation)
- Comprehensive input validation for image generation parameters
- Response format validation for image generation API
Authenticated health checks for OpenAI and API-key backends
API key authentication to streaming requests
Filter /v1/models to show only configured models
Allow any config file path when explicitly specified via -c/--config
.env.example and typed backend configuration examples
Comprehensive model metadata for GLM 4.6, Kimi K2, DeepSeek, GPT, and Qwen3 series

Fixed¶

Streaming response truncation for thinking models (gemini-2.5-pro, gemini-3-pro)
Model ID normalization and streaming compatibility for Gemini backend
Convert max_tokens to max_completion_tokens for newer OpenAI models
Correct URL construction for all API endpoints
Security: Remove sensitive data from debug logs
Security: Add request body size limits to prevent DoS attacks

Changed¶

Refactor: Unify request retry logic with RequestType enum
Refactor: Improve Gemini backend performance with lock-free statistics and slice returns
Add Gemini backend documentation and max_tokens behavior documentation
Add image generation API documentation
Standardize capability naming in model-metadata.yaml

[0.9.0] - 2025-12-02¶

Added¶

Native Google Gemini API backend (type: gemini) with OpenAI-compatible endpoint (#32)
- Automatic API key loading from CONTINUUM_GEMINI_API_KEY environment variable
- Extended 300s streaming timeout for thinking models (gemini-2.5-pro, gemini-3-pro)
- Automatic max_tokens adjustment for thinking models to prevent response truncation
- Support for reasoning_effort parameter
OpenAI Images API support (/v1/images/generations) for DALL-E and gpt-image-1 models (#35)
Configurable image generation timeout (timeouts.request.image_generation)
Comprehensive model metadata for OpenAI models including GPT-5 family, o-series, audio/speech, video (Sora), and embedding models
Enhanced rate limiting with token bucket algorithm (#11)
Comprehensive Prometheus metrics and monitoring (#10)
Configuration file migration and auto-correction CLI utility (#29)
Comprehensive authentication for metrics endpoint

Fixed¶

CRITICAL: Eliminate race condition in token refill
CRITICAL: Protect API keys with SHA-256 hashing
CRITICAL: Prevent memory exhaustion via unbounded bucket growth
CRITICAL: Prevent header injection vulnerabilities
HIGH: Prevent IP spoofing via X-Forwarded-For manipulation
HIGH: Implement singleton pattern for metrics to prevent memory leaks
HIGH: Eliminate unnecessary string allocations
HIGH: Implement model extraction for rate limiting
Add comprehensive cardinality limits and label sanitization to prevent metric explosion DoS attacks
Improve error handling to prevent panic conditions
Resolve environment variable race condition in config test
Fix integration test failure in metrics RequestTimer
Fix unit test failures in metrics security module

Changed¶

Refactor: remove excessive Arc wrapping in rate limiting
Reorganize documentation structure for better maintainability
Add comprehensive metrics documentation
Update documentation for rate limiting feature
Remove development mock server and sample config files
Remove temporary test files and improve gitignore
Remove duplicate man page and update gitignore
Update README.md to mention correct repo
Update release workflows

[0.8.0] - 2025-09-09¶

Added¶

Model ID alias support for metadata sharing (#27)
Comprehensive rate limiting documentation
Robust rate limiting to models endpoint to prevent DoS via cache poisoning

Fixed¶

Return empty list instead of 503 when all backends are unhealthy (#28)
Improve error handling and classification
Resolve clippy warnings for MutexGuard held across await points

Changed¶

Increase rate limits for /v1/models endpoint to be more practical
Add alias feature documentation to configuration.md

[0.7.1] - 2025-09-08¶

Fixed¶

Improve config path validation for home directory and executable paths (#26)

[0.7.0] - 2025-09-07¶

Added¶

Extend /v1/models endpoint with rich metadata support (#23) (#25)
Enhanced Configuration Management (#9) (#22)
Advanced load balancing strategies with enhanced error handling (#21)

Fixed¶

Use streaming timeout configuration from config.yaml instead of hardcoded 25s limit

Changed¶

Add yaml to exclude list

[0.6.0] - 2025-09-03¶

Added¶

GitHub Project automation workflow
Comprehensive timeout configuration and model documentation updates

Fixed¶

Use timeout configuration from config.yaml instead of hardcoded values (#19)
Fix clippy warnings and benchmark compilation issues

Changed¶

Apply cargo fmt

[0.5.0] - 2025-09-02¶

Added¶

Extensible architecture with layered design (#16)
Comprehensive integration tests and performance optimizations
Complete service layer implementation
Middleware architecture and enhanced backend abstraction
Configurable connection pool size with CLI and config file support
Comprehensive configuration management with YAML support (#7)
Debian packaging and man page for continuum-router

Fixed¶

Handle Option correctly in tests
Update test to handle streaming requests without model field gracefully
Resolve floating-point precision and timing issues in tests
Resolve test failures and deadlocks in object pool and SSE parser
Resolve CI test failures and improve test performance
Resolve config watcher test failures in CI environment
Resolve initial health check race condition
Critical security vulnerabilities in error handling and retry logic
Adjust timeout test tolerance for timing variations

Changed¶

Extract complex types into type aliases for better readability
Resolve all cargo fmt and clippy warnings
Make retry configuration optional with sensible defaults
Optimize config access and add comprehensive timeout management
Update model names in timeout configuration to latest versions
Complete documentation update
Split oversized modules into layered architecture

Performance¶

Optimize config access and add comprehensive timeout management

[0.4.0] - 2025-08-25¶

Added¶

Model-based routing with health monitoring (#6)

Fixed¶

Improve health check integration and SSE parsing for better compatibility

Changed¶

Update README.md

[0.3.0] - 2025-08-25¶

Added¶

SSE streaming support for real-time chat completions (#5)

Fixed¶

Handle non-success status codes in streaming responses
Allow streaming to continue even when backend returns 404 or other error status codes
Send SSE error event first to notify client of the backend error status

[0.2.0] - 2025-08-25¶

Added¶

Model aggregation from multiple endpoints (#4)

[0.1.0] - 2025-08-24¶

Added¶

OpenAI-compatible endpoints and proxy functionality
/v1/models endpoint for listing available models
/v1/completions endpoint for legacy OpenAI completions API
/v1/chat/completions endpoint for chat API
Multiple backends support with round-robin load balancing (#1)
Fallback handler for undefined routes with proper error messages

Fixed¶

Improve error handling consistency across all endpoints

Changed¶

Update README with changelog and version information

Migration Notes¶

Upgrading to v0.16.0¶

New Files API: OpenAI-compatible Files API is now available at /v1/files
- Upload files for fine-tuning, batch processing, or assistants
- Files are stored locally with persistent metadata
- Configure via files_api section in config.yaml
File Resolution: Reference uploaded files in chat completions
- Use file IDs in your chat messages for automatic content injection
Persistent Metadata: File metadata now survives server restarts
- Set metadata_storage: persistent (default) in files_api config
- Set cleanup_orphans_on_startup: true to auto-clean orphaned files
Circuit Breaker: Add circuit_breaker section to your config.yaml for automatic backend failover
- Configure failure threshold, recovery timeout, and half-open requests
New Fallback Feature: Add fallback section to your config.yaml to enable automatic model fallback
- Define fallback chains: fallback_chains: { "gpt-4o": ["gpt-4-turbo", "gpt-3.5-turbo"] }
- Configure trigger conditions in fallback_policy
- Cross-provider fallback is supported (e.g., OpenAI → Anthropic)
Circuit Breaker Integration: Set circuit_breaker_open: true in trigger_conditions to integrate with existing circuit breaker
Response Headers: Check X-Fallback-Used header to detect when fallback was used
GPT-5.2 Support: New GPT-5.2 model metadata is available
No breaking changes from v0.15.0

Upgrading to v0.15.0¶

Split /v1/models Endpoint: The /v1/models endpoint now returns a lightweight response by default
- For extended metadata, use /v1/models?extended=true
- This improves performance for clients that only need basic model information
Nano Banana API: New support for Gemini Image Generation (Imagen) through OpenAI-compatible interface
- Use nano-banana or nano-banana-pro model names
Error Handling: Improved reliability with proper error propagation instead of panics
Performance: LRU cache now uses read locks for better concurrent performance
No breaking changes from v0.14.x

Upgrading to v0.13.0¶

New Responses API: The /v1/responses endpoint is now available for OpenAI Responses API compatibility
- Sessions are automatically managed with background cleanup for expired sessions
- True SSE streaming provides real-time responses
Security: API keys are now stored using SecretString for improved security across all backends (#76)
Model Metadata: Override /v1/models response fields via model-metadata.yaml (#75)
No breaking changes from v0.12.0

Upgrading to v0.12.0¶

No breaking changes: This is a refactoring release with improved code organization
Bug fix: Consistent hash routing now correctly handles exact hash matches
Security: SSRF prevention module added for URL validation
Reliability: Panics replaced with Option returns for better error handling
API change: /v1/models endpoint no longer has hardcoded auth requirement

Upgrading to v0.11.0¶

New Anthropic backend: Add type: anthropic backends for native Anthropic Claude API support
- Set CONTINUUM_ANTHROPIC_API_KEY environment variable for authentication
- Supports extended thinking with automatic parameter conversion
- OpenAI reasoning_effort parameter is automatically converted to Claude's thinking format
Streaming improvements: Accept-Encoding fixes ensure proper streaming for all backends
No breaking changes from v0.10.0

Upgrading to v0.10.0¶

New OpenAI backend: Add type: openai backends for native OpenAI API support
- Set CONTINUUM_OPENAI_API_KEY environment variable for authentication
- Built-in model metadata is automatically included in /v1/models response
Image Generation API: New /v1/images/generations endpoint for DALL-E models
- Configure timeout via timeouts.request.image_generation (default: 120s)
- Supports responseformat validation (url or b64json)
Gemini improvements: Streaming response truncation fixed for thinking models
- Model ID normalization ensures proper routing
API key authentication: Streaming requests now support API key authentication
Security: Request body size limits prevent DoS attacks
Newer OpenAI models automatically use max_completion_tokens instead of max_tokens

Upgrading to v0.9.0¶

New Gemini backend: Add type: gemini backends for native Google Gemini API support
- Set CONTINUUM_GEMINI_API_KEY environment variable for authentication
- Thinking models (gemini-2.5-pro, gemini-3-pro) automatically get max_tokens: 16384 if client sends values below 4096
Enhanced rate limiting with token bucket algorithm is now available
Configure rate limiting via rate_limiting section in config.yaml
Prometheus metrics are now available at /metrics endpoint with authentication
Use --migrate-config-file CLI option to migrate and fix configuration files
Multiple critical security fixes have been applied to rate limiting

Upgrading to v0.8.0¶

Rate limiting is now enabled for the /v1/models endpoint
Empty list is returned instead of 503 error when all backends are unhealthy
Model aliases are now supported for metadata sharing

Upgrading to v0.7.0¶

Enhanced configuration management requires updating configuration files
New load balancing strategies are available
Streaming timeout is now configurable via config.yaml

Upgrading to v0.6.0¶

Timeout configuration is now read from config.yaml instead of hardcoded values
Update your configuration files to include timeout settings

Upgrading to v0.5.0¶

Major architectural refactoring with layered design
Configuration management now supports YAML files
Retry mechanisms have been enhanced with security improvements
Connection pool size is now configurable

This changelog reflects the actual development history of Continuum Router from its initial release to the current version.