Skip to content

Changelog

All notable changes to Continuum Router are documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

Added

  • Cohere/Jina-Compatible Rerank and Sparse Embedding Endpoints - Add support for advanced retrieval APIs (#374)

    • New /v1/rerank endpoint (Cohere-compatible) for document reranking as a second-stage retrieval step
    • New /embed_sparse endpoint (TEI/Jina-compatible) for sparse embeddings (SPLADE format)
    • Supports both simple string documents and structured documents with text field for reranking
    • Request/response types with comprehensive validation for model, query, and documents fields
    • New capability mappings: rerank -> rerank method, sparse_embedding -> embed_sparse method
    • Example models added to model-metadata.yaml: BGE Reranker, Jina Reranker, SPLADE models
  • BGE-M3 and Multilingual Embedding Model Support - Add model metadata and configuration examples for BGE-M3 and equivalent multilingual embedding models (#373)

    • BGE-M3: 568M parameters, 1024 dimensions, 100+ languages, 8192 context. Supports dense, sparse (lexical), and ColBERT multi-vector retrieval
    • BGE-Large-EN-v1.5: 335M parameters, 1024 dimensions, English-only, 512 context
    • Multilingual-E5-Large: 560M parameters, 1024 dimensions, 100+ languages, 514 context
    • Example backend configurations for vLLM, Ollama, and Text Embeddings Inference (TEI) deployments
    • Addresses cross-lingual retrieval requirements for RAG systems
  • Plain Text Support for Anthropic File Transformer - Add text/plain support to Anthropic file transformer (#342)

    • Text files are converted to document blocks with base64 data (same format as PDF)
    • Maximum text file size: 32MB (same as PDF)
    • Text files don't have magic bytes validation (accepts any content)
    • Updated SUPPORTEDDOCUMENTTYPES to include text/plain alongside application/pdf
    • Updated error messages to mention plain text support
  • PDF Support for OpenAI and Anthropic File Transformers - Add PDF file support to file transformers (#340)

    • OpenAI transformer: PDFs are converted to file blocks with base64 data or file IDs
    • Anthropic transformer: PDFs are converted to document blocks with base64 data
    • PDF magic bytes validation (%PDF- signature) for security
    • Maximum PDF size: 32MB (100 page limit enforced by backends)
    • Images remain at 20MB limit
  • Native Anthropic Responses API Support - Add native Anthropic Messages API conversion for Responses API (#332)

    • New AnthropicConverter for converting Responses API requests to native Anthropic Messages format
    • Full PDF file support via Anthropic's document understanding (input_file with file_data)
    • Image file support with automatic media type detection
    • Extended thinking (reasoning) content support for Claude 3+ models
    • Streaming support with proper SSE event transformation from Anthropic format
    • Non-streaming support with complete response transformation

Fixed

  • SSRF Validation for External File URLs - Add SSRF protection when fetching external files (#332)

    • Private IP address validation (blocks 10.x.x.x, 172.16-31.x.x, 192.168.x.x, 127.x.x.x)
    • Localhost and link-local address blocking
    • IPv6 loopback and link-local address blocking
    • DNS rebinding protection via IP validation after resolution
  • Media Type Whitelist for File Inputs - Add security whitelist for allowed file types (#332)

    • PDF: application/pdf
    • Images: image/jpeg, image/png, image/gif, image/webp
    • Rejects unsupported media types with clear error messages
  • AI SDK Compatibility for Responses API Streaming - Fix Vercel AI SDK compatibility issues with Responses API streaming (#334)

    • Updated ResponseStreamEvent serialization to use dot-separated type names matching OpenAI spec (e.g., "type": "response.output_text.done" instead of "type": "output_text_done")
    • Added item_id field to OutputItemAdded, OutputItemInProgress, OutputItemDone, and other streaming events
    • Added sequence_number field to track event ordering (uses u64 to prevent overflow in long streaming sessions)
    • Custom Serialize implementation for ResponseStreamEvent ensures correct JSON output format
    • All existing streaming tests updated to verify new fields are correctly serialized
  • Immediate Health Check After Hot Reload Backend Sync - Trigger immediate health check when backends are added via hot reload (#367)

    • New backends added via configuration hot reload are now health-checked immediately
    • Previously, new backends remained unavailable for up to 30 seconds (the default health check interval)
    • Made HealthChecker::perform_health_checks() public for external invocation
    • Improves model availability responsiveness in Backend.AI GO and other clients

0.34.0 - 2026-01-14

Added

  • Automatic Quality Parameter Conversion - Add automatic quality parameter conversion between DALL-E and GPT Image models (#330)
    • to_dalle_quality() method on ImageQuality enum for converting GPT Image quality values to DALL-E equivalents
    • Quality conversion applied transparently in handle_openai_image_generation() and handle_streaming_image_generation()
    • Quality conversion mapping:
      • DALL-E 3: low/medium/auto → standard, high → hd
      • GPT Image: standard → medium, hd → high
      • Gemini models: quality parameter ignored (no changes needed)
    • is_dalle3_model() helper for exact DALL-E 3 model matching
    • convert_quality_for_model() helper to eliminate code duplication
    • Conversion is logged for debugging and happens transparently without user-facing warnings

0.33.0 - 2026-01-13

Added

  • Local File Resolution for Responses API - Resolve local file_id references in Responses API requests (#325)
    • Files uploaded via the Files API can now be referenced using file_id in Responses API requests
    • FileResolver service scans requests for file_id references and loads content from local storage
    • File content is converted to base64 file_data format before sending to backends
    • Security features: file ownership verification (user_id check) and 10MB size limit for injection
    • Graceful degradation: resolution failures fall back to original request with warning logs

Fixed

  • Responses API Flat Tool Format - Fix Responses API /v1/responses endpoint to accept flat tool format (#323)
    • Function tools now use flat format: {"type": "function", "name": "...", "parameters": {...}}
    • This aligns with OpenAI's Responses API specification
    • Nested format (with function wrapper object) is no longer accepted for Responses API
    • Updated documentation with flat tool format examples

0.32.0 - 2026-01-09

Added

  • Reasoning Effort Documentation and xhigh Fallback Logging - Add comprehensive reasoning effort documentation and improve fallback logging for xhigh effort level (#317)
    • New documentation explaining reasoning effort parameter usage
    • Improved logging when xhigh effort level falls back to high for non-GPT-5.2 models

Fixed

  • Implicit Message Type Inference in Responses API InputItem - Support implicit message type inference when role field is missing (#316)
    • Optimized InputItem deserializer for better performance
    • Added invalid role test coverage
    • Enables more flexible input handling in Responses API

0.31.5 - 2026-01-09

Added

  • Responses API Pass-through for Native OpenAI Backends - Smart routing for /v1/responses API based on backend type (#313)

    • OpenAI and Azure OpenAI backends now use pass-through mode, forwarding requests directly to /v1/responses endpoint
    • Other backends (Anthropic, Gemini, vLLM, Ollama, LlamaCpp, Generic) automatically convert to their native format
    • Pass-through mode benefits: native PDF support, preserved reasoning state, access to built-in tools (websearch, filesearch), better cache utilization
    • New router.rs module with ResponsesApiStrategy enum for routing decisions
    • New passthrough.rs module with PassthroughService for direct request forwarding
    • Request payload size validation (16MB limit) for DoS prevention
    • Comprehensive test coverage for routing strategy, error handling, and request validation
  • OpenAI Responses API File Input Types - Add support for multi-modal file inputs in Responses API (#311)

    • New input_text, input_file, and input_image content part types
    • Support for PDF documents and images via base64 data URLs (file_data)
    • Support for external file URLs (file_url) with SSRF validation
    • Warning logs for unsupported file_id references (Files API integration pending)
    • Backend-specific transformers for Anthropic (document/image blocks) and Gemini (inlinedata/filedata)
    • Comprehensive test coverage for all input types

Fixed

  • Pass-through Raw Error Responses - Forward raw backend error responses in pass-through mode for better error debugging

0.31.4 - 2026-01-07

Fixed

  • Hot Reload Support for API Key Forwarding - Fix hot reload support in proxy and streaming handlers (#310)
    • Use current_config() instead of captured config snapshot in proxy and streaming handlers
    • API key and other configuration changes via hot reload now properly apply to new requests
    • Ensures runtime configuration updates affect backend request forwarding
    • Added comprehensive end-to-end tests for hot reload api_key application

0.31.3 - 2026-01-06

Fixed

  • Unix Socket Anthropic Request/Response Transformation - Fix Anthropic backends accessed via Unix socket failing due to missing transformations (#307, #308)

    • Unix socket transport now applies the same request transformation as HTTP transport for Anthropic backends
    • OpenAI-format requests are properly converted to Anthropic format before sending
    • Anthropic responses are transformed back to OpenAI format
    • Endpoint is correctly rewritten from /v1/chat/completions to /v1/messages
    • Added comprehensive integration tests for Unix socket Anthropic transformations
  • Anthropic Non-streaming Stream Parameter - Preserve stream parameter for non-streaming Anthropic requests (#305, #306)

    • Replace transform_openai_to_anthropic_request (which forces stream: true) with transform_openai_to_anthropic_with_global_prompt in non-streaming path
    • Fixes issue where requests with stream: false were incorrectly sent to Anthropic API with stream: true
    • Renamed transform_openai_to_anthropic_request to transform_openai_to_anthropic_streaming for clarity

Documentation

  • Jinja2 Syntax Escaping - Escape Jinja2 syntax in Korean configuration docs to prevent mkdocs-macros-plugin errors

0.31.2 - 2026-01-05

Added

  • Non-streaming Support for Anthropic Backend - Transform OpenAI-formatted requests to Anthropic format for non-streaming chat completion calls, and convert Anthropic responses back to OpenAI format

    • Non-streaming requests to Anthropic backends now properly transform request/response formats
    • Updated streaming handlers to use transform_str for proper tool call handling
  • Tool Call and Tool Result Transformation for Anthropic Backend - Enable proper tool use workflows when routing to Anthropic models

    • Transform OpenAI-style toolcalls in assistant messages to Anthropic's tooluse format
    • Transform tool result messages to Anthropic's tool_result format
    • Enables multi-turn tool use conversations with Anthropic models

Dependencies

  • Update 12 packages including rustls, tokio-stream, and syn to latest versions

0.31.1 - 2026-01-04

Fixed

  • Anthropic Non-streaming Authentication Headers - Fix non-streaming Anthropic requests failing with wrong authentication header (#300, #301)
    • Non-streaming requests to Anthropic backends now correctly use x-api-key header instead of Authorization: Bearer
    • Added anthropic-version header for all Anthropic backend requests
    • Applied consistent header handling between HTTP and Unix socket transport paths
    • Fixed issue where Anthropic API returned "Invalid Anthropic API Key" error (HTTP 400)

0.31.0 - 2026-01-04

Added

  • Unix Socket Server Binding - Add Unix socket binding support alongside TCP (#298)

    • Server can now bind to Unix domain sockets for local communication
    • Configure via server.unix_socket in config file
    • Supports concurrent TCP and Unix socket bindings
  • Reasoning Parameter Support for Responses API - Add reasoning parameter support to /v1/responses endpoint (#296)

    • Supports nested format: {"reasoning": {"effort": "high"}}
    • Valid effort levels: low, medium, high, xhigh (GPT-5.2 only)
    • Type-safe validation using ReasoningEffortLevel enum
    • Automatic conversion to flat reasoning_effort format for backends
    • Invalid effort values rejected at deserialization with clear error messages
    • Added with_reasoning() builder method for ResponsesRequest
  • Configurable Health Check Endpoints - Add configurable health check endpoints per backend type

    • Customize health check paths for different backend types
    • Support for backend-specific health verification

0.30.0 - 2026-01-01

Added

  • Wildcard Patterns and Date Suffix Handling for Model Aliases - Support wildcard patterns and automatic date suffix handling in model aliases (#286)
    • Automatic date suffix normalization: Models with date suffixes (e.g., claude-opus-4-5-20251130) automatically match metadata for base model (e.g., claude-opus-4-5-20251101)
    • Supported date formats: -YYYYMMDD, -YYYY-MM-DD, -YYMM, @YYYYMMDD
    • Wildcard pattern matching in aliases using * character
    • Prefix patterns: claude-* matches claude-opus, claude-sonnet, etc.
    • Suffix patterns: *-preview matches gpt-4o-preview, o1-preview, etc.
    • Infix patterns: gpt-*-turbo matches gpt-4-turbo, gpt-3.5-turbo, etc.
    • Zero-config date handling: Works automatically without configuration changes
    • Matching priority: Exact ID > Exact alias > Date suffix > Wildcard > Base name fallback

Fixed

  • Anthropic Backend Default URL - Apply default URL for Anthropic backend when not specified (#288)
  • Backend-specific owned_by Values - Replace owned_by placeholders with backend-type-specific values (#287)

Documentation

  • Translate wildcard pattern and date suffix handling documentation to Korean (#289)

0.29.0 - 2026-01-01

Added

  • Accelerated Health Checks During Backend Warmup - Implement accelerated health check during backend warmup (#282)
    • When a backend returns HTTP 503 (Service Unavailable), it enters a "warming up" state
    • During warmup, health checks occur at an accelerated interval (default: 1 second)
    • Reduces model availability detection latency from up to 30 seconds to approximately 1 second
    • Configurable via warmup_check_interval (default: 1s) and max_warmup_duration (default: 300s)
    • Particularly useful for backends like llama.cpp that return HTTP 503 while loading models
  • Model Metadata CLI Option - Add --model-metadata option for specifying model metadata file path (#281)
    • New --model-metadata CLI argument to specify model metadata YAML file at runtime
    • Overrides config file model_metadata_file setting
    • Supports absolute paths, relative paths, and tilde expansion (~)

Fixed

  • OpenAI owned_by Field - Replace OpenAI owned_by placeholder with 'openai' (#280)
    • Models from OpenAI backend now correctly show owned_by: openai instead of placeholder text
  • Admin API Race Condition - Prevent race condition in Admin API concurrent backend creation (#278)
    • Fixed issue where concurrent backend creation requests could cause data corruption
    • Added proper synchronization for backend management operations
  • Hot Reload Processing Steps - Add missing processing steps to hot reload (#277)
    • Fixed issue where some configuration changes were not properly applied during hot reload
    • Ensures all processing steps are executed when configuration is reloaded
  • Cloud Backend Availability Status - Cloud backends now show available:true in /v1/models/{model_id} (#272)
    • Fixed issue where cloud backends (OpenAI, Anthropic, Gemini) were incorrectly showing as unavailable
    • Cloud backends are now correctly marked as available when healthy

Documentation

  • Add tests and documentation for v0.29.0 features (#459da6a)

0.28.0 - 2025-12-31

Added

  • SSE Streaming Support for Tool Calls - Add SSE streaming support for tool calls (#258)
    • Real-time streaming of tool call responses over Server-Sent Events
    • Enables efficient streaming responses for function-calling scenarios
  • llama.cpp Tool Calling Auto-Detection - Auto-detect tool calling support via /props endpoint (#263)
    • Queries /props endpoint during model discovery to analyze chat_template
    • Detects tool-related keywords (tool, tools, tool_call, function, etc.)
    • Automatically enables function_calling capability when detected
    • Graceful fallback when /props endpoint is unavailable
    • Works with both HTTP and Unix socket backends
  • Extended /v1/models/{model_id} Endpoint - Extend with rich metadata fields (#262)
    • Returns comprehensive model metadata including capabilities and pricing
    • Enhanced response format with additional model information
  • Tool Result Message Transformation - Implement tool result message transformation for multi-turn conversations (#265)
    • Transforms tool result messages (role: "tool") to backend-native formats
    • Anthropic: Converts to user role with tool_result content blocks
    • Gemini: Converts to function role with functionResponse parts
    • Combines consecutive tool results for Anthropic (parallel tool calls)
    • Automatic function name lookup for Gemini transformations
    • Preserves is_error indicator for error responses
  • Backend-specific owned_by Placeholders - Add owned_by placeholders for llamacpp, vllm, ollama, http backends (#267)

Improved

  • CLI Help Output Formatting - Improve --help output formatting with title header and project attribution (#269)
    • Enhanced visual appearance for command-line help
    • Added project attribution in help output

Fixed

  • Model Metadata Cache Sync - Sync model metadata cache with ConfigManager (#270)
    • Ensures model cache properly reflects configuration changes

CI/CD

  • Comprehensive integration tests for tool calling (#264)

Dependencies

  • Bump the minor-and-patch group with 3 updates (#257)

Technical

  • Fix dead_code warnings for Unix-only items on Windows builds

[0.27.0] - 2025-12-28

Added

  • llama.cpp Tool Calling Auto-Detection - Automatic detection of tool calling support for llama.cpp backends (#260)
    • Queries /props endpoint during model discovery to analyze chat_template
    • Detects tool-related keywords (tool, tools, tool_call, function, etc.)
    • Automatically enables function_calling capability when detected
    • Graceful fallback when /props endpoint is unavailable
    • Works with both HTTP and Unix socket backends
  • Complete Unix Socket Support - Full Unix socket support for model discovery and streaming (#248, #252, #253, #254, #256)
    • SSE/streaming support for Unix socket backends, enabling real-time responses over local sockets
    • Backend type auto-detection for Unix socket connections
    • vLLM model discovery support via Unix sockets
    • llama.cpp model discovery support via Unix sockets
    • Model fetcher fully supports Unix socket backends
  • Tool Call Transformation - Implement OpenAI tool call transformation across all backends (#244, #245, #246)
    • Tool definition transformation for Anthropic, Gemini, and llama.cpp backends
    • Tool choice transformation with support for auto, none, required, and specific function selection
    • Tool call response transformation for unified response format across providers
  • Multi-Turn Tool Conversation Support - Message transformation for tool calling in multi-turn conversations (#241)
    • Transforms tool result messages (role: "tool") to backend-native formats
    • Anthropic: Converts to user role with tool_result content blocks
    • Gemini: Converts to function role with functionResponse parts
    • Combines consecutive tool results for Anthropic (parallel tool calls)
    • Automatic function name lookup for Gemini transformations
    • Preserves is_error indicator for error responses

[0.26.0] - 2025-12-27

Added

  • Tool Choice Transformation - Automatic transformation of tool_choice parameter across backends (#239)
    • Supports auto, none, required, and specific function selection
    • Anthropic: Transforms to {"type": "auto|any|tool"} format, handles "none" by removing tools
    • Gemini: Transforms to tool_config.function_calling_config structure
    • llama.cpp: Preserves parallel_tool_calls parameter for parallel function calling
    • Integrates with model fallback system for cross-provider tool calling
  • Single Model Retrieval Endpoint - Add GET /v1/models/{model} endpoint for single model retrieval with availability status (#236)
    • Returns model information with an additional available field indicating real-time availability
    • available: true when at least one healthy backend provides the model
    • available: false when the model exists but all backends providing it are unhealthy
    • Returns 404 if the model does not exist across any backend
    • Optimized performance: avoids full model aggregation by targeting specific model lookup

[0.25.0] - 2025-12-26

Added

  • CORS (Cross-Origin Resource Sharing) Support - Configurable CORS middleware for embedding the router in web applications (#234)
    • Support for Tauri desktop apps, Electron apps, and web frontends
    • Wildcard origins and port patterns (e.g., http://localhost:*)
    • Custom schemes support (e.g., tauri://localhost)
    • Configurable methods, headers, and credentials
    • Preflight cache with configurable max-age
  • Unix Domain Socket Backend Support - Secure local LLM communication via Unix sockets (#232)
    • Use unix:///path/to/socket URL scheme for local backends
    • Better security through file system permissions (no TCP port exposure)
    • Lower latency than localhost TCP (~30% improvement)
    • No port conflicts when running multiple LLM servers
    • Platform support: Linux and macOS (Windows planned for future releases)

[0.23.1] - 2025-12-25

CI/CD

  • Windows Build Support - Add Windows x86_64 build target to release workflow (#224)
    • Enables native Windows builds in the release pipeline
    • Cross-compilation from Linux using mingw-w64

[0.23.0] - 2025-12-23

Added

  • GLM 4.7 Model Support - Add support for Zhipu AI's GLM 4.7 model with thinking capabilities (#222)
    • Model metadata in model-metadata.yaml with full specifications (355B MoE, 32B active parameters)
    • Support for thinking parameters: enable_thinking (boolean) and thinking_budget (1-204,800 tokens)
    • 200K context window with up to 131K token output
    • Z.AI backend configuration example in config.yaml.example
    • SiliconFlow alternative backend configuration
    • Comprehensive integration tests for model metadata
    • Pricing: $0.60/1M input tokens, $2.20/1M output tokens
  • Thinking Pattern Metadata - Add thinking pattern metadata for models with implicit start tags (#218)
    • Support for models that use implicit thinking start tags
    • Pattern-based detection for thinking content extraction
  • GCP Service Account Authentication - Add GCP Service Account authentication support for Gemini backend (#208)
    • Support for JSON key file authentication
    • Environment variable based authentication
    • Automatic token refresh and management
  • Distributed Tracing - Add distributed tracing with correlation ID propagation (#207)
    • W3C Trace Context support with traceparent header
    • Configurable trace ID, request ID, and correlation ID headers
    • Trace ID propagation across all retry attempts
    • Security validation for trace IDs from headers
  • New Model Metadata - Add model metadata for NVIDIA Nemotron 3 Nano, Qwen Image Layered, and Kakao Kanana-2 (#202)
  • ASCII Diagram Replacement - Add ASCII diagram to image replacement system for MkDocs (#200)
    • Automatic replacement of ASCII diagrams with SVG images during MkDocs build
    • Preserve ASCII art visibility in raw Markdown

Changed

  • CI Optimization - Skip Rust tests when only non-code files change (#204)
    • Faster CI for documentation-only changes
    • Path-based filtering for test execution

Fixed

  • Cache Stampede Prevention - Prevent cache stampede with singleflight, stale-while-revalidate, and background refresh (#220)
    • Singleflight pattern prevents thundering herd on model cache expiration
    • Stale-while-revalidate returns cached data immediately while refreshing in background
    • Background refresh proactively updates cache before expiration
  • Hot Reload for Global Prompts - Apply global_prompts changes via hot reload (#219)
    • Global prompt configuration changes now take effect without restart
  • Model Cache Invalidation - Invalidate model cache when backend config changes (#206)
    • Backend configuration changes now properly trigger model cache refresh
  • Documentation Improvements - Improve diagram rendering with inline SVG and responsive sizing
  • Translation Typo - Fix translation typo in documentation
  • Docker CI Fixes - Handle multi-line tags in Docker manifest creation
  • Private Repo Access - Use gh CLI for private repo asset download in CI
  • GitHub Token Auth - Add GitHub token authentication for private repository access
  • Docs Workflow - Remove release trigger from docs workflow to avoid environment protection error

CI/CD

  • Bump actions/github-script from 7 to 8 (#210)
  • Bump apple-actions/import-codesign-certs from 3 to 6 (#212)
  • Bump actions/cache from 4 to 5 (#211)
  • Bump actions/checkout from 4 to 6 (#209)

[0.22.0] - 2025-12-19

Added

  • Docker Support with Pre-built Binaries - Add Dockerfile and Dockerfile.alpine that download pre-built binaries from GitHub Releases (#189)
    • Debian Bookworm-based image (~50MB) for general use
    • Alpine 3.20-based image (~10MB) for minimal deployments
    • Multi-architecture support (linux/amd64, linux/arm64) using TARGETARCH
    • VERSION build argument for selecting release version
    • Non-root user execution for security
    • OCI labels for image metadata
  • Container Health Check CLI - Implement --health-check CLI argument for container orchestration (#189)
    • Returns exit code 0 if server is healthy, 1 if unhealthy
    • Optional --health-check-url for custom health endpoint
    • Proper IPv6 address handling
    • 5-second default timeout
  • Docker Compose Quick Start - Add docker-compose.yml for easy deployment (#189)
    • Volume mount for configuration
    • Environment variable support (RUST_LOG)
    • Resource limits and health checks
  • Automated Docker Image Publishing - Add Docker build and push to ghcr.io in release workflow (#189)
    • Builds both Debian and Alpine images after binary release
    • Multi-platform support (linux/amd64, linux/arm64)
    • Automatic tagging with semver (VERSION, MAJOR.MINOR, latest)
    • Alpine images tagged with -alpine suffix
    • GitHub Actions cache for faster builds
  • MkDocs Documentation Website - Build comprehensive documentation site with Material theme (#183)
    • Full navigation structure with Getting Started, Features, Operations, and Development sections
    • GitHub Actions workflow for automatic deployment to GitHub Pages
    • Custom stylesheets and theme configuration
  • Korean Documentation Translation - Complete Korean localization of all documentation (#190)
    • All 20 documentation files translated to Korean
    • Language switcher in navigation (English/Korean)
    • Multi-language build in GitHub Actions workflow
  • Dependency Security Auditing - Add cargo-deny for vulnerability scanning (#192)
    • Security advisory checking in CI workflow
    • License compliance verification
    • Dependency source validation
  • Dependabot Integration - Automated dependency updates for Cargo and GitHub Actions (#192)
  • Security Policy - Add comprehensive SECURITY.md with vulnerability reporting process (#191)

Changed

  • Integrate orphaned architecture documentation into MkDocs site (#186)
  • Rename documentation files to lowercase kebab-case for URL-friendly filenames (#186)
  • Update various GitHub Actions to latest versions (checkout@v6, setup-python@v6, upload-artifact@v6, etc.)

Fixed

  • Health check response validation logic bug (operator precedence issue)
  • Address parsing fallback that was silently hiding configuration errors
  • IPv6 address formatting in health check (now correctly uses bracket notation)

Security

  • Updated reqwest 0.11→0.12, prometheus 0.13→0.14, validator 0.18→0.20
  • Replaced dotenv with dotenvy for better maintenance
  • Added .dockerignore to exclude sensitive files from build context

[0.21.0] - 2025-12-19

Added

  • Gemini 3 Flash Preview Model - Add support for gemini-3-flash-preview model (#168)
  • Backend Error Passthrough - Pass through detailed error messages from backends for 4xx responses (#177)
    • Parse and forward original error messages from OpenAI, Anthropic, and Gemini backends
    • Preserve param field when available (useful for invalid parameter errors)
    • Falls back to generic error message if backend response cannot be parsed
    • Error format remains OpenAI-compatible
    • Comprehensive unit tests for error parsing across all backend formats
  • Default Authentication Mode for API Endpoints - Configurable authentication enforcement for API endpoints (#173)
    • New mode field in api_keys configuration: permissive (default) or blocking
    • permissive mode: Requests without API key are allowed (backward compatible)
    • blocking mode: Only authenticated requests are processed, unauthenticated requests receive 401
    • Protected endpoints: /v1/chat/completions, /v1/completions, /v1/responses, /v1/images/*, /v1/models
    • Health endpoints (/health, /healthz) always accessible without authentication
    • Hot reload support for authentication mode changes
    • Comprehensive integration tests for both modes
    • Updated API.md, configuration.md, and manpage documentation

Fixed

  • UTF-8 Multi-byte Character Corruption - Handle UTF-8 multi-byte character corruption in streaming responses (#179)
  • GPT Image response_format - Strip response_format parameter for GPT Image models (#176)
  • Auto-discovery Validation - Allow auto-discovery for all backends except Anthropic (#172)

Changed

  • Updated architecture.md and fixed documentation issues (#167, #169)
  • Added AGENTS.md and linked CLAUDE.md to it

[0.20.0] - 2025-12-18

Added

  • Image Variations Support for Gemini - Add image variations support for Gemini (nano-banana) models (#165)
  • Image Edit Support for Gemini - Implement limited image edit support for Gemini (nano-banana) models (#164)
  • Enhanced Image Generation - Enhance /v1/images/generations with streaming and GPT Image features (#161)
  • GPT Image 1.5 Model - Add gpt-image-1.5 model support (#159)
  • Image Variations Endpoint - Implement /v1/images/variations endpoint for image variations (#155)
  • Image Edits Endpoint - Implement /v1/images/edits endpoint for image editing (inpainting) (#156)
    • Full OpenAI Images Edit API compatibility
    • Supports GPT Image models: gpt-image-1, gpt-image-1-mini, gpt-image-1.5 (recommended)
    • Legacy support for dall-e-2 model
    • Multipart form-data parsing with shared utilities
    • PNG image validation (format, size, square dimensions)
    • Optional mask validation (dimension matching with source image)
  • Shared Image Utilities - Implement shared utilities for image edit/variations endpoints (#154)
  • External Prompt Files - Support loading system prompts from external Markdown files (#146)
    • New prompt_file field in BackendPromptConfig and ModelPromptConfig
    • New default_file and prompts_dir fields in GlobalPromptConfig
    • Secure path validation with path traversal attack prevention
    • REST API endpoints for prompt file management
    • File caching with size limits (100 entries max, 50MB total)
    • Hot-reload support for prompt files
  • Solar Open 100B Model - Add Solar Open 100B model metadata
  • Automatic Model Discovery - Backends automatically discover available models from /v1/models API when models are not explicitly configured (#142)
    • OpenAI, Gemini, and vLLM backends support auto-discovery
    • Ollama backend uses vLLM's discovery mechanism (OpenAI-compatible API)
    • 10-second timeout prevents blocking startup
    • Falls back to hardcoded defaults if discovery fails

Changed

  • BackendFactory::create_backend_from_typed_config() is now async to support async model discovery
  • Backend from_config() methods for OpenAI, Gemini, and vLLM are now async

Security

  • API Key Redaction - Implement API key redaction to prevent credential exposure (#150)

Performance

  • Binary Size Optimization - Optimize release binary size from 20MB to 6MB (70% reduction) (#144)

Refactored

  • Split large files for Priority 2 of issue #147
  • Large files to keep each under 500 lines (#148)

[0.19.0] - 2025-12-13

Added

  • Runtime Configuration Management API - Comprehensive REST API for viewing and modifying configuration at runtime (#139)
    • Configuration Query APIs:
      • GET /admin/config/full - Retrieve full configuration with sensitive info masked
      • GET /admin/config/sections - List all 15 configuration sections
      • GET /admin/config/{section} - Get specific section configuration
      • GET /admin/config/schema - JSON Schema for client-side validation
    • Configuration Modification APIs:
      • PUT /admin/config/{section} - Replace section configuration
      • PATCH /admin/config/{section} - Partial update (JSON merge patch)
      • POST /admin/config/validate - Validate configuration before applying
      • POST /admin/config/apply - Apply configuration with hot reload
    • Configuration Save/Restore APIs:
      • POST /admin/config/export - Export configuration (YAML/JSON/TOML)
      • POST /admin/config/import - Import and apply configuration
      • GET /admin/config/history - View configuration change history
      • POST /admin/config/rollback/{version} - Rollback to previous version
    • Backend Management APIs:
      • POST /admin/backends - Add new backend
      • GET /admin/backends/{name} - Get backend configuration
      • PUT /admin/backends/{name} - Update backend configuration
      • DELETE /admin/backends/{name} - Remove backend
      • PUT /admin/backends/{name}/weight - Update backend weight
      • PUT /admin/backends/{name}/models - Update backend model list
    • Sensitive information masking for API keys, passwords, tokens
    • JSON Schema generation for all configuration sections
    • Configuration history tracking (up to 100 entries, configurable)
    • Memory-efficient history storage with size-based eviction (10MB limit)
    • Atomic version counter using AtomicU64 for thread safety
    • Structured error responses with error codes
  • Admin REST API Documentation - Comprehensive developer guide (docs/admin-api.md)
    • Complete API reference with request/response examples
    • Client SDK examples for Python, JavaScript/TypeScript, and Go
    • Best practices and security considerations
  • Integration Tests - 33 integration tests for Configuration Management API endpoints

Fixed

  • CRITICAL: Configuration changes now actually applied to running system
  • CRITICAL: Memory growth controlled with JSON string storage and size-based eviction
  • HIGH: Input validation added (1MB content limit, 32-level nesting depth)
  • HIGH: Sensitive export requires elevated permission and audit logging
  • HIGH: Comprehensive sensitive field detection (30+ patterns)
  • MEDIUM: Validation functions now perform actual validation
  • MEDIUM: Race condition fixed with AtomicU64 for version counter
  • MEDIUM: Colon removed from allowed backend name characters
  • MEDIUM: Structured error responses with error codes
  • MEDIUM: Initialize flag prevents duplicate history entries
  • LOW: Unnecessary clones removed for better performance
  • LOW: Limits now configurable via AdminConfig
  • LOW: Duplicate validation logic refactored
  • LOW: Test coverage improved for edge cases

Changed

  • Enhanced documentation for Configuration Management API across all guides
  • Updated manpage with new admin endpoints
  • Updated API.md with comprehensive Configuration Management API section

[0.18.0] - 2025-12-13

Added

  • Per-API-Key Rate Limiting - Implement per-API-key rate limiting (#137)
    • Individual rate limits for each API key
    • Configurable requests per minute per key
  • API Key Management System - Comprehensive API key management and configuration system
    • Multiple key sources: config file, external file, environment variables
    • Key properties: scopes, rate limits, expiration, enabled status
    • Hot reload support for key configuration changes
  • Files API Authentication - Implement authentication and authorization for Files API (#131)
    • API key authentication for file operations
    • File ownership enforcement
    • Admin access control for all files
  • Hot Reload for Runtime Configuration - Complete hot reload functionality for runtime configuration updates (#130)
    • Automatic configuration file watching
    • Classified updates: immediate, gradual, restart-required

Changed

  • Major refactoring with modular structure
    • Extract CLI and app utilities into modular structure (#132)
    • Split converter.rs into modular structure (#132)
    • Split large source files into modular structure
    • Consolidate findgeminibackend function logic
  • Updated architecture.md to reflect refactored module structure

Fixed

  • Add ConnectInfo extension for admin/metrics/files endpoints
  • Address security vulnerabilities in API key management
  • Address code quality issues in API key management

Documentation

  • Add API key management documentation
  • Add comprehensive API key management tests

[0.17.0] - 2025-12-12

Added

  • Anthropic Backend File Content Transformation - Files uploaded to the router can now be used with Anthropic backend (#126)
    • Automatic conversion of file content to Anthropic message format
    • Support for text and document files with base64 encoding
    • Seamless integration with file resolution middleware
  • Gemini Backend File Content Transformation - Files uploaded to the router can now be used with Gemini backend (#127)
    • Automatic conversion of file content to Gemini API format
    • Support for inline data with proper MIME type handling
    • Cross-provider file support enables files uploaded once to work across all backends

Fixed

  • Streaming File Uploads - Implement streaming file uploads to prevent memory exhaustion (#128)
    • Large file uploads no longer load entire file into memory
    • Streaming processing for efficient memory usage
    • Prevents OOM errors when uploading large files

Changed

  • None

[0.16.0] - 2025-12-12

Added

  • OpenAI-Compatible Files API - Full implementation of OpenAI Files API endpoints (#111)
    • Upload files with multipart/form-data support
    • List, retrieve, and delete files
    • Download file content
    • Supports purpose: fine-tune, batch, assistants, user_data
  • File Resolution Middleware - Automatic file content injection for chat completions (#120)
    • Reference uploaded files in chat messages with file IDs
    • Automatic content injection into chat context
  • Persistent Metadata Storage - File metadata persists across server restarts (#125)
    • Sidecar JSON files (.meta.json) stored alongside data files
    • Automatic recovery on startup with metadata rebuild from files
    • Orphan file detection and optional cleanup
  • OpenAI Backend File Handling - Files uploaded locally are forwarded to OpenAI when needed (#121, #122)
  • GPT-5.2 Model Support - Added GPT-5.2 model metadata to OpenAI backend (#124)
  • Circuit Breaker Pattern - Automatic backend failover with circuit breaker (#93)
    • States: Closed → Open → Half-Open → Closed cycle
    • Configurable failure thresholds and recovery timeout
    • Per-backend circuit breaker instances
    • Admin endpoints for circuit breaker status and control
  • Admin Endpoint Authentication - Secure admin endpoints with authentication and audit logging
  • Configurable Fallback Models - Automatic model fallback for unavailable model scenarios (#50)
    • Define fallback chains for primary models (e.g., gpt-4o → gpt-4-turbo → gpt-3.5-turbo)
    • Cross-provider fallback support (e.g., OpenAI → Anthropic)
    • Automatic parameter translation between providers
    • Integration with circuit breaker for layered failover protection
    • Configurable trigger conditions (error codes, timeout, connection error, circuit breaker open)
    • Response headers indicate when fallback was used (X-Fallback-Used, X-Original-Model, X-Fallback-Model)
    • Prometheus metrics for fallback monitoring
  • Pre-commit Hook - Automated code formatting and linting before commits

Fixed

  • Fallback Chain Validation - Integrate chain validation into Validate derive
  • Fallback Performance - Use index-based lookup for fallback chain traversal
  • Lock Contention - Reduce lock contention in FallbackService with snapshot pattern
  • Security - Sanitize fallback error headers and metric labels
  • Circuit Breaker Security - Add backend name validation in admin endpoints
  • Thread Safety - Use CAS loop for thread-safe half-open request limiting

Changed

  • Documentation Updates - Comprehensive documentation for fallback configuration, circuit breaker, and Files API
  • Code Quality - Fix clippy warnings and format code
  • Pre-commit Hook Location - Move pre-commit hook to .githooks directory

[0.15.0] - 2025-12-05

Added

  • Nano Banana API Support - Add Gemini Image Generation API support with OpenAI-compatible interface (#102)
    • Supports nano-banana and nano-banana-pro models
    • Automatic format conversion between OpenAI Images API and Gemini Imagen API
  • Split /v1/models Endpoint - Standard lightweight response vs extended metadata response (#101)
    • /v1/models returns lightweight response for better performance
    • /v1/models?extended=true returns full metadata for detailed model information

Changed

  • Extract StreamService - Streaming handler logic extracted to dedicated StreamService for modular architecture (#106)
  • Eliminate Retry Logic Duplication - Consolidated retry logic code in proxy.rs (#103)

Fixed

  • Proper Error Propagation - Replace .expect() panics with proper error propagation in HttpClientFactory (#104)

Performance

  • LRU Cache Optimization - Use read lock instead of write lock for cache lookups (#105)

[0.14.2] - 2025-12-05

Added

  • Token Usage Logging - Log input/output token counts on request completion (#92)
  • Exclude List for Reports - Add exclude list configuration for reports

Changed

  • None

Fixed

  • None

[0.14.1] - 2025-12-05

Added

  • TTFB Benchmark Targets - Add TTFB benchmark targets to Makefile
  • Connection Pre-warming - Add connection pre-warming for Anthropic, Gemini, OpenAI backends

Fixed

  • Anthropic Backend TTFT - Optimize Anthropic backend TTFT with connection pooling and HTTP/2 (#90)
  • Gemini Backend TTFT - Optimize Gemini backend TTFT with connection pooling and HTTP/2 (#88)
  • Model Metadata Alias Matching - Apply base name fallback matching to aliases in model metadata lookup (#84)

Changed

  • Shared HTTP Client - Share HTTP client between HealthChecker and request handler
  • Updated architecture and performance documentation

[0.14.0] - 2025-12-04

Added

  • Global System Prompt Injection - Add router-wide global system prompt injection (#82)

Fixed

  • GitHub Actions - Replace deprecated actions-rs/toolchain with dtolnay/rust-toolchain
  • macOS ARM64 Build - Add RUSTFLAGS for macOS ARM64 ring build
  • musl Build - Switch to rustls-tls for musl cross-compilation support

Changed

  • Update GitHub Action runner

[0.13.0] - 2025-12-04

Added

  • OpenAI Responses API (/v1/responses) - Full implementation of OpenAI's Responses API (#49)
    • Session-based response management with automatic expiration
    • Background cleanup task for expired sessions
    • Request/response format converter between Responses API and Chat Completions
  • SecretString for API Keys - Secure API key storage using SecretString across all backends (#76)
  • Model Metadata Override - Allow overriding /v1/models response fields via model-metadata.yaml (#75)

Fixed

  • True SSE Streaming - Implement proper Server-Sent Events streaming for /v1/responses API

Changed

  • Immediate Mode for SseParser - Reduced first-response latency with immediate parsing mode
  • String Allocation Optimizations - Improved performance with reduced allocations
  • Error Handling Standardization - Consistent error handling patterns across the codebase

Security

  • Session Access Control - Added proper access control for session management
  • Input Validation - Comprehensive input validation for Responses API

[0.12.0] - 2025-12-04

Added

  • SSRF Prevention Module - New UrlValidator module with comprehensive SSRF prevention (#66)
  • Centralized HTTP Client Factory - HttpClientFactory for consistent HTTP client creation across backends (#67)

Fixed

  • Consistent Hash Algorithm - Handle exact hash matches in binary search for proper routing (#72)
  • Replace Panics with Option Returns - Improve reliability by replacing panics with Option returns (#71)
  • Remove Hardcoded Auth Requirement - /v1/models endpoint no longer requires hardcoded authentication
  • GitHub Actions - Use GitHub App token for Projects V2 API access

Changed

  • Reorganize OpenAI Model Metadata - Model metadata organized by family for better maintainability (#74)
  • Extract AnthropicStreamTransformer - Dedicated module for Anthropic stream transformation (#73)
  • Split Backends Module - backends mod.rs split into separate modules for cleaner architecture (#69)
  • Extract Embedded Tests - Tests moved to separate files for better organization (#68)
  • Extract RequestExecutor - Shared common module for request execution (#65)
  • Extract HeaderBuilder - Auth strategies moved to dedicated module (#64)
  • Extract AtomicStatistics - Shared common module for atomic statistics

Technical Improvements

  • Improved code organization with modular architecture
  • Implemented stats aggregation for better observability
  • Enhanced security with SSRF prevention capabilities

[0.11.0] - 2025-12-03

Added

  • Native Anthropic Claude API backend (type: anthropic) with OpenAI-compatible endpoint (#33)
    • Automatic API key loading from CONTINUUM_ANTHROPIC_API_KEY environment variable
    • Extended thinking block support for Claude thinking models
    • OpenAI to Claude reasoning parameter conversion (reasoning_effort)
    • Support for flat reasoning_effort parameter
  • Claude 4, 4.1, 4.5 model metadata documentation

Fixed

  • Improve health check and model fetching for Anthropic/Gemini backends
  • Add Accept-Encoding: identity header to streaming requests to prevent compression issues
  • Fix make_backend_request in proxy.rs for proper Accept-Encoding handling

Changed

  • Refactor: apply code formatting and fix clippy warnings
  • Refactor: use reqwest no_gzip/no_brotli/no_deflate instead of Accept-Encoding header

[0.10.0] - 2025-12-03

Added

  • Native Google Gemini API backend (type: gemini) with OpenAI-compatible endpoint (#32)
    • Automatic API key loading from CONTINUUM_GEMINI_API_KEY environment variable
    • Extended 300s streaming timeout for thinking models (gemini-2.5-pro, gemini-3-pro)
    • Automatic max_tokens adjustment for thinking models to prevent response truncation
    • Support for reasoning_effort parameter
  • Native OpenAI API backend (type: openai) with built-in configuration
    • Automatic API key loading from CONTINUUM_OPENAI_API_KEY environment variable
    • Built-in OpenAI model metadata in /v1/models response
  • OpenAI Images API support (/v1/images/generations) for DALL-E and gpt-image-1 models (#35)
    • Configurable image generation timeout (timeouts.request.image_generation)
    • Comprehensive input validation for image generation parameters
    • Response format validation for image generation API
  • Authenticated health checks for OpenAI and API-key backends
  • API key authentication to streaming requests
  • Filter /v1/models to show only configured models
  • Allow any config file path when explicitly specified via -c/--config
  • .env.example and typed backend configuration examples
  • Comprehensive model metadata for GLM 4.6, Kimi K2, DeepSeek, GPT, and Qwen3 series

Fixed

  • Streaming response truncation for thinking models (gemini-2.5-pro, gemini-3-pro)
  • Model ID normalization and streaming compatibility for Gemini backend
  • Convert max_tokens to max_completion_tokens for newer OpenAI models
  • Correct URL construction for all API endpoints
  • Security: Remove sensitive data from debug logs
  • Security: Add request body size limits to prevent DoS attacks

Changed

  • Refactor: Unify request retry logic with RequestType enum
  • Refactor: Improve Gemini backend performance with lock-free statistics and slice returns
  • Add Gemini backend documentation and max_tokens behavior documentation
  • Add image generation API documentation
  • Standardize capability naming in model-metadata.yaml

[0.9.0] - 2025-12-02

Added

  • Native Google Gemini API backend (type: gemini) with OpenAI-compatible endpoint (#32)
    • Automatic API key loading from CONTINUUM_GEMINI_API_KEY environment variable
    • Extended 300s streaming timeout for thinking models (gemini-2.5-pro, gemini-3-pro)
    • Automatic max_tokens adjustment for thinking models to prevent response truncation
    • Support for reasoning_effort parameter
  • OpenAI Images API support (/v1/images/generations) for DALL-E and gpt-image-1 models (#35)
  • Configurable image generation timeout (timeouts.request.image_generation)
  • Comprehensive model metadata for OpenAI models including GPT-5 family, o-series, audio/speech, video (Sora), and embedding models
  • Enhanced rate limiting with token bucket algorithm (#11)
  • Comprehensive Prometheus metrics and monitoring (#10)
  • Configuration file migration and auto-correction CLI utility (#29)
  • Comprehensive authentication for metrics endpoint

Fixed

  • CRITICAL: Eliminate race condition in token refill
  • CRITICAL: Protect API keys with SHA-256 hashing
  • CRITICAL: Prevent memory exhaustion via unbounded bucket growth
  • CRITICAL: Prevent header injection vulnerabilities
  • HIGH: Prevent IP spoofing via X-Forwarded-For manipulation
  • HIGH: Implement singleton pattern for metrics to prevent memory leaks
  • HIGH: Eliminate unnecessary string allocations
  • HIGH: Implement model extraction for rate limiting
  • Add comprehensive cardinality limits and label sanitization to prevent metric explosion DoS attacks
  • Improve error handling to prevent panic conditions
  • Resolve environment variable race condition in config test
  • Fix integration test failure in metrics RequestTimer
  • Fix unit test failures in metrics security module

Changed

  • Refactor: remove excessive Arc wrapping in rate limiting
  • Reorganize documentation structure for better maintainability
  • Add comprehensive metrics documentation
  • Update documentation for rate limiting feature
  • Remove development mock server and sample config files
  • Remove temporary test files and improve gitignore
  • Remove duplicate man page and update gitignore
  • Update README.md to mention correct repo
  • Update release workflows

[0.8.0] - 2025-09-09

Added

  • Model ID alias support for metadata sharing (#27)
  • Comprehensive rate limiting documentation
  • Robust rate limiting to models endpoint to prevent DoS via cache poisoning

Fixed

  • Return empty list instead of 503 when all backends are unhealthy (#28)
  • Improve error handling and classification
  • Resolve clippy warnings for MutexGuard held across await points

Changed

  • Increase rate limits for /v1/models endpoint to be more practical
  • Add alias feature documentation to configuration.md

[0.7.1] - 2025-09-08

Fixed

  • Improve config path validation for home directory and executable paths (#26)

[0.7.0] - 2025-09-07

Added

  • Extend /v1/models endpoint with rich metadata support (#23) (#25)
  • Enhanced Configuration Management (#9) (#22)
  • Advanced load balancing strategies with enhanced error handling (#21)

Fixed

  • Use streaming timeout configuration from config.yaml instead of hardcoded 25s limit

Changed

  • Add yaml to exclude list

[0.6.0] - 2025-09-03

Added

  • GitHub Project automation workflow
  • Comprehensive timeout configuration and model documentation updates

Fixed

  • Use timeout configuration from config.yaml instead of hardcoded values (#19)
  • Fix clippy warnings and benchmark compilation issues

Changed

  • Apply cargo fmt

[0.5.0] - 2025-09-02

Added

  • Extensible architecture with layered design (#16)
  • Comprehensive integration tests and performance optimizations
  • Complete service layer implementation
  • Middleware architecture and enhanced backend abstraction
  • Configurable connection pool size with CLI and config file support
  • Comprehensive configuration management with YAML support (#7)
  • Debian packaging and man page for continuum-router

Fixed

  • Handle Option correctly in tests
  • Update test to handle streaming requests without model field gracefully
  • Resolve floating-point precision and timing issues in tests
  • Resolve test failures and deadlocks in object pool and SSE parser
  • Resolve CI test failures and improve test performance
  • Resolve config watcher test failures in CI environment
  • Resolve initial health check race condition
  • Critical security vulnerabilities in error handling and retry logic
  • Adjust timeout test tolerance for timing variations

Changed

  • Extract complex types into type aliases for better readability
  • Resolve all cargo fmt and clippy warnings
  • Make retry configuration optional with sensible defaults
  • Optimize config access and add comprehensive timeout management
  • Update model names in timeout configuration to latest versions
  • Complete documentation update
  • Split oversized modules into layered architecture

Performance

  • Optimize config access and add comprehensive timeout management

[0.4.0] - 2025-08-25

Added

  • Model-based routing with health monitoring (#6)

Fixed

  • Improve health check integration and SSE parsing for better compatibility

Changed

  • Update README.md

[0.3.0] - 2025-08-25

Added

  • SSE streaming support for real-time chat completions (#5)

Fixed

  • Handle non-success status codes in streaming responses
  • Allow streaming to continue even when backend returns 404 or other error status codes
  • Send SSE error event first to notify client of the backend error status

[0.2.0] - 2025-08-25

Added

  • Model aggregation from multiple endpoints (#4)

[0.1.0] - 2025-08-24

Added

  • OpenAI-compatible endpoints and proxy functionality
  • /v1/models endpoint for listing available models
  • /v1/completions endpoint for legacy OpenAI completions API
  • /v1/chat/completions endpoint for chat API
  • Multiple backends support with round-robin load balancing (#1)
  • Fallback handler for undefined routes with proper error messages

Fixed

  • Improve error handling consistency across all endpoints

Changed

  • Update README with changelog and version information

Migration Notes

Upgrading to v0.16.0

  • New Files API: OpenAI-compatible Files API is now available at /v1/files
    • Upload files for fine-tuning, batch processing, or assistants
    • Files are stored locally with persistent metadata
    • Configure via files_api section in config.yaml
  • File Resolution: Reference uploaded files in chat completions
    • Use file IDs in your chat messages for automatic content injection
  • Persistent Metadata: File metadata now survives server restarts
    • Set metadata_storage: persistent (default) in files_api config
    • Set cleanup_orphans_on_startup: true to auto-clean orphaned files
  • Circuit Breaker: Add circuit_breaker section to your config.yaml for automatic backend failover
    • Configure failure threshold, recovery timeout, and half-open requests
  • New Fallback Feature: Add fallback section to your config.yaml to enable automatic model fallback
    • Define fallback chains: fallback_chains: { "gpt-4o": ["gpt-4-turbo", "gpt-3.5-turbo"] }
    • Configure trigger conditions in fallback_policy
    • Cross-provider fallback is supported (e.g., OpenAI → Anthropic)
  • Circuit Breaker Integration: Set circuit_breaker_open: true in trigger_conditions to integrate with existing circuit breaker
  • Response Headers: Check X-Fallback-Used header to detect when fallback was used
  • GPT-5.2 Support: New GPT-5.2 model metadata is available
  • No breaking changes from v0.15.0

Upgrading to v0.15.0

  • Split /v1/models Endpoint: The /v1/models endpoint now returns a lightweight response by default
    • For extended metadata, use /v1/models?extended=true
    • This improves performance for clients that only need basic model information
  • Nano Banana API: New support for Gemini Image Generation (Imagen) through OpenAI-compatible interface
    • Use nano-banana or nano-banana-pro model names
  • Error Handling: Improved reliability with proper error propagation instead of panics
  • Performance: LRU cache now uses read locks for better concurrent performance
  • No breaking changes from v0.14.x

Upgrading to v0.13.0

  • New Responses API: The /v1/responses endpoint is now available for OpenAI Responses API compatibility
    • Sessions are automatically managed with background cleanup for expired sessions
    • True SSE streaming provides real-time responses
  • Security: API keys are now stored using SecretString for improved security across all backends (#76)
  • Model Metadata: Override /v1/models response fields via model-metadata.yaml (#75)
  • No breaking changes from v0.12.0

Upgrading to v0.12.0

  • No breaking changes: This is a refactoring release with improved code organization
  • Bug fix: Consistent hash routing now correctly handles exact hash matches
  • Security: SSRF prevention module added for URL validation
  • Reliability: Panics replaced with Option returns for better error handling
  • API change: /v1/models endpoint no longer has hardcoded auth requirement

Upgrading to v0.11.0

  • New Anthropic backend: Add type: anthropic backends for native Anthropic Claude API support
    • Set CONTINUUM_ANTHROPIC_API_KEY environment variable for authentication
    • Supports extended thinking with automatic parameter conversion
    • OpenAI reasoning_effort parameter is automatically converted to Claude's thinking format
  • Streaming improvements: Accept-Encoding fixes ensure proper streaming for all backends
  • No breaking changes from v0.10.0

Upgrading to v0.10.0

  • New OpenAI backend: Add type: openai backends for native OpenAI API support
    • Set CONTINUUM_OPENAI_API_KEY environment variable for authentication
    • Built-in model metadata is automatically included in /v1/models response
  • Image Generation API: New /v1/images/generations endpoint for DALL-E models
    • Configure timeout via timeouts.request.image_generation (default: 120s)
    • Supports responseformat validation (url or b64json)
  • Gemini improvements: Streaming response truncation fixed for thinking models
    • Model ID normalization ensures proper routing
  • API key authentication: Streaming requests now support API key authentication
  • Security: Request body size limits prevent DoS attacks
  • Newer OpenAI models automatically use max_completion_tokens instead of max_tokens

Upgrading to v0.9.0

  • New Gemini backend: Add type: gemini backends for native Google Gemini API support
    • Set CONTINUUM_GEMINI_API_KEY environment variable for authentication
    • Thinking models (gemini-2.5-pro, gemini-3-pro) automatically get max_tokens: 16384 if client sends values below 4096
  • Enhanced rate limiting with token bucket algorithm is now available
  • Configure rate limiting via rate_limiting section in config.yaml
  • Prometheus metrics are now available at /metrics endpoint with authentication
  • Use --migrate-config-file CLI option to migrate and fix configuration files
  • Multiple critical security fixes have been applied to rate limiting

Upgrading to v0.8.0

  • Rate limiting is now enabled for the /v1/models endpoint
  • Empty list is returned instead of 503 error when all backends are unhealthy
  • Model aliases are now supported for metadata sharing

Upgrading to v0.7.0

  • Enhanced configuration management requires updating configuration files
  • New load balancing strategies are available
  • Streaming timeout is now configurable via config.yaml

Upgrading to v0.6.0

  • Timeout configuration is now read from config.yaml instead of hardcoded values
  • Update your configuration files to include timeout settings

Upgrading to v0.5.0

  • Major architectural refactoring with layered design
  • Configuration management now supports YAML files
  • Retry mechanisms have been enhanced with security improvements
  • Connection pool size is now configurable

This changelog reflects the actual development history of Continuum Router from its initial release to the current version.