Changelog¶

All notable changes to Continuum Router are documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Unreleased ¶

Added¶

ChatGPT subscription / Codex backend authentication via OAuth device flow (#551, #592)
continuum-router auth login --backend <name> runs the OpenAI Codex three-step headless device-code flow: POST /api/accounts/deviceauth/usercode to mint a one-time user_code, POST /api/accounts/deviceauth/token polling, and a PKCE exchange at /oauth/token. Standards-compliant RFC 8628 device flow remains available for any future provider that implements it; the new OpenAICodexDeviceFlowClient is selected automatically for provider: openai.
Tokens are wrapped in SecretString, written to the configured token_store with mode 0600 on Unix using an O_CREAT|O_EXCL open + atomic rename; a random tempfile suffix prevents concurrent saves from colliding, and a partial write is unlinked on failure so secret material does not linger on disk.
Access-token expiry is parsed from the JWT exp claim (with a 1-hour fallback for non-JWT tokens) and clamped to a useful minimum so a degenerate expires_in from the provider cannot trigger a refresh storm.
Proactive refresh fires 60 s before expiry, single-flighted with a tokio::sync::Mutex. A 401 from the upstream backend triggers exactly one forced refresh and a single retry; the previous refresh token is preserved race-free when the provider omits refresh_token from a refresh response.
The strategy reports an identity_fingerprint() (backend name, client_id, token_store) so that hot-reload rebuilds the strategy when any of those rotate, instead of silently keeping the prior in-memory state.
The CLI strips C0/C1 control characters from verification_uri_complete and user_code before printing, so a hostile provider response cannot inject ANSI escapes that rewrite the terminal.
Every device-flow and runtime request to auth.openai.com / chatgpt.com/backend-api/codex carries originator: codex_cli_rs (configurable via auth.oauth.originator) and a codex_cli_rs/<version> User-Agent (configurable via auth.oauth.user_agent), matching the official Codex CLI so Cloudflare admits the traffic instead of returning a 403 JS challenge.
auth.type: oauth is accepted in YAML alongside the legacy o_auth snake_case rendering. client_id and scope default to the public Codex CLI values; only token_store is required for the ChatGPT-subscription case.
Anthropic Messages and Chat Completions surfaces both transparently route to the ChatGPT Codex backend (#592)
Any backend whose auth.type is oauth and whose provider uses the Codex flow (currently openai) is forced through the Responses API for every request, regardless of per-model responses_only metadata. chatgpt.com/backend-api/codex exposes /responses only — no /chat/completions — so chat-shaped models (e.g. gpt-5.5, alias-mapped claude-haiku-4-5) and unknown model IDs all dispatch through /v1/responses → …/backend-api/codex/responses. Non-OAuth OpenAI backends continue to honor the per-model responses_only flag.
New core::url_utils::compose_backend_url centralizes backend URL composition for the three OpenAI-compatible roots (/v1, /openai, /backend-api/codex). Replaces ad-hoc ends_with("/v1") || ends_with("/openai") checks across proxy/backend.rs, http/handlers/responses.rs, http/streaming/handler.rs, services/responses/stream_service.rs, and the Anthropic handler so the /backend-api/codex rule applies uniformly.
The proxy hot path (proxy/backend.rs, proxy/responses_only.rs, proxy/image_gen.rs, proxy/image_edit.rs) now flows through a backend-name-keyed AuthStrategyRegistry exposed on AppState via src/proxy/oauth_helper.rs. The helper looks up the strategy, calls refresh_if_needed() before sending, replaces the static-bearer header with one derived from the strategy, and force-refreshes + retries once on a 401. Static api_key auth continues to work unchanged when no strategy is registered.
The Anthropic-compatible handler (src/http/handlers/anthropic/handler.rs) consults the same registry. Client-supplied Authorization: sk-ant-… and x-api-key headers are dropped when the backend has an OAuth strategy, instead of being forwarded to OpenAI as the bearer.
Model fetcher detects OAuth-authed backends and falls back to the configured models list rather than probing /v1/models, since chatgpt.com/backend-api/codex does not expose a models endpoint.
Codex-compatible Responses API extensions (#536, #537)
POST /v1/responses/compact endpoint for context compaction — passthrough to OpenAI / Azure OpenAI native /v1/responses/compact; other backend types return 501.
store field on ResponsesRequest (defaults to true) controls upstream session persistence; Codex sends store: false for ephemeral requests.
output_text content part type alongside input_text so converters can differentiate assistant vs. user content in input items. All converters (OpenAI, Anthropic, Gemini) handle the new variant.

v1.5.6 - 2026-04-29¶

Fixed¶

/v1/chat/completions returned HTTP 502 responses_parse_failed for responses_only reasoning models (gpt-5.4-pro, gpt-5.5-pro). OpenAI's /v1/responses payload for these models contains output items shaped like { "id": "rs_...", "type": "reasoning", "summary": [] }, but OutputItem::Reasoning required content and status, so serde rejected the payload with missing field 'content'. The Anthropic Messages surface bypassed the strict variant on a different conversion path, masking the bug until directly tested. content and status are now optional on OutputItem::Reasoning; reasoning items are dropped before reaching Chat Completions clients (per existing project policy), so body shape is irrelevant beyond successful deserialization. (#594)

Changed¶

Realign gemini-3.1-pro-preview as the canonical metadata id for the Gemini 3.1 Pro family in model-metadata.yaml, with gemini-3.1-pro (and existing -latest / -customtools forms) demoted to aliases. Matches what generativelanguage.googleapis.com actually serves today — the canonical gemini-3.1-pro form returns 404 from upstream — and avoids implying GA availability that does not exist yet. The metadata cache still resolves both forms to the same entry. Note: alias-to-canonical rewriting on the upstream-bound payload is out of scope for this release; clients calling with the gemini-3.1-pro alias will still hit upstream 404 until that work lands. (#594)
Sample config.yaml registers the newly-available pro / 5.5 family models so the responses_only dispatch path can be exercised end-to-end against real upstreams (gpt-5.4-pro, gpt-5.2-pro, gpt-5.5, gpt-5.5-pro, claude-opus-4-7, gemini-3.1-pro, gemini-3.1-pro-preview); duplicate claude-haiku-4-5 entry removed.

v1.5.5 - 2026-04-27¶

Added¶

Transparent Responses-API routing for OpenAI Pro models (epic #581)
New responses_only: true capability flag in model-metadata.yaml and the built-in OpenAI registry marks gpt-5.2-pro, gpt-5.4-pro, and gpt-5.5-pro as served only on /v1/responses upstream (#574, #582)
/v1/chat/completions requests for responses_only models are dispatched to the upstream /v1/responses endpoint and translated back into a strict-mode chat.completion (or chat.completion.chunk for streaming) envelope, transparent to the client. Stream usage is gated by stream_options.include_usage, and per-model latency / success counters are recorded for the responses_only path (#578, #584)
/anthropic/v1/messages requests for responses_only models are converted to the Responses API shape, dispatched to /v1/responses, and translated back into Anthropic Messages JSON (or the Anthropic SSE event sequence for streaming) — tool-call round-trips, web-search emulation, and Unix-socket transports all branch on the flag (#575, #577, #583, #585, #586)
Anthropic Messages <-> Responses request transformer covers system → instructions, tools, toolchoice (including disable_parallel_tool_use → parallel_tool_calls: false), max_tokens → max_output_tokens, reasoning effort derivation, and multi-turn tool round-trips; the response transformer preserves thinking/text/tooluse ordering and stop-reason fidelity (#575, #583)
SSE streaming bridge (AnthropicResponsesStreamTranslator) maps Responses API events to Anthropic Messages events while preserving Anthropic's strict event-ordering invariants (single message_start, paired content_block_start/content_block_stop, terminal message_stop); handles mid-stream error / response.failed / response.cancelled, response.incomplete → stop_reason: max_tokens, deferred input tokens, and graceful early-close synthesis (#576, #585)
Only OpenAI and Azure OpenAI backends serve /v1/responses; pairing a responses_only model with another backend type produces a 400 invalid_request_error before any upstream call (rejection fires on both /v1/chat/completions and /anthropic/v1/messages surfaces) (#577, #589)
The first dispatch per (backend, model) pair logs at info level so operators can confirm Responses-API routing without enabling debug logs
Anthropic Messages → Responses requests explicitly send store: false to avoid upstream side-effects (#589)
22 deterministic, in-process integration tests covering the {Anthropic, Chat} × {gpt-5.4-pro, gpt-5.2-pro} × {non-streaming, streaming} × {plain, tool-call, reasoning} matrix, mid-stream backend-failure negatives on both surfaces, and an upstream byte-fragmentation regression guard (#579, #588)
Documented in docs/en/configuration/advanced.md (Responses-API-only Models section split into Models-marked-out-of-the-box, Marking-a-new-model, Dispatch-behavior, and Backend-type-constraint subsections), docs/en/architecture.md (Responses-API Routing data-flow diagram), and the docs/en/api.md Chat Completions and Anthropic Messages surface notes with a Transparent-Responses-API-routing subsection (#580, #587)

Fixed¶

Chat Completions responses-only routing now rejects incompatible-only backend configs before upstream dispatch and chooses a compatible OpenAI/Azure Responses backend when available (#589)
Chat assistant tool_calls[] are preserved as Responses function_call input items for stateless tool-result turns over /v1/chat/completions (#589)

v1.5.4 - 2026-04-25¶

Changed¶

Refresh model-metadata.yaml for late-April 2026 frontier model releases (#572, #573)
Add GPT-5.5 ($5/$30 per 1M, 1M context, knowledge cutoff 2025-12, omnimodal, leads Terminal-Bench 2.0 at 82.7%) and GPT-5.5 Pro ($30/$180 per 1M, Responses API only, deep reasoning) — released 2026-04-23
Add DeepSeek V4 Pro (1.6T total / 49B active MoE, 1M context, 384K max output, three reasoning effort modes) and DeepSeek V4 Flash (284B total / 13B active MoE, 1M context, 384K max output) with deepseek-chat and deepseek-reasoner retained as deprecated aliases per official API docs — released 2026-04-24
Add gpt-image-2 (token-billed instead of per-image: text $5/$30, image $8/$30 per 1M tokens; 1K/2K/4K resolution tiers; ~99% text accuracy in any language; built-in reasoning before generation; context-aware multi-turn editing; gpt-image-2-latest alias) — released 2026-04-21
Add Claude Opus 4.7 ($5/$25 per 1M, 1M context, 128K max output, knowledge cutoff 2026-01, high-resolution image support up to 2576px / 3.75MP, new tokenizer with ~1.0–1.35× token usage vs prior models, new xhigh effort level) — released 2026-04-16
Promote Gemini 3.1 series from preview to GA, retaining -preview suffix as alias for fallback compatibility (#573)
gemini-3.1-pro-preview → gemini-3.1-pro (with gemini-3.1-pro-preview, gemini-3.1-pro-preview-customtools, and gemini-3.1-pro-latest aliases)
gemini-3.1-flash-image-preview → gemini-3.1-flash-image (with gemini-3.1-flash-image-preview, nano-banana-2, and gemini-3.1-flash-image-latest aliases)
gemini-3.1-flash-lite-preview → gemini-3.1-flash-lite (with gemini-3.1-flash-lite-preview and gemini-3.1-flash-lite-latest aliases)
Updated gemini-3-flash-preview deprecation note to point to the new GA gemini-3.1-pro id

v1.5.3 - 2026-04-23¶

Added¶

HuggingFace repo-prefix stripping as a new matching phase (phase 5) in src/models/pattern_matching.rs (#555)
try_strip_hf_repo_prefix() validates a vendor/repo (or org/team/repo) prefix against a MAX_PREFIX_SEGMENTS = 3 bound, rejects empty segments (/repo, vendor/, vendor//repo), and rejects any ASCII whitespace before returning the residual
Phase 5 re-enters phases 1-4 on the stripped residual with a structurally-enforced recursion depth of exactly 1 (the re-entry call clears the allow_prefix_strip gate), so prefix stripping composes with the existing layered suffix peel in a single lookup — the motivating case unsloth/Qwen3.6-35B-A3B-GGUF now resolves to qwen3.6-35b-a3b without any hand-registered alias
Phase 5 runs before the wildcard phase; the blast-radius audit confirmed no *-bearing alias in model-metadata.yaml contains /, so the ordering change is behavior-neutral for existing routing
Phase numbering in tracing output realigned to match the documented phase chain (previous code emitted phase = 7 for the namespace fallback while comments called it phase 6)
12 new unit tests covering standard HF form, composition with suffix peel, case-sensitive vendor, registered-alias precedence, unresolvable residual, three-segment form, segment-cap rejection, no-slash input, whitespace rejection, empty segments, re-entry bounding, and alias-phase precedence
9 new integration tests in tests/format_suffix_normalization_test.rs exercising the full RouterConfig / BackendConfig public API through phase 5
Pipeline doc updated in docs/en/configuration/advanced.md (and Korean counterpart) with a new "HuggingFace repo-prefix stripping (phase 5)" section covering the composition semantics, security bounds, and out-of-scope list (hyphen prefixes, HF API discovery)

Changed¶

Replaced the previous phase-6 namespace fallback with the new phase-5 HuggingFace prefix-strip layer. The previous phase was case-sensitive and did not compose with suffix peel; the new phase applies stricter input validation (segment cap, empty-segment rejection, whitespace rejection) but composes with phase 4's case-insensitive peel through the bounded re-entry. Pathological inputs above MAX_PREFIX_SEGMENTS (3) — such as provider/deep/nested/model — are now rejected by phase 5 rather than silently matched via recursive rsplit_once fallback (#555)
Aliases currently classified as vendor-prefix in the #560 audit (e.g., Qwen/Qwen3.6-35B-A3B, MiniMaxAI/MiniMax-M2.5) are now peel-coverable-adjacent post-#555: phase 2 still wins on the explicit alias, but phase 5 + phase 4 together reach the same metadata. Retroactive removal is deferred to a follow-up audit per #555 design section 7

Fixed¶

POST /anthropic/v1/messages now works when the selected backend is configured with a unix:// URL (#567)
Native Anthropic backends and OpenAI-compatible backends both work over Unix sockets, for both non-streaming and streaming requests
Socket paths containing spaces (e.g. macOS ~/Library/Application Support/...) are handled correctly
Auth header selection (x-api-key for Anthropic backends, Authorization: Bearer for OpenAI-compatible backends) is correct on the Unix socket path
anthropic-version header is added automatically for Anthropic backends on the Unix socket path, matching the HTTP path behavior

v1.5.2 - 2026-04-21¶

Added¶

Regression tests locking down the transport-layer passthrough contract for llama.cpp and MLxcel backends (#562)
New tests/llamacpp_passthrough_test.rs and tests/mlxcel_passthrough_test.rs covering all four passthrough call sites: direct backend execute_chat_completion, factory-built backend (BackendFactory -> LlamaCppBackend), proxy/backend.rs HTTP handler, and the streaming handler
New test_mlxcel_factory_backend_passthrough_nonstandard_fields asserts that BackendFactory -> LlamaCppBackend::execute_chat_completion preserves non-standard fields byte-for-byte at transport time
Anthropic input test (tests/anthropic_input_test.rs) extended with explicit passthrough coverage
docs/en/architecture/backend-passthrough.md and its Korean counterpart docs/ko/architecture/backend-passthrough.md documenting the passthrough contract, the four guarded call sites, and the list of router-side transforms that run before transport (global_prompts, transform_payload_for_openai for o1/o3/gpt-5*, web_search injection) (#562, #563)
docs/reports/alias-audit-2026-04.md classifying every alias in model-metadata.yaml into peel-redundant, peel-redundant-but-kept, and peel-independent categories, with an "aliases vs peel" policy section added to docs/en/configuration/advanced.md (and the Korean counterpart) explaining when to prefer each mechanism (#560)

Changed¶

Narrowed the passthrough contract from an implied "byte-equivalent" global guarantee to a transport-layer scope — the router may still run global_prompts injection, o1/o3/gpt-5* payload transforms, and web_search tool injection before transport, but no provider-specific rewriting happens at the transport boundary (#563)
Comment-only clarifications in src/http/streaming/handler.rs, src/infrastructure/backends/factory/backend_factory.rs, src/infrastructure/backends/llamacpp/backend.rs, and src/proxy/backend.rs
Audited model-metadata.yaml aliases for peel-normalization redundancy: removed aliases that differ from the canonical ID only by suffixes already handled by the layered peel (-4bit, -q4_k_m, -fp8, -gguf, -mlx, -awq, etc.), while preserving aliases that encode canonical flavor variants (-qat, -instruct) or disambiguate parameter counts (#557)
New tests/alias_audit_helper.rs and tests/format_suffix_normalization_test.rs enforce the peel-vs-alias boundary going forward

CI¶

Target Ubuntu 26.04 LTS (Resolute) instead of 25.10 (Questing) in the Debian build workflow
Fall back to createdAt when release publishedAt is null in debian/update-changelog.sh to prevent changelog regression when the latest release is still in draft

v1.5.1 - 2026-04-20¶

Added¶

Built-in web_search tool for self-hosted LLM backends (#553)
Router-level tool transparently injected into chat completion requests for vLLM, Ollama, llama.cpp, MLxcel, LM Studio, Continuum Router, and Generic backends
Pluggable SearchProvider trait under src/services/search/ with SerperProvider implementation; Exa and Brave scaffolded behind the same trait
Configurable inject_policy (auto/always/never) with per-backend overrides; commercial backends (OpenAI, Azure, Gemini, Anthropic) left untouched so their native web_search continues to flow through unchanged
Bounded non-streaming tool-execution loop parses web_search tool calls, executes the provider, appends tool-role results, and re-invokes the backend up to max_tool_iterations rounds
New BackendTypeConfig::is_self_hosted / is_commercial helpers covered by unit tests enforcing the commercial/self-hosted partition invariant
API keys redacted in Debug output and never logged; hot-reload friendly WebSearchConfig with ${ENV} substitution
Prometheus counters for tool calls, injections, and iteration-cap hits under src/metrics/web_search
Layered quantization and format suffix normalization for model metadata lookup (#549)
New layered_format_strip() in src/models/pattern_matching.rs iteratively peels allowlisted quantization/format/flavor tokens from the right side of a model ID, retrying exact-id/alias/date-suffix matches after each peel
Token categories: BIT_WIDTH, GGUF_QUANT, FP_FORMAT, INT_FORMAT, LIBRARY, IMATRIX, UNSLOTH, CONTAINER, FLAVOR (all case-insensitive)
Parameter-count suffixes preserved: -Nbit stripped as quantization; -Nb, -aNb, -eNb, -0.6b kept as parameter counts
Canonical base IDs ending in allowlisted flavors (e.g. gemma-3-12b-qat) win via exact-id match before peel runs
Normalization pipeline wired into find_matching_config, BackendConfig::get_model_metadata, RouterConfig::get_model_metadata, RouterConfig::get_thinking_pattern_config, resolve_model_tier (routing), and get_model_profile (admin)
Model metadata for GLM 5.1, Qwen 3.6, and MiniMax M2.7 (#548)
Teams release notification posted to Microsoft Teams via Power Automate webhook after build and Docker jobs

Changed¶

Migrate documentation toolchain from MkDocs + Material for MkDocs to Zensical — reads mkdocs.yml natively and bundles required extensions

Fixed¶

Security: Cap layered peel phase with MAX_MODEL_ID_LEN=256 and MAX_PEEL_ITERATIONS=8 to eliminate DoS via pathological model IDs (previously O(n²) allocation on inputs like -4bit-4bit-4bit-...)
Security: Enforce 256-char model field length at /v1/chat/completions, /v1/completions, /v1/embeddings, and /v1/embeddings/sparse (parity with existing /v1/responses check)
Consolidate 7-phase metadata matching pipeline into a single implementation (find_matching_config_slice) with thin adapters at each call site, eliminating drift between BackendConfig, Config::get_model_metadata, Config::get_thinking_pattern_config, and find_matching_config
Replace cfg.to_ascii_lowercase() == peel with str::eq_ignore_ascii_case on the hot path (~4000 fewer per-request String allocations)
Pin Pygments <2.20 to fix MkDocs build failure (superseded by Zensical migration)

CI¶

Bump softprops/action-gh-release from 2 to 3 (#544)
Bump actions/github-script from 8 to 9 (#545)
Bump actions/upload-pages-artifact from 4 to 5 (#554)

Documentation¶

Document suffix-order ambiguity (-qat-4bit vs -4bit-qat) and internal peel phase bounds in docs/en/configuration/advanced.md
Add pattern_matching.rs to Model Aggregation Service module listing in docs/en/architecture.md with cross-reference to suffix normalization section
New docs/en/web-search.md feature documentation; config.yaml.example extended with web_search section

v1.5.0 - 2026-04-11¶

Added¶

Smart routing system with model tier & capability profile registry (#525, #531)
Rule-based request classifier & smart routing policy engine (#526, #532)
Load-aware dynamic tier adjustment (#527, #533)
LLM-based request classifier with hybrid mode (#528, #534)
Smart routing observability, admin API & documentation (#529, #535)
Codex-compatible Responses API extensions (#536, #537)

Changed¶

Upgrade core dependencies — axum 0.8, sha2 0.11, rand 0.10 (#523)
Add Gemma 4 model family metadata (#538)

Fixed¶

Complete smart routing integration gaps
Increase DefaultTransformer PDF size limit from 20MB to 32MB (#542)

CI¶

Bump actions/deploy-pages from 4 to 5 (#521)

Dependencies¶

Bump the minor-and-patch dependency group with 4 updates (#539)

Documentation¶

Add Codex-compatible Responses API gap analysis report

v1.4.5 - 2026-03-27¶

Fixed¶

Return 400 error when file references are used without file service configured (#519)

Changed¶

Add GLM-5-Turbo model metadata (#516)

Documentation¶

Fix Korean anti-AI-slop violations in ko/ documentation
Fix slop word and transition word in api.md

v1.4.4 - 2026-03-18¶

Fixed¶

Fix Anthropic thinking failing for high/xhigh reasoning effort — budget_tokens (32768) exceeded default max_tokens (16384), causing API rejection (#514)
Auto-adjust max_tokens to budget_tokens + 4096 when thinking is enabled and budget exceeds max

Changed¶

Add GPT-5.4 model family: gpt-5.4, gpt-5.4-pro, gpt-5.4-mini, gpt-5.4-nano with 1M context window (#515)
Update Gemini 3 series: add gemini-3.1-pro-preview, gemini-3-flash-preview, gemini-3.1-flash-lite-preview; mark gemini-3-pro-preview as deprecated
Recognize Gemini 3 Flash and 3.1 Flash-Lite as thinking models for include_thoughts auto-injection
Update Claude 4.6 models: context window to 1M (GA), fix Sonnet 4.6 max_output to 64K, correct knowledge cutoffs
Update config examples and documentation with latest model names across 8 files

v1.4.3 - 2026-03-18¶

Fixed¶

Fix Gemini thinking models (2.5 Pro, 3 Pro, etc.) not returning reasoning_content in streaming responses through the router (#513)
Replaced transform_payload_for_gemini() with transform_request_gemini() across all three Gemini streaming paths to ensure include_thoughts: true auto-injection

v1.4.2 - 2026-03-17¶

Changed¶

Change mid-stream fallback default to enabled for improved streaming reliability (#504)
Breaking: Mid-stream fallback is now enabled by default; set mid_stream_fallback.enabled: false to restore previous behavior

Documentation¶

Add failover latency tuning guide for optimizing fallback behavior

v1.4.1 - 2026-03-17¶

Added¶

Mid-stream fallback for streaming inference (#497) — when a backend fails mid-stream during SSE streaming, the router transparently retries with a fallback backend

Changed¶

Decouple pre-stream fallback from mid-stream fallback (#500) — each can now be independently enabled/disabled
Bump dependency versions to latest major releases

Fixed¶

Fix streaming config changes not detected in hot reload system (#503)
Fix mid-stream connection errors leaking to client during fallback (#502)
Remove unused config crate dependency

CI¶

Bump dorny/paths-filter from 3 to 4 (#493)
Bump actions/create-github-app-token from 2 to 3 (#494)

v1.4.0 - 2026-03-14¶

Added¶

Prefix-aware routing: PrefixAwareHash selection strategy with Consistent Hash with Bounded Loads (CHWBL) (#455, #457, #461)
Response caching: SHA256-based cache key computation with streaming response buffering and post-completion caching (#456, #459, #462)
Multi-tier CacheStore: in-memory backend (#466), Redis/Valkey backend with connection pooling (#467), and S3-backed tiered L1/L2 cache (#483)
KV cache index: shared data structure (#470), KV event consumer for vLLM backend streams (#471), prefix overlap scoring integrated into backend selection (#473), configuration/metrics/admin endpoints (#474)
Tiered KV cache with storage-tier awareness (GPU hot / external warm) (#484)
Disaggregated prefill/decode orchestration with external KV tensor transfer (#485)
Anthropic cache_control breakpoint auto-injection (#460)
Multimodal embedding support for Gemini Embedding 2 (#492)
Shared cache configuration and operational metrics (#468)
30 new models added to model-metadata.yaml (#472)

Changed¶

Rename VAST-specific identifiers to generic S3/external storage names (#490) — update configuration files if using VAST-specific field names

Fixed¶

Make RequestExecutor transport-aware for Unix socket paths with spaces (#488)
Replace relative source tree links with GitHub URLs in docs

CI¶

Bump docker/setup-qemu-action from 3 to 4 (#428)
Bump docker/metadata-action from 5 to 6 (#426)
Bump docker/setup-buildx-action from 3 to 4 (#429)
Bump docker/build-push-action from 6 to 7 (#430)
Bump docker/login-action from 3 to 4 (#427)

Documentation¶

Comprehensive KV cache feature documentation, benchmarks, and config examples (#477)
VAST Data connection guide and integration examples (#486)
Sync Korean documentation with English counterparts
Split monolithic configuration.md into 6 smaller files

v1.3.0 - 2026-03-12¶

Added¶

Agent Communication Protocol (ACP) support with JSON-RPC 2.0 protocol layer and stdio transport (#414, #420)
ACP session management with protocol lifecycle, initialize/shutdown handshake (#415, #421)
ACP-to-LLM inference pipeline with streaming support (#416, #422)
ACP tool call reporting and permission delegation (#417, #423)
MCP-over-ACP bridge for MCP server tunneling (#418, #424)
ACP agent registry with metadata and configuration support (#419, #425)
ACP integration tests for protocol lifecycle and session management

Fixed¶

Resolve clippy field_reassign_with_default warnings in ACP integration tests

CI¶

Bump actions/upload-artifact from 6 to 7 (#398)

Documentation¶

ACP architecture documentation with MkDocs integration
ACP practical usage guide with IDE integration examples
KV cache integration plan for router-level caching strategies

v1.2.1 - 2026-03-07¶

Added¶

MLxcel backend type support for MLX-based model serving (#412, #413) — fully API-compatible with llama-server, reusing the same backend implementation for health checks, model discovery, and proxying

v1.2.0 - 2026-03-06¶

Added¶

Admin Statistics API with comprehensive request-level statistics collection and reporting (#409)
Endpoints: GET /admin/stats, GET /admin/stats/models, GET /admin/stats/backends, POST /admin/stats/reset
Time-windowed queries, token usage tracking, latency percentiles (p50, p95, p99)
Statistics persistence with configurable snapshot path, interval, and staleness checks (#410, #411)
Atomic writes, restore on startup, final snapshot on graceful shutdown

Documentation¶

Add admin stats and persistence to configuration guide
Add post-refactoring benchmark report for v1.1.0 (#407)

v1.1.1 - 2026-03-04¶

Added¶

Embeddable library crate (Phase 1) — use continuum-router as a Rust dependency (#394)
Type-safe config builders for programmatic library usage (#400)
Cargo feature flags for optional library dependencies (#399)
Persistent storage for runtime API keys (#405)
New LLM model metadata entries (#403)

Fixed¶

Fix Gemini-specific transforms incorrectly applied in Anthropic handler (#404)

v1.1.0 - 2026-03-01¶

Added¶

Embedded WebUI for configuration management and API key administration (#388)
Windows AF_UNIX socket support via socket2 crate (#390)
Nano Banana 2 (Gemini Image Generation) support

Fixed¶

Resolve compilation error in ClientAddr::is_unix for tuple variant matching
Resolve Windows AF_UNIX socket accept failure and config validation
Accept Windows absolute paths in Unix socket config validation (#393)
Resolve Windows compilation errors in Unix socket tests and transport parsing (#392)

v1.0.0 - 2026-02-19¶

Added¶

Continuum Router federation — router-to-router chaining as a new backend type (#385)
LM Studio as a dedicated backend type (#381)
Anthropic adaptive thinking effort parameter (output_config.effort) (#384)
Adaptive thinking and auto reasoning effort level across backends (#378)
Cohere/Jina-compatible rerank and sparse embedding endpoints (#374)
BGE-M3 and multilingual embedding model support (#373)
Claude Opus 4.6 model metadata
Qwen3-Coder-Next, Qwen3-VL-30B/8B model metadata

Changed¶

Handle SIGTERM for graceful shutdown on Unix systems (#370)
Reduce per-backend filter and model metadata log verbosity during model refresh (#371, #375)

CI¶

Replace Ubuntu 24.10 with 25.10 in deb build matrix (#376)

v0.36.1 - 2026-01-30¶

Fixed¶

Trigger immediate health check after sync_backends during hot reload (#368) — new backends now available within 1-2 seconds instead of up to 30 seconds
Sync healthcheckinfo and use URL-based updates during hot reload (#369) — new backends properly receive API key authentication
Accelerate health checks for recently added backends — 1-second check interval for 5 minutes after addition
Trigger model cache refresh when backends transition to healthy state with 5-second debounce

v0.36.0 - 2026-01-27¶

Added¶

Native Anthropic Messages API handler with endpoint routing (#355)
Anthropic to OpenAI request/response transformation (#356, #357)
Anthropic streaming response format (#358)
Direct Anthropic to Gemini request/response transformation (#359)
File_id source type and file resolution for Anthropic input (#360)
Claude Code compatibility for Anthropic handler (#365)
Tiered token counting for all backend types
Parallel file reference resolution for improved performance
Anthropic-version header format validation

Fixed¶

Require HTTPS for image and document URLs to prevent SSRF
Return generic error messages to clients instead of backend details
Use authenticated user_id from API key for file ownership checks
Use UUID v4 for secure message/tool ID generation
Place tool messages before user text in Anthropic-to-OpenAI conversion
Override stopreason to tooluse when tool_use blocks are present
Apply maxcompletiontokens conversion for OpenAI-routed Anthropic requests
Propagate file access denied and not found errors to client
Call current_config() once per request for consistent behavior

Refactored¶

Extract common SSE event type and data extraction logic
Add parse_bytes method to SseParser for proper UTF-8 handling
Remove unnecessary Arc wrapper in AnthropicFileResolver
Box FileResolutionResult::Resolved to reduce enum size

v0.35.0 - 2026-01-23¶

Added¶

Gemini 3 thoughtSignature support in function calling (#354)
PDF support for OpenAI and Anthropic file transformers (#340)
Text/plain support for AnthropicFileTransformer (#342)

Fixed¶

Add PDF support to DefaultTransformer and file resolution (#343)
Add tool message transformation to non-streaming Anthropic requests (#344)
Reject non-image files in DefaultTransformer with clear error message (#338)
Fix AI SDK incompatibility with Responses API streaming format (#335)

v0.34.0 - 2026-01-16¶

Added¶

Automatic quality parameter conversion between DALL-E and GPT Image models (#330)

Changed¶

Native Anthropic conversion for Responses API PDF file uploads (#332)

Fixed¶

Gemini streaming toolcalls compatibility fixes (#333) — missing index field, toolchoice format preservation, unnecessary transformation removal

v0.33.0 - 2026-01-13¶

Added¶

/v1/embeddings endpoint for embedding API support (#319)
Resolve local file_id references in Responses API requests (#326)
user_data and evals purpose values for Files API (#322)

Fixed¶

Use flat tool format for Responses API function tools (#324)
Improve Unix socket test stability for parallel execution (#328)

v0.32.0 - 2026-01-09¶

Added¶

Reasoning effort documentation and improved xhigh fallback logging (#317)

Fixed¶

Support implicit message type inference in Responses API InputItem (#316)

Refactored¶

Optimize InputItem deserializer and add invalid role test

v0.31.5 - 2026-01-09¶

Added¶

Responses API pass-through support for native OpenAI backends (#313) — smart routing based on backend type with direct forwarding to /v1/responses endpoint
OpenAI Responses API file input types (#311) — support for input_text, input_file, input_image content parts with SSRF validation

Fixed¶

Forward raw backend error responses in pass-through mode
Address security and performance issues in Responses API pass-through

v0.31.4 - 2026-01-07¶

Fixed¶

Use current_config() for hot reload support in proxy handlers (#310) — API key and configuration changes via hot reload now properly apply to new requests

v0.31.3 - 2026-01-06¶

Fixed¶

Add Anthropic transformations to Unix socket transport (#308) — Unix socket transport now applies the same request/response transformations as HTTP transport
Preserve stream parameter for non-streaming Anthropic requests (#306)

v0.31.2 - 2026-01-05¶

Added¶

Non-streaming support for Anthropic backend requests
Tool call and tool result transformation for Anthropic backend — enables multi-turn tool use conversations

v0.31.1 - 2026-01-04¶

Fixed¶

Non-streaming Anthropic requests failing with wrong authentication header (#301) — now correctly uses x-api-key header instead of Authorization: Bearer

v0.31.0 - 2026-01-04¶

Added¶

Unix socket server binding alongside TCP (#298) — supports unix: URI scheme, socket_mode configuration, auto-cleanup
Reasoning parameter support for Responses API (#296) with nested format and low/medium/high/xhigh effort levels
xhigh reasoning effort support for GPT-5.2 thinking models with auto-downgrade for unsupported models
Configurable health check endpoints per backend type (#293) — custom endpoint, fallback endpoints, method, body, accept_status, and headers

Changed¶

Comprehensive reasoning parameter normalization across backends (#294)

v0.30.0 - 2026-01-01¶

Added¶

Wildcard patterns and date suffix handling in model aliases (#286) — automatic date suffix normalization, * pattern matching (prefix, suffix, infix), zero-config date handling

Fixed¶

Apply default URL for Anthropic backend when not specified (#288)
Replace owned_by placeholders with backend-type-specific values (#287)

Documentation¶

Translate wildcard pattern and date suffix handling documentation to Korean (#289)

v0.29.0 - 2026-01-01¶

Added¶

Accelerated health checks during backend warmup (#282) — 1s interval on HTTP 503, configurable via warmup_check_interval and max_warmup_duration
--model-metadata CLI option for specifying model metadata file path at runtime (#281)

Fixed¶

Replace OpenAI owned_by placeholder with 'openai' (#280)
Prevent race condition in Admin API concurrent backend creation (#278)
Fix missing processing steps in hot reload (#277)
Cloud backends now show available: true in /v1/models/{model_id} (#272)

v0.28.0 - 2025-12-31¶

Added¶

SSE streaming support for tool calls (#258)
llama.cpp tool calling auto-detection via /props endpoint (#263)
Extended /v1/models/{model_id} endpoint with rich metadata fields (#262)
Tool result message transformation for multi-turn conversations (#265)
Backend-specific owned_by placeholders for llamacpp, vllm, ollama, http (#267)

Changed¶

Improved --help output formatting with title header and project attribution (#269)

Fixed¶

Sync model metadata cache with ConfigManager (#270)

v0.27.0 - 2025-12-29¶

Added¶

Complete Unix socket support for model discovery and SSE streaming (#248, #252, #253, #254, #256)
SSE/streaming for Unix socket backends
Backend type auto-detection for Unix sockets
vLLM and llama.cpp model discovery via Unix sockets
Tool call transformation across all backends (#244, #245, #246) — tool definitions, tool_choice, and tool call responses for Anthropic, Gemini, and llama.cpp

v0.26.0 - 2025-12-27¶

Added¶

GET /v1/models/{model} endpoint for single model retrieval with real-time availability status (#236)

v0.25.0 - 2025-12-26¶

Added¶

CORS (Cross-Origin Resource Sharing) support (#234) — configurable origins, wildcard patterns, custom schemes (e.g., tauri://localhost), preflight cache
Unix Domain Socket backend support (#232) — unix:///path/to/socket scheme, lower latency than localhost TCP

v0.24.0 - 2025-12-26¶

Added¶

llama.cpp backend support for local LLM inference (#230)
Allow router to start without any backends configured (#226)

Changed¶

Enable hot reload for backend additions/removals from config (#229)

v0.23.1 - 2025-12-25¶

CI¶

Add Windows x86_64 build target to release workflow (#224)

v0.23.0 - 2025-12-23¶

Added¶

GLM 4.7 model support with thinking capabilities (#222)
GCP Service Account authentication support for Gemini (#208)
Distributed tracing with correlation ID propagation (#207) — W3C Trace Context with traceparent header
Thinking pattern metadata for models with implicit start tags (#218)
Model metadata for NVIDIA Nemotron 3 Nano, Qwen Image Layered, and Kakao Kanana-2 (#202)
ASCII diagram to image replacement system for MkDocs (#200)

Fixed¶

Prevent cache stampede with singleflight, stale-while-revalidate, and background refresh (#220)
Apply global_prompts changes via hot reload (#219)
Invalidate model cache when backend config changes (#206)

CI¶

Skip Rust tests in CI when only non-code files change (#204)
Bump actions/github-script from 7 to 8 (#210)
Bump apple-actions/import-codesign-certs from 3 to 6 (#212)
Bump actions/cache from 4 to 5 (#211)
Bump actions/checkout from 4 to 6 (#209)

v0.22.0 - 2025-12-19¶

Added¶

Docker support with pre-built binary images — Debian (~50MB) and Alpine (~10MB) with multi-arch support (#198)
Container health check CLI (--health-check) for orchestration (#198)
Docker Compose quick start configuration
Automated Docker image publishing to ghcr.io in release workflow
MkDocs documentation website with Material theme (#183)
Korean documentation translation (i18n) — complete localization of all 20 documentation files (#190)
Security policy with vulnerability reporting process (#191)
Dependency security auditing with cargo-deny and Dependabot (#192)

Changed¶

Integrate orphaned architecture documentation into MkDocs site (#186)
Rename documentation files to lowercase kebab-case for URL-friendly filenames

Fixed¶

Fix health check response validation logic bug (operator precedence)
Fix address parsing fallback silently hiding configuration errors
Fix IPv6 address formatting in health check

v0.21.0 - 2025-12-19¶

Added¶

Gemini 3 Flash Preview model support (#168)
Default authentication mode for API endpoints (#173) — permissive (default) or blocking mode
Backend error passthrough for 4xx responses (#177) — parse and forward original error messages from OpenAI, Anthropic, and Gemini

Fixed¶

Handle UTF-8 multi-byte character corruption in streaming responses (#179)
Strip response_format parameter for GPT Image models (#176)
Allow auto-discovery for all backends except Anthropic (#172)
Always return b64_json field for Gemini image generation responses (#181)

v0.20.0 - 2025-12-18¶

Added¶

Image variations support for Gemini (nano-banana) models (#165)
Image edit support for Gemini (nano-banana) models (#164)
Enhanced /v1/images/generations with streaming and GPT Image features (#161)
gpt-image-1.5 model support (#159)
/v1/images/variations endpoint (#155)
/v1/images/edits endpoint for image editing and inpainting (#156)
External Markdown file support for system prompts with REST API management (#146)
Automatic model discovery for backends without explicit model list (#142)
Solar Open 100B model

Security¶

API key redaction to prevent credential exposure in logs and error messages (#150)

Changed¶

Optimized release binary size from 20MB to 6MB (70% reduction) (#144)

Refactored¶

Split large files to keep each under 500 lines (#147, #148)

v0.19.0 - 2025-12-13¶

Added¶

Runtime Configuration Management API (#139)
Configuration query, modification, save/restore, and backend management APIs
Sensitive information masking, JSON Schema generation, configuration history with rollback (up to 50 entries)
Comprehensive Admin REST API reference documentation
33 integration tests for configuration API endpoints

Security¶

Input validation with 1MB content limit and 32-level nesting depth
Audit logging for sensitive data exports with 30+ sensitive field patterns

v0.18.0 - 2025-12-13¶

Added¶

Per-API-key rate limiting (#137)
API key management and configuration system
Files API authentication and authorization (#131)
Hot reload for runtime configuration updates (#130)

Fixed¶

Add ConnectInfo extension for admin/metrics/files endpoints
Address security vulnerabilities in API key management

Refactored¶

Extract CLI and app utilities into modular structure (#132)
Split converter.rs into modular structure (#132)
Split large source files into modular components

v0.17.0 - 2025-12-12¶

Added¶

Anthropic backend file content transformation (#126)
Gemini backend file content transformation (#127)

Fixed¶

Streaming file uploads to prevent memory exhaustion (#128)

v0.16.0 - 2025-12-12¶

Added¶

OpenAI-compatible Files API endpoints (#111)
File resolution middleware for chat completions (#120)
OpenAI backend file handling strategy (#121, #122)
Persistent metadata storage for Files API (#125)
GPT-5.2 model support (#124)
Circuit breaker pattern for automatic backend failover
Admin endpoint authentication and audit logging
Configurable fallback models for unavailable model scenarios with cross-provider support

Fixed¶

Sanitize fallback error headers and metric labels
Use index-based lookup for fallback chain traversal
Reduce lock contention in FallbackService with snapshot pattern

v0.15.0 - 2025-12-05¶

Added¶

Nano Banana (Gemini Image Generation) API support (#102)
Split /v1/models endpoint — standard lightweight vs extended metadata response (#101)

Changed¶

Optimize LRU cache to use read lock for cache lookups (#105)

Fixed¶

Replace .expect() panics with proper error propagation in HttpClientFactory (#104)

Refactored¶

Extract streaming handler logic to dedicated StreamService (#106)
Eliminate retry logic code duplication in proxy.rs (#103)

v0.14.2 - 2025-12-05¶

Added¶

Log token usage (input/output tokens) on request completion (#92)

v0.14.1 - 2025-12-05¶

Fixed¶

Optimize Anthropic backend TTFT with connection pooling and HTTP/2 (#90)
Optimize Gemini backend TTFT with connection pooling and HTTP/2 (#88)
Apply base name fallback matching to aliases in model metadata lookup (#84)

v0.14.0 - 2025-12-04¶

Added¶

Router-wide global system prompt injection (#82)

CI¶

Replace deprecated actions-rs/toolchain with dtolnay/rust-toolchain
Add RUSTFLAGS for macOS ARM64 ring build
Switch to rustls-tls for musl cross-compilation support

v0.13.0 - 2025-12-04¶

Added¶

OpenAI /v1/responses API support with session management (#49)
True SSE streaming for /v1/responses API
Background cleanup task for expired sessions
Override /v1/models response fields via model-metadata.yaml (#75)

Security¶

SecretString for API key storage across all backends (#76)
Session access control and input validation for Responses API

Changed¶

Immediate mode for SseParser for reduced first-response latency

Refactored¶

String allocation optimizations and error handling standardization

v0.12.0 - 2025-12-04¶

Fixed¶

Handle exact hash matches in consistent hash binary search (#72)
Replace panics with Option returns and implement stats aggregation (#71)
Remove hardcoded auth requirement from /v1/models endpoint

Refactored¶

Reorganize OpenAI model metadata by family (#74)
Extract AnthropicStreamTransformer to dedicated module (#73)
Split backends mod.rs into separate modules (#69)
Extract embedded tests to separate files (#68)
Create HttpClientFactory for centralized HTTP client creation (#67)
Create UrlValidator module with SSRF prevention (#66)
Extract RequestExecutor to shared common module (#65)
Extract HeaderBuilder with auth strategies (#64)
Extract AtomicStatistics to shared common module

v0.11.0 - 2025-12-03¶

Added¶

Native Anthropic Claude API backend with extended thinking support
OpenAI to Claude reasoning parameter conversion
Flat reasoning_effort parameter for Anthropic
Claude 4, 4.1, 4.5 model metadata

Fixed¶

Improve health check and model fetching for Anthropic/Gemini backends
Accept-Encoding fixes for streaming — use identity header and disable compression

v0.10.0 - 2025-12-03¶

Added¶

Native Google Gemini API backend support
OpenAI Images API support for image generation
Authenticated health checks for OpenAI and API-key backends
Built-in OpenAI model metadata for /v1/models response
API key authentication for streaming requests
Configurable image generation timeout
Response_format validation for image generation API

Fixed¶

Convert maxtokens to maxcompletion_tokens for newer OpenAI models
Correct URL construction for all API endpoints
Request body size limits to prevent DoS attacks

Security¶

Remove sensitive data from debug logs

Refactored¶

Unify request retry logic with RequestType enum

v0.9.0 - 2025-12-02¶

Added¶

Enhanced rate limiting with token bucket algorithm
Comprehensive Prometheus metrics and monitoring (#10)

Security¶

Prevent IP spoofing via X-Forwarded-For manipulation
Prevent header injection vulnerabilities
Eliminate race condition in token refill
Protect API keys with SHA-256 hashing
Prevent memory exhaustion via unbounded bucket growth
Comprehensive authentication for metrics endpoint
Cardinality limits and label sanitization to prevent metric explosion DoS

Fixed¶

Implement singleton pattern for metrics to prevent memory leaks
Improve error handling to prevent panic conditions
Resolve environment variable race condition in config test
Fix integration test failures in metrics

v0.8.0 - 2025-09-09¶

Added¶

Model ID alias support for metadata sharing (#27)

Fixed¶

Return empty list instead of 503 when all backends are unhealthy (#28)

v0.7.1 - 2025-09-08¶

Fixed¶

Improve config path validation for home directory and executable paths (#26)

v0.7.0 - 2025-09-07¶

Added¶

Rich metadata support for /v1/models endpoint (#23, #25)
Enhanced configuration management (#9, #22)
Advanced load balancing strategies (Weighted, Least-Latency, Consistent-Hash) with enhanced error handling (#21)

Fixed¶

Use streaming timeout configuration from config.yaml instead of hardcoded 25s limit

v0.6.0 - 2025-09-03¶

Fixed¶

Use timeout configuration from config.yaml instead of hardcoded values (#19)

Documentation¶

Comprehensive timeout configuration and model documentation updates

v0.5.0 - 2025-09-02¶

Added¶

Optional retry configuration with sensible defaults
Comprehensive integration tests and performance optimizations
Complete service layer implementation
Middleware architecture and enhanced backend abstraction

Fixed¶

Handle streaming requests without model field gracefully
Resolve floating-point precision and timing issues in tests
Resolve test failures and deadlocks in object pool and SSE parser
Resolve initial health check race condition

Refactored¶

Split oversized modules into layered architecture
Extract complex types into type aliases for better readability

v0.4.0 - 2025-08-25¶

Added¶

Model-based routing with health monitoring

Fixed¶

Improve health check integration and SSE parsing

v0.3.0 - 2025-08-25¶

Added¶

SSE streaming support for real-time chat completions (#5)
Model aggregation from multiple endpoints (#4)

v0.2.0 - 2025-08-25¶

Added¶

Multiple backends support with round-robin load balancing (#1)

v0.1.0 - 2025-08-24¶

Added¶

Initial release with OpenAI-compatible endpoints and proxy functionality

Changelog¶

Unreleased¶

Added¶

v1.5.6 - 2026-04-29¶

Fixed¶

Changed¶

v1.5.5 - 2026-04-27¶

Added¶

Fixed¶

v1.5.4 - 2026-04-25¶

Changed¶

v1.5.3 - 2026-04-23¶

Added¶

Changed¶

Fixed¶

v1.5.2 - 2026-04-21¶

Added¶

Changed¶

CI¶

v1.5.1 - 2026-04-20¶

Added¶

Changed¶

Fixed¶

CI¶

Documentation¶

v1.5.0 - 2026-04-11¶

Added¶

Changed¶

Fixed¶

CI¶

Dependencies¶

Documentation¶

v1.4.5 - 2026-03-27¶

Fixed¶

Changed¶

Documentation¶

v1.4.4 - 2026-03-18¶

Fixed¶

Changed¶

v1.4.3 - 2026-03-18¶

Fixed¶

v1.4.2 - 2026-03-17¶

Changed¶

Documentation¶

v1.4.1 - 2026-03-17¶

Added¶

Changed¶

Fixed¶

CI¶

v1.4.0 - 2026-03-14¶

Added¶

Changed¶

Fixed¶

CI¶

Documentation¶

v1.3.0 - 2026-03-12¶

Added¶

Fixed¶

CI¶

Documentation¶

v1.2.1 - 2026-03-07¶

Added¶

v1.2.0 - 2026-03-06¶

Added¶

Documentation¶

v1.1.1 - 2026-03-04¶

Added¶

Fixed¶

v1.1.0 - 2026-03-01¶

Added¶

Fixed¶

v1.0.0 - 2026-02-19¶

Added¶

Changed¶

CI¶

v0.36.1 - 2026-01-30¶

Fixed¶

v0.36.0 - 2026-01-27¶

Added¶

Fixed¶

Unreleased ¶