Changelog¶
All notable changes to Continuum Router are documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Unreleased¶
Added¶
- ChatGPT subscription / Codex backend authentication via OAuth device flow (#551, #592)
continuum-router auth login --backend <name>runs the OpenAI Codex three-step headless device-code flow:POST /api/accounts/deviceauth/usercodeto mint a one-timeuser_code,POST /api/accounts/deviceauth/tokenpolling, and a PKCE exchange at/oauth/token. Standards-compliant RFC 8628 device flow remains available for any future provider that implements it; the newOpenAICodexDeviceFlowClientis selected automatically forprovider: openai.- Tokens are wrapped in
SecretString, written to the configuredtoken_storewith mode0600on Unix using anO_CREAT|O_EXCLopen + atomic rename; a random tempfile suffix prevents concurrent saves from colliding, and a partial write is unlinked on failure so secret material does not linger on disk. - Access-token expiry is parsed from the JWT
expclaim (with a 1-hour fallback for non-JWT tokens) and clamped to a useful minimum so a degenerateexpires_infrom the provider cannot trigger a refresh storm. - Proactive refresh fires 60 s before expiry, single-flighted with a
tokio::sync::Mutex. A401from the upstream backend triggers exactly one forced refresh and a single retry; the previous refresh token is preserved race-free when the provider omitsrefresh_tokenfrom a refresh response. - The strategy reports an
identity_fingerprint()(backend name,client_id,token_store) so that hot-reload rebuilds the strategy when any of those rotate, instead of silently keeping the prior in-memory state. - The CLI strips C0/C1 control characters from
verification_uri_completeanduser_codebefore printing, so a hostile provider response cannot inject ANSI escapes that rewrite the terminal. - Every device-flow and runtime request to
auth.openai.com/chatgpt.com/backend-api/codexcarriesoriginator: codex_cli_rs(configurable viaauth.oauth.originator) and acodex_cli_rs/<version>User-Agent(configurable viaauth.oauth.user_agent), matching the official Codex CLI so Cloudflare admits the traffic instead of returning a 403 JS challenge. auth.type: oauthis accepted in YAML alongside the legacyo_authsnake_case rendering.client_idandscopedefault to the public Codex CLI values; onlytoken_storeis required for the ChatGPT-subscription case.- Anthropic Messages and Chat Completions surfaces both transparently route to the ChatGPT Codex backend (#592)
- Any backend whose
auth.typeisoauthand whose provider uses the Codex flow (currentlyopenai) is forced through the Responses API for every request, regardless of per-modelresponses_onlymetadata.chatgpt.com/backend-api/codexexposes/responsesonly — no/chat/completions— so chat-shaped models (e.g.gpt-5.5, alias-mappedclaude-haiku-4-5) and unknown model IDs all dispatch through/v1/responses→…/backend-api/codex/responses. Non-OAuth OpenAI backends continue to honor the per-modelresponses_onlyflag. - New
core::url_utils::compose_backend_urlcentralizes backend URL composition for the three OpenAI-compatible roots (/v1,/openai,/backend-api/codex). Replaces ad-hocends_with("/v1") || ends_with("/openai")checks acrossproxy/backend.rs,http/handlers/responses.rs,http/streaming/handler.rs,services/responses/stream_service.rs, and the Anthropic handler so the/backend-api/codexrule applies uniformly. - The proxy hot path (
proxy/backend.rs,proxy/responses_only.rs,proxy/image_gen.rs,proxy/image_edit.rs) now flows through a backend-name-keyedAuthStrategyRegistryexposed onAppStateviasrc/proxy/oauth_helper.rs. The helper looks up the strategy, callsrefresh_if_needed()before sending, replaces the static-bearer header with one derived from the strategy, and force-refreshes + retries once on a 401. Staticapi_keyauth continues to work unchanged when no strategy is registered. - The Anthropic-compatible handler (
src/http/handlers/anthropic/handler.rs) consults the same registry. Client-suppliedAuthorization: sk-ant-…andx-api-keyheaders are dropped when the backend has an OAuth strategy, instead of being forwarded to OpenAI as the bearer. - Model fetcher detects OAuth-authed backends and falls back to the configured
modelslist rather than probing/v1/models, sincechatgpt.com/backend-api/codexdoes not expose a models endpoint. - Codex-compatible Responses API extensions (#536, #537)
POST /v1/responses/compactendpoint for context compaction — passthrough to OpenAI / Azure OpenAI native/v1/responses/compact; other backend types return501.storefield onResponsesRequest(defaults totrue) controls upstream session persistence; Codex sendsstore: falsefor ephemeral requests.output_textcontent part type alongsideinput_textso converters can differentiate assistant vs. user content in input items. All converters (OpenAI, Anthropic, Gemini) handle the new variant.
v1.5.6 - 2026-04-29¶
Fixed¶
/v1/chat/completionsreturned HTTP 502responses_parse_failedforresponses_onlyreasoning models (gpt-5.4-pro, gpt-5.5-pro). OpenAI's/v1/responsespayload for these models contains output items shaped like{ "id": "rs_...", "type": "reasoning", "summary": [] }, butOutputItem::Reasoningrequiredcontentandstatus, so serde rejected the payload withmissing field 'content'. The Anthropic Messages surface bypassed the strict variant on a different conversion path, masking the bug until directly tested.contentandstatusare now optional onOutputItem::Reasoning; reasoning items are dropped before reaching Chat Completions clients (per existing project policy), so body shape is irrelevant beyond successful deserialization. (#594)
Changed¶
- Realign
gemini-3.1-pro-previewas the canonical metadata id for the Gemini 3.1 Pro family inmodel-metadata.yaml, withgemini-3.1-pro(and existing-latest/-customtoolsforms) demoted to aliases. Matches whatgenerativelanguage.googleapis.comactually serves today — the canonicalgemini-3.1-proform returns 404 from upstream — and avoids implying GA availability that does not exist yet. The metadata cache still resolves both forms to the same entry. Note: alias-to-canonical rewriting on the upstream-bound payload is out of scope for this release; clients calling with thegemini-3.1-proalias will still hit upstream 404 until that work lands. (#594) - Sample
config.yamlregisters the newly-available pro / 5.5 family models so theresponses_onlydispatch path can be exercised end-to-end against real upstreams (gpt-5.4-pro,gpt-5.2-pro,gpt-5.5,gpt-5.5-pro,claude-opus-4-7,gemini-3.1-pro,gemini-3.1-pro-preview); duplicateclaude-haiku-4-5entry removed.
v1.5.5 - 2026-04-27¶
Added¶
- Transparent Responses-API routing for OpenAI Pro models (epic #581)
- New
responses_only: truecapability flag inmodel-metadata.yamland the built-in OpenAI registry marksgpt-5.2-pro,gpt-5.4-pro, andgpt-5.5-proas served only on/v1/responsesupstream (#574, #582) /v1/chat/completionsrequests forresponses_onlymodels are dispatched to the upstream/v1/responsesendpoint and translated back into a strict-modechat.completion(orchat.completion.chunkfor streaming) envelope, transparent to the client. Streamusageis gated bystream_options.include_usage, and per-model latency / success counters are recorded for the responses_only path (#578, #584)/anthropic/v1/messagesrequests forresponses_onlymodels are converted to the Responses API shape, dispatched to/v1/responses, and translated back into Anthropic Messages JSON (or the Anthropic SSE event sequence for streaming) — tool-call round-trips, web-search emulation, and Unix-socket transports all branch on the flag (#575, #577, #583, #585, #586)- Anthropic Messages <-> Responses request transformer covers system → instructions, tools, toolchoice (including
disable_parallel_tool_use→parallel_tool_calls: false),max_tokens→max_output_tokens, reasoning effort derivation, and multi-turn tool round-trips; the response transformer preserves thinking/text/tooluse ordering and stop-reason fidelity (#575, #583) - SSE streaming bridge (
AnthropicResponsesStreamTranslator) maps Responses API events to Anthropic Messages events while preserving Anthropic's strict event-ordering invariants (singlemessage_start, pairedcontent_block_start/content_block_stop, terminalmessage_stop); handles mid-streamerror/response.failed/response.cancelled,response.incomplete→stop_reason: max_tokens, deferred input tokens, and graceful early-close synthesis (#576, #585) - Only OpenAI and Azure OpenAI backends serve
/v1/responses; pairing aresponses_onlymodel with another backend type produces a400 invalid_request_errorbefore any upstream call (rejection fires on both/v1/chat/completionsand/anthropic/v1/messagessurfaces) (#577, #589) - The first dispatch per
(backend, model)pair logs atinfolevel so operators can confirm Responses-API routing without enabling debug logs - Anthropic Messages → Responses requests explicitly send
store: falseto avoid upstream side-effects (#589) - 22 deterministic, in-process integration tests covering the {Anthropic, Chat} × {gpt-5.4-pro, gpt-5.2-pro} × {non-streaming, streaming} × {plain, tool-call, reasoning} matrix, mid-stream backend-failure negatives on both surfaces, and an upstream byte-fragmentation regression guard (#579, #588)
- Documented in
docs/en/configuration/advanced.md(Responses-API-only Models section split into Models-marked-out-of-the-box, Marking-a-new-model, Dispatch-behavior, and Backend-type-constraint subsections),docs/en/architecture.md(Responses-API Routing data-flow diagram), and thedocs/en/api.mdChat Completions and Anthropic Messages surface notes with a Transparent-Responses-API-routing subsection (#580, #587)
Fixed¶
- Chat Completions responses-only routing now rejects incompatible-only backend configs before upstream dispatch and chooses a compatible OpenAI/Azure Responses backend when available (#589)
- Chat assistant
tool_calls[]are preserved as Responsesfunction_callinput items for stateless tool-result turns over/v1/chat/completions(#589)
v1.5.4 - 2026-04-25¶
Changed¶
- Refresh
model-metadata.yamlfor late-April 2026 frontier model releases (#572, #573) - Add GPT-5.5 ($5/$30 per 1M, 1M context, knowledge cutoff 2025-12, omnimodal, leads Terminal-Bench 2.0 at 82.7%) and GPT-5.5 Pro ($30/$180 per 1M, Responses API only, deep reasoning) — released 2026-04-23
- Add DeepSeek V4 Pro (1.6T total / 49B active MoE, 1M context, 384K max output, three reasoning effort modes) and DeepSeek V4 Flash (284B total / 13B active MoE, 1M context, 384K max output) with
deepseek-chatanddeepseek-reasonerretained as deprecated aliases per official API docs — released 2026-04-24 - Add
gpt-image-2(token-billed instead of per-image: text $5/$30, image $8/$30 per 1M tokens; 1K/2K/4K resolution tiers; ~99% text accuracy in any language; built-in reasoning before generation; context-aware multi-turn editing;gpt-image-2-latestalias) — released 2026-04-21 - Add Claude Opus 4.7 ($5/$25 per 1M, 1M context, 128K max output, knowledge cutoff 2026-01, high-resolution image support up to 2576px / 3.75MP, new tokenizer with ~1.0–1.35× token usage vs prior models, new
xhigheffort level) — released 2026-04-16 - Promote Gemini 3.1 series from preview to GA, retaining
-previewsuffix as alias for fallback compatibility (#573) gemini-3.1-pro-preview→gemini-3.1-pro(withgemini-3.1-pro-preview,gemini-3.1-pro-preview-customtools, andgemini-3.1-pro-latestaliases)gemini-3.1-flash-image-preview→gemini-3.1-flash-image(withgemini-3.1-flash-image-preview,nano-banana-2, andgemini-3.1-flash-image-latestaliases)gemini-3.1-flash-lite-preview→gemini-3.1-flash-lite(withgemini-3.1-flash-lite-previewandgemini-3.1-flash-lite-latestaliases)- Updated
gemini-3-flash-previewdeprecation note to point to the new GAgemini-3.1-proid
v1.5.3 - 2026-04-23¶
Added¶
- HuggingFace repo-prefix stripping as a new matching phase (phase 5) in
src/models/pattern_matching.rs(#555) try_strip_hf_repo_prefix()validates avendor/repo(ororg/team/repo) prefix against aMAX_PREFIX_SEGMENTS = 3bound, rejects empty segments (/repo,vendor/,vendor//repo), and rejects any ASCII whitespace before returning the residual- Phase 5 re-enters phases 1-4 on the stripped residual with a structurally-enforced recursion depth of exactly 1 (the re-entry call clears the
allow_prefix_stripgate), so prefix stripping composes with the existing layered suffix peel in a single lookup — the motivating caseunsloth/Qwen3.6-35B-A3B-GGUFnow resolves toqwen3.6-35b-a3bwithout any hand-registered alias - Phase 5 runs before the wildcard phase; the blast-radius audit confirmed no
*-bearing alias inmodel-metadata.yamlcontains/, so the ordering change is behavior-neutral for existing routing - Phase numbering in tracing output realigned to match the documented phase chain (previous code emitted
phase = 7for the namespace fallback while comments called it phase 6) - 12 new unit tests covering standard HF form, composition with suffix peel, case-sensitive vendor, registered-alias precedence, unresolvable residual, three-segment form, segment-cap rejection, no-slash input, whitespace rejection, empty segments, re-entry bounding, and alias-phase precedence
- 9 new integration tests in
tests/format_suffix_normalization_test.rsexercising the fullRouterConfig/BackendConfigpublic API through phase 5 - Pipeline doc updated in
docs/en/configuration/advanced.md(and Korean counterpart) with a new "HuggingFace repo-prefix stripping (phase 5)" section covering the composition semantics, security bounds, and out-of-scope list (hyphen prefixes, HF API discovery)
Changed¶
- Replaced the previous phase-6 namespace fallback with the new phase-5 HuggingFace prefix-strip layer. The previous phase was case-sensitive and did not compose with suffix peel; the new phase applies stricter input validation (segment cap, empty-segment rejection, whitespace rejection) but composes with phase 4's case-insensitive peel through the bounded re-entry. Pathological inputs above
MAX_PREFIX_SEGMENTS(3) — such asprovider/deep/nested/model— are now rejected by phase 5 rather than silently matched via recursiversplit_oncefallback (#555) - Aliases currently classified as
vendor-prefixin the #560 audit (e.g.,Qwen/Qwen3.6-35B-A3B,MiniMaxAI/MiniMax-M2.5) are now peel-coverable-adjacent post-#555: phase 2 still wins on the explicit alias, but phase 5 + phase 4 together reach the same metadata. Retroactive removal is deferred to a follow-up audit per #555 design section 7
Fixed¶
POST /anthropic/v1/messagesnow works when the selected backend is configured with aunix://URL (#567)- Native Anthropic backends and OpenAI-compatible backends both work over Unix sockets, for both non-streaming and streaming requests
- Socket paths containing spaces (e.g. macOS
~/Library/Application Support/...) are handled correctly - Auth header selection (
x-api-keyfor Anthropic backends,Authorization: Bearerfor OpenAI-compatible backends) is correct on the Unix socket path anthropic-versionheader is added automatically for Anthropic backends on the Unix socket path, matching the HTTP path behavior
v1.5.2 - 2026-04-21¶
Added¶
- Regression tests locking down the transport-layer passthrough contract for llama.cpp and MLxcel backends (#562)
- New
tests/llamacpp_passthrough_test.rsandtests/mlxcel_passthrough_test.rscovering all four passthrough call sites: direct backendexecute_chat_completion, factory-built backend (BackendFactory -> LlamaCppBackend),proxy/backend.rsHTTP handler, and the streaming handler - New
test_mlxcel_factory_backend_passthrough_nonstandard_fieldsasserts thatBackendFactory -> LlamaCppBackend::execute_chat_completionpreserves non-standard fields byte-for-byte at transport time - Anthropic input test (
tests/anthropic_input_test.rs) extended with explicit passthrough coverage docs/en/architecture/backend-passthrough.mdand its Korean counterpartdocs/ko/architecture/backend-passthrough.mddocumenting the passthrough contract, the four guarded call sites, and the list of router-side transforms that run before transport (global_prompts,transform_payload_for_openaifor o1/o3/gpt-5*,web_searchinjection) (#562, #563)docs/reports/alias-audit-2026-04.mdclassifying every alias inmodel-metadata.yamlinto peel-redundant, peel-redundant-but-kept, and peel-independent categories, with an "aliases vs peel" policy section added todocs/en/configuration/advanced.md(and the Korean counterpart) explaining when to prefer each mechanism (#560)
Changed¶
- Narrowed the passthrough contract from an implied "byte-equivalent" global guarantee to a transport-layer scope — the router may still run
global_promptsinjection, o1/o3/gpt-5* payload transforms, andweb_searchtool injection before transport, but no provider-specific rewriting happens at the transport boundary (#563) - Comment-only clarifications in
src/http/streaming/handler.rs,src/infrastructure/backends/factory/backend_factory.rs,src/infrastructure/backends/llamacpp/backend.rs, andsrc/proxy/backend.rs - Audited
model-metadata.yamlaliases for peel-normalization redundancy: removed aliases that differ from the canonical ID only by suffixes already handled by the layered peel (-4bit,-q4_k_m,-fp8,-gguf,-mlx,-awq, etc.), while preserving aliases that encode canonical flavor variants (-qat,-instruct) or disambiguate parameter counts (#557) - New
tests/alias_audit_helper.rsandtests/format_suffix_normalization_test.rsenforce the peel-vs-alias boundary going forward
CI¶
- Target Ubuntu 26.04 LTS (Resolute) instead of 25.10 (Questing) in the Debian build workflow
- Fall back to
createdAtwhen releasepublishedAtis null indebian/update-changelog.shto prevent changelog regression when the latest release is still in draft
v1.5.1 - 2026-04-20¶
Added¶
- Built-in
web_searchtool for self-hosted LLM backends (#553) - Router-level tool transparently injected into chat completion requests for vLLM, Ollama, llama.cpp, MLxcel, LM Studio, Continuum Router, and Generic backends
- Pluggable
SearchProvidertrait undersrc/services/search/withSerperProviderimplementation; Exa and Brave scaffolded behind the same trait - Configurable
inject_policy(auto/always/never) with per-backend overrides; commercial backends (OpenAI, Azure, Gemini, Anthropic) left untouched so their nativeweb_searchcontinues to flow through unchanged - Bounded non-streaming tool-execution loop parses
web_searchtool calls, executes the provider, appends tool-role results, and re-invokes the backend up tomax_tool_iterationsrounds - New
BackendTypeConfig::is_self_hosted/is_commercialhelpers covered by unit tests enforcing the commercial/self-hosted partition invariant - API keys redacted in Debug output and never logged; hot-reload friendly
WebSearchConfigwith${ENV}substitution - Prometheus counters for tool calls, injections, and iteration-cap hits under
src/metrics/web_search - Layered quantization and format suffix normalization for model metadata lookup (#549)
- New
layered_format_strip()insrc/models/pattern_matching.rsiteratively peels allowlisted quantization/format/flavor tokens from the right side of a model ID, retrying exact-id/alias/date-suffix matches after each peel - Token categories:
BIT_WIDTH,GGUF_QUANT,FP_FORMAT,INT_FORMAT,LIBRARY,IMATRIX,UNSLOTH,CONTAINER,FLAVOR(all case-insensitive) - Parameter-count suffixes preserved:
-Nbitstripped as quantization;-Nb,-aNb,-eNb,-0.6bkept as parameter counts - Canonical base IDs ending in allowlisted flavors (e.g.
gemma-3-12b-qat) win via exact-id match before peel runs - Normalization pipeline wired into
find_matching_config,BackendConfig::get_model_metadata,RouterConfig::get_model_metadata,RouterConfig::get_thinking_pattern_config,resolve_model_tier(routing), andget_model_profile(admin) - Model metadata for GLM 5.1, Qwen 3.6, and MiniMax M2.7 (#548)
- Teams release notification posted to Microsoft Teams via Power Automate webhook after build and Docker jobs
Changed¶
- Migrate documentation toolchain from MkDocs + Material for MkDocs to Zensical — reads
mkdocs.ymlnatively and bundles required extensions
Fixed¶
- Security: Cap layered peel phase with
MAX_MODEL_ID_LEN=256andMAX_PEEL_ITERATIONS=8to eliminate DoS via pathological model IDs (previously O(n²) allocation on inputs like-4bit-4bit-4bit-...) - Security: Enforce 256-char model field length at
/v1/chat/completions,/v1/completions,/v1/embeddings, and/v1/embeddings/sparse(parity with existing/v1/responsescheck) - Consolidate 7-phase metadata matching pipeline into a single implementation (
find_matching_config_slice) with thin adapters at each call site, eliminating drift betweenBackendConfig,Config::get_model_metadata,Config::get_thinking_pattern_config, andfind_matching_config - Replace
cfg.to_ascii_lowercase() == peelwithstr::eq_ignore_ascii_caseon the hot path (~4000 fewer per-request String allocations) - Pin Pygments <2.20 to fix MkDocs build failure (superseded by Zensical migration)
CI¶
- Bump softprops/action-gh-release from 2 to 3 (#544)
- Bump actions/github-script from 8 to 9 (#545)
- Bump actions/upload-pages-artifact from 4 to 5 (#554)
Documentation¶
- Document suffix-order ambiguity (
-qat-4bitvs-4bit-qat) and internal peel phase bounds indocs/en/configuration/advanced.md - Add
pattern_matching.rsto Model Aggregation Service module listing indocs/en/architecture.mdwith cross-reference to suffix normalization section - New
docs/en/web-search.mdfeature documentation;config.yaml.exampleextended withweb_searchsection
v1.5.0 - 2026-04-11¶
Added¶
- Smart routing system with model tier & capability profile registry (#525, #531)
- Rule-based request classifier & smart routing policy engine (#526, #532)
- Load-aware dynamic tier adjustment (#527, #533)
- LLM-based request classifier with hybrid mode (#528, #534)
- Smart routing observability, admin API & documentation (#529, #535)
- Codex-compatible Responses API extensions (#536, #537)
Changed¶
- Upgrade core dependencies — axum 0.8, sha2 0.11, rand 0.10 (#523)
- Add Gemma 4 model family metadata (#538)
Fixed¶
- Complete smart routing integration gaps
- Increase DefaultTransformer PDF size limit from 20MB to 32MB (#542)
CI¶
- Bump actions/deploy-pages from 4 to 5 (#521)
Dependencies¶
- Bump the minor-and-patch dependency group with 4 updates (#539)
Documentation¶
- Add Codex-compatible Responses API gap analysis report
v1.4.5 - 2026-03-27¶
Fixed¶
- Return 400 error when file references are used without file service configured (#519)
Changed¶
- Add GLM-5-Turbo model metadata (#516)
Documentation¶
- Fix Korean anti-AI-slop violations in ko/ documentation
- Fix slop word and transition word in api.md
v1.4.4 - 2026-03-18¶
Fixed¶
- Fix Anthropic thinking failing for high/xhigh reasoning effort —
budget_tokens(32768) exceeded defaultmax_tokens(16384), causing API rejection (#514) - Auto-adjust
max_tokenstobudget_tokens + 4096when thinking is enabled and budget exceeds max
Changed¶
- Add GPT-5.4 model family:
gpt-5.4,gpt-5.4-pro,gpt-5.4-mini,gpt-5.4-nanowith 1M context window (#515) - Update Gemini 3 series: add
gemini-3.1-pro-preview,gemini-3-flash-preview,gemini-3.1-flash-lite-preview; markgemini-3-pro-previewas deprecated - Recognize Gemini 3 Flash and 3.1 Flash-Lite as thinking models for
include_thoughtsauto-injection - Update Claude 4.6 models: context window to 1M (GA), fix Sonnet 4.6 max_output to 64K, correct knowledge cutoffs
- Update config examples and documentation with latest model names across 8 files
v1.4.3 - 2026-03-18¶
Fixed¶
- Fix Gemini thinking models (2.5 Pro, 3 Pro, etc.) not returning
reasoning_contentin streaming responses through the router (#513) - Replaced
transform_payload_for_gemini()withtransform_request_gemini()across all three Gemini streaming paths to ensureinclude_thoughts: trueauto-injection
v1.4.2 - 2026-03-17¶
Changed¶
- Change mid-stream fallback default to enabled for improved streaming reliability (#504)
- Breaking: Mid-stream fallback is now enabled by default; set
mid_stream_fallback.enabled: falseto restore previous behavior
Documentation¶
- Add failover latency tuning guide for optimizing fallback behavior
v1.4.1 - 2026-03-17¶
Added¶
- Mid-stream fallback for streaming inference (#497) — when a backend fails mid-stream during SSE streaming, the router transparently retries with a fallback backend
Changed¶
- Decouple pre-stream fallback from mid-stream fallback (#500) — each can now be independently enabled/disabled
- Bump dependency versions to latest major releases
Fixed¶
- Fix streaming config changes not detected in hot reload system (#503)
- Fix mid-stream connection errors leaking to client during fallback (#502)
- Remove unused config crate dependency
CI¶
v1.4.0 - 2026-03-14¶
Added¶
- Prefix-aware routing: PrefixAwareHash selection strategy with Consistent Hash with Bounded Loads (CHWBL) (#455, #457, #461)
- Response caching: SHA256-based cache key computation with streaming response buffering and post-completion caching (#456, #459, #462)
- Multi-tier CacheStore: in-memory backend (#466), Redis/Valkey backend with connection pooling (#467), and S3-backed tiered L1/L2 cache (#483)
- KV cache index: shared data structure (#470), KV event consumer for vLLM backend streams (#471), prefix overlap scoring integrated into backend selection (#473), configuration/metrics/admin endpoints (#474)
- Tiered KV cache with storage-tier awareness (GPU hot / external warm) (#484)
- Disaggregated prefill/decode orchestration with external KV tensor transfer (#485)
- Anthropic cache_control breakpoint auto-injection (#460)
- Multimodal embedding support for Gemini Embedding 2 (#492)
- Shared cache configuration and operational metrics (#468)
- 30 new models added to model-metadata.yaml (#472)
Changed¶
- Rename VAST-specific identifiers to generic S3/external storage names (#490) — update configuration files if using VAST-specific field names
Fixed¶
- Make RequestExecutor transport-aware for Unix socket paths with spaces (#488)
- Replace relative source tree links with GitHub URLs in docs
CI¶
- Bump docker/setup-qemu-action from 3 to 4 (#428)
- Bump docker/metadata-action from 5 to 6 (#426)
- Bump docker/setup-buildx-action from 3 to 4 (#429)
- Bump docker/build-push-action from 6 to 7 (#430)
- Bump docker/login-action from 3 to 4 (#427)
Documentation¶
- Comprehensive KV cache feature documentation, benchmarks, and config examples (#477)
- VAST Data connection guide and integration examples (#486)
- Sync Korean documentation with English counterparts
- Split monolithic configuration.md into 6 smaller files
v1.3.0 - 2026-03-12¶
Added¶
- Agent Communication Protocol (ACP) support with JSON-RPC 2.0 protocol layer and stdio transport (#414, #420)
- ACP session management with protocol lifecycle, initialize/shutdown handshake (#415, #421)
- ACP-to-LLM inference pipeline with streaming support (#416, #422)
- ACP tool call reporting and permission delegation (#417, #423)
- MCP-over-ACP bridge for MCP server tunneling (#418, #424)
- ACP agent registry with metadata and configuration support (#419, #425)
- ACP integration tests for protocol lifecycle and session management
Fixed¶
- Resolve clippy
field_reassign_with_defaultwarnings in ACP integration tests
CI¶
- Bump actions/upload-artifact from 6 to 7 (#398)
Documentation¶
- ACP architecture documentation with MkDocs integration
- ACP practical usage guide with IDE integration examples
- KV cache integration plan for router-level caching strategies
v1.2.1 - 2026-03-07¶
Added¶
- MLxcel backend type support for MLX-based model serving (#412, #413) — fully API-compatible with llama-server, reusing the same backend implementation for health checks, model discovery, and proxying
v1.2.0 - 2026-03-06¶
Added¶
- Admin Statistics API with comprehensive request-level statistics collection and reporting (#409)
- Endpoints:
GET /admin/stats,GET /admin/stats/models,GET /admin/stats/backends,POST /admin/stats/reset - Time-windowed queries, token usage tracking, latency percentiles (p50, p95, p99)
- Statistics persistence with configurable snapshot path, interval, and staleness checks (#410, #411)
- Atomic writes, restore on startup, final snapshot on graceful shutdown
Documentation¶
- Add admin stats and persistence to configuration guide
- Add post-refactoring benchmark report for v1.1.0 (#407)
v1.1.1 - 2026-03-04¶
Added¶
- Embeddable library crate (Phase 1) — use continuum-router as a Rust dependency (#394)
- Type-safe config builders for programmatic library usage (#400)
- Cargo feature flags for optional library dependencies (#399)
- Persistent storage for runtime API keys (#405)
- New LLM model metadata entries (#403)
Fixed¶
- Fix Gemini-specific transforms incorrectly applied in Anthropic handler (#404)
v1.1.0 - 2026-03-01¶
Added¶
- Embedded WebUI for configuration management and API key administration (#388)
- Windows AF_UNIX socket support via socket2 crate (#390)
- Nano Banana 2 (Gemini Image Generation) support
Fixed¶
- Resolve compilation error in
ClientAddr::is_unixfor tuple variant matching - Resolve Windows AF_UNIX socket accept failure and config validation
- Accept Windows absolute paths in Unix socket config validation (#393)
- Resolve Windows compilation errors in Unix socket tests and transport parsing (#392)
v1.0.0 - 2026-02-19¶
Added¶
- Continuum Router federation — router-to-router chaining as a new backend type (#385)
- LM Studio as a dedicated backend type (#381)
- Anthropic adaptive thinking effort parameter (
output_config.effort) (#384) - Adaptive thinking and auto reasoning effort level across backends (#378)
- Cohere/Jina-compatible rerank and sparse embedding endpoints (#374)
- BGE-M3 and multilingual embedding model support (#373)
- Claude Opus 4.6 model metadata
- Qwen3-Coder-Next, Qwen3-VL-30B/8B model metadata
Changed¶
- Handle SIGTERM for graceful shutdown on Unix systems (#370)
- Reduce per-backend filter and model metadata log verbosity during model refresh (#371, #375)
CI¶
- Replace Ubuntu 24.10 with 25.10 in deb build matrix (#376)
v0.36.1 - 2026-01-30¶
Fixed¶
- Trigger immediate health check after sync_backends during hot reload (#368) — new backends now available within 1-2 seconds instead of up to 30 seconds
- Sync healthcheckinfo and use URL-based updates during hot reload (#369) — new backends properly receive API key authentication
- Accelerate health checks for recently added backends — 1-second check interval for 5 minutes after addition
- Trigger model cache refresh when backends transition to healthy state with 5-second debounce
v0.36.0 - 2026-01-27¶
Added¶
- Native Anthropic Messages API handler with endpoint routing (#355)
- Anthropic to OpenAI request/response transformation (#356, #357)
- Anthropic streaming response format (#358)
- Direct Anthropic to Gemini request/response transformation (#359)
- File_id source type and file resolution for Anthropic input (#360)
- Claude Code compatibility for Anthropic handler (#365)
- Tiered token counting for all backend types
- Parallel file reference resolution for improved performance
- Anthropic-version header format validation
Fixed¶
- Require HTTPS for image and document URLs to prevent SSRF
- Return generic error messages to clients instead of backend details
- Use authenticated user_id from API key for file ownership checks
- Use UUID v4 for secure message/tool ID generation
- Place tool messages before user text in Anthropic-to-OpenAI conversion
- Override stopreason to tooluse when tool_use blocks are present
- Apply maxcompletiontokens conversion for OpenAI-routed Anthropic requests
- Propagate file access denied and not found errors to client
- Call current_config() once per request for consistent behavior
Refactored¶
- Extract common SSE event type and data extraction logic
- Add parse_bytes method to SseParser for proper UTF-8 handling
- Remove unnecessary Arc wrapper in AnthropicFileResolver
- Box FileResolutionResult::Resolved to reduce enum size
v0.35.0 - 2026-01-23¶
Added¶
- Gemini 3 thoughtSignature support in function calling (#354)
- PDF support for OpenAI and Anthropic file transformers (#340)
- Text/plain support for AnthropicFileTransformer (#342)
Fixed¶
- Add PDF support to DefaultTransformer and file resolution (#343)
- Add tool message transformation to non-streaming Anthropic requests (#344)
- Reject non-image files in DefaultTransformer with clear error message (#338)
- Fix AI SDK incompatibility with Responses API streaming format (#335)
v0.34.0 - 2026-01-16¶
Added¶
- Automatic quality parameter conversion between DALL-E and GPT Image models (#330)
Changed¶
- Native Anthropic conversion for Responses API PDF file uploads (#332)
Fixed¶
- Gemini streaming toolcalls compatibility fixes (#333) — missing index field, toolchoice format preservation, unnecessary transformation removal
v0.33.0 - 2026-01-13¶
Added¶
/v1/embeddingsendpoint for embedding API support (#319)- Resolve local file_id references in Responses API requests (#326)
user_dataandevalspurpose values for Files API (#322)
Fixed¶
- Use flat tool format for Responses API function tools (#324)
- Improve Unix socket test stability for parallel execution (#328)
v0.32.0 - 2026-01-09¶
Added¶
- Reasoning effort documentation and improved xhigh fallback logging (#317)
Fixed¶
- Support implicit message type inference in Responses API InputItem (#316)
Refactored¶
- Optimize InputItem deserializer and add invalid role test
v0.31.5 - 2026-01-09¶
Added¶
- Responses API pass-through support for native OpenAI backends (#313) — smart routing based on backend type with direct forwarding to
/v1/responsesendpoint - OpenAI Responses API file input types (#311) — support for
input_text,input_file,input_imagecontent parts with SSRF validation
Fixed¶
- Forward raw backend error responses in pass-through mode
- Address security and performance issues in Responses API pass-through
v0.31.4 - 2026-01-07¶
Fixed¶
- Use current_config() for hot reload support in proxy handlers (#310) — API key and configuration changes via hot reload now properly apply to new requests
v0.31.3 - 2026-01-06¶
Fixed¶
- Add Anthropic transformations to Unix socket transport (#308) — Unix socket transport now applies the same request/response transformations as HTTP transport
- Preserve stream parameter for non-streaming Anthropic requests (#306)
v0.31.2 - 2026-01-05¶
Added¶
- Non-streaming support for Anthropic backend requests
- Tool call and tool result transformation for Anthropic backend — enables multi-turn tool use conversations
v0.31.1 - 2026-01-04¶
Fixed¶
- Non-streaming Anthropic requests failing with wrong authentication header (#301) — now correctly uses
x-api-keyheader instead ofAuthorization: Bearer
v0.31.0 - 2026-01-04¶
Added¶
- Unix socket server binding alongside TCP (#298) — supports
unix:URI scheme, socket_mode configuration, auto-cleanup - Reasoning parameter support for Responses API (#296) with nested format and low/medium/high/xhigh effort levels
- xhigh reasoning effort support for GPT-5.2 thinking models with auto-downgrade for unsupported models
- Configurable health check endpoints per backend type (#293) — custom endpoint, fallback endpoints, method, body, accept_status, and headers
Changed¶
- Comprehensive reasoning parameter normalization across backends (#294)
v0.30.0 - 2026-01-01¶
Added¶
- Wildcard patterns and date suffix handling in model aliases (#286) — automatic date suffix normalization,
*pattern matching (prefix, suffix, infix), zero-config date handling
Fixed¶
- Apply default URL for Anthropic backend when not specified (#288)
- Replace owned_by placeholders with backend-type-specific values (#287)
Documentation¶
- Translate wildcard pattern and date suffix handling documentation to Korean (#289)
v0.29.0 - 2026-01-01¶
Added¶
- Accelerated health checks during backend warmup (#282) — 1s interval on HTTP 503, configurable via
warmup_check_intervalandmax_warmup_duration --model-metadataCLI option for specifying model metadata file path at runtime (#281)
Fixed¶
- Replace OpenAI owned_by placeholder with 'openai' (#280)
- Prevent race condition in Admin API concurrent backend creation (#278)
- Fix missing processing steps in hot reload (#277)
- Cloud backends now show
available: truein/v1/models/{model_id}(#272)
v0.28.0 - 2025-12-31¶
Added¶
- SSE streaming support for tool calls (#258)
- llama.cpp tool calling auto-detection via
/propsendpoint (#263) - Extended
/v1/models/{model_id}endpoint with rich metadata fields (#262) - Tool result message transformation for multi-turn conversations (#265)
- Backend-specific owned_by placeholders for llamacpp, vllm, ollama, http (#267)
Changed¶
- Improved
--helpoutput formatting with title header and project attribution (#269)
Fixed¶
- Sync model metadata cache with ConfigManager (#270)
v0.27.0 - 2025-12-29¶
Added¶
- Complete Unix socket support for model discovery and SSE streaming (#248, #252, #253, #254, #256)
- SSE/streaming for Unix socket backends
- Backend type auto-detection for Unix sockets
- vLLM and llama.cpp model discovery via Unix sockets
- Tool call transformation across all backends (#244, #245, #246) — tool definitions, tool_choice, and tool call responses for Anthropic, Gemini, and llama.cpp
v0.26.0 - 2025-12-27¶
Added¶
- GET
/v1/models/{model}endpoint for single model retrieval with real-time availability status (#236)
v0.25.0 - 2025-12-26¶
Added¶
- CORS (Cross-Origin Resource Sharing) support (#234) — configurable origins, wildcard patterns, custom schemes (e.g.,
tauri://localhost), preflight cache - Unix Domain Socket backend support (#232) —
unix:///path/to/socketscheme, lower latency than localhost TCP
v0.24.0 - 2025-12-26¶
Added¶
- llama.cpp backend support for local LLM inference (#230)
- Allow router to start without any backends configured (#226)
Changed¶
- Enable hot reload for backend additions/removals from config (#229)
v0.23.1 - 2025-12-25¶
CI¶
- Add Windows x86_64 build target to release workflow (#224)
v0.23.0 - 2025-12-23¶
Added¶
- GLM 4.7 model support with thinking capabilities (#222)
- GCP Service Account authentication support for Gemini (#208)
- Distributed tracing with correlation ID propagation (#207) — W3C Trace Context with traceparent header
- Thinking pattern metadata for models with implicit start tags (#218)
- Model metadata for NVIDIA Nemotron 3 Nano, Qwen Image Layered, and Kakao Kanana-2 (#202)
- ASCII diagram to image replacement system for MkDocs (#200)
Fixed¶
- Prevent cache stampede with singleflight, stale-while-revalidate, and background refresh (#220)
- Apply global_prompts changes via hot reload (#219)
- Invalidate model cache when backend config changes (#206)
CI¶
- Skip Rust tests in CI when only non-code files change (#204)
- Bump actions/github-script from 7 to 8 (#210)
- Bump apple-actions/import-codesign-certs from 3 to 6 (#212)
- Bump actions/cache from 4 to 5 (#211)
- Bump actions/checkout from 4 to 6 (#209)
v0.22.0 - 2025-12-19¶
Added¶
- Docker support with pre-built binary images — Debian (~50MB) and Alpine (~10MB) with multi-arch support (#198)
- Container health check CLI (
--health-check) for orchestration (#198) - Docker Compose quick start configuration
- Automated Docker image publishing to ghcr.io in release workflow
- MkDocs documentation website with Material theme (#183)
- Korean documentation translation (i18n) — complete localization of all 20 documentation files (#190)
- Security policy with vulnerability reporting process (#191)
- Dependency security auditing with cargo-deny and Dependabot (#192)
Changed¶
- Integrate orphaned architecture documentation into MkDocs site (#186)
- Rename documentation files to lowercase kebab-case for URL-friendly filenames
Fixed¶
- Fix health check response validation logic bug (operator precedence)
- Fix address parsing fallback silently hiding configuration errors
- Fix IPv6 address formatting in health check
v0.21.0 - 2025-12-19¶
Added¶
- Gemini 3 Flash Preview model support (#168)
- Default authentication mode for API endpoints (#173) —
permissive(default) orblockingmode - Backend error passthrough for 4xx responses (#177) — parse and forward original error messages from OpenAI, Anthropic, and Gemini
Fixed¶
- Handle UTF-8 multi-byte character corruption in streaming responses (#179)
- Strip response_format parameter for GPT Image models (#176)
- Allow auto-discovery for all backends except Anthropic (#172)
- Always return b64_json field for Gemini image generation responses (#181)
v0.20.0 - 2025-12-18¶
Added¶
- Image variations support for Gemini (nano-banana) models (#165)
- Image edit support for Gemini (nano-banana) models (#164)
- Enhanced
/v1/images/generationswith streaming and GPT Image features (#161) - gpt-image-1.5 model support (#159)
/v1/images/variationsendpoint (#155)/v1/images/editsendpoint for image editing and inpainting (#156)- External Markdown file support for system prompts with REST API management (#146)
- Automatic model discovery for backends without explicit model list (#142)
- Solar Open 100B model
Security¶
- API key redaction to prevent credential exposure in logs and error messages (#150)
Changed¶
- Optimized release binary size from 20MB to 6MB (70% reduction) (#144)
Refactored¶
v0.19.0 - 2025-12-13¶
Added¶
- Runtime Configuration Management API (#139)
- Configuration query, modification, save/restore, and backend management APIs
- Sensitive information masking, JSON Schema generation, configuration history with rollback (up to 50 entries)
- Comprehensive Admin REST API reference documentation
- 33 integration tests for configuration API endpoints
Security¶
- Input validation with 1MB content limit and 32-level nesting depth
- Audit logging for sensitive data exports with 30+ sensitive field patterns
v0.18.0 - 2025-12-13¶
Added¶
- Per-API-key rate limiting (#137)
- API key management and configuration system
- Files API authentication and authorization (#131)
- Hot reload for runtime configuration updates (#130)
Fixed¶
- Add ConnectInfo extension for admin/metrics/files endpoints
- Address security vulnerabilities in API key management
Refactored¶
- Extract CLI and app utilities into modular structure (#132)
- Split converter.rs into modular structure (#132)
- Split large source files into modular components
v0.17.0 - 2025-12-12¶
Added¶
- Anthropic backend file content transformation (#126)
- Gemini backend file content transformation (#127)
Fixed¶
- Streaming file uploads to prevent memory exhaustion (#128)
v0.16.0 - 2025-12-12¶
Added¶
- OpenAI-compatible Files API endpoints (#111)
- File resolution middleware for chat completions (#120)
- OpenAI backend file handling strategy (#121, #122)
- Persistent metadata storage for Files API (#125)
- GPT-5.2 model support (#124)
- Circuit breaker pattern for automatic backend failover
- Admin endpoint authentication and audit logging
- Configurable fallback models for unavailable model scenarios with cross-provider support
Fixed¶
- Sanitize fallback error headers and metric labels
- Use index-based lookup for fallback chain traversal
- Reduce lock contention in FallbackService with snapshot pattern
v0.15.0 - 2025-12-05¶
Added¶
- Nano Banana (Gemini Image Generation) API support (#102)
- Split
/v1/modelsendpoint — standard lightweight vs extended metadata response (#101)
Changed¶
- Optimize LRU cache to use read lock for cache lookups (#105)
Fixed¶
- Replace
.expect()panics with proper error propagation in HttpClientFactory (#104)
Refactored¶
- Extract streaming handler logic to dedicated StreamService (#106)
- Eliminate retry logic code duplication in proxy.rs (#103)
v0.14.2 - 2025-12-05¶
Added¶
- Log token usage (input/output tokens) on request completion (#92)
v0.14.1 - 2025-12-05¶
Fixed¶
- Optimize Anthropic backend TTFT with connection pooling and HTTP/2 (#90)
- Optimize Gemini backend TTFT with connection pooling and HTTP/2 (#88)
- Apply base name fallback matching to aliases in model metadata lookup (#84)
v0.14.0 - 2025-12-04¶
Added¶
- Router-wide global system prompt injection (#82)
CI¶
- Replace deprecated actions-rs/toolchain with dtolnay/rust-toolchain
- Add RUSTFLAGS for macOS ARM64 ring build
- Switch to rustls-tls for musl cross-compilation support
v0.13.0 - 2025-12-04¶
Added¶
- OpenAI
/v1/responsesAPI support with session management (#49) - True SSE streaming for
/v1/responsesAPI - Background cleanup task for expired sessions
- Override
/v1/modelsresponse fields via model-metadata.yaml (#75)
Security¶
- SecretString for API key storage across all backends (#76)
- Session access control and input validation for Responses API
Changed¶
- Immediate mode for SseParser for reduced first-response latency
Refactored¶
- String allocation optimizations and error handling standardization
v0.12.0 - 2025-12-04¶
Fixed¶
- Handle exact hash matches in consistent hash binary search (#72)
- Replace panics with Option returns and implement stats aggregation (#71)
- Remove hardcoded auth requirement from
/v1/modelsendpoint
Refactored¶
- Reorganize OpenAI model metadata by family (#74)
- Extract AnthropicStreamTransformer to dedicated module (#73)
- Split backends mod.rs into separate modules (#69)
- Extract embedded tests to separate files (#68)
- Create HttpClientFactory for centralized HTTP client creation (#67)
- Create UrlValidator module with SSRF prevention (#66)
- Extract RequestExecutor to shared common module (#65)
- Extract HeaderBuilder with auth strategies (#64)
- Extract AtomicStatistics to shared common module
v0.11.0 - 2025-12-03¶
Added¶
- Native Anthropic Claude API backend with extended thinking support
- OpenAI to Claude reasoning parameter conversion
- Flat reasoning_effort parameter for Anthropic
- Claude 4, 4.1, 4.5 model metadata
Fixed¶
- Improve health check and model fetching for Anthropic/Gemini backends
- Accept-Encoding fixes for streaming — use
identityheader and disable compression
v0.10.0 - 2025-12-03¶
Added¶
- Native Google Gemini API backend support
- OpenAI Images API support for image generation
- Authenticated health checks for OpenAI and API-key backends
- Built-in OpenAI model metadata for
/v1/modelsresponse - API key authentication for streaming requests
- Configurable image generation timeout
- Response_format validation for image generation API
Fixed¶
- Convert maxtokens to maxcompletion_tokens for newer OpenAI models
- Correct URL construction for all API endpoints
- Request body size limits to prevent DoS attacks
Security¶
- Remove sensitive data from debug logs
Refactored¶
- Unify request retry logic with RequestType enum
v0.9.0 - 2025-12-02¶
Added¶
- Enhanced rate limiting with token bucket algorithm
- Comprehensive Prometheus metrics and monitoring (#10)
Security¶
- Prevent IP spoofing via X-Forwarded-For manipulation
- Prevent header injection vulnerabilities
- Eliminate race condition in token refill
- Protect API keys with SHA-256 hashing
- Prevent memory exhaustion via unbounded bucket growth
- Comprehensive authentication for metrics endpoint
- Cardinality limits and label sanitization to prevent metric explosion DoS
Fixed¶
- Implement singleton pattern for metrics to prevent memory leaks
- Improve error handling to prevent panic conditions
- Resolve environment variable race condition in config test
- Fix integration test failures in metrics
v0.8.0 - 2025-09-09¶
Added¶
- Model ID alias support for metadata sharing (#27)
Fixed¶
- Return empty list instead of 503 when all backends are unhealthy (#28)
v0.7.1 - 2025-09-08¶
Fixed¶
- Improve config path validation for home directory and executable paths (#26)
v0.7.0 - 2025-09-07¶
Added¶
- Rich metadata support for
/v1/modelsendpoint (#23, #25) - Enhanced configuration management (#9, #22)
- Advanced load balancing strategies (Weighted, Least-Latency, Consistent-Hash) with enhanced error handling (#21)
Fixed¶
- Use streaming timeout configuration from config.yaml instead of hardcoded 25s limit
v0.6.0 - 2025-09-03¶
Fixed¶
- Use timeout configuration from config.yaml instead of hardcoded values (#19)
Documentation¶
- Comprehensive timeout configuration and model documentation updates
v0.5.0 - 2025-09-02¶
Added¶
- Optional retry configuration with sensible defaults
- Comprehensive integration tests and performance optimizations
- Complete service layer implementation
- Middleware architecture and enhanced backend abstraction
Fixed¶
- Handle streaming requests without model field gracefully
- Resolve floating-point precision and timing issues in tests
- Resolve test failures and deadlocks in object pool and SSE parser
- Resolve initial health check race condition
Refactored¶
- Split oversized modules into layered architecture
- Extract complex types into type aliases for better readability
v0.4.0 - 2025-08-25¶
Added¶
- Model-based routing with health monitoring
Fixed¶
- Improve health check integration and SSE parsing
v0.3.0 - 2025-08-25¶
Added¶
- SSE streaming support for real-time chat completions (#5)
- Model aggregation from multiple endpoints (#4)
v0.2.0 - 2025-08-25¶
Added¶
- Multiple backends support with round-robin load balancing (#1)
v0.1.0 - 2025-08-24¶
Added¶
- Initial release with OpenAI-compatible endpoints and proxy functionality