Backend Passthrough Contract¶
For several backend types, Continuum Router forwards the chat completions request body without backend-specific reshaping once the request reaches the transport layer. This is narrower than an edge-to-upstream byte-equivalence guarantee: earlier handler stages may still modify the payload before make_http_request, make_unix_socket_request, or LlamaCppBackend::execute_chat_completion see it. This document defines which backends offer that transport-layer guarantee, which fields known consumers rely on, and where the boundary is with backends that do transform the request.
Passthrough Backends¶
The following backend types do not apply a backend-specific transform in their chat-completions transport path. Once a request has reached that path, all non-standard top-level fields, extra_body, and unrecognized keys are forwarded unchanged:
OpenAILlamacpp(llama-server)Mlxcel(reusesLlamaCppBackend; seesrc/infrastructure/backends/factory/backend_factory.rs)OllamavLLMLMStudioLocalAI
The passthrough has three equivalent proxy sites plus one factory-backed backend site:
- HTTP path:
src/proxy/backend.rs::make_http_request, the primary passthrough site. Thepayload.clone()call in theelsebranch (non-Anthropic, non-Gemini) forwards the body unchanged. - Unix-socket path: the sibling
elsebranch inmake_unix_socket_requestwithin the same file, exercised whenbackend.transportisUnixSocket. - Streaming path:
src/http/streaming/handler.rsaround theclient.post(&backend_url).json(¤t_payload)call, which forwardscurrent_payloadverbatim for each attempt in the streaming fallback loop. - Factory-backed llama.cpp / MLxcel path:
src/infrastructure/backends/llamacpp/backend.rs::LlamaCppBackend::execute_chat_completion, reached forBackendTypeConfig::LlamacppandBackendTypeConfig::Mlxcelviasrc/infrastructure/backends/factory/backend_factory.rs.
None of these sites apply any additional provider-specific transformation. The factory-level note about MLxcel inheriting LlamaCppBackend (see src/infrastructure/backends/factory/backend_factory.rs) is still correct and is now covered explicitly by the factory-backed site above.
Known consumers¶
The following fields are used in production and must survive the passthrough unchanged:
chat_template_kwargs— Jinja template parameters for llama.cpp / MLxcel (e.g.,{"enable_thinking": false, "preserve_thinking": true}). Required for Qwen3-family thinking-mode control.thinking_budget_tokens— per-request thinking-budget cap accepted by llama.cpp. Required for Qwen3 thinking mode via llama-server.extra_body— object of additional parameters passed by OpenAI client libraries (e.g., the PythonopenaiSDK'sextra_bodykwarg). Used to forward vLLM-specific settings such as{"skip_special_tokens": false}.
Transform Backends¶
The following backend types run the request through a provider-specific transformation layer before forwarding it. Non-standard fields may not survive verbatim:
Anthropic—src/http/handlers/anthropic/transform.rs. Among other changes, thethinkingblock (withbudget_tokens) is transformed intoreasoning_effortfor the downstream OpenAI-compatible call.Gemini—src/infrastructure/backends/gemini/transform.rs. Gemini-native request reshaping; consult the transform source for the current field mapping.
Integration Tests¶
The transport-layer passthrough contract is protected by the following integration tests:
tests/llamacpp_passthrough_test.rs— verifies thatchat_template_kwargs,thinking_budget_tokens,extra_body, and arbitrary unknown fields reach the llama-server endpoint unchanged.tests/mlxcel_passthrough_test.rs— keeps separate coverage for theMlxcelbackend type. It includes a proxy-path fixture and a factory-backedBackendFactory -> LlamaCppBackend::execute_chat_completionassertion so a future divergence in MLxcel wiring surfaces as a distinct failing test.tests/anthropic_input_test.rs— includes a negative test (test_anthropic_thinking_budget_tokens_transforms_to_reasoning_effort) that confirmsthinking.budget_tokensis transformed and does not reach the downstream server as a rawthinking_budget_tokensfield.
If a refactor breaks any of these tests, treat it as a breaking change to a public contract, not a routine test failure. Update this document to reflect the new behavior before merging.
What the Guarantee Covers¶
The guarantee applies to the transport sites listed above. It does not mean the upstream server always receives a byte-for-byte copy of the client-submitted JSON body.
Before those passthrough sites run, chat_completions can still perform router-level preprocessing:
- Global system prompt injection when
global_promptsis configured, which modifies themessagesarray. transform_payload_for_openai()renamingmax_tokenstomax_completion_tokensforo1,o3, andgpt-5*model IDs.- Router-managed
web_searchinjection for eligible self-hosted backends whenweb_search.enabledapplies, which can add atoolsentry before the upstream call.
At the transport layer itself, no additional top-level JSON keys are injected and no passthrough fields are filtered or renamed.
Related Documentation¶
- Reasoning Effort — how
reasoning_effort/thinking.budget_tokensis mapped across providers - Model Fallback — fallback chain configuration
- Architecture Overview — main architecture guide