API Reference¶

Continuum Router provides a comprehensive OpenAI-compatible API with additional administrative endpoints for monitoring and management. This document provides detailed information about all available endpoints, request/response formats, and error handling.

Table of Contents¶

Overview
Authentication
Core API Endpoints
Admin Endpoints
Configuration Management API
Error Handling
Rate Limiting
Streaming
Examples

Overview¶

Base URL¶

http://localhost:8080

Content Type¶

All requests and responses use application/json unless otherwise specified.

OpenAI Compatibility¶

Continuum Router is fully compatible with OpenAI API v1, supporting: - Chat completions with streaming - Text completions - Image generation (DALL-E, gpt-image-1) - Image editing/inpainting (DALL-E 2, gpt-image-1) - Image variations (DALL-E 2) - Files API (upload, list, retrieve, delete) - File resolution in chat completions (image_file references) - Model listing - Error response formats

Authentication¶

Continuum Router supports API key authentication with configurable enforcement modes.

Authentication Modes¶

The router supports two authentication modes for API endpoints:

Mode	Behavior
`permissive` (default)	Requests without API key are allowed. Requests with valid API keys are authenticated and can access user-specific features.
`blocking`	Only authenticated requests are processed. Requests without valid API key receive `401 Unauthorized`.

Configuration¶

api_keys:
  # Authentication mode: "permissive" (default) or "blocking"
  mode: blocking

  # API key definitions
  api_keys:
        - key: "${API_KEY_1}"
      id: "key-production-1"
      user_id: "user-admin"
      organization_id: "org-main"
      scopes: [read, write, files, admin]

Protected Endpoints (when mode is `blocking`)¶

/v1/chat/completions
/v1/completions
/v1/responses
/v1/images/generations
/v1/images/edits
/v1/images/variations
/v1/models

Note: Health endpoints (/health, /healthz) are always accessible without authentication. Admin, Files, and Metrics endpoints have separate authentication mechanisms.

Making Authenticated Requests¶

Include the API key in the Authorization header:

POST /v1/chat/completions HTTP/1.1
Authorization: Bearer sk-your-api-key
Content-Type: application/json

{
  "model": "gpt-4",
  "messages": [{"role": "user", "content": "Hello"}]
}

Authentication Errors¶

When authentication fails, the API returns:

{
  "error": {
    "message": "Missing or invalid Authorization header. Expected: Bearer <api_key>",
    "type": "authentication_error",
    "code": "invalid_api_key"
  }
}

Status Codes:

401 Unauthorized: Missing or invalid API key

Core API Endpoints¶

Health Check¶

Check the health status of the router service.

GET /health

Response:

{
  "status": "ok",
  "service": "continuum-router"
}

Status Codes:

200: Service is healthy

List Models¶

Retrieve all available models from all healthy backends.

GET /v1/models

Response:



name=__codelineno-7-1 href=#__codelineno-7-1>{ "object": "list", "data": [ { "id": "gpt-4", "object": "model", "created": 1677610602, "owned_by": "openai-compatible", "permission": [], "root": "gpt-4", "parent": null }, { "id": "llama2:7b", "object": "model",  "created": 1677610602, "owned_by": "local-ollama", "permission": [], "root": "llama2:7b", "parent": null } ] class=p>}
 Status Codes:
  200: Models retrieved successfully
 503: All backends are unhealthy
 
 Features:
  Model Aggregation: Combines models from all healthy backends
 Deduplication: Removes duplicate models across backends
 Caching: Results cached for 5 minutes by default
 Health Awareness: Only includes models from healthy backends
 
 
 Chat Completions¶
 Generate chat completions using the OpenAI Chat API format.
 POST /v1/chat/completions
 Request Body: 
{
  "model": "gpt-3.5-turbo",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user", 
      "content": "Explain quantum computing in simple terms."
    }
  ],
  "temperature": 0.7,
  "max_tokens": 150,
  "top_p": 1.0,
  "frequency_penalty": 0.0,
  "presence_penalty": 0.0,
  "stream": false,
  "stop": null,
  "logit_bias": {},
  "user": "user123"
}
 Parameters:
    Parameter  Type  Required  Description  
 
   model  string  Yes  Model identifier (must be available on at least one healthy backend)  
  messages  array  Yes  Array of message objects with role and content  
  temperature  number  No  Sampling temperature (0.0 to 2.0, default: 1.0)  
  max_tokens  integer  No  Maximum tokens to generate  
  top_p  number  No  Nucleus sampling parameter (0.0 to 1.0)  
  frequency_penalty  number  No  Frequency penalty (-2.0 to 2.0)  
  presence_penalty  number  No  Presence penalty (-2.0 to 2.0)  
  stream  boolean  No  Enable streaming response (default: false)  
  stop  string/array  No  Stop sequences  
  logit_bias  object  No  Token logit bias  
  user  string  No  User identifier for tracking  
 
 
 Response (Non-streaming): 
{
  "id": "chatcmpl-123456789",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-3.5-turbo",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing is a revolutionary computing paradigm that harnesses quantum mechanical phenomena..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 150,
    "total_tokens": 175
  }
}
 Response (Streaming): When stream: true, the response uses Server-Sent Events (SSE):
 data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-3.5-turbo","choices":[{"delta":{"role":"assistant","content":""},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-3.5-turbo","choices":[{"delta":{"content":"Quantum"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-3.5-turbo","choices":[{"delta":{"content":" computing"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-3.5-turbo","choices":[{"delta":{},"index":0,"finish_reason":"stop"}]}

data: [DONE]
 Status Codes:
  200: Completion generated successfully 
 400: Invalid request format or parameters
 404: Model not found on any healthy backend
 502: Backend connection error
 504: Request timeout
 503: All backends unhealthy
 
 Features:
  Model-Based Routing: Automatically routes to backends serving the requested model
 Load Balancing: Distributes load across healthy backends
 Streaming Support: Real-time response streaming via SSE
 Error Recovery: Automatic retry on transient failures
 Request Deduplication: Prevents duplicate processing of identical requests
 
 
 Image Generation¶
 Generate images using OpenAI's DALL-E, GPT Image models, or Google's Nano Banana (Gemini) models.
 POST /v1/images/generations
 Request Body: 
{
  "model": "dall-e-3",
  "prompt": "A serene Japanese garden with cherry blossoms",
  "n": 1,
  "size": "1024x1024",
  "quality": "standard",
  "response_format": "url"
}
 Parameters:
    Parameter  Type  Required  Description  
 
   model  string  Yes  Image model: dall-e-2, dall-e-3, gpt-image-1, gpt-image-1.5, gpt-image-1-mini, nano-banana, or nano-banana-pro  
  prompt  string  Yes  Description of the image to generate  
  n  integer  No  Number of images (1-10, varies by model)  
  size  string  No  Image size (varies by model, see below)  
  quality  string  No  Image quality (varies by model, see below)  
  style  string  No  Image style: vivid or natural (DALL-E 3 only)  
  response_format  string  No  Response format: url or b64_json  
  output_format  string  No  Output file format: png, jpeg, webp (GPT Image models only, default: png)  
  output_compression  integer  No  Compression level 0-100 for jpeg/webp (GPT Image models only)  
  background  string  No  Background: transparent, opaque, auto (GPT Image models only)  
  stream  boolean  No  Enable streaming for partial images (GPT Image models only, default: false)  
  partial_images  integer  No  Number of partial images 0-3 during streaming (GPT Image models only)  
  user  string  No  User identifier for tracking  
 
 
 Model-specific constraints:
    Model  Sizes  n  Quality  Notes  
 
   dall-e-2  256x256, 512x512, 1024x1024  1-10  N/A  Classic DALL-E 2  
  dall-e-3  1024x1024, 1792x1024, 1024x1792  1  standard, hd  High quality with prompt revision  
  gpt-image-1  1024x1024, 1536x1024, 1024x1536, auto  1  low, medium, high, auto  Latest GPT Image model, supports streaming  
  gpt-image-1.5  1024x1024, 1536x1024, 1024x1536, auto  1  low, medium, high, auto  4x faster, better text rendering  
  gpt-image-1-mini  1024x1024, 1536x1024, 1024x1536, auto  1  low, medium, high, auto  Cost-effective option  
  nano-banana  256x256 to 1024x1024  1-4  N/A  Gemini 2.5 Flash Image (fast)  
  nano-banana-pro  256x256 to 4096x4096  1-4  N/A  Gemini 2.0 Flash Image (advanced, up to 4K)  
 
 
 Quality Parameter (GPT Image Models):
 For backward compatibility, standard maps to medium and hd maps to high when using GPT Image models.
    Quality  Description  
 
   low  Fast generation with lower quality  
  medium  Balanced quality and speed (default)  
  high  Best quality, slower generation  
  auto  Model selects optimal quality  
 
 
 Output Format Options (GPT Image Models):
    Format  Description  Supports Transparency  
 
   png  Lossless format (default)  Yes  
  jpeg  Lossy format, smaller file size  No  
  webp  Modern format, good compression  Yes  
 
 
  Note: Transparent background (background: "transparent") requires png or webp format.
 
 Nano Banana (Gemini) Models:
 Nano Banana provides access to Google's Gemini image generation capabilities through an OpenAI-compatible interface:
  nano-banana: Maps to Gemini 2.5 Flash Image - fast, general-purpose image generation
 nano-banana-pro: Maps to Gemini 2.0 Flash Image - advanced model with high-resolution support (up to 4K)
 
 Nano Banana Size Mapping:
 The router automatically converts OpenAI-style size parameters to Gemini's aspectRatio and imageSize format:
    OpenAI Size  Gemini aspectRatio  Gemini imageSize  Notes  
 
   256x256  1:1  1K  Falls back to Gemini minimum  
  512x512  1:1  1K  Falls back to Gemini minimum  
  1024x1024  1:1  1K  Default  
  1536x1024  3:2  1K  Landscape (new)  
  1024x1536  2:3  1K  Portrait (new)  
  1024x1792  9:16  1K  Tall portrait  
  1792x1024  16:9  1K  Wide landscape  
  2048x2048  1:1  2K  Pro only  
  4096x4096  1:1  4K  Pro only  
  auto  1:1  1K  Default fallback  
 
 
 The conversion sends the following Gemini API structure: 
{
  "contents": [{"parts": [{"text": "Your prompt"}]}],
  "generationConfig": {
    "imageConfig": {
      "aspectRatio": "3:2",
      "imageSize": "1K"
    }
  }
}
 Example Nano Banana Request: 
{
  "model": "nano-banana",
  "prompt": "A white siamese cat with blue eyes, photorealistic",
  "n": 1,
  "size": "1024x1024",
  "response_format": "b64_json"
}
 Response: 
{
  "created": 1677652288,
  "data": [
    {
      "url": "https://oaidalleapiprodscus.blob.core.windows.net/...",
      "revised_prompt": "A tranquil Japanese garden featuring..."
    }
  ]
}
 Response (with b64_json): 
{
  "created": 1677652288,
  "data": [
    {
      "b64_json": "/9j/4AAQSkZJRgABAQAA...",
      "revised_prompt": "A tranquil Japanese garden featuring..."
    }
  ]
}
 Nano Banana Response Notes:
  When using response_format: "url" with Nano Banana, the image is returned as a data URL (data:image/png;base64,...) since Gemini's native API returns inline base64 data
 The revised_prompt field contains any text response from Gemini describing the generated image
 
 Streaming Image Generation (GPT Image Models):
 When stream: true is specified for GPT Image models, the response will be streamed as Server-Sent Events (SSE):
 Example Streaming Request: 
{
  "model": "gpt-image-1",
  "prompt": "A beautiful sunset over mountains",
  "stream": true,
  "partial_images": 2,
  "response_format": "b64_json"
}
 Streaming Response Format: 
data: {"type":"image_generation.partial_image","partial_image_index":0,"b64_json":"...","created":1702345678}

data: {"type":"image_generation.partial_image","partial_image_index":1,"b64_json":"...","created":1702345679}

data: {"type":"image_generation.complete","b64_json":"...","created":1702345680}

data: {"type":"image_generation.usage","usage":{"input_tokens":25,"output_tokens":1024}}

data: {"type":"done"}
 SSE Event Types:
    Event Type  Description  
 
   image_generation.partial_image  Intermediate image during generation  
  image_generation.complete  Final complete image  
  image_generation.usage  Token usage information (for cost tracking)  
  done  Stream completion marker  
 
 
 Example GPT Image Request with New Options: 
{
  "model": "gpt-image-1.5",
  "prompt": "A white cat with blue eyes, photorealistic",
  "size": "auto",
  "quality": "high",
  "output_format": "webp",
  "output_compression": 85,
  "background": "transparent",
  "response_format": "b64_json"
}
 Status Codes:
  200: Image(s) generated successfully
 400: Invalid request (e.g., invalid size for model, n > 1 for DALL-E 3)
 401: Invalid API key
 429: Rate limit exceeded
 500: Backend error
 503: Gemini backend unavailable (for Nano Banana models)
 
 Timeout Configuration: Image generation requests use a configurable timeout (default: 3 minutes). See timeouts.request.image_generation in configuration.
 
 Image Edit (Inpainting)¶
 Edit existing images using OpenAI's inpainting capabilities. This endpoint allows you to modify specific regions of an image based on a text prompt and optional mask. Supports GPT Image models and DALL-E 2.
 POST /v1/images/edits
Content-Type: multipart/form-data
 Request Parameters (multipart/form-data):
    Parameter  Type  Required  Description  
 
   image  file  Yes  The source image to edit (PNG, < 4MB, square)  
  prompt  string  Yes  Description of the desired edit  
  mask  file  No  Mask image indicating edit regions (PNG, same dimensions as image)  
  model  string  No  Model to use (default: gpt-image-1)  
  n  integer  No  Number of images to generate (1-10, default: 1)  
  size  string  No  Output size (model-dependent, default: 1024x1024)  
  response_format  string  No  Response format: url or b64_json (default: url)  
  user  string  No  Unique user identifier for tracking  
 
 
 Supported Models and Sizes:
    Model  Sizes  Notes  
 
   gpt-image-1  1024x1024, 1536x1024, 1024x1536, auto  Latest GPT Image model (recommended)  
  gpt-image-1-mini  1024x1024, 1536x1024, 1024x1536, auto  Cost-optimized version  
  gpt-image-1.5  1024x1024, 1536x1024, 1024x1536, auto  Newest with improved instruction following  
  dall-e-2  256x256, 512x512, 1024x1024  Legacy DALL-E 2 model  
 
 
  Note: DALL-E 3 and Gemini (nano-banana) do NOT support image editing via this endpoint. Gemini uses semantic masking via natural language, which is incompatible with OpenAI's mask-based editing format.
 
 Image Requirements:
  Format: PNG only
 Size: Less than 4MB
 Dimensions: Must be square (width equals height)
 
 Mask Requirements:
  Format: PNG with alpha channel (RGBA)
 Dimensions: Must match the source image exactly
 Transparent areas: Indicate regions to edit/generate
 Opaque areas: Indicate regions to preserve
 
 Example Request: 
curl -X POST http://localhost:8080/v1/images/edits \
  -F "image=@source_image.png" \
  -F "mask=@mask.png" \
  -F "prompt=A sunlit indoor lounge area with a pool containing a flamingo" \
  -F "n=1" \
  -F "size=1024x1024" \
  -F "response_format=url"
 Example Request (without mask): 
curl -X POST http://localhost:8080/v1/images/edits \
  -F "image=@source_image.png" \
  -F "prompt=Add a sunset in the background" \
  -F "n=1" \
  -F "size=512x512"
 Response: 
{
  "created": 1677652288,
  "data": [
    {
      "url": "https://oaidalleapiprodscus.blob.core.windows.net/..."
    }
  ]
}
 Response (with b64_json): 
{
  "created": 1677652288,
  "data": [
    {
      "b64_json": "/9j/4AAQSkZJRgABAQAA..."
    }
  ]
}
 Status Codes:
  200: Image(s) edited successfully
 400: Invalid request (e.g., non-square image, invalid size, missing required field)
 401: Invalid API key
 503: OpenAI backend unavailable
 
 Error Examples:
 Non-square image: 
{
  "error": {
    "message": "Image must be square (800x600 is not square)",
    "type": "invalid_request_error",
    "param": "image",
    "code": "image_not_square"
  }
}
 Mask dimension mismatch: 
{
  "error": {
    "message": "Mask dimensions (256x256) do not match image dimensions (512x512)",
    "type": "invalid_request_error",
    "param": "mask",
    "code": "dimension_mismatch"
  }
}
 Unsupported model: 
{
  "error": {
    "message": "Model 'dall-e-3' does not support image editing. Supported models: gpt-image-1, gpt-image-1-mini, gpt-image-1.5, dall-e-2. Note: dall-e-3 does NOT support image editing.",
    "type": "invalid_request_error",
    "param": "model",
    "code": "unsupported_model"
  }
}
 Notes:
  Supported models: gpt-image-1, gpt-image-1-mini, gpt-image-1.5, dall-e-2
 DALL-E 3 does NOT support image editing via API
 Gemini (nano-banana) is NOT supported - uses different editing approach (semantic masking)
 When no mask is provided, the entire image may be modified
 The source image should have transparent regions if editing without a mask
 Request timeout uses the image generation timeout configuration
 
 
 Image Variations¶
 Generate variations of an existing image using OpenAI's DALL-E 2 model.
 POST /v1/images/variations
Content-Type: multipart/form-data
 Form Fields:
    Parameter  Type  Required  Description  
 
   image  file  Yes  Source image for variations (PNG, < 4MB, must be square)  
  model  string  No  Model to use (default: dall-e-2)  
  n  integer  No  Number of variations to generate (1-10, default: 1)  
  size  string  No  Output size: 256x256, 512x512, 1024x1024 (default: 1024x1024)  
  response_format  string  No  Response format: url or b64_json (default: url)  
  user  string  No  User identifier for tracking  
 
 
 Example Request: 
curl -X POST http://localhost:8080/v1/images/variations \
  -F "image=@source_image.png" \
  -F "model=dall-e-2" \
  -F "n=2" \
  -F "size=512x512" \
  -F "response_format=url"
 Response: 
{
  "created": 1677652288,
  "data": [
    {
      "url": "https://oaidalleapiprodscus.blob.core.windows.net/..."
    },
    {
      "url": "https://oaidalleapiprodscus.blob.core.windows.net/..."
    }
  ]
}
 Response (with b64_json): 
{
  "created": 1677652288,
  "data": [
    {
      "b64_json": "/9j/4AAQSkZJRgABAQAA..."
    }
  ]
}
 Model Support:
    Model  Variations Support  Notes  
 
   dall-e-2  Yes (native)  Full support, 1-10 variations  
  dall-e-3  No  Not supported by OpenAI API  
  gpt-image-1  No  Not supported  
  nano-banana  No  Gemini does not support variations API  
  nano-banana-pro  No  Gemini does not support variations API  
 
 
 Image Requirements:
  Format: PNG only
 Size: Less than 4MB
 Dimensions: Must be square (width == height)
 Supported input sizes: Any square dimensions (will be processed by the model)
 
 Error Scenarios:
    Error  Status  Description  
 
   Image not PNG  400  Only PNG format is supported  
  Image not square  400  Image dimensions must be equal  
  Image too large  400  Image exceeds 4MB size limit  
  Model not supported  400  Requested model doesn't support variations  
  Missing image  400  Image field is required  
  Invalid n value  400  n must be between 1 and 10  
  Invalid size  400  Size must be one of the supported values  
 
 
 Status Codes:
  200: Variation(s) generated successfully
 400: Invalid request (invalid format, non-square image, unsupported model)
 401: Invalid API key
 429: Rate limit exceeded
 500: Backend error
 503: Backend unavailable
 
 
 Text Completions¶
 Generate text completions using the OpenAI Completions API format.
 POST /v1/completions
 Request Body: 
{
  "model": "gpt-3.5-turbo-instruct",
  "prompt": "Once upon a time in a distant galaxy",
  "max_tokens": 100,
  "temperature": 0.7,
  "top_p": 1.0,
  "frequency_penalty": 0.0,
  "presence_penalty": 0.0,
  "stream": false,
  "stop": null,
  "logit_bias": {},
  "user": "user123"
}
 Response: 
{
  "id": "cmpl-123456789",
  "object": "text_completion",
  "created": 1677652288,
  "model": "gpt-3.5-turbo-instruct",
  "choices": [
    {
      "text": ", there lived a young explorer named Zara who dreamed of discovering new worlds...",
      "index": 0,
      "finish_reason": "stop",
      "logprobs": null
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 90,
    "total_tokens": 100
  }
}
 Status Codes: Same as Chat Completions
 
 Files API¶
 The Files API allows you to upload, manage, and use files in chat completions. Uploaded files can be referenced in messages using the image_file content type, and the router automatically resolves these references by injecting the file content.
 Upload File¶
 Upload a file for use in chat completions.
 POST /v1/files
Content-Type: multipart/form-data
 Form Fields:
    Field  Type  Required  Description  
 
   file  file  Yes  The file to upload  
  purpose  string  Yes  Purpose of the file: vision, assistants, fine-tune, batch  
 
 
 Example: 
curl -X POST http://localhost:8080/v1/files \
  -F "file=@image.png" \
  -F "purpose=vision"
 Response: 
{
  "id": "file-abc123def456",
  "object": "file",
  "bytes": 12345,
  "created_at": 1699061776,
  "filename": "image.png",
  "purpose": "vision"
}
 Status Codes:
  200: File uploaded successfully
 400: Invalid request (missing file, invalid purpose)
 413: File too large (exceeds configured maxfilesize)
 
 
 List Files¶
 Retrieve a list of uploaded files.
 GET /v1/files
GET /v1/files?purpose=vision
 Query Parameters:
    Parameter  Type  Required  Description  
 
   purpose  string  No  Filter by purpose  
 
 
 Response: 
{
  "object": "list",
  "data": [
    {
      "id": "file-abc123def456",
      "object": "file",
      "bytes": 12345,
      "created_at": 1699061776,
      "filename": "image.png",
      "purpose": "vision"
    }
  ]
}
 
 Get File Metadata¶
 Retrieve metadata for a specific file.
 GET /v1/files/{file_id}
 Response: 
{
  "id": "file-abc123def456",
  "object": "file",
  "bytes": 12345,
  "created_at": 1699061776,
  "filename": "image.png",
  "purpose": "vision"
}
 Status Codes:
  200: File metadata retrieved
 404: File not found
 
 
 Download File Content¶
 Download the content of an uploaded file.
 GET /v1/files/{file_id}/content
 Response: Binary file content with appropriate Content-Type header.
 Status Codes:
  200: File content returned
 404: File not found
 
 
 Delete File¶
 Delete an uploaded file.
 DELETE /v1/files/{file_id}
 Response: 
{
  "id": "file-abc123def456",
  "object": "file",
  "deleted": true
}
 Status Codes:
  200: File deleted successfully
 404: File not found
 
 
 File Resolution in Chat Completions¶
 The router automatically resolves file references in chat completion requests. When a message contains an image_file content block, the router:
  Validates the file ID format
 Loads the file content from storage
 Converts the file to a base64 data URL
 Replaces the image_file block with an image_url block
 
 Request with File Reference: 
{
  "model": "gpt-4-vision-preview",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What's in this image?"},
        {"type": "image_file", "image_file": {"file_id": "file-abc123def456"}}
      ]
    }
  ]
}
 Transformed Request (sent to backend): 
{
  "model": "gpt-4-vision-preview",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What's in this image?"},
        {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
      ]
    }
  ]
}
 File Resolution Errors:
    Error  Status  Description  
 
   Invalid file ID format  400  File ID must start with file-  
  File not found  404  Referenced file does not exist  
  Too many file references  400  Request contains more than 20 file references  
  Resolution timeout  504  File resolution took longer than 30 seconds  
 
 
 Supported MIME Types for Image Files:
  image/png
 image/jpeg
 image/gif
 image/webp
 
 
 Admin Endpoints¶
 Backend Status¶
 Get detailed status information about all configured backends.
 GET /admin/backends
 Response: 
{
  "backends": [
    {
      "name": "local-ollama",
      "url": "http://localhost:11434",
      "is_healthy": true,
      "consecutive_failures": 0,
      "consecutive_successes": 15,
      "last_check": "2024-01-15T10:30:45Z",
      "last_error": null,
      "response_time_ms": 45,
      "models": ["llama2", "mistral", "codellama"],
      "weight": 1,
      "total_requests": 150,
      "failed_requests": 2
    },
    {
      "name": "openai-compatible",
      "url": "https://api.openai.com",
      "is_healthy": false,
      "consecutive_failures": 3,
      "consecutive_successes": 0,
      "last_check": "2024-01-15T10:29:30Z",
      "last_error": "Connection timeout after 5s",
      "response_time_ms": null,
      "models": [],
      "weight": 1,
      "total_requests": 45,
      "failed_requests": 8
    }
  ],
  "healthy_count": 1,
  "total_count": 2,
  "summary": {
    "total_models": 3,
    "total_requests": 195,
    "total_failures": 10,
    "average_response_time_ms": 45
  }
}
 Fields:
    Field  Type  Description  
 
   name  string  Backend identifier from configuration  
  url  string  Backend base URL  
  is_healthy  boolean  Current health status  
  consecutive_failures  integer  Sequential failed health checks  
  consecutive_successes  integer  Sequential successful health checks  
  last_check  string  ISO timestamp of last health check  
  last_error  string/null  Last error message if unhealthy  
  response_time_ms  integer/null  Last health check response time  
  models  array  Available models from this backend  
  weight  integer  Load balancing weight  
  total_requests  integer  Total requests routed to this backend  
  failed_requests  integer  Failed requests to this backend  
 
 
 Status Codes:
  200: Backend status retrieved successfully
 
 
 Service Health¶
 Get overall service health and component status.
 GET /admin/health
 Response: 
{
  "status": "healthy",
  "version": "1.0.0",
  "uptime": "2h 15m 30s",
  "timestamp": "2024-01-15T10:30:45Z",
  "services": {
    "backend_service": {
      "status": "healthy",
      "message": "All backends operational",
      "healthy_backends": 2,
      "total_backends": 2
    },
    "model_service": {
      "status": "healthy",
      "message": "Model cache operational",
      "cached_models": 15,
      "cache_hit_rate": 0.95,
      "last_refresh": "2024-01-15T10:25:00Z"
    },
    "proxy_service": {
      "status": "healthy",
      "message": "Request routing operational",
      "total_requests": 1250,
      "failed_requests": 12,
      "average_latency_ms": 85
    },
    "health_service": {
      "status": "healthy",
      "message": "Health monitoring active",
      "check_interval": "30s",
      "last_check": "2024-01-15T10:30:00Z"
    }
  },
  "metrics": {
    "requests_per_second": 5.2,
    "error_rate": 0.008,
    "memory_usage_mb": 125,
    "cpu_usage_percent": 15.5
  }
}
 Status Values:
  healthy: Service operating normally
 degraded: Service operating with reduced functionality
 unhealthy: Service experiencing issues
 
 Status Codes:
  200: Service health retrieved successfully
 503: Service is unhealthy
 
 
 Configuration Summary¶
 Get current configuration summary including hot reload status.
 GET /admin/config
 Response: 
{
  "server": {
    "bind_address": "0.0.0.0:8080",
    "workers": 4,
    "connection_pool_size": 100
  },
  "backends": {
    "count": 3,
    "names": ["openai", "local-ollama", "gemini"]
  },
  "health_checks": {
    "interval": "30s",
    "timeout": "10s",
    "unhealthy_threshold": 3,
    "healthy_threshold": 2
  },
  "rate_limiting": {
    "enabled": false
  },
  "circuit_breaker": {
    "enabled": true
  },
  "selection_strategy": "RoundRobin",
  "hot_reload": {
    "available": true,
    "note": "Configuration changes will be automatically detected and applied"
  }
}
 Fields:
    Field  Type  Description  
 
   server  object  Server configuration (bindaddress, workers, connectionpool_size)  
  backends  object  Backend configuration summary (count, names)  
  health_checks  object  Health check settings  
  rate_limiting  object  Rate limiting status  
  circuit_breaker  object  Circuit breaker status  
  selection_strategy  string  Current load balancing strategy  
  hot_reload  object  Hot reload availability and status  
 
 
 Status Codes:
  200: Configuration summary retrieved successfully
 
 Note: Sensitive information (API keys, etc.) is automatically redacted from the response.
 
 Hot Reload Status¶
 Get detailed information about hot reload functionality and configuration item classification.
 GET /admin/config/hot-reload-status
 Response: 
{
  "enabled": true,
  "description": "Hot reload is enabled. Configuration file changes are automatically detected and applied.",
  "capabilities": {
    "immediate_update": {
      "description": "Changes applied immediately without service interruption",
      "items": [
        "logging.level",
        "rate_limiting.*",
        "circuit_breaker.*",
        "retry.*",
        "global_prompts.*"
      ]
    },
    "gradual_update": {
      "description": "Existing connections maintained, new connections use new config",
      "items": [
        "backends.*",
        "health_checks.*",
        "timeouts.*"
      ]
    },
    "requires_restart": {
      "description": "Changes logged as warnings, restart required to take effect",
      "items": [
        "server.bind_address",
        "server.workers"
      ]
    }
  }
}
 Fields:
    Field  Type  Description  
 
   enabled  boolean  Whether hot reload is enabled  
  description  string  Human-readable description of hot reload status  
  capabilities  object  Configuration item classification by hot reload capability  
  capabilities.immediate_update  object  Items that update immediately without disruption  
  capabilities.gradual_update  object  Items that apply to new connections only  
  capabilities.requires_restart  object  Items that require server restart  
 
 
 Configuration Item Classification:
 Immediate Update (no service interruption): - logging.level - Log level changes apply immediately - rate_limiting.* - Rate limiting settings update in real-time - circuit_breaker.* - Circuit breaker thresholds and timeouts - retry.* - Retry policies and backoff strategies - global_prompts.* - Global system prompt injection settings
 Gradual Update (existing connections maintained): - backends.* - Backend add/remove/modify (new requests use updated pool) - health_checks.* - Health check intervals and thresholds - timeouts.* - Timeout values for new requests
 Requires Restart (logged as warnings): - server.bind_address - TCP bind address - server.workers - Worker thread count
 Status Codes:
  200: Hot reload status retrieved successfully
 
 Example Usage: 
# Check if hot reload is enabled
curl http://localhost:8080/admin/config/hot-reload-status | jq '.enabled'

# List items that support immediate update
curl http://localhost:8080/admin/config/hot-reload-status | jq '.capabilities.immediate_update.items'
 
 Configuration Management API¶
 The Configuration Management API enables viewing and modifying router configuration at runtime without requiring a server restart. This provides operational flexibility for adjusting behavior, adding backends, and fine-tuning settings in production environments.
 Overview¶
 Key capabilities: - Runtime Configuration: View and modify configuration without server restart - Hot Reload Support: Changes to supported settings apply immediately - Validation: Validate configuration changes before applying - History & Rollback: Track configuration changes and rollback to previous versions - Export/Import: Backup and restore configurations across environments - Security: Sensitive information (API keys, passwords, tokens) is automatically masked
 
 Configuration Query APIs¶
 Get Full Configuration¶
 Returns the complete current configuration with sensitive information masked for security.
 GET /admin/config/full
 Response: 
{
  "server": {
    "bind_address": "0.0.0.0:8080",
    "workers": 4,
    "connection_pool_size": 100
  },
  "backends": [
    {
      "name": "openai",
      "url": "https://api.openai.com",
      "api_key": "sk-****...**",
      "weight": 1,
      "models": ["gpt-4", "gpt-3.5-turbo"]
    },
    {
      "name": "local-ollama",
      "url": "http://localhost:11434",
      "weight": 1,
      "models": []
    }
  ],
  "health_checks": {
    "interval": "30s",
    "timeout": "10s",
    "unhealthy_threshold": 3,
    "healthy_threshold": 2
  },
  "logging": {
    "level": "info",
    "format": "json"
  },
  "retry": {
    "max_attempts": 3,
    "backoff": "exponential",
    "initial_delay_ms": 100
  },
  "timeouts": {
    "connect": "5s",
    "request": "60s"
  },
  "rate_limiting": {
    "enabled": false
  },
  "circuit_breaker": {
    "enabled": true,
    "failure_threshold": 5,
    "recovery_timeout": "30s"
  }
}
 Notes:
  API keys, passwords, and tokens are masked (e.g., sk-****...**)
 All configuration sections are included in the response
 Use /admin/config/{section} for individual section details
 
 Status Codes:
  200: Configuration retrieved successfully
 
 
 List Configuration Sections¶
 Returns a list of all available configuration sections.
 GET /admin/config/sections
 Response: 
{
  "sections": [
    "server",
    "backends",
    "health_checks",
    "logging",
    "retry",
    "timeouts",
    "rate_limiting",
    "circuit_breaker",
    "global_prompts",
    "admin",
    "fallback",
    "files",
    "api_keys",
    "metrics",
    "routing"
  ],
  "total": 15
}
 Status Codes:
  200: Section list retrieved successfully
 
 
 Get Configuration Section¶
 Returns the configuration for a specific section with hot reload capability information.
 GET /admin/config/{section}
 Path Parameters:
    Parameter  Type  Required  Description  
 
   section  string  Yes  Configuration section name  
 
 
 Example Request: 
curl http://localhost:8080/admin/config/logging
 Response: 
{
  "section": "logging",
  "config": {
    "level": "info",
    "format": "json",
    "output": "stdout",
    "include_timestamps": true
  },
  "hot_reload_capability": "immediate_update",
  "description": "Changes to this section apply immediately without service interruption"
}
 Hot Reload Capability Values:
    Value  Description  
 
   immediate_update  Changes apply immediately without service interruption  
  gradual_update  Existing connections maintained, new connections use new config  
  requires_restart  Server restart required for changes to take effect  
 
 
 Status Codes:
  200: Section configuration retrieved successfully
 404: Invalid section name
 
 Error Response: 
{
  "error": {
    "message": "Configuration section 'invalid_section' not found",
    "type": "not_found",
    "code": 404,
    "details": {
      "requested_section": "invalid_section",
      "available_sections": ["server", "backends", "logging", "..."]
    }
  }
}
 
 Get Configuration Schema¶
 Returns the JSON Schema for configuration validation. Useful for client-side validation before submitting changes.
 GET /admin/config/schema
 Response: 
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "server": {
      "type": "object",
      "properties": {
        "bind_address": {
          "type": "string",
          "pattern": "^[0-9.]+:[0-9]+$",
          "description": "Server bind address in host:port format"
        },
        "workers": {
          "type": "integer",
          "minimum": 1,
          "maximum": 256,
          "description": "Number of worker threads"
        },
        "connection_pool_size": {
          "type": "integer",
          "minimum": 1,
          "maximum": 10000,
          "description": "HTTP connection pool size per backend"
        }
      },
      "required": ["bind_address"]
    },
    "backends": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "name": {
            "type": "string",
            "minLength": 1,
            "description": "Unique backend identifier"
          },
          "url": {
            "type": "string",
            "format": "uri",
            "description": "Backend base URL"
          },
          "weight": {
            "type": "integer",
            "minimum": 0,
            "maximum": 100,
            "default": 1,
            "description": "Load balancing weight"
          },
          "models": {
            "type": "array",
            "items": {"type": "string"},
            "description": "Explicit model list (optional)"
          }
        },
        "required": ["name", "url"]
      }
    },
    "logging": {
      "type": "object",
      "properties": {
        "level": {
          "type": "string",
          "enum": ["trace", "debug", "info", "warn", "error"],
          "description": "Log level"
        },
        "format": {
          "type": "string",
          "enum": ["json", "text", "pretty"],
          "description": "Log output format"
        }
      }
    }
  }
}
 Status Codes:
  200: Schema retrieved successfully
 
 
 Configuration Modification APIs¶
 Replace Configuration Section¶
 Replaces an entire configuration section. Triggers validation and hot reload if applicable.
 PUT /admin/config/{section}
 Path Parameters:
    Parameter  Type  Required  Description  
 
   section  string  Yes  Configuration section name  
 
 
 Request Body: Complete section configuration object.
 Example Request: 
curl -X PUT http://localhost:8080/admin/config/logging \
  -H "Content-Type: application/json" \
  -d '{
    "level": "debug",
    "format": "json",
    "output": "stdout",
    "include_timestamps": true
  }'
 Response: 
{
  "success": true,
  "section": "logging",
  "hot_reload_applied": true,
  "message": "Configuration updated and applied immediately",
  "previous": {
    "level": "info",
    "format": "json",
    "output": "stdout",
    "include_timestamps": true
  },
  "current": {
    "level": "debug",
    "format": "json",
    "output": "stdout",
    "include_timestamps": true
  },
  "version": 15
}
 Status Codes:
  200: Configuration updated successfully
 400: Invalid configuration format or validation error
 404: Invalid section name
 
 
 Partial Update Configuration Section¶
 Performs a partial update using JSON merge patch semantics. Only specified fields are updated; unspecified fields retain their current values.
 PATCH /admin/config/{section}
 Path Parameters:
    Parameter  Type  Required  Description  
 
   section  string  Yes  Configuration section name  
 
 
 Request Body: Partial configuration object with fields to update.
 Example Request: 
curl -X PATCH http://localhost:8080/admin/config/logging \
  -H "Content-Type: application/json" \
  -d '{
    "level": "warn"
  }'
 Response: 
{
  "success": true,
  "section": "logging",
  "hot_reload_applied": true,
  "message": "Configuration partially updated and applied",
  "changes": {
    "level": {
      "from": "info",
      "to": "warn"
    }
  },
  "current": {
    "level": "warn",
    "format": "json",
    "output": "stdout",
    "include_timestamps": true
  },
  "version": 16
}
 Merge Behavior:
  Scalar values are replaced
 Objects are merged recursively
 Arrays are replaced entirely (not merged)
 null values remove the field (if optional)
 
 Status Codes:
  200: Configuration updated successfully
 400: Invalid configuration format or validation error
 404: Invalid section name
 
 
 Validate Configuration¶
 Validates configuration without applying changes. Supports dry_run mode for testing configuration changes safely.
 POST /admin/config/validate
 Request Body: 
{
  "section": "backends",
  "config": {
    "name": "new-backend",
    "url": "http://localhost:8000",
    "weight": 2
  },
  "dry_run": true
}
 Parameters:
    Parameter  Type  Required  Description  
 
   section  string  Yes  Configuration section to validate  
  config  object  Yes  Configuration to validate  
  dry_run  boolean  No  If true, only validate without preparing for apply (default: true)  
 
 
 Response (Valid): 
{
  "valid": true,
  "section": "backends",
  "warnings": [
    "Backend 'new-backend' has no explicit model list; models will be auto-discovered"
  ],
  "info": {
    "hot_reload_capability": "gradual_update",
    "estimated_impact": "New requests may be routed to this backend after apply"
  }
}
 Response (Invalid): 
{
  "valid": false,
  "section": "backends",
  "errors": [
    {
      "field": "url",
      "message": "Invalid URL format: missing scheme",
      "value": "localhost:8000"
    },
    {
      "field": "weight",
      "message": "Weight must be between 0 and 100",
      "value": 150
    }
  ],
  "warnings": []
}
 Status Codes:
  200: Validation completed (check valid field for result)
 400: Invalid request format
 
 
 Apply Pending Changes¶
 Applies pending configuration changes immediately. Triggers hot reload for applicable settings.
 POST /admin/config/apply
 Request Body (optional): 
{
  "sections": ["logging", "rate_limiting"],
  "force": false
}
 Parameters:
    Parameter  Type  Required  Description  
 
   sections  array  No  Specific sections to apply (default: all pending)  
  force  boolean  No  Force apply even if warnings exist (default: false)  
 
 
 Response: 
{
  "success": true,
  "applied_sections": ["logging", "rate_limiting"],
  "results": {
    "logging": {
      "status": "applied",
      "hot_reload": "immediate_update"
    },
    "rate_limiting": {
      "status": "applied",
      "hot_reload": "immediate_update"
    }
  },
  "version": 17,
  "timestamp": "2024-01-15T10:45:30Z"
}
 Status Codes:
  200: Changes applied successfully
 400: No pending changes or validation errors
 409: Conflict with concurrent modification
 
 
 Configuration Save/Restore APIs¶
 Export Configuration¶
 Exports the current configuration in the specified format.
 POST /admin/config/export
 Request Body: 
{
  "format": "yaml",
  "include_sensitive": false,
  "sections": ["server", "backends", "logging"]
}
 Parameters:
    Parameter  Type  Required  Description  
 
   format  string  No  Export format: yaml, json, or toml (default: yaml)  
  include_sensitive  boolean  No  Include sensitive data unmasked (requires elevated permissions, default: false)  
  sections  array  No  Specific sections to export (default: all)  
 
 
 Response (format: json): 
{
  "format": "json",
  "content": "{\"server\":{\"bind_address\":\"0.0.0.0:8080\",...}}",
  "sections_exported": ["server", "backends", "logging"],
  "exported_at": "2024-01-15T10:45:30Z",
  "version": 17,
  "checksum": "sha256:a1b2c3d4..."
}
 Response (format: yaml): 
{
  "format": "yaml",
  "content": "server:\n  bind_address: \"0.0.0.0:8080\"\n  workers: 4\n...",
  "sections_exported": ["server", "backends", "logging"],
  "exported_at": "2024-01-15T10:45:30Z",
  "version": 17,
  "checksum": "sha256:a1b2c3d4..."
}
 Status Codes:
  200: Export successful
 400: Invalid format specified
 403: Elevated permissions required for include_sensitive: true
 
 
 Import Configuration¶
 Imports configuration from the provided content.
 POST /admin/config/import
 Request Body: 
{
  "format": "yaml",
  "content": "server:\n  bind_address: \"0.0.0.0:8080\"\n  workers: 8\nlogging:\n  level: debug",
  "dry_run": true,
  "merge": false
}
 Parameters:
    Parameter  Type  Required  Description  
 
   format  string  Yes  Content format: yaml, json, or toml  
  content  string  Yes  Configuration content to import  
  dry_run  boolean  No  Validate without applying (default: false)  
  merge  boolean  No  Merge with existing config vs replace (default: false)  
 
 
 Response (dry_run: true): 
{
  "valid": true,
  "dry_run": true,
  "changes_preview": {
    "server": {
      "workers": {"from": 4, "to": 8}
    },
    "logging": {
      "level": {"from": "info", "to": "debug"}
    }
  },
  "sections_affected": ["server", "logging"],
  "warnings": [
    "server.workers change requires restart to take effect"
  ]
}
 Response (dry_run: false): 
{
  "success": true,
  "imported_sections": ["server", "logging"],
  "hot_reload_results": {
    "logging": "applied_immediately",
    "server": "requires_restart"
  },
  "version": 18,
  "timestamp": "2024-01-15T10:50:00Z"
}
 Status Codes:
  200: Import successful (or dry_run validation passed)
 400: Invalid format or content parsing error
 422: Configuration validation failed
 
 
 Get Configuration History¶
 Retrieves the history of configuration changes.
 GET /admin/config/history
 Query Parameters:
    Parameter  Type  Required  Description  
 
   limit  integer  No  Maximum entries to return (default: 20, max: 100)  
  offset  integer  No  Number of entries to skip (default: 0)  
  section  string  No  Filter by section name  
 
 
 Example Request: 
curl "http://localhost:8080/admin/config/history?limit=10&section=logging"
 Response: 
{
  "history": [
    {
      "version": 18,
      "timestamp": "2024-01-15T10:50:00Z",
      "sections_changed": ["logging"],
      "source": "api",
      "user": "admin",
      "changes": {
        "logging": {
          "level": {"from": "info", "to": "debug"}
        }
      }
    },
    {
      "version": 17,
      "timestamp": "2024-01-15T09:30:00Z",
      "sections_changed": ["backends"],
      "source": "file_reload",
      "user": null,
      "changes": {
        "backends": {
          "added": ["new-backend"],
          "modified": [],
          "removed": []
        }
      }
    },
    {
      "version": 16,
      "timestamp": "2024-01-14T15:20:00Z",
      "sections_changed": ["rate_limiting"],
      "source": "api",
      "user": "admin",
      "changes": {
        "rate_limiting": {
          "enabled": {"from": false, "to": true}
        }
      }
    }
  ],
  "total": 18,
  "limit": 10,
  "offset": 0
}
 Source Values:
    Source  Description  
 
   api  Changed via Configuration Management API  
  file_reload  Changed via configuration file hot reload  
  startup  Initial configuration at server startup  
  rollback  Restored from previous version  
 
 
 Status Codes:
  200: History retrieved successfully
 
 
 Rollback Configuration¶
 Rolls back to a previous configuration version.
 POST /admin/config/rollback/{version}
 Path Parameters:
    Parameter  Type  Required  Description  
 
   version  integer  Yes  Version number to rollback to  
 
 
 Request Body (optional): 
{
  "dry_run": false,
  "sections": ["logging", "backends"]
}
 Parameters:
    Parameter  Type  Required  Description  
 
   dry_run  boolean  No  Preview changes without applying (default: false)  
  sections  array  No  Specific sections to rollback (default: all changed sections)  
 
 
 Response: 
{
  "success": true,
  "rolled_back_from": 18,
  "rolled_back_to": 15,
  "sections_restored": ["logging", "backends"],
  "changes": {
    "logging": {
      "level": {"from": "debug", "to": "info"}
    },
    "backends": {
      "removed": ["new-backend"]
    }
  },
  "new_version": 19,
  "timestamp": "2024-01-15T11:00:00Z"
}
 Status Codes:
  200: Rollback successful
 400: Validation error for target configuration
 404: Version not found in history
 
 
 Backend Management APIs¶
 These endpoints provide convenient shortcuts for managing backends without modifying the full backends configuration section.
 Add Backend¶
 Dynamically adds a new backend to the router.
 POST /admin/backends
 Request Body: 
{
  "name": "new-ollama",
  "url": "http://192.168.1.100:11434",
  "weight": 2,
  "models": ["llama2", "codellama"],
  "api_key": null,
  "health_check_path": "/api/tags"
}
 Parameters:
    Parameter  Type  Required  Description  
 
   name  string  Yes  Unique backend identifier  
  url  string  Yes  Backend base URL  
  weight  integer  No  Load balancing weight (default: 1)  
  models  array  No  Explicit model list (empty for auto-discovery)  
  api_key  string  No  API key for authentication  
  health_check_path  string  No  Custom health check endpoint  
 
 
 Response: 
{
  "success": true,
  "backend": {
    "name": "new-ollama",
    "url": "http://192.168.1.100:11434",
    "weight": 2,
    "models": ["llama2", "codellama"],
    "is_healthy": null,
    "status": "pending_health_check"
  },
  "message": "Backend added successfully. Health check scheduled.",
  "config_version": 20
}
 Status Codes:
  200: Backend added successfully
 400: Invalid backend configuration
 409: Backend with this name already exists
 
 
 Get Backend Configuration¶
 Retrieves the configuration for a specific backend.
 GET /admin/backends/{name}
 Path Parameters:
    Parameter  Type  Required  Description  
 
   name  string  Yes  Backend identifier  
 
 
 Response: 
{
  "name": "local-ollama",
  "url": "http://localhost:11434",
  "weight": 1,
  "models": ["llama2", "mistral", "codellama"],
  "api_key": null,
  "health_check_path": "/api/tags",
  "is_healthy": true,
  "consecutive_failures": 0,
  "consecutive_successes": 25,
  "last_check": "2024-01-15T10:55:00Z",
  "total_requests": 1250,
  "failed_requests": 3
}
 Status Codes:
  200: Backend configuration retrieved
 404: Backend not found
 
 
 Update Backend Configuration¶
 Updates the configuration for an existing backend.
 PUT /admin/backends/{name}
 Path Parameters:
    Parameter  Type  Required  Description  
 
   name  string  Yes  Backend identifier  
 
 
 Request Body: 
{
  "url": "http://localhost:11434",
  "weight": 3,
  "models": ["llama2", "mistral", "codellama", "phi"],
  "api_key": null
}
 Response: 
{
  "success": true,
  "backend": {
    "name": "local-ollama",
    "url": "http://localhost:11434",
    "weight": 3,
    "models": ["llama2", "mistral", "codellama", "phi"]
  },
  "changes": {
    "weight": {"from": 1, "to": 3},
    "models": {"added": ["phi"], "removed": []}
  },
  "config_version": 21
}
 Status Codes:
  200: Backend updated successfully
 400: Invalid configuration
 404: Backend not found
 
 
 Delete Backend¶
 Removes a backend from the router.
 DELETE /admin/backends/{name}
 Path Parameters:
    Parameter  Type  Required  Description  
 
   name  string  Yes  Backend identifier  
 
 
 Query Parameters:
    Parameter  Type  Required  Description  
 
   drain  boolean  No  Wait for active requests to complete (default: true)  
  timeout  integer  No  Drain timeout in seconds (default: 30)  
 
 
 Example Request: 
curl -X DELETE "http://localhost:8080/admin/backends/old-backend?drain=true&timeout=60"
 Response: 
{
  "success": true,
  "deleted_backend": "old-backend",
  "drained": true,
  "active_requests_completed": 5,
  "config_version": 22,
  "message": "Backend removed from rotation"
}
 Status Codes:
  200: Backend deleted successfully
 404: Backend not found
 409: Cannot delete last remaining backend
 
 
 Update Backend Weight¶
 Updates only the load balancing weight for a backend.
 PUT /admin/backends/{name}/weight
 Path Parameters:
    Parameter  Type  Required  Description  
 
   name  string  Yes  Backend identifier  
 
 
 Request Body: 
{
  "weight": 5
}
 Response: 
{
  "success": true,
  "backend": "local-ollama",
  "weight": {
    "from": 1,
    "to": 5
  },
  "config_version": 23
}
 Status Codes:
  200: Weight updated successfully
 400: Invalid weight value
 404: Backend not found
 
 
 Update Backend Models¶
 Updates only the model list for a backend.
 PUT /admin/backends/{name}/models
 Path Parameters:
    Parameter  Type  Required  Description  
 
   name  string  Yes  Backend identifier  
 
 
 Request Body: 
{
  "models": ["llama2", "mistral", "codellama", "phi", "gemma"],
  "mode": "replace"
}
 Parameters:
    Parameter  Type  Required  Description  
 
   models  array  Yes  Model list  
  mode  string  No  Update mode: replace, add, or remove (default: replace)  
 
 
 Response: 
{
  "success": true,
  "backend": "local-ollama",
  "models": {
    "previous": ["llama2", "mistral", "codellama"],
    "current": ["llama2", "mistral", "codellama", "phi", "gemma"],
    "added": ["phi", "gemma"],
    "removed": []
  },
  "config_version": 24
}
 Status Codes:
  200: Models updated successfully
 400: Invalid model list
 404: Backend not found
 
 
 Configuration API Examples¶
 Get Full Configuration¶
 curl http://localhost:8080/admin/config/full | jq
 Update Logging Level¶
 curl -X PATCH http://localhost:8080/admin/config/logging \
  -H "Content-Type: application/json" \
  -d '{"level": "debug"}'
 Add a New Backend¶
 curl -X POST http://localhost:8080/admin/backends \
  -H "Content-Type: application/json" \
  -d '{
    "name": "remote-ollama",
    "url": "http://192.168.1.50:11434",
    "weight": 2,
    "models": ["llama2", "mistral"]
  }'
 Export Configuration as JSON¶
 curl -X POST http://localhost:8080/admin/config/export \
  -H "Content-Type: application/json" \
  -d '{"format": "json"}' | jq -r '.content' > config-backup.json
 View Configuration History¶
 curl "http://localhost:8080/admin/config/history?limit=5" | jq '.history'
 
 Configuration API Error Responses¶
 All Configuration Management API errors follow the standard error format:
 {
  "error": {
    "message": "Human-readable error description",
    "type": "error_type_identifier",
    "code": 400,
    "details": {
      "additional": "context information"
    }
  }
}
 Configuration-Specific Error Types:
    Type  HTTP Code  Description  
 
   config_validation_error  400  Configuration validation failed  
  config_section_not_found  404  Requested configuration section does not exist  
  config_version_not_found  404  Requested version not found in history  
  config_conflict  409  Concurrent modification conflict  
  config_permission_denied  403  Insufficient permissions for operation  
  config_parse_error  422  Failed to parse configuration content  
 
 
 Example Validation Error: 
{
  "error": {
    "message": "Configuration validation failed",
    "type": "config_validation_error",
    "code": 400,
    "details": {
      "section": "backends",
      "errors": [
        {
          "field": "url",
          "message": "URL must include scheme (http:// or https://)",
          "value": "localhost:8000"
        }
      ]
    }
  }
}
 Example Conflict Error: 
{
  "error": {
    "message": "Configuration was modified by another request",
    "type": "config_conflict",
    "code": 409,
    "details": {
      "expected_version": 15,
      "current_version": 16,
      "conflicting_sections": ["backends"]
    }
  }
}
 
 Error Handling¶
 Error Response Format¶
 All errors follow a consistent JSON structure:
 {
  "error": {
    "message": "Human-readable error description",
    "type": "error_type_identifier",
    "code": 404,
    "details": {
      "additional": "context information"
    }
  }
}
 Error Types¶
    Type  HTTP Code  Description  
 
   bad_request  400  Invalid request format or parameters  
  unauthorized  401  Authentication required (future feature)  
  forbidden  403  Access denied (future feature)  
  model_not_found  404  Requested model not available  
  rate_limit_exceeded  429  Rate limit exceeded (future feature)  
  internal_error  500  Router internal error  
  bad_gateway  502  Backend connection/response error  
  service_unavailable  503  All backends unhealthy  
  gateway_timeout  504  Backend request timeout  
 
 
 Example Error Responses¶
 Model Not Found: 
{
  "error": {
    "message": "Model 'invalid-model' not found on any healthy backend",
    "type": "model_not_found", 
    "code": 404,
    "details": {
      "requested_model": "invalid-model",
      "available_models": ["gpt-4", "gpt-3.5-turbo", "llama2"]
    }
  }
}
 Backend Error: 
{
  "error": {
    "message": "Failed to connect to backend 'local-ollama'",
    "type": "bad_gateway",
    "code": 502,
    "details": {
      "backend": "local-ollama", 
      "backend_error": "Connection refused"
    }
  }
}
 Service Unavailable: 
{
  "error": {
    "message": "All backends are currently unhealthy",
    "type": "service_unavailable",
    "code": 503,
    "details": {
      "healthy_backends": 0,
      "total_backends": 3
    }
  }
}
 Rate Limiting¶
 Note: Rate limiting is not currently implemented but is planned for future releases.
 Future rate limiting will support: - Per-IP rate limiting - Per-API-key rate limiting
 - Model-specific rate limiting - Sliding window algorithms - Rate limit headers in responses
 Streaming¶
 Server-Sent Events (SSE)¶
 When stream: true is specified, responses are sent as Server-Sent Events with:
 Content-Type: text/event-stream Cache-Control: no-cache Connection: keep-alive
 SSE Format¶
 data: {"id":"chatcmpl-123","object":"chat.completion.chunk",...}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk",...}

data: [DONE]
 SSE Compatibility¶
 The router supports multiple SSE formats for maximum compatibility:
  Standard Format: data: {...}
 Spaced Format: data: {...} 
 Mixed Line Endings: Handles \r\n, \n, and \r
 Empty Lines: Properly processes chunk separators
 
 Connection Management¶
  Keep-Alive: Connections are kept open during streaming
 Timeouts: 5-minute timeout for long-running requests
 Error Handling: Partial responses include error information
 Client Disconnection: Gracefully handles client disconnects
 
 Examples¶
 Basic Chat Completion¶
 curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'
 Streaming Chat Completion¶
 curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {"role": "user", "content": "Write a short story"}
    ],
    "stream": true,
    "max_tokens": 200
  }'
 Text Completion with Parameters¶
 curl -X POST http://localhost:8080/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-3.5-turbo-instruct",
    "prompt": "The future of AI is",
    "max_tokens": 50,
    "temperature": 0.8,
    "top_p": 0.9
  }'
 Check Backend Status¶
 curl http://localhost:8080/admin/backends | jq
 Monitor Service Health¶
 curl http://localhost:8080/admin/health | jq '.services'
 List Available Models¶
 curl http://localhost:8080/v1/models | jq '.data[].id'
 Python Client Example¶
 import requests
import json

# Configure the client
BASE_URL = "http://localhost:8080"

def chat_completion(messages, model="gpt-3.5-turbo", stream=False):
    """Send a chat completion request"""
    response = requests.post(
        f"{BASE_URL}/v1/chat/completions",
        headers={"Content-Type": "application/json"},
        json={
            "model": model,
            "messages": messages,
            "stream": stream,
            "temperature": 0.7
        },
        stream=stream
    )

    if stream:
        # Handle streaming response
        for line in response.iter_lines():
            if line:
                line = line.decode('utf-8')
                if line.startswith('data: '):
                    data = line[6:]  # Remove 'data: ' prefix
                    if data == '[DONE]':
                        break
                    try:
                        chunk = json.loads(data)
                        content = chunk['choices'][0]['delta'].get('content', '')
                        if content:
                            print(content, end='', flush=True)
                    except json.JSONDecodeError:
                        continue
        print()  # New line after streaming
    else:
        # Handle non-streaming response
        result = response.json()
        return result['choices'][0]['message']['content']

# Example usage
messages = [
    {"role": "user", "content": "Explain machine learning in simple terms"}
]

print("Streaming response:")
chat_completion(messages, stream=True)

print("\nNon-streaming response:")
response = chat_completion(messages, stream=False)
print(response)
 JavaScript/Node.js Client Example¶
 const fetch = require('node-fetch');

const BASE_URL = 'http://localhost:8080';

async function chatCompletion(messages, options = {}) {
    const response = await fetch(`${BASE_URL}/v1/chat/completions`, {
        method: 'POST',
        headers: {
            'Content-Type': 'application/json',
        },
        body: JSON.stringify({
            model: options.model || 'gpt-3.5-turbo',
            messages: messages,
            stream: options.stream || false,
            temperature: options.temperature || 0.7,
            ...options
        })
    });

    if (options.stream) {
        // Handle streaming response
        const reader = response.body.getReader();
        const decoder = new TextDecoder();

        while (true) {
            const { done, value } = await reader.read();
            if (done) break;

            const chunk = decoder.decode(value);
            const lines = chunk.split('\n');

            for (const line of lines) {
                if (line.startsWith('data: ')) {
                    const data = line.slice(6);
                    if (data === '[DONE]') return;

                    try {
                        const parsed = JSON.parse(data);
                        const content = parsed.choices[0]?.delta?.content;
                        if (content) {
                            process.stdout.write(content);
                        }
                    } catch (e) {
                        // Ignore JSON parse errors
                    }
                }
            }
        }
        console.log(); // New line
    } else {
        const result = await response.json();
        return result.choices[0].message.content;
    }
}

// Example usage
const messages = [
    { role: 'user', content: 'What is the meaning of life?' }
];

// Streaming
console.log('Streaming response:');
await chatCompletion(messages, { stream: true });

// Non-streaming  
console.log('\nNon-streaming response:');
const response = await chatCompletion(messages);
console.log(response);
 
 This API reference provides comprehensive documentation for integrating with Continuum Router. The router maintains full OpenAI API compatibility while adding powerful multi-backend routing and management capabilities.

Model	Sizes	n	Quality	Notes
`dall-e-2`	`256x256`, `512x512`, `1024x1024`	1-10	N/A	Classic DALL-E 2
`dall-e-3`	`1024x1024`, `1792x1024`, `1024x1792`	1	`standard`, `hd`	High quality with prompt revision
`gpt-image-1`	`1024x1024`, `1536x1024`, `1024x1536`, `auto`	1	`low`, `medium`, `high`, `auto`	Latest GPT Image model, supports streaming
`gpt-image-1.5`	`1024x1024`, `1536x1024`, `1024x1536`, `auto`	1	`low`, `medium`, `high`, `auto`	4x faster, better text rendering
`gpt-image-1-mini`	`1024x1024`, `1536x1024`, `1024x1536`, `auto`	1	`low`, `medium`, `high`, `auto`	Cost-effective option
`nano-banana`	`256x256` to `1024x1024`	1-4	N/A	Gemini 2.5 Flash Image (fast)
`nano-banana-pro`	`256x256` to `4096x4096`	1-4	N/A	Gemini 2.0 Flash Image (advanced, up to 4K)

Model	Sizes	Notes
`gpt-image-1`	`1024x1024`, `1536x1024`, `1024x1536`, `auto`	Latest GPT Image model (recommended)
`gpt-image-1-mini`	`1024x1024`, `1536x1024`, `1024x1536`, `auto`	Cost-optimized version
`gpt-image-1.5`	`1024x1024`, `1536x1024`, `1024x1536`, `auto`	Newest with improved instruction following
`dall-e-2`	`256x256`, `512x512`, `1024x1024`	Legacy DALL-E 2 model

Parameter	Type	Required	Description
`model`	string	Yes	Model identifier (must be available on at least one healthy backend)
`messages`	array	Yes	Array of message objects with role and content
`temperature`	number	No	Sampling temperature (0.0 to 2.0, default: 1.0)
`max_tokens`	integer	No	Maximum tokens to generate
`top_p`	number	No	Nucleus sampling parameter (0.0 to 1.0)
`frequency_penalty`	number	No	Frequency penalty (-2.0 to 2.0)
`presence_penalty`	number	No	Presence penalty (-2.0 to 2.0)
`stream`	boolean	No	Enable streaming response (default: false)
`stop`	string/array	No	Stop sequences
`logit_bias`	object	No	Token logit bias
`user`	string	No	User identifier for tracking

Parameter	Type	Required	Description
`model`	string	Yes	Image model: `dall-e-2`, `dall-e-3`, `gpt-image-1`, `gpt-image-1.5`, `gpt-image-1-mini`, `nano-banana`, or `nano-banana-pro`
`prompt`	string	Yes	Description of the image to generate
`n`	integer	No	Number of images (1-10, varies by model)
`size`	string	No	Image size (varies by model, see below)
`quality`	string	No	Image quality (varies by model, see below)
`style`	string	No	Image style: `vivid` or `natural` (DALL-E 3 only)
`response_format`	string	No	Response format: `url` or `b64_json`
`output_format`	string	No	Output file format: `png`, `jpeg`, `webp` (GPT Image models only, default: `png`)
`output_compression`	integer	No	Compression level 0-100 for jpeg/webp (GPT Image models only)
`background`	string	No	Background: `transparent`, `opaque`, `auto` (GPT Image models only)
`stream`	boolean	No	Enable streaming for partial images (GPT Image models only, default: false)
`partial_images`	integer	No	Number of partial images 0-3 during streaming (GPT Image models only)
`user`	string	No	User identifier for tracking

Quality	Description
`low`	Fast generation with lower quality
`medium`	Balanced quality and speed (default)
`high`	Best quality, slower generation
`auto`	Model selects optimal quality

Format	Description	Supports Transparency
`png`	Lossless format (default)	Yes
`jpeg`	Lossy format, smaller file size	No
`webp`	Modern format, good compression	Yes

OpenAI Size	Gemini aspectRatio	Gemini imageSize	Notes
`256x256`	1:1	1K	Falls back to Gemini minimum
`512x512`	1:1	1K	Falls back to Gemini minimum
`1024x1024`	1:1	1K	Default
`1536x1024`	3:2	1K	Landscape (new)
`1024x1536`	2:3	1K	Portrait (new)
`1024x1792`	9:16	1K	Tall portrait
`1792x1024`	16:9	1K	Wide landscape
`2048x2048`	1:1	2K	Pro only
`4096x4096`	1:1	4K	Pro only
`auto`	1:1	1K	Default fallback

Event Type	Description
`image_generation.partial_image`	Intermediate image during generation
`image_generation.complete`	Final complete image
`image_generation.usage`	Token usage information (for cost tracking)
`done`	Stream completion marker

Parameter	Type	Required	Description
`image`	file	Yes	The source image to edit (PNG, < 4MB, square)
`prompt`	string	Yes	Description of the desired edit
`mask`	file	No	Mask image indicating edit regions (PNG, same dimensions as image)
`model`	string	No	Model to use (default: `gpt-image-1`)
`n`	integer	No	Number of images to generate (1-10, default: 1)
`size`	string	No	Output size (model-dependent, default: `1024x1024`)
`response_format`	string	No	Response format: `url` or `b64_json` (default: `url`)
`user`	string	No	Unique user identifier for tracking

Parameter	Type	Required	Description
`image`	file	Yes	Source image for variations (PNG, < 4MB, must be square)
`model`	string	No	Model to use (default: `dall-e-2`)
`n`	integer	No	Number of variations to generate (1-10, default: 1)
`size`	string	No	Output size: `256x256`, `512x512`, `1024x1024` (default: `1024x1024`)
`response_format`	string	No	Response format: `url` or `b64_json` (default: `url`)
`user`	string	No	User identifier for tracking

Model	Variations Support	Notes
`dall-e-2`	Yes (native)	Full support, 1-10 variations
`dall-e-3`	No	Not supported by OpenAI API
`gpt-image-1`	No	Not supported
`nano-banana`	No	Gemini does not support variations API
`nano-banana-pro`	No	Gemini does not support variations API

Error	Status	Description
Image not PNG	400	Only PNG format is supported
Image not square	400	Image dimensions must be equal
Image too large	400	Image exceeds 4MB size limit
Model not supported	400	Requested model doesn't support variations
Missing image	400	Image field is required
Invalid n value	400	n must be between 1 and 10
Invalid size	400	Size must be one of the supported values

Field	Type	Required	Description
`file`	file	Yes	The file to upload
`purpose`	string	Yes	Purpose of the file: `vision`, `assistants`, `fine-tune`, `batch`

Error	Status	Description
Invalid file ID format	400	File ID must start with `file-`
File not found	404	Referenced file does not exist
Too many file references	400	Request contains more than 20 file references
Resolution timeout	504	File resolution took longer than 30 seconds

Field	Type	Description
`name`	string	Backend identifier from configuration
`url`	string	Backend base URL
`is_healthy`	boolean	Current health status
`consecutive_failures`	integer	Sequential failed health checks
`consecutive_successes`	integer	Sequential successful health checks
`last_check`	string	ISO timestamp of last health check
`last_error`	string/null	Last error message if unhealthy
`response_time_ms`	integer/null	Last health check response time
`models`	array	Available models from this backend
`weight`	integer	Load balancing weight
`total_requests`	integer	Total requests routed to this backend
`failed_requests`	integer	Failed requests to this backend

Field	Type	Description
`server`	object	Server configuration (bindaddress, workers, connectionpool_size)
`backends`	object	Backend configuration summary (count, names)
`health_checks`	object	Health check settings
`rate_limiting`	object	Rate limiting status
`circuit_breaker`	object	Circuit breaker status
`selection_strategy`	string	Current load balancing strategy
`hot_reload`	object	Hot reload availability and status

Field	Type	Description
`enabled`	boolean	Whether hot reload is enabled
`description`	string	Human-readable description of hot reload status
`capabilities`	object	Configuration item classification by hot reload capability
`capabilities.immediate_update`	object	Items that update immediately without disruption
`capabilities.gradual_update`	object	Items that apply to new connections only
`capabilities.requires_restart`	object	Items that require server restart

Value	Description
`immediate_update`	Changes apply immediately without service interruption
`gradual_update`	Existing connections maintained, new connections use new config
`requires_restart`	Server restart required for changes to take effect

Parameter	Type	Required	Description
`section`	string	Yes	Configuration section to validate
`config`	object	Yes	Configuration to validate
`dry_run`	boolean	No	If true, only validate without preparing for apply (default: true)

Parameter	Type	Required	Description
`sections`	array	No	Specific sections to apply (default: all pending)
`force`	boolean	No	Force apply even if warnings exist (default: false)

Parameter	Type	Required	Description
`format`	string	No	Export format: `yaml`, `json`, or `toml` (default: `yaml`)
`include_sensitive`	boolean	No	Include sensitive data unmasked (requires elevated permissions, default: false)
`sections`	array	No	Specific sections to export (default: all)

Parameter	Type	Required	Description
`format`	string	Yes	Content format: `yaml`, `json`, or `toml`
`content`	string	Yes	Configuration content to import
`dry_run`	boolean	No	Validate without applying (default: false)
`merge`	boolean	No	Merge with existing config vs replace (default: false)

API Reference¶

Table of Contents¶

Overview¶

Base URL¶

Content Type¶

OpenAI Compatibility¶

Authentication¶

Authentication Modes¶

Configuration¶

Protected Endpoints (when mode is blocking)¶

Making Authenticated Requests¶

Authentication Errors¶

Core API Endpoints¶

Health Check¶

List Models¶

Chat Completions¶

Image Generation¶

Image Edit (Inpainting)¶

Image Variations¶

Text Completions¶

Files API¶

Upload File¶

List Files¶

Get File Metadata¶

Download File Content¶

Delete File¶

File Resolution in Chat Completions¶

Admin Endpoints¶

Backend Status¶

Service Health¶

Configuration Summary¶

Hot Reload Status¶

Configuration Management API¶

Overview¶

Configuration Query APIs¶

Get Full Configuration¶

List Configuration Sections¶

Get Configuration Section¶

Get Configuration Schema¶

Configuration Modification APIs¶

Replace Configuration Section¶

Partial Update Configuration Section¶

Validate Configuration¶

Apply Pending Changes¶

Configuration Save/Restore APIs¶

Export Configuration¶

Import Configuration¶

Get Configuration History¶

Rollback Configuration¶

Backend Management APIs¶

Add Backend¶

Get Backend Configuration¶

Update Backend Configuration¶

Delete Backend¶

Update Backend Weight¶

Update Backend Models¶

Configuration API Examples¶

Get Full Configuration¶

Update Logging Level¶

Add a New Backend¶

Export Configuration as JSON¶

View Configuration History¶

Configuration API Error Responses¶

Error Handling¶

Error Response Format¶

Error Types¶

Example Error Responses¶

Rate Limiting¶

Streaming¶

Server-Sent Events (SSE)¶

SSE Format¶

SSE Compatibility¶

Connection Management¶

Examples¶

Basic Chat Completion¶

Streaming Chat Completion¶

Text Completion with Parameters¶

Check Backend Status¶

Monitor Service Health¶

List Available Models¶

Protected Endpoints (when mode is `blocking`)¶

Parameter	Type	Required	Description
`limit`	integer	No	Maximum entries to return (default: 20, max: 100)
`offset`	integer	No	Number of entries to skip (default: 0)
`section`	string	No	Filter by section name

Source	Description
`api`	Changed via Configuration Management API
`file_reload`	Changed via configuration file hot reload
`startup`	Initial configuration at server startup
`rollback`	Restored from previous version

Parameter	Type	Required	Description
`dry_run`	boolean	No	Preview changes without applying (default: false)
`sections`	array	No	Specific sections to rollback (default: all changed sections)

Parameter	Type	Required	Description
`name`	string	Yes	Unique backend identifier
`url`	string	Yes	Backend base URL
`weight`	integer	No	Load balancing weight (default: 1)
`models`	array	No	Explicit model list (empty for auto-discovery)
`api_key`	string	No	API key for authentication
`health_check_path`	string	No	Custom health check endpoint

Parameter	Type	Required	Description
`drain`	boolean	No	Wait for active requests to complete (default: true)
`timeout`	integer	No	Drain timeout in seconds (default: 30)

Parameter	Type	Required	Description
`models`	array	Yes	Model list
`mode`	string	No	Update mode: `replace`, `add`, or `remove` (default: `replace`)

Type	HTTP Code	Description
`config_validation_error`	400	Configuration validation failed
`config_section_not_found`	404	Requested configuration section does not exist
`config_version_not_found`	404	Requested version not found in history
`config_conflict`	409	Concurrent modification conflict
`config_permission_denied`	403	Insufficient permissions for operation
`config_parse_error`	422	Failed to parse configuration content

Type	HTTP Code	Description
`bad_request`	400	Invalid request format or parameters
`unauthorized`	401	Authentication required (future feature)
`forbidden`	403	Access denied (future feature)
`model_not_found`	404	Requested model not available
`rate_limit_exceeded`	429	Rate limit exceeded (future feature)
`internal_error`	500	Router internal error
`bad_gateway`	502	Backend connection/response error
`service_unavailable`	503	All backends unhealthy
`gateway_timeout`	504	Backend request timeout