Skip to content

API Reference

Continuum Router provides a comprehensive OpenAI-compatible API with additional administrative endpoints for monitoring and management. This document provides detailed information about all available endpoints, request/response formats, and error handling.

Table of Contents

Overview

Base URL

http://localhost:8080

Content Type

All requests and responses use application/json unless otherwise specified.

OpenAI Compatibility

Continuum Router is fully compatible with OpenAI API v1, supporting: - Chat completions with streaming - Text completions - Image generation (DALL-E, gpt-image-1) - Image editing/inpainting (DALL-E 2, gpt-image-1) - Image variations (DALL-E 2) - Files API (upload, list, retrieve, delete) - File resolution in chat completions (image_file references) - Model listing - Error response formats

Authentication

Continuum Router supports API key authentication with configurable enforcement modes.

Authentication Modes

The router supports two authentication modes for API endpoints:

Mode Behavior
permissive (default) Requests without API key are allowed. Requests with valid API keys are authenticated and can access user-specific features.
blocking Only authenticated requests are processed. Requests without valid API key receive 401 Unauthorized.

Configuration

api_keys:
  # Authentication mode: "permissive" (default) or "blocking"
  mode: blocking

  # API key definitions
  api_keys:
        - key: "${API_KEY_1}"
      id: "key-production-1"
      user_id: "user-admin"
      organization_id: "org-main"
      scopes: [read, write, files, admin]

Protected Endpoints (when mode is blocking)

  • /v1/chat/completions
  • /v1/completions
  • /v1/responses
  • /v1/images/generations
  • /v1/images/edits
  • /v1/images/variations
  • /v1/models

Note: Health endpoints (/health, /healthz) are always accessible without authentication. Admin, Files, and Metrics endpoints have separate authentication mechanisms.

Making Authenticated Requests

Include the API key in the Authorization header:

POST /v1/chat/completions HTTP/1.1
Authorization: Bearer sk-your-api-key
Content-Type: application/json

{
  "model": "gpt-4",
  "messages": [{"role": "user", "content": "Hello"}]
}

Authentication Errors

When authentication fails, the API returns:

{
  "error": {
    "message": "Missing or invalid Authorization header. Expected: Bearer <api_key>",
    "type": "authentication_error",
    "code": "invalid_api_key"
  }
}

Status Codes:

  • 401 Unauthorized: Missing or invalid API key

Core API Endpoints

Health Check

Check the health status of the router service.

GET /health

Response:

{
  "status": "ok",
  "service": "continuum-router"
}

Status Codes:

  • 200: Service is healthy

List Models

Retrieve all available models from all healthy backends.

GET /v1/models

Response:

{
  "object": "list",
  "data": [
    {
      "id": "gpt-4",
      "object": "model",
      "created": 1677610602,
      "owned_by": "openai-compatible",
      "permission": [],
      "root": "gpt-4",
      "parent": null
    },
    {
      "id": "llama2:7b",
      "object": "model", 
      "created": 1677610602,
      "owned_by": "local-ollama",
      "permission": [],
      "root": "llama2:7b",
      "parent": null
    }
  ]
}

Status Codes:

  • 200: Models retrieved successfully
  • 503: All backends are unhealthy

Features:

  • Model Aggregation: Combines models from all healthy backends
  • Deduplication: Removes duplicate models across backends
  • Caching: Results cached for 5 minutes by default
  • Health Awareness: Only includes models from healthy backends

Chat Completions

Generate chat completions using the OpenAI Chat API format.

POST /v1/chat/completions

Request Body:

{
  "model": "gpt-3.5-turbo",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user", 
      "content": "Explain quantum computing in simple terms."
    }
  ],
  "temperature": 0.7,
  "max_tokens": 150,
  "top_p": 1.0,
  "frequency_penalty": 0.0,
  "presence_penalty": 0.0,
  "stream": false,
  "stop": null,
  "logit_bias": {},
  "user": "user123"
}

Parameters:

Parameter Type Required Description
model string Yes Model identifier (must be available on at least one healthy backend)
messages array Yes Array of message objects with role and content
temperature number No Sampling temperature (0.0 to 2.0, default: 1.0)
max_tokens integer No Maximum tokens to generate
top_p number No Nucleus sampling parameter (0.0 to 1.0)
frequency_penalty number No Frequency penalty (-2.0 to 2.0)
presence_penalty number No Presence penalty (-2.0 to 2.0)
stream boolean No Enable streaming response (default: false)
stop string/array No Stop sequences
logit_bias object No Token logit bias
user string No User identifier for tracking

Response (Non-streaming):

{
  "id": "chatcmpl-123456789",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-3.5-turbo",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing is a revolutionary computing paradigm that harnesses quantum mechanical phenomena..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 150,
    "total_tokens": 175
  }
}

Response (Streaming): When stream: true, the response uses Server-Sent Events (SSE):

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-3.5-turbo","choices":[{"delta":{"role":"assistant","content":""},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-3.5-turbo","choices":[{"delta":{"content":"Quantum"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-3.5-turbo","choices":[{"delta":{"content":" computing"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-3.5-turbo","choices":[{"delta":{},"index":0,"finish_reason":"stop"}]}

data: [DONE]

Status Codes:

  • 200: Completion generated successfully
  • 400: Invalid request format or parameters
  • 404: Model not found on any healthy backend
  • 502: Backend connection error
  • 504: Request timeout
  • 503: All backends unhealthy

Features:

  • Model-Based Routing: Automatically routes to backends serving the requested model
  • Load Balancing: Distributes load across healthy backends
  • Streaming Support: Real-time response streaming via SSE
  • Error Recovery: Automatic retry on transient failures
  • Request Deduplication: Prevents duplicate processing of identical requests

Image Generation

Generate images using OpenAI's DALL-E, GPT Image models, or Google's Nano Banana (Gemini) models.

POST /v1/images/generations

Request Body:

{
  "model": "dall-e-3",
  "prompt": "A serene Japanese garden with cherry blossoms",
  "n": 1,
  "size": "1024x1024",
  "quality": "standard",
  "response_format": "url"
}

Parameters:

Parameter Type Required Description
model string Yes Image model: dall-e-2, dall-e-3, gpt-image-1, gpt-image-1.5, gpt-image-1-mini, nano-banana, or nano-banana-pro
prompt string Yes Description of the image to generate
n integer No Number of images (1-10, varies by model)
size string No Image size (varies by model, see below)
quality string No Image quality (varies by model, see below)
style string No Image style: vivid or natural (DALL-E 3 only)
response_format string No Response format: url or b64_json
output_format string No Output file format: png, jpeg, webp (GPT Image models only, default: png)
output_compression integer No Compression level 0-100 for jpeg/webp (GPT Image models only)
background string No Background: transparent, opaque, auto (GPT Image models only)
stream boolean No Enable streaming for partial images (GPT Image models only, default: false)
partial_images integer No Number of partial images 0-3 during streaming (GPT Image models only)
user string No User identifier for tracking

Model-specific constraints:

Model Sizes n Quality Notes
dall-e-2 256x256, 512x512, 1024x1024 1-10 N/A Classic DALL-E 2
dall-e-3 1024x1024, 1792x1024, 1024x1792 1 standard, hd High quality with prompt revision
gpt-image-1 1024x1024, 1536x1024, 1024x1536, auto 1 low, medium, high, auto Latest GPT Image model, supports streaming
gpt-image-1.5 1024x1024, 1536x1024, 1024x1536, auto 1 low, medium, high, auto 4x faster, better text rendering
gpt-image-1-mini 1024x1024, 1536x1024, 1024x1536, auto 1 low, medium, high, auto Cost-effective option
nano-banana 256x256 to 1024x1024 1-4 N/A Gemini 2.5 Flash Image (fast)
nano-banana-pro 256x256 to 4096x4096 1-4 N/A Gemini 2.0 Flash Image (advanced, up to 4K)

Quality Parameter (GPT Image Models):

For backward compatibility, standard maps to medium and hd maps to high when using GPT Image models.

Quality Description
low Fast generation with lower quality
medium Balanced quality and speed (default)
high Best quality, slower generation
auto Model selects optimal quality

Output Format Options (GPT Image Models):

Format Description Supports Transparency
png Lossless format (default) Yes
jpeg Lossy format, smaller file size No
webp Modern format, good compression Yes

Note: Transparent background (background: "transparent") requires png or webp format.

Nano Banana (Gemini) Models:

Nano Banana provides access to Google's Gemini image generation capabilities through an OpenAI-compatible interface:

  • nano-banana: Maps to Gemini 2.5 Flash Image - fast, general-purpose image generation
  • nano-banana-pro: Maps to Gemini 2.0 Flash Image - advanced model with high-resolution support (up to 4K)

Nano Banana Size Mapping:

The router automatically converts OpenAI-style size parameters to Gemini's aspectRatio and imageSize format:

OpenAI Size Gemini aspectRatio Gemini imageSize Notes
256x256 1:1 1K Falls back to Gemini minimum
512x512 1:1 1K Falls back to Gemini minimum
1024x1024 1:1 1K Default
1536x1024 3:2 1K Landscape (new)
1024x1536 2:3 1K Portrait (new)
1024x1792 9:16 1K Tall portrait
1792x1024 16:9 1K Wide landscape
2048x2048 1:1 2K Pro only
4096x4096 1:1 4K Pro only
auto 1:1 1K Default fallback

The conversion sends the following Gemini API structure:

{
  "contents": [{"parts": [{"text": "Your prompt"}]}],
  "generationConfig": {
    "imageConfig": {
      "aspectRatio": "3:2",
      "imageSize": "1K"
    }
  }
}

Example Nano Banana Request:

{
  "model": "nano-banana",
  "prompt": "A white siamese cat with blue eyes, photorealistic",
  "n": 1,
  "size": "1024x1024",
  "response_format": "b64_json"
}

Response:

{
  "created": 1677652288,
  "data": [
    {
      "url": "https://oaidalleapiprodscus.blob.core.windows.net/...",
      "revised_prompt": "A tranquil Japanese garden featuring..."
    }
  ]
}

Response (with b64_json):

{
  "created": 1677652288,
  "data": [
    {
      "b64_json": "/9j/4AAQSkZJRgABAQAA...",
      "revised_prompt": "A tranquil Japanese garden featuring..."
    }
  ]
}

Nano Banana Response Notes:

  • When using response_format: "url" with Nano Banana, the image is returned as a data URL (data:image/png;base64,...) since Gemini's native API returns inline base64 data
  • The revised_prompt field contains any text response from Gemini describing the generated image

Streaming Image Generation (GPT Image Models):

When stream: true is specified for GPT Image models, the response will be streamed as Server-Sent Events (SSE):

Example Streaming Request:

{
  "model": "gpt-image-1",
  "prompt": "A beautiful sunset over mountains",
  "stream": true,
  "partial_images": 2,
  "response_format": "b64_json"
}

Streaming Response Format:

data: {"type":"image_generation.partial_image","partial_image_index":0,"b64_json":"...","created":1702345678}

data: {"type":"image_generation.partial_image","partial_image_index":1,"b64_json":"...","created":1702345679}

data: {"type":"image_generation.complete","b64_json":"...","created":1702345680}

data: {"type":"image_generation.usage","usage":{"input_tokens":25,"output_tokens":1024}}

data: {"type":"done"}

SSE Event Types:

Event Type Description
image_generation.partial_image Intermediate image during generation
image_generation.complete Final complete image
image_generation.usage Token usage information (for cost tracking)
done Stream completion marker

Example GPT Image Request with New Options:

{
  "model": "gpt-image-1.5",
  "prompt": "A white cat with blue eyes, photorealistic",
  "size": "auto",
  "quality": "high",
  "output_format": "webp",
  "output_compression": 85,
  "background": "transparent",
  "response_format": "b64_json"
}

Status Codes:

  • 200: Image(s) generated successfully
  • 400: Invalid request (e.g., invalid size for model, n > 1 for DALL-E 3)
  • 401: Invalid API key
  • 429: Rate limit exceeded
  • 500: Backend error
  • 503: Gemini backend unavailable (for Nano Banana models)

Timeout Configuration: Image generation requests use a configurable timeout (default: 3 minutes). See timeouts.request.image_generation in configuration.


Image Edit (Inpainting)

Edit existing images using OpenAI's inpainting capabilities. This endpoint allows you to modify specific regions of an image based on a text prompt and optional mask. Supports GPT Image models and DALL-E 2.

POST /v1/images/edits
Content-Type: multipart/form-data

Request Parameters (multipart/form-data):

Parameter Type Required Description
image file Yes The source image to edit (PNG, < 4MB, square)
prompt string Yes Description of the desired edit
mask file No Mask image indicating edit regions (PNG, same dimensions as image)
model string No Model to use (default: gpt-image-1)
n integer No Number of images to generate (1-10, default: 1)
size string No Output size (model-dependent, default: 1024x1024)
response_format string No Response format: url or b64_json (default: url)
user string No Unique user identifier for tracking

Supported Models and Sizes:

Model Sizes Notes
gpt-image-1 1024x1024, 1536x1024, 1024x1536, auto Latest GPT Image model (recommended)
gpt-image-1-mini 1024x1024, 1536x1024, 1024x1536, auto Cost-optimized version
gpt-image-1.5 1024x1024, 1536x1024, 1024x1536, auto Newest with improved instruction following
dall-e-2 256x256, 512x512, 1024x1024 Legacy DALL-E 2 model

Note: DALL-E 3 and Gemini (nano-banana) do NOT support image editing via this endpoint. Gemini uses semantic masking via natural language, which is incompatible with OpenAI's mask-based editing format.

Image Requirements:

  • Format: PNG only
  • Size: Less than 4MB
  • Dimensions: Must be square (width equals height)

Mask Requirements:

  • Format: PNG with alpha channel (RGBA)
  • Dimensions: Must match the source image exactly
  • Transparent areas: Indicate regions to edit/generate
  • Opaque areas: Indicate regions to preserve

Example Request:

curl -X POST http://localhost:8080/v1/images/edits \
  -F "image=@source_image.png" \
  -F "mask=@mask.png" \
  -F "prompt=A sunlit indoor lounge area with a pool containing a flamingo" \
  -F "n=1" \
  -F "size=1024x1024" \
  -F "response_format=url"

Example Request (without mask):

curl -X POST http://localhost:8080/v1/images/edits \
  -F "image=@source_image.png" \
  -F "prompt=Add a sunset in the background" \
  -F "n=1" \
  -F "size=512x512"

Response:

{
  "created": 1677652288,
  "data": [
    {
      "url": "https://oaidalleapiprodscus.blob.core.windows.net/..."
    }
  ]
}

Response (with b64_json):

{
  "created": 1677652288,
  "data": [
    {
      "b64_json": "/9j/4AAQSkZJRgABAQAA..."
    }
  ]
}

Status Codes:

  • 200: Image(s) edited successfully
  • 400: Invalid request (e.g., non-square image, invalid size, missing required field)
  • 401: Invalid API key
  • 503: OpenAI backend unavailable

Error Examples:

Non-square image:

{
  "error": {
    "message": "Image must be square (800x600 is not square)",
    "type": "invalid_request_error",
    "param": "image",
    "code": "image_not_square"
  }
}

Mask dimension mismatch:

{
  "error": {
    "message": "Mask dimensions (256x256) do not match image dimensions (512x512)",
    "type": "invalid_request_error",
    "param": "mask",
    "code": "dimension_mismatch"
  }
}

Unsupported model:

{
  "error": {
    "message": "Model 'dall-e-3' does not support image editing. Supported models: gpt-image-1, gpt-image-1-mini, gpt-image-1.5, dall-e-2. Note: dall-e-3 does NOT support image editing.",
    "type": "invalid_request_error",
    "param": "model",
    "code": "unsupported_model"
  }
}

Notes:

  • Supported models: gpt-image-1, gpt-image-1-mini, gpt-image-1.5, dall-e-2
  • DALL-E 3 does NOT support image editing via API
  • Gemini (nano-banana) is NOT supported - uses different editing approach (semantic masking)
  • When no mask is provided, the entire image may be modified
  • The source image should have transparent regions if editing without a mask
  • Request timeout uses the image generation timeout configuration

Image Variations

Generate variations of an existing image using OpenAI's DALL-E 2 model.

POST /v1/images/variations
Content-Type: multipart/form-data

Form Fields:

Parameter Type Required Description
image file Yes Source image for variations (PNG, < 4MB, must be square)
model string No Model to use (default: dall-e-2)
n integer No Number of variations to generate (1-10, default: 1)
size string No Output size: 256x256, 512x512, 1024x1024 (default: 1024x1024)
response_format string No Response format: url or b64_json (default: url)
user string No User identifier for tracking

Example Request:

curl -X POST http://localhost:8080/v1/images/variations \
  -F "image=@source_image.png" \
  -F "model=dall-e-2" \
  -F "n=2" \
  -F "size=512x512" \
  -F "response_format=url"

Response:

{
  "created": 1677652288,
  "data": [
    {
      "url": "https://oaidalleapiprodscus.blob.core.windows.net/..."
    },
    {
      "url": "https://oaidalleapiprodscus.blob.core.windows.net/..."
    }
  ]
}

Response (with b64_json):

{
  "created": 1677652288,
  "data": [
    {
      "b64_json": "/9j/4AAQSkZJRgABAQAA..."
    }
  ]
}

Model Support:

Model Variations Support Notes
dall-e-2 Yes (native) Full support, 1-10 variations
dall-e-3 No Not supported by OpenAI API
gpt-image-1 No Not supported
nano-banana No Gemini does not support variations API
nano-banana-pro No Gemini does not support variations API

Image Requirements:

  • Format: PNG only
  • Size: Less than 4MB
  • Dimensions: Must be square (width == height)
  • Supported input sizes: Any square dimensions (will be processed by the model)

Error Scenarios:

Error Status Description
Image not PNG 400 Only PNG format is supported
Image not square 400 Image dimensions must be equal
Image too large 400 Image exceeds 4MB size limit
Model not supported 400 Requested model doesn't support variations
Missing image 400 Image field is required
Invalid n value 400 n must be between 1 and 10
Invalid size 400 Size must be one of the supported values

Status Codes:

  • 200: Variation(s) generated successfully
  • 400: Invalid request (invalid format, non-square image, unsupported model)
  • 401: Invalid API key
  • 429: Rate limit exceeded
  • 500: Backend error
  • 503: Backend unavailable

Text Completions

Generate text completions using the OpenAI Completions API format.

POST /v1/completions

Request Body:

{
  "model": "gpt-3.5-turbo-instruct",
  "prompt": "Once upon a time in a distant galaxy",
  "max_tokens": 100,
  "temperature": 0.7,
  "top_p": 1.0,
  "frequency_penalty": 0.0,
  "presence_penalty": 0.0,
  "stream": false,
  "stop": null,
  "logit_bias": {},
  "user": "user123"
}

Response:

{
  "id": "cmpl-123456789",
  "object": "text_completion",
  "created": 1677652288,
  "model": "gpt-3.5-turbo-instruct",
  "choices": [
    {
      "text": ", there lived a young explorer named Zara who dreamed of discovering new worlds...",
      "index": 0,
      "finish_reason": "stop",
      "logprobs": null
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 90,
    "total_tokens": 100
  }
}

Status Codes: Same as Chat Completions


Files API

The Files API allows you to upload, manage, and use files in chat completions. Uploaded files can be referenced in messages using the image_file content type, and the router automatically resolves these references by injecting the file content.

Upload File

Upload a file for use in chat completions.

POST /v1/files
Content-Type: multipart/form-data

Form Fields:

Field Type Required Description
file file Yes The file to upload
purpose string Yes Purpose of the file: vision, assistants, fine-tune, batch

Example:

curl -X POST http://localhost:8080/v1/files \
  -F "file=@image.png" \
  -F "purpose=vision"

Response:

{
  "id": "file-abc123def456",
  "object": "file",
  "bytes": 12345,
  "created_at": 1699061776,
  "filename": "image.png",
  "purpose": "vision"
}

Status Codes:

  • 200: File uploaded successfully
  • 400: Invalid request (missing file, invalid purpose)
  • 413: File too large (exceeds configured maxfilesize)

List Files

Retrieve a list of uploaded files.

GET /v1/files
GET /v1/files?purpose=vision

Query Parameters:

Parameter Type Required Description
purpose string No Filter by purpose

Response:

{
  "object": "list",
  "data": [
    {
      "id": "file-abc123def456",
      "object": "file",
      "bytes": 12345,
      "created_at": 1699061776,
      "filename": "image.png",
      "purpose": "vision"
    }
  ]
}


Get File Metadata

Retrieve metadata for a specific file.

GET /v1/files/{file_id}

Response:

{
  "id": "file-abc123def456",
  "object": "file",
  "bytes": 12345,
  "created_at": 1699061776,
  "filename": "image.png",
  "purpose": "vision"
}

Status Codes:

  • 200: File metadata retrieved
  • 404: File not found

Download File Content

Download the content of an uploaded file.

GET /v1/files/{file_id}/content

Response: Binary file content with appropriate Content-Type header.

Status Codes:

  • 200: File content returned
  • 404: File not found

Delete File

Delete an uploaded file.

DELETE /v1/files/{file_id}

Response:

{
  "id": "file-abc123def456",
  "object": "file",
  "deleted": true
}

Status Codes:

  • 200: File deleted successfully
  • 404: File not found

File Resolution in Chat Completions

The router automatically resolves file references in chat completion requests. When a message contains an image_file content block, the router:

  1. Validates the file ID format
  2. Loads the file content from storage
  3. Converts the file to a base64 data URL
  4. Replaces the image_file block with an image_url block

Request with File Reference:

{
  "model": "gpt-4-vision-preview",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What's in this image?"},
        {"type": "image_file", "image_file": {"file_id": "file-abc123def456"}}
      ]
    }
  ]
}

Transformed Request (sent to backend):

{
  "model": "gpt-4-vision-preview",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What's in this image?"},
        {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
      ]
    }
  ]
}

File Resolution Errors:

Error Status Description
Invalid file ID format 400 File ID must start with file-
File not found 404 Referenced file does not exist
Too many file references 400 Request contains more than 20 file references
Resolution timeout 504 File resolution took longer than 30 seconds

Supported MIME Types for Image Files:

  • image/png
  • image/jpeg
  • image/gif
  • image/webp

Admin Endpoints

Backend Status

Get detailed status information about all configured backends.

GET /admin/backends

Response:

{
  "backends": [
    {
      "name": "local-ollama",
      "url": "http://localhost:11434",
      "is_healthy": true,
      "consecutive_failures": 0,
      "consecutive_successes": 15,
      "last_check": "2024-01-15T10:30:45Z",
      "last_error": null,
      "response_time_ms": 45,
      "models": ["llama2", "mistral", "codellama"],
      "weight": 1,
      "total_requests": 150,
      "failed_requests": 2
    },
    {
      "name": "openai-compatible",
      "url": "https://api.openai.com",
      "is_healthy": false,
      "consecutive_failures": 3,
      "consecutive_successes": 0,
      "last_check": "2024-01-15T10:29:30Z",
      "last_error": "Connection timeout after 5s",
      "response_time_ms": null,
      "models": [],
      "weight": 1,
      "total_requests": 45,
      "failed_requests": 8
    }
  ],
  "healthy_count": 1,
  "total_count": 2,
  "summary": {
    "total_models": 3,
    "total_requests": 195,
    "total_failures": 10,
    "average_response_time_ms": 45
  }
}

Fields:

Field Type Description
name string Backend identifier from configuration
url string Backend base URL
is_healthy boolean Current health status
consecutive_failures integer Sequential failed health checks
consecutive_successes integer Sequential successful health checks
last_check string ISO timestamp of last health check
last_error string/null Last error message if unhealthy
response_time_ms integer/null Last health check response time
models array Available models from this backend
weight integer Load balancing weight
total_requests integer Total requests routed to this backend
failed_requests integer Failed requests to this backend

Status Codes:

  • 200: Backend status retrieved successfully

Service Health

Get overall service health and component status.

GET /admin/health

Response:

{
  "status": "healthy",
  "version": "1.0.0",
  "uptime": "2h 15m 30s",
  "timestamp": "2024-01-15T10:30:45Z",
  "services": {
    "backend_service": {
      "status": "healthy",
      "message": "All backends operational",
      "healthy_backends": 2,
      "total_backends": 2
    },
    "model_service": {
      "status": "healthy",
      "message": "Model cache operational",
      "cached_models": 15,
      "cache_hit_rate": 0.95,
      "last_refresh": "2024-01-15T10:25:00Z"
    },
    "proxy_service": {
      "status": "healthy",
      "message": "Request routing operational",
      "total_requests": 1250,
      "failed_requests": 12,
      "average_latency_ms": 85
    },
    "health_service": {
      "status": "healthy",
      "message": "Health monitoring active",
      "check_interval": "30s",
      "last_check": "2024-01-15T10:30:00Z"
    }
  },
  "metrics": {
    "requests_per_second": 5.2,
    "error_rate": 0.008,
    "memory_usage_mb": 125,
    "cpu_usage_percent": 15.5
  }
}

Status Values:

  • healthy: Service operating normally
  • degraded: Service operating with reduced functionality
  • unhealthy: Service experiencing issues

Status Codes:

  • 200: Service health retrieved successfully
  • 503: Service is unhealthy

Configuration Summary

Get current configuration summary including hot reload status.

GET /admin/config

Response:

{
  "server": {
    "bind_address": "0.0.0.0:8080",
    "workers": 4,
    "connection_pool_size": 100
  },
  "backends": {
    "count": 3,
    "names": ["openai", "local-ollama", "gemini"]
  },
  "health_checks": {
    "interval": "30s",
    "timeout": "10s",
    "unhealthy_threshold": 3,
    "healthy_threshold": 2
  },
  "rate_limiting": {
    "enabled": false
  },
  "circuit_breaker": {
    "enabled": true
  },
  "selection_strategy": "RoundRobin",
  "hot_reload": {
    "available": true,
    "note": "Configuration changes will be automatically detected and applied"
  }
}

Fields:

Field Type Description
server object Server configuration (bindaddress, workers, connectionpool_size)
backends object Backend configuration summary (count, names)
health_checks object Health check settings
rate_limiting object Rate limiting status
circuit_breaker object Circuit breaker status
selection_strategy string Current load balancing strategy
hot_reload object Hot reload availability and status

Status Codes:

  • 200: Configuration summary retrieved successfully

Note: Sensitive information (API keys, etc.) is automatically redacted from the response.


Hot Reload Status

Get detailed information about hot reload functionality and configuration item classification.

GET /admin/config/hot-reload-status

Response:

{
  "enabled": true,
  "description": "Hot reload is enabled. Configuration file changes are automatically detected and applied.",
  "capabilities": {
    "immediate_update": {
      "description": "Changes applied immediately without service interruption",
      "items": [
        "logging.level",
        "rate_limiting.*",
        "circuit_breaker.*",
        "retry.*",
        "global_prompts.*"
      ]
    },
    "gradual_update": {
      "description": "Existing connections maintained, new connections use new config",
      "items": [
        "backends.*",
        "health_checks.*",
        "timeouts.*"
      ]
    },
    "requires_restart": {
      "description": "Changes logged as warnings, restart required to take effect",
      "items": [
        "server.bind_address",
        "server.workers"
      ]
    }
  }
}

Fields:

Field Type Description
enabled boolean Whether hot reload is enabled
description string Human-readable description of hot reload status
capabilities object Configuration item classification by hot reload capability
capabilities.immediate_update object Items that update immediately without disruption
capabilities.gradual_update object Items that apply to new connections only
capabilities.requires_restart object Items that require server restart

Configuration Item Classification:

Immediate Update (no service interruption): - logging.level - Log level changes apply immediately - rate_limiting.* - Rate limiting settings update in real-time - circuit_breaker.* - Circuit breaker thresholds and timeouts - retry.* - Retry policies and backoff strategies - global_prompts.* - Global system prompt injection settings

Gradual Update (existing connections maintained): - backends.* - Backend add/remove/modify (new requests use updated pool) - health_checks.* - Health check intervals and thresholds - timeouts.* - Timeout values for new requests

Requires Restart (logged as warnings): - server.bind_address - TCP bind address - server.workers - Worker thread count

Status Codes:

  • 200: Hot reload status retrieved successfully

Example Usage:

# Check if hot reload is enabled
curl http://localhost:8080/admin/config/hot-reload-status | jq '.enabled'

# List items that support immediate update
curl http://localhost:8080/admin/config/hot-reload-status | jq '.capabilities.immediate_update.items'


Configuration Management API

The Configuration Management API enables viewing and modifying router configuration at runtime without requiring a server restart. This provides operational flexibility for adjusting behavior, adding backends, and fine-tuning settings in production environments.

Overview

Key capabilities: - Runtime Configuration: View and modify configuration without server restart - Hot Reload Support: Changes to supported settings apply immediately - Validation: Validate configuration changes before applying - History & Rollback: Track configuration changes and rollback to previous versions - Export/Import: Backup and restore configurations across environments - Security: Sensitive information (API keys, passwords, tokens) is automatically masked


Configuration Query APIs

Get Full Configuration

Returns the complete current configuration with sensitive information masked for security.

GET /admin/config/full

Response:

{
  "server": {
    "bind_address": "0.0.0.0:8080",
    "workers": 4,
    "connection_pool_size": 100
  },
  "backends": [
    {
      "name": "openai",
      "url": "https://api.openai.com",
      "api_key": "sk-****...**",
      "weight": 1,
      "models": ["gpt-4", "gpt-3.5-turbo"]
    },
    {
      "name": "local-ollama",
      "url": "http://localhost:11434",
      "weight": 1,
      "models": []
    }
  ],
  "health_checks": {
    "interval": "30s",
    "timeout": "10s",
    "unhealthy_threshold": 3,
    "healthy_threshold": 2
  },
  "logging": {
    "level": "info",
    "format": "json"
  },
  "retry": {
    "max_attempts": 3,
    "backoff": "exponential",
    "initial_delay_ms": 100
  },
  "timeouts": {
    "connect": "5s",
    "request": "60s"
  },
  "rate_limiting": {
    "enabled": false
  },
  "circuit_breaker": {
    "enabled": true,
    "failure_threshold": 5,
    "recovery_timeout": "30s"
  }
}

Notes:

  • API keys, passwords, and tokens are masked (e.g., sk-****...**)
  • All configuration sections are included in the response
  • Use /admin/config/{section} for individual section details

Status Codes:

  • 200: Configuration retrieved successfully

List Configuration Sections

Returns a list of all available configuration sections.

GET /admin/config/sections

Response:

{
  "sections": [
    "server",
    "backends",
    "health_checks",
    "logging",
    "retry",
    "timeouts",
    "rate_limiting",
    "circuit_breaker",
    "global_prompts",
    "admin",
    "fallback",
    "files",
    "api_keys",
    "metrics",
    "routing"
  ],
  "total": 15
}

Status Codes:

  • 200: Section list retrieved successfully

Get Configuration Section

Returns the configuration for a specific section with hot reload capability information.

GET /admin/config/{section}

Path Parameters:

Parameter Type Required Description
section string Yes Configuration section name

Example Request:

curl http://localhost:8080/admin/config/logging

Response:

{
  "section": "logging",
  "config": {
    "level": "info",
    "format": "json",
    "output": "stdout",
    "include_timestamps": true
  },
  "hot_reload_capability": "immediate_update",
  "description": "Changes to this section apply immediately without service interruption"
}

Hot Reload Capability Values:

Value Description
immediate_update Changes apply immediately without service interruption
gradual_update Existing connections maintained, new connections use new config
requires_restart Server restart required for changes to take effect

Status Codes:

  • 200: Section configuration retrieved successfully
  • 404: Invalid section name

Error Response:

{
  "error": {
    "message": "Configuration section 'invalid_section' not found",
    "type": "not_found",
    "code": 404,
    "details": {
      "requested_section": "invalid_section",
      "available_sections": ["server", "backends", "logging", "..."]
    }
  }
}


Get Configuration Schema

Returns the JSON Schema for configuration validation. Useful for client-side validation before submitting changes.

GET /admin/config/schema

Response:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "server": {
      "type": "object",
      "properties": {
        "bind_address": {
          "type": "string",
          "pattern": "^[0-9.]+:[0-9]+$",
          "description": "Server bind address in host:port format"
        },
        "workers": {
          "type": "integer",
          "minimum": 1,
          "maximum": 256,
          "description": "Number of worker threads"
        },
        "connection_pool_size": {
          "type": "integer",
          "minimum": 1,
          "maximum": 10000,
          "description": "HTTP connection pool size per backend"
        }
      },
      "required": ["bind_address"]
    },
    "backends": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "name": {
            "type": "string",
            "minLength": 1,
            "description": "Unique backend identifier"
          },
          "url": {
            "type": "string",
            "format": "uri",
            "description": "Backend base URL"
          },
          "weight": {
            "type": "integer",
            "minimum": 0,
            "maximum": 100,
            "default": 1,
            "description": "Load balancing weight"
          },
          "models": {
            "type": "array",
            "items": {"type": "string"},
            "description": "Explicit model list (optional)"
          }
        },
        "required": ["name", "url"]
      }
    },
    "logging": {
      "type": "object",
      "properties": {
        "level": {
          "type": "string",
          "enum": ["trace", "debug", "info", "warn", "error"],
          "description": "Log level"
        },
        "format": {
          "type": "string",
          "enum": ["json", "text", "pretty"],
          "description": "Log output format"
        }
      }
    }
  }
}

Status Codes:

  • 200: Schema retrieved successfully

Configuration Modification APIs

Replace Configuration Section

Replaces an entire configuration section. Triggers validation and hot reload if applicable.

PUT /admin/config/{section}

Path Parameters:

Parameter Type Required Description
section string Yes Configuration section name

Request Body: Complete section configuration object.

Example Request:

curl -X PUT http://localhost:8080/admin/config/logging \
  -H "Content-Type: application/json" \
  -d '{
    "level": "debug",
    "format": "json",
    "output": "stdout",
    "include_timestamps": true
  }'

Response:

{
  "success": true,
  "section": "logging",
  "hot_reload_applied": true,
  "message": "Configuration updated and applied immediately",
  "previous": {
    "level": "info",
    "format": "json",
    "output": "stdout",
    "include_timestamps": true
  },
  "current": {
    "level": "debug",
    "format": "json",
    "output": "stdout",
    "include_timestamps": true
  },
  "version": 15
}

Status Codes:

  • 200: Configuration updated successfully
  • 400: Invalid configuration format or validation error
  • 404: Invalid section name

Partial Update Configuration Section

Performs a partial update using JSON merge patch semantics. Only specified fields are updated; unspecified fields retain their current values.

PATCH /admin/config/{section}

Path Parameters:

Parameter Type Required Description
section string Yes Configuration section name

Request Body: Partial configuration object with fields to update.

Example Request:

curl -X PATCH http://localhost:8080/admin/config/logging \
  -H "Content-Type: application/json" \
  -d '{
    "level": "warn"
  }'

Response:

{
  "success": true,
  "section": "logging",
  "hot_reload_applied": true,
  "message": "Configuration partially updated and applied",
  "changes": {
    "level": {
      "from": "info",
      "to": "warn"
    }
  },
  "current": {
    "level": "warn",
    "format": "json",
    "output": "stdout",
    "include_timestamps": true
  },
  "version": 16
}

Merge Behavior:

  • Scalar values are replaced
  • Objects are merged recursively
  • Arrays are replaced entirely (not merged)
  • null values remove the field (if optional)

Status Codes:

  • 200: Configuration updated successfully
  • 400: Invalid configuration format or validation error
  • 404: Invalid section name

Validate Configuration

Validates configuration without applying changes. Supports dry_run mode for testing configuration changes safely.

POST /admin/config/validate

Request Body:

{
  "section": "backends",
  "config": {
    "name": "new-backend",
    "url": "http://localhost:8000",
    "weight": 2
  },
  "dry_run": true
}

Parameters:

Parameter Type Required Description
section string Yes Configuration section to validate
config object Yes Configuration to validate
dry_run boolean No If true, only validate without preparing for apply (default: true)

Response (Valid):

{
  "valid": true,
  "section": "backends",
  "warnings": [
    "Backend 'new-backend' has no explicit model list; models will be auto-discovered"
  ],
  "info": {
    "hot_reload_capability": "gradual_update",
    "estimated_impact": "New requests may be routed to this backend after apply"
  }
}

Response (Invalid):

{
  "valid": false,
  "section": "backends",
  "errors": [
    {
      "field": "url",
      "message": "Invalid URL format: missing scheme",
      "value": "localhost:8000"
    },
    {
      "field": "weight",
      "message": "Weight must be between 0 and 100",
      "value": 150
    }
  ],
  "warnings": []
}

Status Codes:

  • 200: Validation completed (check valid field for result)
  • 400: Invalid request format

Apply Pending Changes

Applies pending configuration changes immediately. Triggers hot reload for applicable settings.

POST /admin/config/apply

Request Body (optional):

{
  "sections": ["logging", "rate_limiting"],
  "force": false
}

Parameters:

Parameter Type Required Description
sections array No Specific sections to apply (default: all pending)
force boolean No Force apply even if warnings exist (default: false)

Response:

{
  "success": true,
  "applied_sections": ["logging", "rate_limiting"],
  "results": {
    "logging": {
      "status": "applied",
      "hot_reload": "immediate_update"
    },
    "rate_limiting": {
      "status": "applied",
      "hot_reload": "immediate_update"
    }
  },
  "version": 17,
  "timestamp": "2024-01-15T10:45:30Z"
}

Status Codes:

  • 200: Changes applied successfully
  • 400: No pending changes or validation errors
  • 409: Conflict with concurrent modification

Configuration Save/Restore APIs

Export Configuration

Exports the current configuration in the specified format.

POST /admin/config/export

Request Body:

{
  "format": "yaml",
  "include_sensitive": false,
  "sections": ["server", "backends", "logging"]
}

Parameters:

Parameter Type Required Description
format string No Export format: yaml, json, or toml (default: yaml)
include_sensitive boolean No Include sensitive data unmasked (requires elevated permissions, default: false)
sections array No Specific sections to export (default: all)

Response (format: json):

{
  "format": "json",
  "content": "{\"server\":{\"bind_address\":\"0.0.0.0:8080\",...}}",
  "sections_exported": ["server", "backends", "logging"],
  "exported_at": "2024-01-15T10:45:30Z",
  "version": 17,
  "checksum": "sha256:a1b2c3d4..."
}

Response (format: yaml):

{
  "format": "yaml",
  "content": "server:\n  bind_address: \"0.0.0.0:8080\"\n  workers: 4\n...",
  "sections_exported": ["server", "backends", "logging"],
  "exported_at": "2024-01-15T10:45:30Z",
  "version": 17,
  "checksum": "sha256:a1b2c3d4..."
}

Status Codes:

  • 200: Export successful
  • 400: Invalid format specified
  • 403: Elevated permissions required for include_sensitive: true

Import Configuration

Imports configuration from the provided content.

POST /admin/config/import

Request Body:

{
  "format": "yaml",
  "content": "server:\n  bind_address: \"0.0.0.0:8080\"\n  workers: 8\nlogging:\n  level: debug",
  "dry_run": true,
  "merge": false
}

Parameters:

Parameter Type Required Description
format string Yes Content format: yaml, json, or toml
content string Yes Configuration content to import
dry_run boolean No Validate without applying (default: false)
merge boolean No Merge with existing config vs replace (default: false)

Response (dry_run: true):

{
  "valid": true,
  "dry_run": true,
  "changes_preview": {
    "server": {
      "workers": {"from": 4, "to": 8}
    },
    "logging": {
      "level": {"from": "info", "to": "debug"}
    }
  },
  "sections_affected": ["server", "logging"],
  "warnings": [
    "server.workers change requires restart to take effect"
  ]
}

Response (dry_run: false):

{
  "success": true,
  "imported_sections": ["server", "logging"],
  "hot_reload_results": {
    "logging": "applied_immediately",
    "server": "requires_restart"
  },
  "version": 18,
  "timestamp": "2024-01-15T10:50:00Z"
}

Status Codes:

  • 200: Import successful (or dry_run validation passed)
  • 400: Invalid format or content parsing error
  • 422: Configuration validation failed

Get Configuration History

Retrieves the history of configuration changes.

GET /admin/config/history

Query Parameters:

Parameter Type Required Description
limit integer No Maximum entries to return (default: 20, max: 100)
offset integer No Number of entries to skip (default: 0)
section string No Filter by section name

Example Request:

curl "http://localhost:8080/admin/config/history?limit=10&section=logging"

Response:

{
  "history": [
    {
      "version": 18,
      "timestamp": "2024-01-15T10:50:00Z",
      "sections_changed": ["logging"],
      "source": "api",
      "user": "admin",
      "changes": {
        "logging": {
          "level": {"from": "info", "to": "debug"}
        }
      }
    },
    {
      "version": 17,
      "timestamp": "2024-01-15T09:30:00Z",
      "sections_changed": ["backends"],
      "source": "file_reload",
      "user": null,
      "changes": {
        "backends": {
          "added": ["new-backend"],
          "modified": [],
          "removed": []
        }
      }
    },
    {
      "version": 16,
      "timestamp": "2024-01-14T15:20:00Z",
      "sections_changed": ["rate_limiting"],
      "source": "api",
      "user": "admin",
      "changes": {
        "rate_limiting": {
          "enabled": {"from": false, "to": true}
        }
      }
    }
  ],
  "total": 18,
  "limit": 10,
  "offset": 0
}

Source Values:

Source Description
api Changed via Configuration Management API
file_reload Changed via configuration file hot reload
startup Initial configuration at server startup
rollback Restored from previous version

Status Codes:

  • 200: History retrieved successfully

Rollback Configuration

Rolls back to a previous configuration version.

POST /admin/config/rollback/{version}

Path Parameters:

Parameter Type Required Description
version integer Yes Version number to rollback to

Request Body (optional):

{
  "dry_run": false,
  "sections": ["logging", "backends"]
}

Parameters:

Parameter Type Required Description
dry_run boolean No Preview changes without applying (default: false)
sections array No Specific sections to rollback (default: all changed sections)

Response:

{
  "success": true,
  "rolled_back_from": 18,
  "rolled_back_to": 15,
  "sections_restored": ["logging", "backends"],
  "changes": {
    "logging": {
      "level": {"from": "debug", "to": "info"}
    },
    "backends": {
      "removed": ["new-backend"]
    }
  },
  "new_version": 19,
  "timestamp": "2024-01-15T11:00:00Z"
}

Status Codes:

  • 200: Rollback successful
  • 400: Validation error for target configuration
  • 404: Version not found in history

Backend Management APIs

These endpoints provide convenient shortcuts for managing backends without modifying the full backends configuration section.

Add Backend

Dynamically adds a new backend to the router.

POST /admin/backends

Request Body:

{
  "name": "new-ollama",
  "url": "http://192.168.1.100:11434",
  "weight": 2,
  "models": ["llama2", "codellama"],
  "api_key": null,
  "health_check_path": "/api/tags"
}

Parameters:

Parameter Type Required Description
name string Yes Unique backend identifier
url string Yes Backend base URL
weight integer No Load balancing weight (default: 1)
models array No Explicit model list (empty for auto-discovery)
api_key string No API key for authentication
health_check_path string No Custom health check endpoint

Response:

{
  "success": true,
  "backend": {
    "name": "new-ollama",
    "url": "http://192.168.1.100:11434",
    "weight": 2,
    "models": ["llama2", "codellama"],
    "is_healthy": null,
    "status": "pending_health_check"
  },
  "message": "Backend added successfully. Health check scheduled.",
  "config_version": 20
}

Status Codes:

  • 200: Backend added successfully
  • 400: Invalid backend configuration
  • 409: Backend with this name already exists

Get Backend Configuration

Retrieves the configuration for a specific backend.

GET /admin/backends/{name}

Path Parameters:

Parameter Type Required Description
name string Yes Backend identifier

Response:

{
  "name": "local-ollama",
  "url": "http://localhost:11434",
  "weight": 1,
  "models": ["llama2", "mistral", "codellama"],
  "api_key": null,
  "health_check_path": "/api/tags",
  "is_healthy": true,
  "consecutive_failures": 0,
  "consecutive_successes": 25,
  "last_check": "2024-01-15T10:55:00Z",
  "total_requests": 1250,
  "failed_requests": 3
}

Status Codes:

  • 200: Backend configuration retrieved
  • 404: Backend not found

Update Backend Configuration

Updates the configuration for an existing backend.

PUT /admin/backends/{name}

Path Parameters:

Parameter Type Required Description
name string Yes Backend identifier

Request Body:

{
  "url": "http://localhost:11434",
  "weight": 3,
  "models": ["llama2", "mistral", "codellama", "phi"],
  "api_key": null
}

Response:

{
  "success": true,
  "backend": {
    "name": "local-ollama",
    "url": "http://localhost:11434",
    "weight": 3,
    "models": ["llama2", "mistral", "codellama", "phi"]
  },
  "changes": {
    "weight": {"from": 1, "to": 3},
    "models": {"added": ["phi"], "removed": []}
  },
  "config_version": 21
}

Status Codes:

  • 200: Backend updated successfully
  • 400: Invalid configuration
  • 404: Backend not found

Delete Backend

Removes a backend from the router.

DELETE /admin/backends/{name}

Path Parameters:

Parameter Type Required Description
name string Yes Backend identifier

Query Parameters:

Parameter Type Required Description
drain boolean No Wait for active requests to complete (default: true)
timeout integer No Drain timeout in seconds (default: 30)

Example Request:

curl -X DELETE "http://localhost:8080/admin/backends/old-backend?drain=true&timeout=60"

Response:

{
  "success": true,
  "deleted_backend": "old-backend",
  "drained": true,
  "active_requests_completed": 5,
  "config_version": 22,
  "message": "Backend removed from rotation"
}

Status Codes:

  • 200: Backend deleted successfully
  • 404: Backend not found
  • 409: Cannot delete last remaining backend

Update Backend Weight

Updates only the load balancing weight for a backend.

PUT /admin/backends/{name}/weight

Path Parameters:

Parameter Type Required Description
name string Yes Backend identifier

Request Body:

{
  "weight": 5
}

Response:

{
  "success": true,
  "backend": "local-ollama",
  "weight": {
    "from": 1,
    "to": 5
  },
  "config_version": 23
}

Status Codes:

  • 200: Weight updated successfully
  • 400: Invalid weight value
  • 404: Backend not found

Update Backend Models

Updates only the model list for a backend.

PUT /admin/backends/{name}/models

Path Parameters:

Parameter Type Required Description
name string Yes Backend identifier

Request Body:

{
  "models": ["llama2", "mistral", "codellama", "phi", "gemma"],
  "mode": "replace"
}

Parameters:

Parameter Type Required Description
models array Yes Model list
mode string No Update mode: replace, add, or remove (default: replace)

Response:

{
  "success": true,
  "backend": "local-ollama",
  "models": {
    "previous": ["llama2", "mistral", "codellama"],
    "current": ["llama2", "mistral", "codellama", "phi", "gemma"],
    "added": ["phi", "gemma"],
    "removed": []
  },
  "config_version": 24
}

Status Codes:

  • 200: Models updated successfully
  • 400: Invalid model list
  • 404: Backend not found

Configuration API Examples

Get Full Configuration

curl http://localhost:8080/admin/config/full | jq

Update Logging Level

curl -X PATCH http://localhost:8080/admin/config/logging \
  -H "Content-Type: application/json" \
  -d '{"level": "debug"}'

Add a New Backend

curl -X POST http://localhost:8080/admin/backends \
  -H "Content-Type: application/json" \
  -d '{
    "name": "remote-ollama",
    "url": "http://192.168.1.50:11434",
    "weight": 2,
    "models": ["llama2", "mistral"]
  }'

Export Configuration as JSON

curl -X POST http://localhost:8080/admin/config/export \
  -H "Content-Type: application/json" \
  -d '{"format": "json"}' | jq -r '.content' > config-backup.json

View Configuration History

curl "http://localhost:8080/admin/config/history?limit=5" | jq '.history'

Configuration API Error Responses

All Configuration Management API errors follow the standard error format:

{
  "error": {
    "message": "Human-readable error description",
    "type": "error_type_identifier",
    "code": 400,
    "details": {
      "additional": "context information"
    }
  }
}

Configuration-Specific Error Types:

Type HTTP Code Description
config_validation_error 400 Configuration validation failed
config_section_not_found 404 Requested configuration section does not exist
config_version_not_found 404 Requested version not found in history
config_conflict 409 Concurrent modification conflict
config_permission_denied 403 Insufficient permissions for operation
config_parse_error 422 Failed to parse configuration content

Example Validation Error:

{
  "error": {
    "message": "Configuration validation failed",
    "type": "config_validation_error",
    "code": 400,
    "details": {
      "section": "backends",
      "errors": [
        {
          "field": "url",
          "message": "URL must include scheme (http:// or https://)",
          "value": "localhost:8000"
        }
      ]
    }
  }
}

Example Conflict Error:

{
  "error": {
    "message": "Configuration was modified by another request",
    "type": "config_conflict",
    "code": 409,
    "details": {
      "expected_version": 15,
      "current_version": 16,
      "conflicting_sections": ["backends"]
    }
  }
}


Error Handling

Error Response Format

All errors follow a consistent JSON structure:

{
  "error": {
    "message": "Human-readable error description",
    "type": "error_type_identifier",
    "code": 404,
    "details": {
      "additional": "context information"
    }
  }
}

Error Types

Type HTTP Code Description
bad_request 400 Invalid request format or parameters
unauthorized 401 Authentication required (future feature)
forbidden 403 Access denied (future feature)
model_not_found 404 Requested model not available
rate_limit_exceeded 429 Rate limit exceeded (future feature)
internal_error 500 Router internal error
bad_gateway 502 Backend connection/response error
service_unavailable 503 All backends unhealthy
gateway_timeout 504 Backend request timeout

Example Error Responses

Model Not Found:

{
  "error": {
    "message": "Model 'invalid-model' not found on any healthy backend",
    "type": "model_not_found", 
    "code": 404,
    "details": {
      "requested_model": "invalid-model",
      "available_models": ["gpt-4", "gpt-3.5-turbo", "llama2"]
    }
  }
}

Backend Error:

{
  "error": {
    "message": "Failed to connect to backend 'local-ollama'",
    "type": "bad_gateway",
    "code": 502,
    "details": {
      "backend": "local-ollama", 
      "backend_error": "Connection refused"
    }
  }
}

Service Unavailable:

{
  "error": {
    "message": "All backends are currently unhealthy",
    "type": "service_unavailable",
    "code": 503,
    "details": {
      "healthy_backends": 0,
      "total_backends": 3
    }
  }
}

Rate Limiting

Note: Rate limiting is not currently implemented but is planned for future releases.

Future rate limiting will support: - Per-IP rate limiting - Per-API-key rate limiting
- Model-specific rate limiting - Sliding window algorithms - Rate limit headers in responses

Streaming

Server-Sent Events (SSE)

When stream: true is specified, responses are sent as Server-Sent Events with:

Content-Type: text/event-stream Cache-Control: no-cache Connection: keep-alive

SSE Format

data: {"id":"chatcmpl-123","object":"chat.completion.chunk",...}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk",...}

data: [DONE]

SSE Compatibility

The router supports multiple SSE formats for maximum compatibility:

  • Standard Format: data: {...}
  • Spaced Format: data: {...}
  • Mixed Line Endings: Handles \r\n, \n, and \r
  • Empty Lines: Properly processes chunk separators

Connection Management

  • Keep-Alive: Connections are kept open during streaming
  • Timeouts: 5-minute timeout for long-running requests
  • Error Handling: Partial responses include error information
  • Client Disconnection: Gracefully handles client disconnects

Examples

Basic Chat Completion

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

Streaming Chat Completion

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {"role": "user", "content": "Write a short story"}
    ],
    "stream": true,
    "max_tokens": 200
  }'

Text Completion with Parameters

curl -X POST http://localhost:8080/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-3.5-turbo-instruct",
    "prompt": "The future of AI is",
    "max_tokens": 50,
    "temperature": 0.8,
    "top_p": 0.9
  }'

Check Backend Status

curl http://localhost:8080/admin/backends | jq

Monitor Service Health

curl http://localhost:8080/admin/health | jq '.services'

List Available Models

curl http://localhost:8080/v1/models | jq '.data[].id'

Python Client Example

import requests
import json

# Configure the client
BASE_URL = "http://localhost:8080"

def chat_completion(messages, model="gpt-3.5-turbo", stream=False):
    """Send a chat completion request"""
    response = requests.post(
        f"{BASE_URL}/v1/chat/completions",
        headers={"Content-Type": "application/json"},
        json={
            "model": model,
            "messages": messages,
            "stream": stream,
            "temperature": 0.7
        },
        stream=stream
    )

    if stream:
        # Handle streaming response
        for line in response.iter_lines():
            if line:
                line = line.decode('utf-8')
                if line.startswith('data: '):
                    data = line[6:]  # Remove 'data: ' prefix
                    if data == '[DONE]':
                        break
                    try:
                        chunk = json.loads(data)
                        content = chunk['choices'][0]['delta'].get('content', '')
                        if content:
                            print(content, end='', flush=True)
                    except json.JSONDecodeError:
                        continue
        print()  # New line after streaming
    else:
        # Handle non-streaming response
        result = response.json()
        return result['choices'][0]['message']['content']

# Example usage
messages = [
    {"role": "user", "content": "Explain machine learning in simple terms"}
]

print("Streaming response:")
chat_completion(messages, stream=True)

print("\nNon-streaming response:")
response = chat_completion(messages, stream=False)
print(response)

JavaScript/Node.js Client Example

const fetch = require('node-fetch');

const BASE_URL = 'http://localhost:8080';

async function chatCompletion(messages, options = {}) {
    const response = await fetch(`${BASE_URL}/v1/chat/completions`, {
        method: 'POST',
        headers: {
            'Content-Type': 'application/json',
        },
        body: JSON.stringify({
            model: options.model || 'gpt-3.5-turbo',
            messages: messages,
            stream: options.stream || false,
            temperature: options.temperature || 0.7,
            ...options
        })
    });

    if (options.stream) {
        // Handle streaming response
        const reader = response.body.getReader();
        const decoder = new TextDecoder();

        while (true) {
            const { done, value } = await reader.read();
            if (done) break;

            const chunk = decoder.decode(value);
            const lines = chunk.split('\n');

            for (const line of lines) {
                if (line.startsWith('data: ')) {
                    const data = line.slice(6);
                    if (data === '[DONE]') return;

                    try {
                        const parsed = JSON.parse(data);
                        const content = parsed.choices[0]?.delta?.content;
                        if (content) {
                            process.stdout.write(content);
                        }
                    } catch (e) {
                        // Ignore JSON parse errors
                    }
                }
            }
        }
        console.log(); // New line
    } else {
        const result = await response.json();
        return result.choices[0].message.content;
    }
}

// Example usage
const messages = [
    { role: 'user', content: 'What is the meaning of life?' }
];

// Streaming
console.log('Streaming response:');
await chatCompletion(messages, { stream: true });

// Non-streaming  
console.log('\nNon-streaming response:');
const response = await chatCompletion(messages);
console.log(response);

This API reference provides comprehensive documentation for integrating with Continuum Router. The router maintains full OpenAI API compatibility while adding powerful multi-backend routing and management capabilities.