API Reference¶
Continuum Router provides a comprehensive OpenAI-compatible API with additional administrative endpoints for monitoring and management. This document provides detailed information about all available endpoints, request/response formats, and error handling.
Table of Contents¶
- Overview
- Authentication
- Core API Endpoints
- Admin Endpoints
- Configuration Management API
- Error Handling
- Rate Limiting
- Streaming
- Examples
Overview¶
Base URL¶
Content Type¶
All requests and responses use application/json unless otherwise specified.
OpenAI Compatibility¶
Continuum Router is fully compatible with OpenAI API v1, supporting: - Chat completions with streaming - Text completions - Image generation (DALL-E, gpt-image-1) - Image editing/inpainting (DALL-E 2, gpt-image-1) - Image variations (DALL-E 2) - Files API (upload, list, retrieve, delete) - File resolution in chat completions (image_file references) - Model listing - Error response formats
Authentication¶
Continuum Router supports API key authentication with configurable enforcement modes.
Authentication Modes¶
The router supports two authentication modes for API endpoints:
| Mode | Behavior |
|---|---|
permissive (default) | Requests without API key are allowed. Requests with valid API keys are authenticated and can access user-specific features. |
blocking | Only authenticated requests are processed. Requests without valid API key receive 401 Unauthorized. |
Configuration¶
api_keys:
# Authentication mode: "permissive" (default) or "blocking"
mode: blocking
# API key definitions
api_keys:
- key: "${API_KEY_1}"
id: "key-production-1"
user_id: "user-admin"
organization_id: "org-main"
scopes: [read, write, files, admin]
Protected Endpoints (when mode is blocking)¶
/v1/chat/completions/v1/completions/v1/responses/v1/images/generations/v1/images/edits/v1/images/variations/v1/models
Note: Health endpoints (/health, /healthz) are always accessible without authentication. Admin, Files, and Metrics endpoints have separate authentication mechanisms.
Making Authenticated Requests¶
Include the API key in the Authorization header:
POST /v1/chat/completions HTTP/1.1
Authorization: Bearer sk-your-api-key
Content-Type: application/json
{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello"}]
}
Authentication Errors¶
When authentication fails, the API returns:
{
"error": {
"message": "Missing or invalid Authorization header. Expected: Bearer <api_key>",
"type": "authentication_error",
"code": "invalid_api_key"
}
}
Status Codes:
401 Unauthorized: Missing or invalid API key
Core API Endpoints¶
Health Check¶
Check the health status of the router service.
Response:
Status Codes:
200: Service is healthy
List Models¶
Retrieve all available models from all healthy backends.
Response:
{
"object": "list",
"data": [
{
"id": "gpt-4",
"object": "model",
"created": 1677610602,
"owned_by": "openai-compatible",
"permission": [],
"root": "gpt-4",
"parent": null
},
{
"id": "llama2:7b",
"object": "model",
"created": 1677610602,
"owned_by": "local-ollama",
"permission": [],
"root": "llama2:7b",
"parent": null
}
]
}
Status Codes:
200: Models retrieved successfully503: All backends are unhealthy
Features:
- Model Aggregation: Combines models from all healthy backends
- Deduplication: Removes duplicate models across backends
- Caching: Results cached for 5 minutes by default
- Health Awareness: Only includes models from healthy backends
Chat Completions¶
Generate chat completions using the OpenAI Chat API format.
Request Body:
{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Explain quantum computing in simple terms."
}
],
"temperature": 0.7,
"max_tokens": 150,
"top_p": 1.0,
"frequency_penalty": 0.0,
"presence_penalty": 0.0,
"stream": false,
"stop": null,
"logit_bias": {},
"user": "user123"
}
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model identifier (must be available on at least one healthy backend) |
messages | array | Yes | Array of message objects with role and content |
temperature | number | No | Sampling temperature (0.0 to 2.0, default: 1.0) |
max_tokens | integer | No | Maximum tokens to generate |
top_p | number | No | Nucleus sampling parameter (0.0 to 1.0) |
frequency_penalty | number | No | Frequency penalty (-2.0 to 2.0) |
presence_penalty | number | No | Presence penalty (-2.0 to 2.0) |
stream | boolean | No | Enable streaming response (default: false) |
stop | string/array | No | Stop sequences |
logit_bias | object | No | Token logit bias |
user | string | No | User identifier for tracking |
Response (Non-streaming):
{
"id": "chatcmpl-123456789",
"object": "chat.completion",
"created": 1677652288,
"model": "gpt-3.5-turbo",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing is a revolutionary computing paradigm that harnesses quantum mechanical phenomena..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 150,
"total_tokens": 175
}
}
Response (Streaming): When stream: true, the response uses Server-Sent Events (SSE):
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-3.5-turbo","choices":[{"delta":{"role":"assistant","content":""},"index":0,"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-3.5-turbo","choices":[{"delta":{"content":"Quantum"},"index":0,"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-3.5-turbo","choices":[{"delta":{"content":" computing"},"index":0,"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-3.5-turbo","choices":[{"delta":{},"index":0,"finish_reason":"stop"}]}
data: [DONE]
Status Codes:
200: Completion generated successfully400: Invalid request format or parameters404: Model not found on any healthy backend502: Backend connection error504: Request timeout503: All backends unhealthy
Features:
- Model-Based Routing: Automatically routes to backends serving the requested model
- Load Balancing: Distributes load across healthy backends
- Streaming Support: Real-time response streaming via SSE
- Error Recovery: Automatic retry on transient failures
- Request Deduplication: Prevents duplicate processing of identical requests
Image Generation¶
Generate images using OpenAI's DALL-E, GPT Image models, or Google's Nano Banana (Gemini) models.
Request Body:
{
"model": "dall-e-3",
"prompt": "A serene Japanese garden with cherry blossoms",
"n": 1,
"size": "1024x1024",
"quality": "standard",
"response_format": "url"
}
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Image model: dall-e-2, dall-e-3, gpt-image-1, gpt-image-1.5, gpt-image-1-mini, nano-banana, or nano-banana-pro |
prompt | string | Yes | Description of the image to generate |
n | integer | No | Number of images (1-10, varies by model) |
size | string | No | Image size (varies by model, see below) |
quality | string | No | Image quality (varies by model, see below) |
style | string | No | Image style: vivid or natural (DALL-E 3 only) |
response_format | string | No | Response format: url or b64_json |
output_format | string | No | Output file format: png, jpeg, webp (GPT Image models only, default: png) |
output_compression | integer | No | Compression level 0-100 for jpeg/webp (GPT Image models only) |
background | string | No | Background: transparent, opaque, auto (GPT Image models only) |
stream | boolean | No | Enable streaming for partial images (GPT Image models only, default: false) |
partial_images | integer | No | Number of partial images 0-3 during streaming (GPT Image models only) |
user | string | No | User identifier for tracking |
Model-specific constraints:
| Model | Sizes | n | Quality | Notes |
|---|---|---|---|---|
dall-e-2 | 256x256, 512x512, 1024x1024 | 1-10 | N/A | Classic DALL-E 2 |
dall-e-3 | 1024x1024, 1792x1024, 1024x1792 | 1 | standard, hd | High quality with prompt revision |
gpt-image-1 | 1024x1024, 1536x1024, 1024x1536, auto | 1 | low, medium, high, auto | Latest GPT Image model, supports streaming |
gpt-image-1.5 | 1024x1024, 1536x1024, 1024x1536, auto | 1 | low, medium, high, auto | 4x faster, better text rendering |
gpt-image-1-mini | 1024x1024, 1536x1024, 1024x1536, auto | 1 | low, medium, high, auto | Cost-effective option |
nano-banana | 256x256 to 1024x1024 | 1-4 | N/A | Gemini 2.5 Flash Image (fast) |
nano-banana-pro | 256x256 to 4096x4096 | 1-4 | N/A | Gemini 2.0 Flash Image (advanced, up to 4K) |
Quality Parameter (GPT Image Models):
For backward compatibility, standard maps to medium and hd maps to high when using GPT Image models.
| Quality | Description |
|---|---|
low | Fast generation with lower quality |
medium | Balanced quality and speed (default) |
high | Best quality, slower generation |
auto | Model selects optimal quality |
Output Format Options (GPT Image Models):
| Format | Description | Supports Transparency |
|---|---|---|
png | Lossless format (default) | Yes |
jpeg | Lossy format, smaller file size | No |
webp | Modern format, good compression | Yes |
Note: Transparent background (
background: "transparent") requirespngorwebpformat.
Nano Banana (Gemini) Models:
Nano Banana provides access to Google's Gemini image generation capabilities through an OpenAI-compatible interface:
nano-banana: Maps to Gemini 2.5 Flash Image - fast, general-purpose image generationnano-banana-pro: Maps to Gemini 2.0 Flash Image - advanced model with high-resolution support (up to 4K)
Nano Banana Size Mapping:
The router automatically converts OpenAI-style size parameters to Gemini's aspectRatio and imageSize format:
| OpenAI Size | Gemini aspectRatio | Gemini imageSize | Notes |
|---|---|---|---|
256x256 | 1:1 | 1K | Falls back to Gemini minimum |
512x512 | 1:1 | 1K | Falls back to Gemini minimum |
1024x1024 | 1:1 | 1K | Default |
1536x1024 | 3:2 | 1K | Landscape (new) |
1024x1536 | 2:3 | 1K | Portrait (new) |
1024x1792 | 9:16 | 1K | Tall portrait |
1792x1024 | 16:9 | 1K | Wide landscape |
2048x2048 | 1:1 | 2K | Pro only |
4096x4096 | 1:1 | 4K | Pro only |
auto | 1:1 | 1K | Default fallback |
The conversion sends the following Gemini API structure:
{
"contents": [{"parts": [{"text": "Your prompt"}]}],
"generationConfig": {
"imageConfig": {
"aspectRatio": "3:2",
"imageSize": "1K"
}
}
}
Example Nano Banana Request:
{
"model": "nano-banana",
"prompt": "A white siamese cat with blue eyes, photorealistic",
"n": 1,
"size": "1024x1024",
"response_format": "b64_json"
}
Response:
{
"created": 1677652288,
"data": [
{
"url": "https://oaidalleapiprodscus.blob.core.windows.net/...",
"revised_prompt": "A tranquil Japanese garden featuring..."
}
]
}
Response (with b64_json):
{
"created": 1677652288,
"data": [
{
"b64_json": "/9j/4AAQSkZJRgABAQAA...",
"revised_prompt": "A tranquil Japanese garden featuring..."
}
]
}
Nano Banana Response Notes:
- When using
response_format: "url"with Nano Banana, the image is returned as a data URL (data:image/png;base64,...) since Gemini's native API returns inline base64 data - The
revised_promptfield contains any text response from Gemini describing the generated image
Streaming Image Generation (GPT Image Models):
When stream: true is specified for GPT Image models, the response will be streamed as Server-Sent Events (SSE):
Example Streaming Request:
{
"model": "gpt-image-1",
"prompt": "A beautiful sunset over mountains",
"stream": true,
"partial_images": 2,
"response_format": "b64_json"
}
Streaming Response Format:
data: {"type":"image_generation.partial_image","partial_image_index":0,"b64_json":"...","created":1702345678}
data: {"type":"image_generation.partial_image","partial_image_index":1,"b64_json":"...","created":1702345679}
data: {"type":"image_generation.complete","b64_json":"...","created":1702345680}
data: {"type":"image_generation.usage","usage":{"input_tokens":25,"output_tokens":1024}}
data: {"type":"done"}
SSE Event Types:
| Event Type | Description |
|---|---|
image_generation.partial_image | Intermediate image during generation |
image_generation.complete | Final complete image |
image_generation.usage | Token usage information (for cost tracking) |
done | Stream completion marker |
Example GPT Image Request with New Options:
{
"model": "gpt-image-1.5",
"prompt": "A white cat with blue eyes, photorealistic",
"size": "auto",
"quality": "high",
"output_format": "webp",
"output_compression": 85,
"background": "transparent",
"response_format": "b64_json"
}
Status Codes:
200: Image(s) generated successfully400: Invalid request (e.g., invalid size for model, n > 1 for DALL-E 3)401: Invalid API key429: Rate limit exceeded500: Backend error503: Gemini backend unavailable (for Nano Banana models)
Timeout Configuration: Image generation requests use a configurable timeout (default: 3 minutes). See timeouts.request.image_generation in configuration.
Image Edit (Inpainting)¶
Edit existing images using OpenAI's inpainting capabilities. This endpoint allows you to modify specific regions of an image based on a text prompt and optional mask. Supports GPT Image models and DALL-E 2.
Request Parameters (multipart/form-data):
| Parameter | Type | Required | Description |
|---|---|---|---|
image | file | Yes | The source image to edit (PNG, < 4MB, square) |
prompt | string | Yes | Description of the desired edit |
mask | file | No | Mask image indicating edit regions (PNG, same dimensions as image) |
model | string | No | Model to use (default: gpt-image-1) |
n | integer | No | Number of images to generate (1-10, default: 1) |
size | string | No | Output size (model-dependent, default: 1024x1024) |
response_format | string | No | Response format: url or b64_json (default: url) |
user | string | No | Unique user identifier for tracking |
Supported Models and Sizes:
| Model | Sizes | Notes |
|---|---|---|
gpt-image-1 | 1024x1024, 1536x1024, 1024x1536, auto | Latest GPT Image model (recommended) |
gpt-image-1-mini | 1024x1024, 1536x1024, 1024x1536, auto | Cost-optimized version |
gpt-image-1.5 | 1024x1024, 1536x1024, 1024x1536, auto | Newest with improved instruction following |
dall-e-2 | 256x256, 512x512, 1024x1024 | Legacy DALL-E 2 model |
Note: DALL-E 3 and Gemini (nano-banana) do NOT support image editing via this endpoint. Gemini uses semantic masking via natural language, which is incompatible with OpenAI's mask-based editing format.
Image Requirements:
- Format: PNG only
- Size: Less than 4MB
- Dimensions: Must be square (width equals height)
Mask Requirements:
- Format: PNG with alpha channel (RGBA)
- Dimensions: Must match the source image exactly
- Transparent areas: Indicate regions to edit/generate
- Opaque areas: Indicate regions to preserve
Example Request:
curl -X POST http://localhost:8080/v1/images/edits \
-F "image=@source_image.png" \
-F "mask=@mask.png" \
-F "prompt=A sunlit indoor lounge area with a pool containing a flamingo" \
-F "n=1" \
-F "size=1024x1024" \
-F "response_format=url"
Example Request (without mask):
curl -X POST http://localhost:8080/v1/images/edits \
-F "image=@source_image.png" \
-F "prompt=Add a sunset in the background" \
-F "n=1" \
-F "size=512x512"
Response:
{
"created": 1677652288,
"data": [
{
"url": "https://oaidalleapiprodscus.blob.core.windows.net/..."
}
]
}
Response (with b64_json):
Status Codes:
200: Image(s) edited successfully400: Invalid request (e.g., non-square image, invalid size, missing required field)401: Invalid API key503: OpenAI backend unavailable
Error Examples:
Non-square image:
{
"error": {
"message": "Image must be square (800x600 is not square)",
"type": "invalid_request_error",
"param": "image",
"code": "image_not_square"
}
}
Mask dimension mismatch:
{
"error": {
"message": "Mask dimensions (256x256) do not match image dimensions (512x512)",
"type": "invalid_request_error",
"param": "mask",
"code": "dimension_mismatch"
}
}
Unsupported model:
{
"error": {
"message": "Model 'dall-e-3' does not support image editing. Supported models: gpt-image-1, gpt-image-1-mini, gpt-image-1.5, dall-e-2. Note: dall-e-3 does NOT support image editing.",
"type": "invalid_request_error",
"param": "model",
"code": "unsupported_model"
}
}
Notes:
- Supported models:
gpt-image-1,gpt-image-1-mini,gpt-image-1.5,dall-e-2 - DALL-E 3 does NOT support image editing via API
- Gemini (nano-banana) is NOT supported - uses different editing approach (semantic masking)
- When no mask is provided, the entire image may be modified
- The source image should have transparent regions if editing without a mask
- Request timeout uses the image generation timeout configuration
Image Variations¶
Generate variations of an existing image using OpenAI's DALL-E 2 model.
Form Fields:
| Parameter | Type | Required | Description |
|---|---|---|---|
image | file | Yes | Source image for variations (PNG, < 4MB, must be square) |
model | string | No | Model to use (default: dall-e-2) |
n | integer | No | Number of variations to generate (1-10, default: 1) |
size | string | No | Output size: 256x256, 512x512, 1024x1024 (default: 1024x1024) |
response_format | string | No | Response format: url or b64_json (default: url) |
user | string | No | User identifier for tracking |
Example Request:
curl -X POST http://localhost:8080/v1/images/variations \
-F "image=@source_image.png" \
-F "model=dall-e-2" \
-F "n=2" \
-F "size=512x512" \
-F "response_format=url"
Response:
{
"created": 1677652288,
"data": [
{
"url": "https://oaidalleapiprodscus.blob.core.windows.net/..."
},
{
"url": "https://oaidalleapiprodscus.blob.core.windows.net/..."
}
]
}
Response (with b64_json):
Model Support:
| Model | Variations Support | Notes |
|---|---|---|
dall-e-2 | Yes (native) | Full support, 1-10 variations |
dall-e-3 | No | Not supported by OpenAI API |
gpt-image-1 | No | Not supported |
nano-banana | No | Gemini does not support variations API |
nano-banana-pro | No | Gemini does not support variations API |
Image Requirements:
- Format: PNG only
- Size: Less than 4MB
- Dimensions: Must be square (width == height)
- Supported input sizes: Any square dimensions (will be processed by the model)
Error Scenarios:
| Error | Status | Description |
|---|---|---|
| Image not PNG | 400 | Only PNG format is supported |
| Image not square | 400 | Image dimensions must be equal |
| Image too large | 400 | Image exceeds 4MB size limit |
| Model not supported | 400 | Requested model doesn't support variations |
| Missing image | 400 | Image field is required |
| Invalid n value | 400 | n must be between 1 and 10 |
| Invalid size | 400 | Size must be one of the supported values |
Status Codes:
200: Variation(s) generated successfully400: Invalid request (invalid format, non-square image, unsupported model)401: Invalid API key429: Rate limit exceeded500: Backend error503: Backend unavailable
Text Completions¶
Generate text completions using the OpenAI Completions API format.
Request Body:
{
"model": "gpt-3.5-turbo-instruct",
"prompt": "Once upon a time in a distant galaxy",
"max_tokens": 100,
"temperature": 0.7,
"top_p": 1.0,
"frequency_penalty": 0.0,
"presence_penalty": 0.0,
"stream": false,
"stop": null,
"logit_bias": {},
"user": "user123"
}
Response:
{
"id": "cmpl-123456789",
"object": "text_completion",
"created": 1677652288,
"model": "gpt-3.5-turbo-instruct",
"choices": [
{
"text": ", there lived a young explorer named Zara who dreamed of discovering new worlds...",
"index": 0,
"finish_reason": "stop",
"logprobs": null
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 90,
"total_tokens": 100
}
}
Status Codes: Same as Chat Completions
Files API¶
The Files API allows you to upload, manage, and use files in chat completions. Uploaded files can be referenced in messages using the image_file content type, and the router automatically resolves these references by injecting the file content.
Upload File¶
Upload a file for use in chat completions.
Form Fields:
| Field | Type | Required | Description |
|---|---|---|---|
file | file | Yes | The file to upload |
purpose | string | Yes | Purpose of the file: vision, assistants, fine-tune, batch |
Example:
Response:
{
"id": "file-abc123def456",
"object": "file",
"bytes": 12345,
"created_at": 1699061776,
"filename": "image.png",
"purpose": "vision"
}
Status Codes:
200: File uploaded successfully400: Invalid request (missing file, invalid purpose)413: File too large (exceeds configured maxfilesize)
List Files¶
Retrieve a list of uploaded files.
Query Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
purpose | string | No | Filter by purpose |
Response:
{
"object": "list",
"data": [
{
"id": "file-abc123def456",
"object": "file",
"bytes": 12345,
"created_at": 1699061776,
"filename": "image.png",
"purpose": "vision"
}
]
}
Get File Metadata¶
Retrieve metadata for a specific file.
Response:
{
"id": "file-abc123def456",
"object": "file",
"bytes": 12345,
"created_at": 1699061776,
"filename": "image.png",
"purpose": "vision"
}
Status Codes:
200: File metadata retrieved404: File not found
Download File Content¶
Download the content of an uploaded file.
Response: Binary file content with appropriate Content-Type header.
Status Codes:
200: File content returned404: File not found
Delete File¶
Delete an uploaded file.
Response:
Status Codes:
200: File deleted successfully404: File not found
File Resolution in Chat Completions¶
The router automatically resolves file references in chat completion requests. When a message contains an image_file content block, the router:
- Validates the file ID format
- Loads the file content from storage
- Converts the file to a base64 data URL
- Replaces the
image_fileblock with animage_urlblock
Request with File Reference:
{
"model": "gpt-4-vision-preview",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_file", "image_file": {"file_id": "file-abc123def456"}}
]
}
]
}
Transformed Request (sent to backend):
{
"model": "gpt-4-vision-preview",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
]
}
]
}
File Resolution Errors:
| Error | Status | Description |
|---|---|---|
| Invalid file ID format | 400 | File ID must start with file- |
| File not found | 404 | Referenced file does not exist |
| Too many file references | 400 | Request contains more than 20 file references |
| Resolution timeout | 504 | File resolution took longer than 30 seconds |
Supported MIME Types for Image Files:
image/pngimage/jpegimage/gifimage/webp
Admin Endpoints¶
Backend Status¶
Get detailed status information about all configured backends.
Response:
{
"backends": [
{
"name": "local-ollama",
"url": "http://localhost:11434",
"is_healthy": true,
"consecutive_failures": 0,
"consecutive_successes": 15,
"last_check": "2024-01-15T10:30:45Z",
"last_error": null,
"response_time_ms": 45,
"models": ["llama2", "mistral", "codellama"],
"weight": 1,
"total_requests": 150,
"failed_requests": 2
},
{
"name": "openai-compatible",
"url": "https://api.openai.com",
"is_healthy": false,
"consecutive_failures": 3,
"consecutive_successes": 0,
"last_check": "2024-01-15T10:29:30Z",
"last_error": "Connection timeout after 5s",
"response_time_ms": null,
"models": [],
"weight": 1,
"total_requests": 45,
"failed_requests": 8
}
],
"healthy_count": 1,
"total_count": 2,
"summary": {
"total_models": 3,
"total_requests": 195,
"total_failures": 10,
"average_response_time_ms": 45
}
}
Fields:
| Field | Type | Description |
|---|---|---|
name | string | Backend identifier from configuration |
url | string | Backend base URL |
is_healthy | boolean | Current health status |
consecutive_failures | integer | Sequential failed health checks |
consecutive_successes | integer | Sequential successful health checks |
last_check | string | ISO timestamp of last health check |
last_error | string/null | Last error message if unhealthy |
response_time_ms | integer/null | Last health check response time |
models | array | Available models from this backend |
weight | integer | Load balancing weight |
total_requests | integer | Total requests routed to this backend |
failed_requests | integer | Failed requests to this backend |
Status Codes:
200: Backend status retrieved successfully
Service Health¶
Get overall service health and component status.
Response:
{
"status": "healthy",
"version": "1.0.0",
"uptime": "2h 15m 30s",
"timestamp": "2024-01-15T10:30:45Z",
"services": {
"backend_service": {
"status": "healthy",
"message": "All backends operational",
"healthy_backends": 2,
"total_backends": 2
},
"model_service": {
"status": "healthy",
"message": "Model cache operational",
"cached_models": 15,
"cache_hit_rate": 0.95,
"last_refresh": "2024-01-15T10:25:00Z"
},
"proxy_service": {
"status": "healthy",
"message": "Request routing operational",
"total_requests": 1250,
"failed_requests": 12,
"average_latency_ms": 85
},
"health_service": {
"status": "healthy",
"message": "Health monitoring active",
"check_interval": "30s",
"last_check": "2024-01-15T10:30:00Z"
}
},
"metrics": {
"requests_per_second": 5.2,
"error_rate": 0.008,
"memory_usage_mb": 125,
"cpu_usage_percent": 15.5
}
}
Status Values:
healthy: Service operating normallydegraded: Service operating with reduced functionalityunhealthy: Service experiencing issues
Status Codes:
200: Service health retrieved successfully503: Service is unhealthy
Configuration Summary¶
Get current configuration summary including hot reload status.
Response:
{
"server": {
"bind_address": "0.0.0.0:8080",
"workers": 4,
"connection_pool_size": 100
},
"backends": {
"count": 3,
"names": ["openai", "local-ollama", "gemini"]
},
"health_checks": {
"interval": "30s",
"timeout": "10s",
"unhealthy_threshold": 3,
"healthy_threshold": 2
},
"rate_limiting": {
"enabled": false
},
"circuit_breaker": {
"enabled": true
},
"selection_strategy": "RoundRobin",
"hot_reload": {
"available": true,
"note": "Configuration changes will be automatically detected and applied"
}
}
Fields:
| Field | Type | Description |
|---|---|---|
server | object | Server configuration (bindaddress, workers, connectionpool_size) |
backends | object | Backend configuration summary (count, names) |
health_checks | object | Health check settings |
rate_limiting | object | Rate limiting status |
circuit_breaker | object | Circuit breaker status |
selection_strategy | string | Current load balancing strategy |
hot_reload | object | Hot reload availability and status |
Status Codes:
200: Configuration summary retrieved successfully
Note: Sensitive information (API keys, etc.) is automatically redacted from the response.
Hot Reload Status¶
Get detailed information about hot reload functionality and configuration item classification.
Response:
{
"enabled": true,
"description": "Hot reload is enabled. Configuration file changes are automatically detected and applied.",
"capabilities": {
"immediate_update": {
"description": "Changes applied immediately without service interruption",
"items": [
"logging.level",
"rate_limiting.*",
"circuit_breaker.*",
"retry.*",
"global_prompts.*"
]
},
"gradual_update": {
"description": "Existing connections maintained, new connections use new config",
"items": [
"backends.*",
"health_checks.*",
"timeouts.*"
]
},
"requires_restart": {
"description": "Changes logged as warnings, restart required to take effect",
"items": [
"server.bind_address",
"server.workers"
]
}
}
}
Fields:
| Field | Type | Description |
|---|---|---|
enabled | boolean | Whether hot reload is enabled |
description | string | Human-readable description of hot reload status |
capabilities | object | Configuration item classification by hot reload capability |
capabilities.immediate_update | object | Items that update immediately without disruption |
capabilities.gradual_update | object | Items that apply to new connections only |
capabilities.requires_restart | object | Items that require server restart |
Configuration Item Classification:
Immediate Update (no service interruption): - logging.level - Log level changes apply immediately - rate_limiting.* - Rate limiting settings update in real-time - circuit_breaker.* - Circuit breaker thresholds and timeouts - retry.* - Retry policies and backoff strategies - global_prompts.* - Global system prompt injection settings
Gradual Update (existing connections maintained): - backends.* - Backend add/remove/modify (new requests use updated pool) - health_checks.* - Health check intervals and thresholds - timeouts.* - Timeout values for new requests
Requires Restart (logged as warnings): - server.bind_address - TCP bind address - server.workers - Worker thread count
Status Codes:
200: Hot reload status retrieved successfully
Example Usage:
# Check if hot reload is enabled
curl http://localhost:8080/admin/config/hot-reload-status | jq '.enabled'
# List items that support immediate update
curl http://localhost:8080/admin/config/hot-reload-status | jq '.capabilities.immediate_update.items'
Configuration Management API¶
The Configuration Management API enables viewing and modifying router configuration at runtime without requiring a server restart. This provides operational flexibility for adjusting behavior, adding backends, and fine-tuning settings in production environments.
Overview¶
Key capabilities: - Runtime Configuration: View and modify configuration without server restart - Hot Reload Support: Changes to supported settings apply immediately - Validation: Validate configuration changes before applying - History & Rollback: Track configuration changes and rollback to previous versions - Export/Import: Backup and restore configurations across environments - Security: Sensitive information (API keys, passwords, tokens) is automatically masked
Configuration Query APIs¶
Get Full Configuration¶
Returns the complete current configuration with sensitive information masked for security.
Response:
{
"server": {
"bind_address": "0.0.0.0:8080",
"workers": 4,
"connection_pool_size": 100
},
"backends": [
{
"name": "openai",
"url": "https://api.openai.com",
"api_key": "sk-****...**",
"weight": 1,
"models": ["gpt-4", "gpt-3.5-turbo"]
},
{
"name": "local-ollama",
"url": "http://localhost:11434",
"weight": 1,
"models": []
}
],
"health_checks": {
"interval": "30s",
"timeout": "10s",
"unhealthy_threshold": 3,
"healthy_threshold": 2
},
"logging": {
"level": "info",
"format": "json"
},
"retry": {
"max_attempts": 3,
"backoff": "exponential",
"initial_delay_ms": 100
},
"timeouts": {
"connect": "5s",
"request": "60s"
},
"rate_limiting": {
"enabled": false
},
"circuit_breaker": {
"enabled": true,
"failure_threshold": 5,
"recovery_timeout": "30s"
}
}
Notes:
- API keys, passwords, and tokens are masked (e.g.,
sk-****...**) - All configuration sections are included in the response
- Use
/admin/config/{section}for individual section details
Status Codes:
200: Configuration retrieved successfully
List Configuration Sections¶
Returns a list of all available configuration sections.
Response:
{
"sections": [
"server",
"backends",
"health_checks",
"logging",
"retry",
"timeouts",
"rate_limiting",
"circuit_breaker",
"global_prompts",
"admin",
"fallback",
"files",
"api_keys",
"metrics",
"routing"
],
"total": 15
}
Status Codes:
200: Section list retrieved successfully
Get Configuration Section¶
Returns the configuration for a specific section with hot reload capability information.
Path Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
section | string | Yes | Configuration section name |
Example Request:
Response:
{
"section": "logging",
"config": {
"level": "info",
"format": "json",
"output": "stdout",
"include_timestamps": true
},
"hot_reload_capability": "immediate_update",
"description": "Changes to this section apply immediately without service interruption"
}
Hot Reload Capability Values:
| Value | Description |
|---|---|
immediate_update | Changes apply immediately without service interruption |
gradual_update | Existing connections maintained, new connections use new config |
requires_restart | Server restart required for changes to take effect |
Status Codes:
200: Section configuration retrieved successfully404: Invalid section name
Error Response:
{
"error": {
"message": "Configuration section 'invalid_section' not found",
"type": "not_found",
"code": 404,
"details": {
"requested_section": "invalid_section",
"available_sections": ["server", "backends", "logging", "..."]
}
}
}
Get Configuration Schema¶
Returns the JSON Schema for configuration validation. Useful for client-side validation before submitting changes.
Response:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"server": {
"type": "object",
"properties": {
"bind_address": {
"type": "string",
"pattern": "^[0-9.]+:[0-9]+$",
"description": "Server bind address in host:port format"
},
"workers": {
"type": "integer",
"minimum": 1,
"maximum": 256,
"description": "Number of worker threads"
},
"connection_pool_size": {
"type": "integer",
"minimum": 1,
"maximum": 10000,
"description": "HTTP connection pool size per backend"
}
},
"required": ["bind_address"]
},
"backends": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string",
"minLength": 1,
"description": "Unique backend identifier"
},
"url": {
"type": "string",
"format": "uri",
"description": "Backend base URL"
},
"weight": {
"type": "integer",
"minimum": 0,
"maximum": 100,
"default": 1,
"description": "Load balancing weight"
},
"models": {
"type": "array",
"items": {"type": "string"},
"description": "Explicit model list (optional)"
}
},
"required": ["name", "url"]
}
},
"logging": {
"type": "object",
"properties": {
"level": {
"type": "string",
"enum": ["trace", "debug", "info", "warn", "error"],
"description": "Log level"
},
"format": {
"type": "string",
"enum": ["json", "text", "pretty"],
"description": "Log output format"
}
}
}
}
}
Status Codes:
200: Schema retrieved successfully
Configuration Modification APIs¶
Replace Configuration Section¶
Replaces an entire configuration section. Triggers validation and hot reload if applicable.
Path Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
section | string | Yes | Configuration section name |
Request Body: Complete section configuration object.
Example Request:
curl -X PUT http://localhost:8080/admin/config/logging \
-H "Content-Type: application/json" \
-d '{
"level": "debug",
"format": "json",
"output": "stdout",
"include_timestamps": true
}'
Response:
{
"success": true,
"section": "logging",
"hot_reload_applied": true,
"message": "Configuration updated and applied immediately",
"previous": {
"level": "info",
"format": "json",
"output": "stdout",
"include_timestamps": true
},
"current": {
"level": "debug",
"format": "json",
"output": "stdout",
"include_timestamps": true
},
"version": 15
}
Status Codes:
200: Configuration updated successfully400: Invalid configuration format or validation error404: Invalid section name
Partial Update Configuration Section¶
Performs a partial update using JSON merge patch semantics. Only specified fields are updated; unspecified fields retain their current values.
Path Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
section | string | Yes | Configuration section name |
Request Body: Partial configuration object with fields to update.
Example Request:
curl -X PATCH http://localhost:8080/admin/config/logging \
-H "Content-Type: application/json" \
-d '{
"level": "warn"
}'
Response:
{
"success": true,
"section": "logging",
"hot_reload_applied": true,
"message": "Configuration partially updated and applied",
"changes": {
"level": {
"from": "info",
"to": "warn"
}
},
"current": {
"level": "warn",
"format": "json",
"output": "stdout",
"include_timestamps": true
},
"version": 16
}
Merge Behavior:
- Scalar values are replaced
- Objects are merged recursively
- Arrays are replaced entirely (not merged)
nullvalues remove the field (if optional)
Status Codes:
200: Configuration updated successfully400: Invalid configuration format or validation error404: Invalid section name
Validate Configuration¶
Validates configuration without applying changes. Supports dry_run mode for testing configuration changes safely.
Request Body:
{
"section": "backends",
"config": {
"name": "new-backend",
"url": "http://localhost:8000",
"weight": 2
},
"dry_run": true
}
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
section | string | Yes | Configuration section to validate |
config | object | Yes | Configuration to validate |
dry_run | boolean | No | If true, only validate without preparing for apply (default: true) |
Response (Valid):
{
"valid": true,
"section": "backends",
"warnings": [
"Backend 'new-backend' has no explicit model list; models will be auto-discovered"
],
"info": {
"hot_reload_capability": "gradual_update",
"estimated_impact": "New requests may be routed to this backend after apply"
}
}
Response (Invalid):
{
"valid": false,
"section": "backends",
"errors": [
{
"field": "url",
"message": "Invalid URL format: missing scheme",
"value": "localhost:8000"
},
{
"field": "weight",
"message": "Weight must be between 0 and 100",
"value": 150
}
],
"warnings": []
}
Status Codes:
200: Validation completed (checkvalidfield for result)400: Invalid request format
Apply Pending Changes¶
Applies pending configuration changes immediately. Triggers hot reload for applicable settings.
Request Body (optional):
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
sections | array | No | Specific sections to apply (default: all pending) |
force | boolean | No | Force apply even if warnings exist (default: false) |
Response:
{
"success": true,
"applied_sections": ["logging", "rate_limiting"],
"results": {
"logging": {
"status": "applied",
"hot_reload": "immediate_update"
},
"rate_limiting": {
"status": "applied",
"hot_reload": "immediate_update"
}
},
"version": 17,
"timestamp": "2024-01-15T10:45:30Z"
}
Status Codes:
200: Changes applied successfully400: No pending changes or validation errors409: Conflict with concurrent modification
Configuration Save/Restore APIs¶
Export Configuration¶
Exports the current configuration in the specified format.
Request Body:
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
format | string | No | Export format: yaml, json, or toml (default: yaml) |
include_sensitive | boolean | No | Include sensitive data unmasked (requires elevated permissions, default: false) |
sections | array | No | Specific sections to export (default: all) |
Response (format: json):
{
"format": "json",
"content": "{\"server\":{\"bind_address\":\"0.0.0.0:8080\",...}}",
"sections_exported": ["server", "backends", "logging"],
"exported_at": "2024-01-15T10:45:30Z",
"version": 17,
"checksum": "sha256:a1b2c3d4..."
}
Response (format: yaml):
{
"format": "yaml",
"content": "server:\n bind_address: \"0.0.0.0:8080\"\n workers: 4\n...",
"sections_exported": ["server", "backends", "logging"],
"exported_at": "2024-01-15T10:45:30Z",
"version": 17,
"checksum": "sha256:a1b2c3d4..."
}
Status Codes:
200: Export successful400: Invalid format specified403: Elevated permissions required forinclude_sensitive: true
Import Configuration¶
Imports configuration from the provided content.
Request Body:
{
"format": "yaml",
"content": "server:\n bind_address: \"0.0.0.0:8080\"\n workers: 8\nlogging:\n level: debug",
"dry_run": true,
"merge": false
}
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
format | string | Yes | Content format: yaml, json, or toml |
content | string | Yes | Configuration content to import |
dry_run | boolean | No | Validate without applying (default: false) |
merge | boolean | No | Merge with existing config vs replace (default: false) |
Response (dry_run: true):
{
"valid": true,
"dry_run": true,
"changes_preview": {
"server": {
"workers": {"from": 4, "to": 8}
},
"logging": {
"level": {"from": "info", "to": "debug"}
}
},
"sections_affected": ["server", "logging"],
"warnings": [
"server.workers change requires restart to take effect"
]
}
Response (dry_run: false):
{
"success": true,
"imported_sections": ["server", "logging"],
"hot_reload_results": {
"logging": "applied_immediately",
"server": "requires_restart"
},
"version": 18,
"timestamp": "2024-01-15T10:50:00Z"
}
Status Codes:
200: Import successful (or dry_run validation passed)400: Invalid format or content parsing error422: Configuration validation failed
Get Configuration History¶
Retrieves the history of configuration changes.
Query Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
limit | integer | No | Maximum entries to return (default: 20, max: 100) |
offset | integer | No | Number of entries to skip (default: 0) |
section | string | No | Filter by section name |
Example Request:
Response:
{
"history": [
{
"version": 18,
"timestamp": "2024-01-15T10:50:00Z",
"sections_changed": ["logging"],
"source": "api",
"user": "admin",
"changes": {
"logging": {
"level": {"from": "info", "to": "debug"}
}
}
},
{
"version": 17,
"timestamp": "2024-01-15T09:30:00Z",
"sections_changed": ["backends"],
"source": "file_reload",
"user": null,
"changes": {
"backends": {
"added": ["new-backend"],
"modified": [],
"removed": []
}
}
},
{
"version": 16,
"timestamp": "2024-01-14T15:20:00Z",
"sections_changed": ["rate_limiting"],
"source": "api",
"user": "admin",
"changes": {
"rate_limiting": {
"enabled": {"from": false, "to": true}
}
}
}
],
"total": 18,
"limit": 10,
"offset": 0
}
Source Values:
| Source | Description |
|---|---|
api | Changed via Configuration Management API |
file_reload | Changed via configuration file hot reload |
startup | Initial configuration at server startup |
rollback | Restored from previous version |
Status Codes:
200: History retrieved successfully
Rollback Configuration¶
Rolls back to a previous configuration version.
Path Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
version | integer | Yes | Version number to rollback to |
Request Body (optional):
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
dry_run | boolean | No | Preview changes without applying (default: false) |
sections | array | No | Specific sections to rollback (default: all changed sections) |
Response:
{
"success": true,
"rolled_back_from": 18,
"rolled_back_to": 15,
"sections_restored": ["logging", "backends"],
"changes": {
"logging": {
"level": {"from": "debug", "to": "info"}
},
"backends": {
"removed": ["new-backend"]
}
},
"new_version": 19,
"timestamp": "2024-01-15T11:00:00Z"
}
Status Codes:
200: Rollback successful400: Validation error for target configuration404: Version not found in history
Backend Management APIs¶
These endpoints provide convenient shortcuts for managing backends without modifying the full backends configuration section.
Add Backend¶
Dynamically adds a new backend to the router.
Request Body:
{
"name": "new-ollama",
"url": "http://192.168.1.100:11434",
"weight": 2,
"models": ["llama2", "codellama"],
"api_key": null,
"health_check_path": "/api/tags"
}
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Unique backend identifier |
url | string | Yes | Backend base URL |
weight | integer | No | Load balancing weight (default: 1) |
models | array | No | Explicit model list (empty for auto-discovery) |
api_key | string | No | API key for authentication |
health_check_path | string | No | Custom health check endpoint |
Response:
{
"success": true,
"backend": {
"name": "new-ollama",
"url": "http://192.168.1.100:11434",
"weight": 2,
"models": ["llama2", "codellama"],
"is_healthy": null,
"status": "pending_health_check"
},
"message": "Backend added successfully. Health check scheduled.",
"config_version": 20
}
Status Codes:
200: Backend added successfully400: Invalid backend configuration409: Backend with this name already exists
Get Backend Configuration¶
Retrieves the configuration for a specific backend.
Path Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Backend identifier |
Response:
{
"name": "local-ollama",
"url": "http://localhost:11434",
"weight": 1,
"models": ["llama2", "mistral", "codellama"],
"api_key": null,
"health_check_path": "/api/tags",
"is_healthy": true,
"consecutive_failures": 0,
"consecutive_successes": 25,
"last_check": "2024-01-15T10:55:00Z",
"total_requests": 1250,
"failed_requests": 3
}
Status Codes:
200: Backend configuration retrieved404: Backend not found
Update Backend Configuration¶
Updates the configuration for an existing backend.
Path Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Backend identifier |
Request Body:
{
"url": "http://localhost:11434",
"weight": 3,
"models": ["llama2", "mistral", "codellama", "phi"],
"api_key": null
}
Response:
{
"success": true,
"backend": {
"name": "local-ollama",
"url": "http://localhost:11434",
"weight": 3,
"models": ["llama2", "mistral", "codellama", "phi"]
},
"changes": {
"weight": {"from": 1, "to": 3},
"models": {"added": ["phi"], "removed": []}
},
"config_version": 21
}
Status Codes:
200: Backend updated successfully400: Invalid configuration404: Backend not found
Delete Backend¶
Removes a backend from the router.
Path Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Backend identifier |
Query Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
drain | boolean | No | Wait for active requests to complete (default: true) |
timeout | integer | No | Drain timeout in seconds (default: 30) |
Example Request:
Response:
{
"success": true,
"deleted_backend": "old-backend",
"drained": true,
"active_requests_completed": 5,
"config_version": 22,
"message": "Backend removed from rotation"
}
Status Codes:
200: Backend deleted successfully404: Backend not found409: Cannot delete last remaining backend
Update Backend Weight¶
Updates only the load balancing weight for a backend.
Path Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Backend identifier |
Request Body:
Response:
{
"success": true,
"backend": "local-ollama",
"weight": {
"from": 1,
"to": 5
},
"config_version": 23
}
Status Codes:
200: Weight updated successfully400: Invalid weight value404: Backend not found
Update Backend Models¶
Updates only the model list for a backend.
Path Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Backend identifier |
Request Body:
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
models | array | Yes | Model list |
mode | string | No | Update mode: replace, add, or remove (default: replace) |
Response:
{
"success": true,
"backend": "local-ollama",
"models": {
"previous": ["llama2", "mistral", "codellama"],
"current": ["llama2", "mistral", "codellama", "phi", "gemma"],
"added": ["phi", "gemma"],
"removed": []
},
"config_version": 24
}
Status Codes:
200: Models updated successfully400: Invalid model list404: Backend not found
Configuration API Examples¶
Get Full Configuration¶
Update Logging Level¶
curl -X PATCH http://localhost:8080/admin/config/logging \
-H "Content-Type: application/json" \
-d '{"level": "debug"}'
Add a New Backend¶
curl -X POST http://localhost:8080/admin/backends \
-H "Content-Type: application/json" \
-d '{
"name": "remote-ollama",
"url": "http://192.168.1.50:11434",
"weight": 2,
"models": ["llama2", "mistral"]
}'
Export Configuration as JSON¶
curl -X POST http://localhost:8080/admin/config/export \
-H "Content-Type: application/json" \
-d '{"format": "json"}' | jq -r '.content' > config-backup.json
View Configuration History¶
Configuration API Error Responses¶
All Configuration Management API errors follow the standard error format:
{
"error": {
"message": "Human-readable error description",
"type": "error_type_identifier",
"code": 400,
"details": {
"additional": "context information"
}
}
}
Configuration-Specific Error Types:
| Type | HTTP Code | Description |
|---|---|---|
config_validation_error | 400 | Configuration validation failed |
config_section_not_found | 404 | Requested configuration section does not exist |
config_version_not_found | 404 | Requested version not found in history |
config_conflict | 409 | Concurrent modification conflict |
config_permission_denied | 403 | Insufficient permissions for operation |
config_parse_error | 422 | Failed to parse configuration content |
Example Validation Error:
{
"error": {
"message": "Configuration validation failed",
"type": "config_validation_error",
"code": 400,
"details": {
"section": "backends",
"errors": [
{
"field": "url",
"message": "URL must include scheme (http:// or https://)",
"value": "localhost:8000"
}
]
}
}
}
Example Conflict Error:
{
"error": {
"message": "Configuration was modified by another request",
"type": "config_conflict",
"code": 409,
"details": {
"expected_version": 15,
"current_version": 16,
"conflicting_sections": ["backends"]
}
}
}
Error Handling¶
Error Response Format¶
All errors follow a consistent JSON structure:
{
"error": {
"message": "Human-readable error description",
"type": "error_type_identifier",
"code": 404,
"details": {
"additional": "context information"
}
}
}
Error Types¶
| Type | HTTP Code | Description |
|---|---|---|
bad_request | 400 | Invalid request format or parameters |
unauthorized | 401 | Authentication required (future feature) |
forbidden | 403 | Access denied (future feature) |
model_not_found | 404 | Requested model not available |
rate_limit_exceeded | 429 | Rate limit exceeded (future feature) |
internal_error | 500 | Router internal error |
bad_gateway | 502 | Backend connection/response error |
service_unavailable | 503 | All backends unhealthy |
gateway_timeout | 504 | Backend request timeout |
Example Error Responses¶
Model Not Found:
{
"error": {
"message": "Model 'invalid-model' not found on any healthy backend",
"type": "model_not_found",
"code": 404,
"details": {
"requested_model": "invalid-model",
"available_models": ["gpt-4", "gpt-3.5-turbo", "llama2"]
}
}
}
Backend Error:
{
"error": {
"message": "Failed to connect to backend 'local-ollama'",
"type": "bad_gateway",
"code": 502,
"details": {
"backend": "local-ollama",
"backend_error": "Connection refused"
}
}
}
Service Unavailable:
{
"error": {
"message": "All backends are currently unhealthy",
"type": "service_unavailable",
"code": 503,
"details": {
"healthy_backends": 0,
"total_backends": 3
}
}
}
Rate Limiting¶
Note: Rate limiting is not currently implemented but is planned for future releases.
Future rate limiting will support: - Per-IP rate limiting - Per-API-key rate limiting
- Model-specific rate limiting - Sliding window algorithms - Rate limit headers in responses
Streaming¶
Server-Sent Events (SSE)¶
When stream: true is specified, responses are sent as Server-Sent Events with:
Content-Type: text/event-stream Cache-Control: no-cache Connection: keep-alive
SSE Format¶
data: {"id":"chatcmpl-123","object":"chat.completion.chunk",...}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk",...}
data: [DONE]
SSE Compatibility¶
The router supports multiple SSE formats for maximum compatibility:
- Standard Format:
data: {...} - Spaced Format:
data: {...} - Mixed Line Endings: Handles
\r\n,\n, and\r - Empty Lines: Properly processes chunk separators
Connection Management¶
- Keep-Alive: Connections are kept open during streaming
- Timeouts: 5-minute timeout for long-running requests
- Error Handling: Partial responses include error information
- Client Disconnection: Gracefully handles client disconnects
Examples¶
Basic Chat Completion¶
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
]
}'
Streaming Chat Completion¶
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{"role": "user", "content": "Write a short story"}
],
"stream": true,
"max_tokens": 200
}'
Text Completion with Parameters¶
curl -X POST http://localhost:8080/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-3.5-turbo-instruct",
"prompt": "The future of AI is",
"max_tokens": 50,
"temperature": 0.8,
"top_p": 0.9
}'
Check Backend Status¶
Monitor Service Health¶
List Available Models¶
Python Client Example¶
import requests
import json
# Configure the client
BASE_URL = "http://localhost:8080"
def chat_completion(messages, model="gpt-3.5-turbo", stream=False):
"""Send a chat completion request"""
response = requests.post(
f"{BASE_URL}/v1/chat/completions",
headers={"Content-Type": "application/json"},
json={
"model": model,
"messages": messages,
"stream": stream,
"temperature": 0.7
},
stream=stream
)
if stream:
# Handle streaming response
for line in response.iter_lines():
if line:
line = line.decode('utf-8')
if line.startswith('data: '):
data = line[6:] # Remove 'data: ' prefix
if data == '[DONE]':
break
try:
chunk = json.loads(data)
content = chunk['choices'][0]['delta'].get('content', '')
if content:
print(content, end='', flush=True)
except json.JSONDecodeError:
continue
print() # New line after streaming
else:
# Handle non-streaming response
result = response.json()
return result['choices'][0]['message']['content']
# Example usage
messages = [
{"role": "user", "content": "Explain machine learning in simple terms"}
]
print("Streaming response:")
chat_completion(messages, stream=True)
print("\nNon-streaming response:")
response = chat_completion(messages, stream=False)
print(response)
JavaScript/Node.js Client Example¶
const fetch = require('node-fetch');
const BASE_URL = 'http://localhost:8080';
async function chatCompletion(messages, options = {}) {
const response = await fetch(`${BASE_URL}/v1/chat/completions`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: options.model || 'gpt-3.5-turbo',
messages: messages,
stream: options.stream || false,
temperature: options.temperature || 0.7,
...options
})
});
if (options.stream) {
// Handle streaming response
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = line.slice(6);
if (data === '[DONE]') return;
try {
const parsed = JSON.parse(data);
const content = parsed.choices[0]?.delta?.content;
if (content) {
process.stdout.write(content);
}
} catch (e) {
// Ignore JSON parse errors
}
}
}
}
console.log(); // New line
} else {
const result = await response.json();
return result.choices[0].message.content;
}
}
// Example usage
const messages = [
{ role: 'user', content: 'What is the meaning of life?' }
];
// Streaming
console.log('Streaming response:');
await chatCompletion(messages, { stream: true });
// Non-streaming
console.log('\nNon-streaming response:');
const response = await chatCompletion(messages);
console.log(response);
This API reference provides comprehensive documentation for integrating with Continuum Router. The router maintains full OpenAI API compatibility while adding powerful multi-backend routing and management capabilities.