Admin REST API Reference¶

This document covers the Continuum Router Admin REST API for developers building configuration control applications. The Configuration Management API supports runtime configuration viewing, modification, and management without server restarts.

Table of Contents¶

Overview
Authentication
Base URL and Headers
Configuration Query APIs
Configuration Modification APIs
Configuration Save/Restore APIs
Backend Management APIs
API Key Management APIs
Statistics APIs
Response Cache Admin APIs
KV Cache Index Admin APIs
Smart Routing Admin APIs
Guardrail Admin APIs
Data Models
Hot Reload Behavior
Error Handling
Client SDK Examples
Best Practices
Security Considerations

Overview¶

The Admin REST API provides programmatic access to Continuum Router's configuration system, enabling:

Real-time Configuration Viewing: Retrieve current configuration with automatic sensitive data masking
Dynamic Configuration Updates: Modify configuration sections without server restart
Configuration Versioning: Track changes with full history and rollback capabilities
Backend Management: Add, remove, and modify backends dynamically
Export/Import: Save and restore configurations in multiple formats (YAML, JSON, TOML)

Key Features¶

Feature	Description
Hot Reload	Changes applied immediately or gradually based on section type
Sensitive Masking	API keys, passwords, and tokens automatically masked in responses
Validation	All changes validated before application with dry-run support
Audit Logging	All modifications logged for security and compliance
History Tracking	Up to 100 configuration versions maintained for rollback

Authentication¶

All Admin API endpoints require authentication via the Admin Auth system.

Authentication Methods¶

1. Bearer Token¶

Authorization: Bearer <admin-token>

curl -H "Authorization: Bearer your-admin-token" \
  http://localhost:8080/admin/config/full

2. Basic Authentication¶

Authorization: Basic <base64(username:password)>

curl -u admin:password http://localhost:8080/admin/config/full

3. API Key Header¶

X-API-Key: <admin-api-key>

curl -H "X-API-Key: your-admin-key" http://localhost:8080/admin/config/full

Configuration¶

Configure admin authentication in config.yaml:

admin:
  auth:
    method: bearer_token  # Options: none, bearer_token, basic, api_key
    token: "${ADMIN_TOKEN}"  # Environment variable supported
    # For basic auth:
    # username: admin
    # password: "${ADMIN_PASSWORD}"

  # IP whitelist (optional)
  ip_whitelist:
        - "127.0.0.1"
        - "10.0.0.0/8"

  # Configurable limits
  max_history_entries: 100
  max_backend_name_length: 256

Base URL and Headers¶

Base URL¶

http://localhost:8080/admin

Common Request Headers¶

Content-Type: application/json
Accept: application/json
Authorization: Bearer <token>

Common Response Headers¶

Content-Type: application/json
X-Request-Id: <unique-request-id>

Configuration Query APIs¶

Get Full Configuration¶

Retrieve the complete configuration with sensitive information masked.

GET /admin/config/full

Response¶

{
  "config": {
    "server": {
      "bind_address": "0.0.0.0:8080",
      "workers": 4
    },
    "backends": [
      {
        "name": "openai",
        "url": "https://api.openai.com",
        "api_key": "sk-***abcd",
        "weight": 1
      }
    ],
    "logging": {
      "level": "info"
    },
    "rate_limiting": {
      "enabled": true,
      "requests_per_minute": 100
    }
  },
  "hot_reload_enabled": true,
  "last_modified": "2025-12-13T10:30:00Z"
}

Example¶

curl -s http://localhost:8080/admin/config/full \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

List Configuration Sections¶

Get all available configuration sections with their hot reload capabilities.

GET /admin/config/sections

Response¶

{
  "sections": [
    {
      "name": "server",
      "description": "Server configuration including bind address and workers",
      "hot_reload_capability": "requires_restart"
    },
    {
      "name": "backends",
      "description": "Backend server configurations",
      "hot_reload_capability": "gradual"
    },
    {
      "name": "logging",
      "description": "Logging configuration",
      "hot_reload_capability": "immediate"
    },
    {
      "name": "rate_limiting",
      "description": "Rate limiting configuration",
      "hot_reload_capability": "immediate"
    },
    {
      "name": "circuit_breaker",
      "description": "Circuit breaker configuration",
      "hot_reload_capability": "immediate"
    },
    {
      "name": "retry",
      "description": "Retry policy configuration",
      "hot_reload_capability": "immediate"
    },
    {
      "name": "timeouts",
      "description": "Timeout configuration",
      "hot_reload_capability": "gradual"
    },
    {
      "name": "health_checks",
      "description": "Health check configuration",
      "hot_reload_capability": "gradual"
    },
    {
      "name": "global_prompts",
      "description": "Global prompt injection configuration",
      "hot_reload_capability": "immediate"
    },
    {
      "name": "fallback",
      "description": "Model fallback configuration",
      "hot_reload_capability": "gradual"
    },
    {
      "name": "files",
      "description": "Files API configuration",
      "hot_reload_capability": "gradual"
    },
    {
      "name": "api_keys",
      "description": "API keys configuration",
      "hot_reload_capability": "immediate"
    },
    {
      "name": "metrics",
      "description": "Metrics and monitoring configuration",
      "hot_reload_capability": "gradual"
    },
    {
      "name": "admin",
      "description": "Admin API configuration",
      "hot_reload_capability": "gradual"
    },
    {
      "name": "routing",
      "description": "Request routing configuration",
      "hot_reload_capability": "gradual"
    }
  ]
}

Example¶

curl -s http://localhost:8080/admin/config/sections \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq '.sections[].name'

Get Section Configuration¶

Retrieve configuration for a specific section.

GET /admin/config/{section}

Path Parameters¶

Parameter	Type	Required	Description
`section`	string	Yes	Section name (see list above)

Response¶

{
  "section": "logging",
  "config": {
    "level": "info",
    "format": "json",
    "file": "/var/log/continuum-router.log"
  },
  "hot_reload_capability": "immediate",
  "description": "Logging configuration"
}

Example¶

# Get logging configuration
curl -s http://localhost:8080/admin/config/logging \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

# Get backends configuration
curl -s http://localhost:8080/admin/config/backends \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Get Configuration Schema¶

Retrieve JSON Schema for configuration validation.

GET /admin/config/schema

Query Parameters¶

Parameter	Type	Required	Description
`section`	string	No	Get schema for specific section only

Response¶

{
  "schema": {
    "$schema": "http://json-schema.org/draft-07/schema#",
    "type": "object",
    "properties": {
      "server": {
        "type": "object",
        "properties": {
          "bind_address": {
            "type": "string",
            "pattern": "^[^:]+:[0-9]+$",
            "description": "Server bind address in host:port format"
          },
          "workers": {
            "type": "integer",
            "minimum": 1,
            "description": "Number of worker threads"
          }
        }
      },
      "logging": {
        "type": "object",
        "properties": {
          "level": {
            "type": "string",
            "enum": ["trace", "debug", "info", "warn", "error"]
          }
        }
      }
    }
  }
}

Example¶

# Get full schema
curl -s http://localhost:8080/admin/config/schema \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

# Get schema for specific section
curl -s "http://localhost:8080/admin/config/schema?section=logging" \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Configuration Modification APIs¶

Replace Section Configuration¶

Replace entire section configuration with new values.

PUT /admin/config/{section}

Request Body¶

{
  "config": {
    "level": "debug",
    "format": "json"
  }
}

Response¶

{
  "success": true,
  "message": "Configuration updated successfully",
  "version": 5,
  "hot_reload_capability": "immediate",
  "applied": true,
  "warnings": []
}

Example¶

# Update logging level to debug
curl -X PUT http://localhost:8080/admin/config/logging \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "config": {
      "level": "debug"
    }
  }'

Partial Update Section¶

Apply partial updates using JSON merge patch semantics.

PATCH /admin/config/{section}

Request Body¶

{
  "config": {
    "level": "warn"
  }
}

Only specified fields are updated; other fields remain unchanged.

Response¶

{
  "success": true,
  "message": "Configuration partially updated",
  "version": 6,
  "hot_reload_capability": "immediate",
  "applied": true,
  "merged_config": {
    "level": "warn",
    "format": "json",
    "file": "/var/log/continuum-router.log"
  }
}

Example¶

# Update only rate limit value
curl -X PATCH http://localhost:8080/admin/config/rate_limiting \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "config": {
      "requests_per_minute": 200
    }
  }'

Validate Configuration¶

Validate configuration changes without applying them.

POST /admin/config/validate

Request Body¶

{
  "section": "server",
  "config": {
    "bind_address": "0.0.0.0:9090",
    "workers": 8
  },
  "dry_run": true
}

Response (Valid)¶

{
  "valid": true,
  "errors": [],
  "warnings": [
    {
      "field": "bind_address",
      "message": "Changing bind_address requires server restart"
    }
  ],
  "hot_reload_capability": "requires_restart"
}

Response (Invalid)¶

{
  "valid": false,
  "errors": [
    {
      "field": "workers",
      "message": "workers must be greater than 0",
      "code": "VALIDATION_ERROR"
    }
  ],
  "warnings": []
}

Example¶

# Validate before applying
curl -X POST http://localhost:8080/admin/config/validate \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "section": "rate_limiting",
    "config": {
      "enabled": true,
      "requests_per_minute": 500
    }
  }'

Apply Configuration¶

Apply pending configuration changes immediately (trigger hot reload).

POST /admin/config/apply

Request Body¶

{
  "sections": ["logging", "rate_limiting"],
  "force": false
}

Field	Type	Required	Description
`sections`	array	No	Specific sections to apply (default: all pending)
`force`	boolean	No	Force apply even with warnings (default: false)

Response¶

{
  "success": true,
  "applied_sections": ["logging", "rate_limiting"],
  "version": 7,
  "results": {
    "logging": {
      "status": "applied",
      "hot_reload_type": "immediate"
    },
    "rate_limiting": {
      "status": "applied",
      "hot_reload_type": "immediate"
    }
  }
}

Example¶

curl -X POST http://localhost:8080/admin/config/apply \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "sections": ["logging"]
  }'

Configuration Save/Restore APIs¶

Export Configuration¶

Export current configuration in specified format.

POST /admin/config/export

Request Body¶

{
  "format": "yaml",
  "sections": ["server", "backends", "logging"],
  "include_sensitive": false,
  "include_defaults": true
}

Field	Type	Required	Description
`format`	string	Yes	Output format: `yaml`, `json`, or `toml`
`sections`	array	No	Sections to export (default: all)
`include_sensitive`	boolean	No	Include unmasked sensitive data (default: false)
`include_defaults`	boolean	No	Include default values (default: true)

Response¶

{
  "format": "yaml",
  "content": "server:\n  bind_address: \"0.0.0.0:8080\"\n  workers: 4\n\nbackends:\n  - name: openai\n    url: https://api.openai.com\n    api_key: \"sk-***abcd\"\n",
  "exported_at": "2025-12-13T10:30:00Z",
  "sections_exported": ["server", "backends", "logging"]
}

Example¶

# Export as YAML
curl -X POST http://localhost:8080/admin/config/export \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"format": "yaml"}' | jq -r '.content' > config-backup.yaml

# Export as JSON
curl -X POST http://localhost:8080/admin/config/export \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"format": "json"}' | jq -r '.content' > config-backup.json

# Export specific sections
curl -X POST http://localhost:8080/admin/config/export \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "format": "yaml",
    "sections": ["backends", "rate_limiting"]
  }'

Import Configuration¶

Import and apply configuration from content.

POST /admin/config/import

Request Body¶

{
  "format": "yaml",
  "content": "logging:\n  level: info\n  format: json\n",
  "apply": true,
  "dry_run": false,
  "merge": true
}

Field	Type	Required	Description
`format`	string	Yes	Content format: `yaml`, `json`, or `toml`
`content`	string	Yes	Configuration content (max 1MB)
`apply`	boolean	No	Apply after validation (default: true)
`dry_run`	boolean	No	Validate only without applying (default: false)
`merge`	boolean	No	Merge with existing config (default: false)

Response¶

{
  "success": true,
  "message": "Configuration imported and applied",
  "version": 8,
  "validation": {
    "valid": true,
    "errors": [],
    "warnings": []
  },
  "sections_imported": ["logging"],
  "applied": true
}

Example¶

# Import from file
curl -X POST http://localhost:8080/admin/config/import \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d "{
    \"format\": \"yaml\",
    \"content\": $(cat config-backup.yaml | jq -Rs .),
    \"apply\": true
  }"

# Dry run import
curl -X POST http://localhost:8080/admin/config/import \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "format": "yaml",
    "content": "logging:\n  level: debug\n",
    "dry_run": true
  }'

Get Configuration History¶

View configuration change history.

GET /admin/config/history

Query Parameters¶

Parameter	Type	Required	Description
`limit`	integer	No	Number of entries to return (default: 20, max: 100)
`offset`	integer	No	Number of entries to skip (default: 0)
`section`	string	No	Filter by section name

Response¶

{
  "history": [
    {
      "version": 8,
      "timestamp": "2025-12-13T10:30:00Z",
      "sections_changed": ["logging"],
      "source": "api",
      "user": "admin",
      "description": "Updated logging level to debug",
      "rollback_available": true
    },
    {
      "version": 7,
      "timestamp": "2025-12-13T10:25:00Z",
      "sections_changed": ["rate_limiting"],
      "source": "api",
      "user": "admin",
      "description": "Increased rate limit to 200 rpm",
      "rollback_available": true
    },
    {
      "version": 6,
      "timestamp": "2025-12-13T09:00:00Z",
      "sections_changed": ["backends"],
      "source": "file_reload",
      "user": "system",
      "description": "Configuration file changed",
      "rollback_available": true
    }
  ],
  "total_entries": 8,
  "current_version": 8
}

Example¶

# Get recent history
curl -s http://localhost:8080/admin/config/history \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

# Get history for specific section
curl -s "http://localhost:8080/admin/config/history?section=backends&limit=10" \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Rollback Configuration¶

Rollback to a previous configuration version.

POST /admin/config/rollback/{version}

Path Parameters¶

Parameter	Type	Required	Description
`version`	integer	Yes	Version number to rollback to

Request Body¶

{
  "sections": ["logging", "rate_limiting"],
  "dry_run": false
}

Field	Type	Required	Description
`sections`	array	No	Specific sections to rollback (default: all changed)
`dry_run`	boolean	No	Preview without applying (default: false)

Response¶

{
  "success": true,
  "message": "Rolled back to version 5",
  "previous_version": 8,
  "new_version": 9,
  "sections_rolled_back": ["logging", "rate_limiting"],
  "changes": {
    "logging": {
      "level": {
        "from": "debug",
        "to": "info"
      }
    }
  }
}

Example¶

# Rollback to version 5
curl -X POST http://localhost:8080/admin/config/rollback/5 \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{}'

# Preview rollback (dry run)
curl -X POST http://localhost:8080/admin/config/rollback/5 \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"dry_run": true}'

Backend Management APIs¶

Add Backend¶

Add a new backend dynamically.

POST /admin/backends

Request Body¶

{
  "name": "new-ollama",
  "url": "http://192.168.1.100:11434",
  "weight": 1,
  "models": ["llama3.2", "mistral"],
  "api_key": "optional-key",
  "enabled": true,
  "health_check": {
    "enabled": true,
    "path": "/v1/models"
  }
}

Field	Type	Required	Description
`name`	string	Yes	Unique backend name (alphanumeric, -, _)
`type`	string	No	Backend type: `openai`, `azure`, `vllm`, `ollama`, `anthropic`, `gemini`, `llamacpp`, `generic`. Default: `generic` (auto-detect)
`url`	string	Yes	Backend URL (http:// or https://)
`weight`	integer	No	Load balancing weight (default: 1)
`models`	array	No	List of models served by this backend
`api_key`	string	No	API key for backend authentication
`enabled`	boolean	No	Whether backend is enabled (default: true)

Backend Type Auto-Detection¶

When type is not specified or set to generic, the router automatically probes the backend's /v1/models endpoint to detect the backend type. Currently supports auto-detection of:

llama.cpp: Identified by owned_by: "llamacpp" or llama.cpp-specific metadata fields

llama.cpp backends can therefore be added without explicit type configuration:

# llama.cpp backend - type auto-detected
curl -X POST http://localhost:8080/admin/backends \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "local-llama",
    "url": "http://localhost:8080"
  }'

Response¶

{
  "success": true,
  "message": "Backend 'new-ollama' added successfully",
  "backend": {
    "name": "new-ollama",
    "url": "http://192.168.1.100:11434",
    "weight": 1,
    "models": ["llama3.2", "mistral"],
    "enabled": true,
    "health_status": "unknown"
  }
}

Example¶

curl -X POST http://localhost:8080/admin/backends \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "new-backend",
    "url": "http://192.168.1.100:11434",
    "weight": 2,
    "models": ["llama3.2"]
  }'

Get Backend¶

Get configuration for a specific backend.

GET /admin/backends/{name}

Response¶

{
  "name": "openai",
  "url": "https://api.openai.com",
  "api_key": "sk-***abcd",
  "weight": 1,
  "models": ["gpt-4", "gpt-3.5-turbo"],
  "enabled": true,
  "health_status": "healthy",
  "stats": {
    "total_requests": 1250,
    "failed_requests": 12,
    "average_latency_ms": 150,
    "last_used": "2025-12-13T10:29:55Z"
  }
}

Example¶

curl -s http://localhost:8080/admin/backends/openai \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Update Backend¶

Update backend configuration.

PUT /admin/backends/{name}

Request Body¶

{
  "url": "https://api.openai.com",
  "weight": 2,
  "models": ["gpt-4", "gpt-4-turbo", "gpt-3.5-turbo"],
  "enabled": true
}

Response¶

{
  "success": true,
  "message": "Backend 'openai' updated successfully",
  "backend": {
    "name": "openai",
    "url": "https://api.openai.com",
    "weight": 2,
    "models": ["gpt-4", "gpt-4-turbo", "gpt-3.5-turbo"],
    "enabled": true
  }
}

Example¶

curl -X PUT http://localhost:8080/admin/backends/openai \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "weight": 3,
    "models": ["gpt-4", "gpt-4-turbo"]
  }'

Delete Backend¶

Remove a backend from the router.

DELETE /admin/backends/{name}

Query Parameters¶

Parameter	Type	Required	Description
`force`	boolean	No	Force delete even if backend has active connections

Response¶

{
  "success": true,
  "message": "Backend 'old-backend' removed successfully",
  "removed_backend": "old-backend"
}

Notes¶

Deleting the last backend is allowed: The router can operate with zero backends configured. When the last backend is deleted:
- /v1/models returns an empty list
- Routing requests return 503 "No backends available"
- New backends can be added via POST /admin/backends

Example¶

curl -X DELETE http://localhost:8080/admin/backends/old-backend \
  -H "Authorization: Bearer $ADMIN_TOKEN"

# Force delete
curl -X DELETE "http://localhost:8080/admin/backends/old-backend?force=true" \
  -H "Authorization: Bearer $ADMIN_TOKEN"

Update Backend Weight¶

Update only the backend weight for load balancing.

PUT /admin/backends/{name}/weight

Request Body¶

{
  "weight": 5
}

Response¶

{
  "success": true,
  "message": "Backend 'openai' weight updated to 5",
  "previous_weight": 2,
  "new_weight": 5
}

Example¶

curl -X PUT http://localhost:8080/admin/backends/openai/weight \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"weight": 5}'

Update Backend Models¶

Update the model list for a backend.

PUT /admin/backends/{name}/models

Request Body¶

{
  "models": ["gpt-4", "gpt-4-turbo", "gpt-4o", "gpt-3.5-turbo"],
  "append": false
}

Field	Type	Required	Description
`models`	array	Yes	List of model names
`append`	boolean	No	Append to existing list (default: false, replaces)

Response¶

{
  "success": true,
  "message": "Backend 'openai' models updated",
  "models": ["gpt-4", "gpt-4-turbo", "gpt-4o", "gpt-3.5-turbo"]
}

Example¶

# Replace models
curl -X PUT http://localhost:8080/admin/backends/openai/models \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"models": ["gpt-4", "gpt-4o"]}'

# Append models
curl -X PUT http://localhost:8080/admin/backends/openai/models \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"models": ["gpt-4.5-turbo"], "append": true}'

API Key Management APIs¶

The API Key Management APIs let you issue, inspect, update, rotate, enable, disable, and revoke per-user API keys at runtime. All eight endpoints are mounted under /admin/api-keys and require the same admin authentication as the rest of the Admin API.

These endpoints operate on the same key store that authenticates incoming client requests. A key created here is immediately usable by a client through the Authorization: Bearer <key> header, subject to the configured authentication mode (see Authentication Mode and Client Usage below).

API Key Object¶

Each API key is described by an ApiKeyConfig record. The fields below are configurable inline in config.yaml, in an external keys file, or through the create/update endpoints.

Field	Type	Description
`key`	string	The secret key value. Generated cryptographically (format `sk-<base64url>`) when not supplied. Never returned in full except once at creation or rotation; elsewhere it is masked.
`id`	string	Unique identifier for the key (1–128 chars). Used in every `/admin/api-keys/{id}` path.
`user_id`	string	Associated user identifier (1–128 chars). Surfaced in per-user usage statistics.
`organization_id`	string	Associated organization identifier (1–128 chars).
`name`	string or absent	Optional human-readable label (max 256 chars).
`description`	string or absent	Optional notes about the key (max 1024 chars).
`scopes`	array of strings	Permissions granted to the key. Common values: `read`, `write`, `files`, `admin`. At least one scope is required when creating a key.
`rate_limit`	integer or absent	Optional per-key rate limit in requests per minute. Overrides the global limit for this key.
`enabled`	boolean	Whether the key is active. A disabled key fails authentication even before expiry is checked.
`created_at`	string (ISO 8601)	Creation timestamp.
`expires_at`	string (ISO 8601) or absent	Optional expiration timestamp. A key past this instant is automatically invalid regardless of `enabled`.
`annotations`	object (string to string) or absent	Free-form metadata map. Recommended canonical keys: `email`, `uuid`, `owner`, `team`, `environment`. An operator-configured allowlist of annotation keys is exported as labels on the `api_key_info` Prometheus metric (values are sanitized).
`allowed_backends`	array of strings or absent	Per-key backend allow-list. When non-empty, requests authenticated with this key may only route to backends whose name appears here. Empty or absent means no restriction. Matching is exact and case-sensitive; unservable requests are rejected with `403 Forbidden`.

A key is considered valid when it is enabled and not past expires_at. The listing endpoint reports active, expired, and disabled counts derived from these rules.

Key Masking¶

The full key value is returned exactly once: in the response to POST /admin/api-keys (creation) and POST /admin/api-keys/{id}/rotate (rotation). Every other response returns a masked_key of the form sk-***abcd, preserving the sk- prefix and the last four characters. Logs always use the masked form.

Authentication Mode and Client Usage¶

The api_keys.mode setting controls how the router treats client requests that lack a valid key:

Mode	Behavior
`permissive` (default)	Requests with a valid key are authenticated and attributed; requests without a key are still allowed through. Use this for incremental rollout.
`blocking`	Every API request must carry a valid key. Requests without one receive `401 Unauthorized`.

Set the mode in config.yaml:

api_keys:
  mode: blocking            # "permissive" (default) | "blocking"
  persistence_file: ~/.config/continuum-router/runtime-keys.yaml
  api_keys:
    - key: "sk-prod-..."
      id: "key-1"
      user_id: "user-1"
      organization_id: "org-1"
      scopes: ["read", "write"]

A client authenticates by sending the issued key as a bearer token:

curl http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer sk-the-issued-key-value" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'

The mode setting hot-reloads: switching between permissive and blocking takes effect without a restart.

Persistence and Hot Reload¶

Keys created or modified through these endpoints live in the in-memory key store. When api_keys.persistence_file is set, runtime changes are written to that file (tilde expansion is supported) and restored on the next startup, so admin-created keys survive restarts. Without persistence_file, runtime keys are in-memory only and lost on restart. Keys loaded from inline config or api_keys_file are read-only sources and are reloaded on config hot-reload.

List API Keys¶

GET /admin/api-keys

Returns every API key with its value masked, plus a summary of active, expired, and disabled counts.

Response¶

{
  "keys": [
    {
      "id": "key-1",
      "masked_key": "sk-***A1aB",
      "user_id": "user-1",
      "organization_id": "org-1",
      "name": "Production key",
      "scopes": ["read", "write"],
      "rate_limit": 600,
      "is_active": true,
      "expires_at": null,
      "created_at": "2026-03-05T10:30:00Z",
      "is_expired": false,
      "allowed_backends": ["openai", "anthropic"]
    }
  ],
  "summary": {
    "total": 1,
    "active": 1,
    "expired": 0,
    "disabled": 0
  }
}

Example¶

curl -s http://localhost:8080/admin/api-keys \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Create API Key¶

POST /admin/api-keys

Creates a new API key. If key is omitted, the router generates a cryptographically random value. The full key value is returned only in this response.

Request Body¶

{
  "id": "key-acme-1",
  "user_id": "user-acme",
  "organization_id": "org-acme",
  "name": "Acme integration",
  "description": "Server-to-server key for the Acme integration",
  "scopes": ["read", "write"],
  "rate_limit": 600,
  "enabled": true,
  "expires_at": "2027-01-01T00:00:00Z",
  "allowed_backends": ["openai"]
}

Field	Type	Required	Description
`id`	string	Yes	Unique key identifier (1–128 chars).
`user_id`	string	Yes	Associated user identifier (must be non-empty).
`organization_id`	string	Yes	Associated organization identifier (must be non-empty).
`key`	string	No	Custom key value. A new value is generated when omitted.
`name`	string	No	Human-readable label (max 256 chars).
`description`	string	No	Notes about the key (max 1024 chars).
`scopes`	array	No	Permissions; defaults to `["read", "write"]`. Must contain at least one scope.
`rate_limit`	integer	No	Per-key rate limit in requests per minute.
`enabled`	boolean	No	Whether the key is active; defaults to `true`.
`expires_at`	string (ISO 8601)	No	Expiration timestamp.
`allowed_backends`	array	No	Per-key backend allow-list. Empty or omitted means unrestricted.

Response¶

Returns 201 Created. The key field is the full value and is shown only here.

{
  "key": "sk-G7q2...full-value...A1",
  "masked_key": "sk-***A1aB",
  "id": "key-acme-1",
  "user_id": "user-acme",
  "organization_id": "org-acme",
  "name": "Acme integration",
  "scopes": ["read", "write"],
  "rate_limit": 600,
  "enabled": true,
  "created_at": "2026-03-05T10:30:00Z",
  "expires_at": "2027-01-01T00:00:00Z",
  "allowed_backends": ["openai"]
}

Error Responses¶

400 Bad Request: empty user_id/organization_id, no scopes, or a name/description over the length limit.
409 Conflict: a key with the same id already exists.
507 Insufficient Storage: the maximum key count (10,000) has been reached.

Example¶

curl -X POST http://localhost:8080/admin/api-keys \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "id": "key-acme-1",
    "user_id": "user-acme",
    "organization_id": "org-acme",
    "scopes": ["read", "write"],
    "rate_limit": 600
  }'

Get API Key¶

GET /admin/api-keys/{id}

Returns a single key by id, with its value masked.

Response¶

{
  "id": "key-acme-1",
  "masked_key": "sk-***A1aB",
  "user_id": "user-acme",
  "organization_id": "org-acme",
  "name": "Acme integration",
  "scopes": ["read", "write"],
  "rate_limit": 600,
  "is_active": true,
  "created_at": "2026-03-05T10:30:00Z",
  "expires_at": "2027-01-01T00:00:00Z",
  "is_expired": false,
  "is_valid": true,
  "allowed_backends": ["openai"]
}

The is_active, is_expired, and is_valid fields are computed: is_valid is true only when the key is active and not expired.

Error Responses¶

404 Not Found: no key with the given id.

Example¶

curl -s http://localhost:8080/admin/api-keys/key-acme-1 \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Update API Key¶

PUT /admin/api-keys/{id}

Updates one or more properties of an existing key. Only the fields present in the body are changed; omitted fields are left untouched. The key value itself is not changed by this endpoint (use Rotate for that).

Request Body¶

{
  "name": "Acme integration (renamed)",
  "scopes": ["read"],
  "rate_limit": 300,
  "enabled": true,
  "expires_at": "2027-06-01T00:00:00Z",
  "allowed_backends": ["openai", "anthropic"]
}

Field	Type	Description
`name`	string	New label.
`scopes`	array	Replacement scope list.
`rate_limit`	integer	New per-key rate limit.
`enabled`	boolean	Enable or disable the key.
`expires_at`	string (ISO 8601)	New expiration timestamp.
`allowed_backends`	array	Backend allow-list. `null` (omitted) leaves it unchanged; an empty array clears all restrictions; a non-empty array replaces the list.

Response¶

{
  "success": true,
  "action": "update",
  "key": {
    "id": "key-acme-1",
    "masked_key": "sk-***A1aB",
    "user_id": "user-acme",
    "organization_id": "org-acme",
    "name": "Acme integration (renamed)",
    "scopes": ["read"],
    "rate_limit": 300,
    "is_active": true,
    "created_at": "2026-03-05T10:30:00Z",
    "expires_at": "2027-06-01T00:00:00Z",
    "is_valid": true,
    "allowed_backends": ["openai", "anthropic"]
  }
}

Error Responses¶

404 Not Found: no key with the given id.

Example¶

curl -X PUT http://localhost:8080/admin/api-keys/key-acme-1 \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"rate_limit": 300, "scopes": ["read"]}'

Delete API Key¶

DELETE /admin/api-keys/{id}

Permanently revokes and removes a key. After deletion, any client still presenting the old value fails authentication. This action is irreversible.

Response¶

{
  "success": true,
  "action": "delete",
  "id": "key-acme-1"
}

Error Responses¶

404 Not Found: no key with the given id.

Example¶

curl -X DELETE http://localhost:8080/admin/api-keys/key-acme-1 \
  -H "Authorization: Bearer $ADMIN_TOKEN"

Rotate API Key¶

POST /admin/api-keys/{id}/rotate

Generates a new secret value for an existing key while preserving its id and all other properties. The previous value stops working immediately. The new value is returned only in this response.

Response¶

{
  "success": true,
  "action": "rotate",
  "id": "key-acme-1",
  "new_key": "sk-Hq9z...new-full-value...B2",
  "masked_key": "sk-***B2cD",
  "warning": "Store this key securely. It will not be shown again."
}

Error Responses¶

404 Not Found: no key with the given id.

Example¶

curl -X POST http://localhost:8080/admin/api-keys/key-acme-1/rotate \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Enable API Key¶

POST /admin/api-keys/{id}/enable

Marks a key as active. A re-enabled key authenticates again, provided it has not expired.

Response¶

{
  "success": true,
  "action": "enable",
  "id": "key-acme-1"
}

Error Responses¶

404 Not Found: no key with the given id.

Example¶

curl -X POST http://localhost:8080/admin/api-keys/key-acme-1/enable \
  -H "Authorization: Bearer $ADMIN_TOKEN"

Disable API Key¶

POST /admin/api-keys/{id}/disable

Marks a key as inactive without deleting it. A disabled key fails authentication but keeps its configuration, so it can be re-enabled later. Use this for a reversible suspension instead of Delete.

Response¶

{
  "success": true,
  "action": "disable",
  "id": "key-acme-1"
}

Error Responses¶

404 Not Found: no key with the given id.

Example¶

curl -X POST http://localhost:8080/admin/api-keys/key-acme-1/disable \
  -H "Authorization: Bearer $ADMIN_TOKEN"

Statistics APIs¶

The Statistics APIs expose aggregated request metrics collected by the StatsCollector. All endpoints are mounted under /admin/stats and share the same authentication as the rest of the Admin API. Alongside the overall, per-model, and per-backend breakdowns, the collector also tracks per-API-key and per-user usage (see Per-API-Key and Per-User Statistics).

Stats collection is enabled by default. It can be configured or disabled via the admin.stats section in your YAML config:

admin:
  stats:
    enabled: true                # Enable/disable collection (default: true)
    retention_window: 24h        # Ring-buffer retention for windowed queries (default: 24h)
    token_tracking: true         # Parse response bodies for token usage (default: true)
    persistence:
      enabled: true              # Enable stats persistence across restarts (default: true)
      path: ./data/stats.json    # File path for the snapshot (default: ./data/stats.json)
      snapshot_interval: 5m      # How often to write periodic snapshots (default: 5m)
      max_age: 7d                # Discard snapshots older than this on startup (default: 7d)

The retention_window and token_tracking settings support hot-reload: changes are applied immediately without a restart.

Stats Persistence¶

When the persistence subsection is present and enabled is true, the router saves a statistics snapshot to disk periodically and restores it on startup. This ensures that request counters, per-model breakdowns, and the latency ring buffer survive restarts.

How it works:

On startup, the router reads the snapshot file and restores all counters and ring-buffer records. Uptime resets to zero on each restart.
A background task writes a new snapshot every snapshot_interval. Writes are atomic (temp file + rename) to prevent corruption.
On graceful shutdown (SIGTERM/SIGINT), a final snapshot is saved before the process exits.
If the snapshot file is missing, corrupted, or older than max_age, the router starts with fresh counters and logs a warning or info message.

Supported duration formats for snapshot_interval and max_age:

Format	Example	Meaning
`Xs`	`30s`	30 seconds
`Xm`	`5m`	5 minutes
`Xh`	`1h`	1 hour
`Xd`	`7d`	7 days

Set max_age to "0" or "" to disable staleness checks (always restore regardless of age).

Get Full Statistics¶

GET /admin/stats

Returns overall, per-model, and per-backend statistics.

Query Parameters¶

Parameter	Type	Description
`window`	string	Optional time window filter. Accepted formats: `30m`, `1h`, `24h`, `7d`. Omit for all-time totals.

Response¶

{
  "uptime_seconds": 3600,
  "window": "all",
  "overall": {
    "total_requests": 1500,
    "successful_requests": 1480,
    "failed_requests": 20,
    "avg_latency_ms": 145.3,
    "p50_latency_ms": 120.0,
    "p95_latency_ms": 380.0,
    "p99_latency_ms": 750.0,
    "total_prompt_tokens": 450000,
    "total_completion_tokens": 180000,
    "total_tokens": 630000,
    "tokens_per_sec_avg": 87.4
  },
  "models": [
    {
      "model_id": "gpt-4",
      "total_requests": 900,
      "successful_requests": 895,
      "failed_requests": 5,
      "total_prompt_tokens": 270000,
      "total_completion_tokens": 108000,
      "total_tokens": 378000,
      "avg_latency_ms": 160.2,
      "avg_tokens_per_sec": 92.1,
      "last_used": "2026-03-05T10:30:00Z"
    }
  ],
  "backends": [
    {
      "backend_name": "openai",
      "total_requests": 900,
      "successful_requests": 895,
      "failed_requests": 5,
      "avg_latency_ms": 160.2,
      "health_status": "healthy"
    }
  ]
}

Example¶

# All-time statistics
curl -s http://localhost:8080/admin/stats \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

# Last hour only
curl -s "http://localhost:8080/admin/stats?window=1h" \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Get Per-Model Statistics¶

GET /admin/stats/models

Returns only the per-model breakdown (subset of the full stats response).

Response¶

{
  "models": [
    {
      "model_id": "gpt-4",
      "total_requests": 900,
      "successful_requests": 895,
      "failed_requests": 5,
      "total_prompt_tokens": 270000,
      "total_completion_tokens": 108000,
      "total_tokens": 378000,
      "avg_latency_ms": 160.2,
      "avg_tokens_per_sec": 92.1,
      "last_used": "2026-03-05T10:30:00Z"
    }
  ]
}

Models are sorted by total_requests in descending order.

Example¶

curl -s http://localhost:8080/admin/stats/models \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq '.models[].model_id'

Get Per-Backend Statistics¶

GET /admin/stats/backends

Returns only the per-backend breakdown. The health_status field is populated from the health checker ("healthy", "unhealthy", or "unknown" when health checks are disabled).

Response¶

{
  "backends": [
    {
      "backend_name": "openai",
      "total_requests": 900,
      "successful_requests": 895,
      "failed_requests": 5,
      "avg_latency_ms": 160.2,
      "health_status": "healthy"
    }
  ]
}

Backends are sorted by total_requests in descending order.

Example¶

curl -s http://localhost:8080/admin/stats/backends \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Per-API-Key and Per-User Statistics¶

These endpoints break down usage by the API key that authenticated each request and by the user attached to that key. They sit beside Get Per-Model Statistics and Get Per-Backend Statistics: same collector, a different grouping dimension.

Identifier and Bucketing Semantics¶

Coverage: every inference surface contributes to these statistics — /v1/chat/completions, /anthropic/v1/messages, and the OpenAI Responses API (/v1/responses, including its pass-through, Chat-Completions-conversion, and Anthropic-conversion strategies). Successful non-streaming requests carry full token usage; streaming requests are recorded at connect time (request counts and per-key/per-user attribution, with token totals omitted because they are only known once the stream completes).
api_key_id is a derived, non-reversible identifier, never a raw key. It is the same value used as the api_key_id Prometheus label, and it corresponds to the issued key's id. The per-user endpoints key on the user_id attached to the matched key. The derived api_key_id requires the metrics feature to be compiled in; without it, per-key attribution collapses to the "anonymous" bucket (per-user attribution is unaffected, since it reads the key's user_id directly).
Requests with no key (or no associated user) are bucketed under "anonymous".
Each dimension has a cardinality cap of 1000 distinct identifiers (excluding the reserved buckets). Once the cap is reached, further new identifiers are folded into an "unknown" overflow bucket so their usage is still counted in aggregate.
The window query parameter is accepted and echoed back in the response for consistency with GET /admin/stats, but the per-key and per-user aggregates are all-time totals, exactly like GET /admin/stats/models. The identifier is resolved off the request hot path, so it is not present on the windowed ring-buffer records used for time-filtered latency percentiles.

The ApiKeyStats and UserStats objects share the same shape:

Field	Type	Description
`api_key_id` / `user_id`	string	The derived key identifier or the user identifier.
`total_requests`	integer	Total requests attributed to this identifier.
`successful_requests`	integer	Requests that completed successfully.
`failed_requests`	integer	Requests that failed.
`total_prompt_tokens`	integer	Prompt tokens consumed.
`total_completion_tokens`	integer	Completion tokens produced.
`total_tokens`	integer	Sum of prompt and completion tokens.
`avg_latency_ms`	number	Average latency in milliseconds.
`avg_tokens_per_sec`	number	Average generation throughput in tokens per second.
`last_used`	string (ISO 8601) or null	Timestamp of the most recent request, or `null` if never used.

Get Per-API-Key Statistics¶

GET /admin/stats/api-keys

Returns one entry per tracked API key, sorted by total_requests in descending order.

Query Parameters¶

Parameter	Type	Description
`window`	string	Accepted and echoed in the `window` field, but does not filter the all-time aggregates.

Response¶

{
  "window": "all",
  "api_keys": [
    {
      "api_key_id": "k_3f9a1c",
      "total_requests": 1200,
      "successful_requests": 1185,
      "failed_requests": 15,
      "total_prompt_tokens": 360000,
      "total_completion_tokens": 144000,
      "total_tokens": 504000,
      "avg_latency_ms": 152.7,
      "avg_tokens_per_sec": 88.3,
      "last_used": "2026-03-05T10:30:00Z"
    },
    {
      "api_key_id": "anonymous",
      "total_requests": 80,
      "successful_requests": 80,
      "failed_requests": 0,
      "total_prompt_tokens": 12000,
      "total_completion_tokens": 4800,
      "total_tokens": 16800,
      "avg_latency_ms": 131.0,
      "avg_tokens_per_sec": 90.1,
      "last_used": "2026-03-05T10:28:00Z"
    }
  ]
}

Example¶

curl -s http://localhost:8080/admin/stats/api-keys \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

# The window param is accepted and echoed but does not change the aggregates
curl -s "http://localhost:8080/admin/stats/api-keys?window=24h" \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq '.window'

Get Per-API-Key Statistics by ID¶

GET /admin/stats/api-keys/{id}

Returns the stats for a single api_key_id (the derived identifier returned by the list endpoint, not a raw key). Returns 404 Not Found when the identifier has no recorded usage.

Response¶

{
  "window": "all",
  "api_key": {
    "api_key_id": "k_3f9a1c",
    "total_requests": 1200,
    "successful_requests": 1185,
    "failed_requests": 15,
    "total_prompt_tokens": 360000,
    "total_completion_tokens": 144000,
    "total_tokens": 504000,
    "avg_latency_ms": 152.7,
    "avg_tokens_per_sec": 88.3,
    "last_used": "2026-03-05T10:30:00Z"
  }
}

Example¶

curl -s http://localhost:8080/admin/stats/api-keys/k_3f9a1c \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Get Per-User Statistics¶

GET /admin/stats/users

Returns one entry per tracked user identifier (the user_id attached to the matched key), sorted by total_requests in descending order. Same fields and bucketing rules as the per-API-key endpoint.

Response¶

{
  "window": "all",
  "users": [
    {
      "user_id": "user-acme",
      "total_requests": 1200,
      "successful_requests": 1185,
      "failed_requests": 15,
      "total_prompt_tokens": 360000,
      "total_completion_tokens": 144000,
      "total_tokens": 504000,
      "avg_latency_ms": 152.7,
      "avg_tokens_per_sec": 88.3,
      "last_used": "2026-03-05T10:30:00Z"
    }
  ]
}

Example¶

curl -s http://localhost:8080/admin/stats/users \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Get Per-User Statistics by ID¶

GET /admin/stats/users/{user_id}

Returns the stats for a single user_id. Returns 404 Not Found when the identifier has no recorded usage.

Response¶

{
  "window": "all",
  "user": {
    "user_id": "user-acme",
    "total_requests": 1200,
    "successful_requests": 1185,
    "failed_requests": 15,
    "total_prompt_tokens": 360000,
    "total_completion_tokens": 144000,
    "total_tokens": 504000,
    "avg_latency_ms": 152.7,
    "avg_tokens_per_sec": 88.3,
    "last_used": "2026-03-05T10:30:00Z"
  }
}

Example¶

curl -s http://localhost:8080/admin/stats/users/user-acme \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Per-Model Breakdown and Usage Time Series¶

These per-identifier drill-downs power dashboard widgets: a per-model breakdown (a "tokens by model" donut) and a daily usage trend (a usage-over-time chart). They are tracked as two independent dimensions, not a per-(identifier, model, date) cube, so cardinality stays bounded.

Scope and semantics carry over from Per-API-Key and Per-User Statistics:

Only token and request totals are tracked. There is no cost field; the dashboard derives cost from tokens against its own pricing table.
api_key_id is the derived, non-reversible identifier (never a raw key); user_id is the user attached to the matched key. Unknown identifiers return 200 OK with an empty array, matching the list endpoints rather than returning 404.
Each new dimension has its own cardinality cap (folding overflow into an aggregate "unknown" bucket that is excluded from per-identifier reads), and the unknown-model label is "unknown".

Get Per-API-Key Model Breakdown¶

GET /admin/stats/api-keys/{id}/models

Returns the per-model breakdown for a single api_key_id as a models array of the same ModelStats objects used by GET /admin/stats/models (model id, request counts, prompt/completion/total tokens, average latency, average tokens-per-second, last used), sorted by total_requests descending. The window query parameter is accepted and echoed but does not filter these all-time aggregates.

Response¶

{
  "api_key_id": "k_3f9a1c",
  "window": "all",
  "models": [
    {
      "model_id": "claude-haiku-4-5",
      "total_requests": 2,
      "successful_requests": 2,
      "failed_requests": 0,
      "total_prompt_tokens": 374,
      "total_completion_tokens": 8,
      "total_tokens": 382,
      "avg_latency_ms": 975.0,
      "avg_tokens_per_sec": 195.9,
      "last_used": "2026-06-18T22:11:54Z"
    }
  ]
}

Example¶

curl -s http://localhost:8080/admin/stats/api-keys/k_3f9a1c/models \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Get Per-User Model Breakdown¶

GET /admin/stats/users/{user_id}/models

Same shape as the per-API-key model breakdown, grouped by user_id.

Response¶

{
  "user_id": "user-acme",
  "window": "all",
  "models": [
    {
      "model_id": "claude-haiku-4-5",
      "total_requests": 2,
      "successful_requests": 2,
      "failed_requests": 0,
      "total_prompt_tokens": 374,
      "total_completion_tokens": 8,
      "total_tokens": 382,
      "avg_latency_ms": 975.0,
      "avg_tokens_per_sec": 195.9,
      "last_used": "2026-06-18T22:11:54Z"
    }
  ]
}

Example¶

curl -s http://localhost:8080/admin/stats/users/user-acme/models \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Get Per-API-Key Usage Time Series¶

GET /admin/stats/api-keys/{id}/series?from=&to=&interval=day

Returns a daily usage series for a single api_key_id, one point per UTC calendar day, sorted ascending by date. Buckets are retained for series_retention_days (default 30); the periodic snapshot task prunes older days and the read path filters them out, so the series never returns days beyond the retention window.

Query Parameters¶

Parameter	Type	Description
`from`	string	Inclusive lower bound, as a Unix-millis integer or an RFC 3339 timestamp. Defaults to 30 days ago.
`to`	string	Exclusive upper bound, same formats as `from`. Defaults to now.
`interval`	string	Bucket granularity. Only `day` is supported; any other value returns `400 Bad Request`. Defaults to `day`.

An inverted range (from >= to) also returns 400 Bad Request.

Response¶

{
  "api_key_id": "k_3f9a1c",
  "interval": "day",
  "series": [
    { "date": "2026-06-17", "total_requests": 12, "prompt_tokens": 3600, "completion_tokens": 1440, "total_tokens": 5040 },
    { "date": "2026-06-18", "total_requests": 8, "prompt_tokens": 2400, "completion_tokens": 960, "total_tokens": 3360 }
  ]
}

Example¶

curl -s "http://localhost:8080/admin/stats/api-keys/k_3f9a1c/series?from=2026-06-01T00:00:00Z&to=2026-06-30T00:00:00Z&interval=day" \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Get Per-User Usage Time Series¶

GET /admin/stats/users/{user_id}/series?from=&to=&interval=day

Same shape and parameters as the per-API-key series, grouped by user_id.

Response¶

{
  "user_id": "user-acme",
  "interval": "day",
  "series": [
    { "date": "2026-06-17", "total_requests": 12, "prompt_tokens": 3600, "completion_tokens": 1440, "total_tokens": 5040 },
    { "date": "2026-06-18", "total_requests": 8, "prompt_tokens": 2400, "completion_tokens": 960, "total_tokens": 3360 }
  ]
}

Example¶

curl -s "http://localhost:8080/admin/stats/users/user-acme/series?from=2026-06-01T00:00:00Z&to=2026-06-30T00:00:00Z" \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Reset Statistics¶

POST /admin/stats/reset

Resets all counters, per-model records, per-backend records, the per-API-key and per-user records (including their per-model breakdowns and daily time-series buckets), and the latency ring buffer. This action is irreversible.

Response¶

{
  "success": true,
  "action": "reset",
  "message": "Statistics counters have been reset"
}

Example¶

curl -X POST http://localhost:8080/admin/stats/reset \
  -H "Authorization: Bearer $ADMIN_TOKEN"

Persistent Metrics Log API¶

The Persistent Metrics Log API exposes recent Prometheus registry history persisted to a local store (default: SQLite). See the Persistent Metrics Log guide for storage layout, retention math, and configuration.

Get Metrics History¶

GET /admin/metrics/history?metric=<name>&from=<ts>&to=<ts>&limit=<n>

Returns historical samples for metric over a half-open time window [from, to).

Query parameters¶

Parameter	Required	Default	Notes
`metric`	yes	—	Metric family name, e.g. `http_requests_total`.
`from`	no	now − 24h	Unix milliseconds (int) or RFC 3339 timestamp.
`to`	no	now	Unix milliseconds (int) or RFC 3339 timestamp.
`limit`	no	10,000	Cap on returned rows. Hard ceiling 100,000.

Response¶

{
  "metric": "http_requests_total",
  "from_ms": 1715385600000,
  "to_ms": 1715472000000,
  "row_count": 2,
  "limit": 10000,
  "samples": [
    {
      "ts_ms": 1715385600000,
      "labels": {"backend": "openai", "endpoint": "/v1/chat/completions"},
      "value": 42.0,
      "kind": "counter"
    }
  ]
}

Histograms and summaries return multiple kind rows per family — see the Persistent Metrics Log guide.

Error responses¶

400 Bad Request — metric missing or oversized, or time range non-positive.
404 Not Found — persistence is disabled (metrics.persistence.enabled: false).
500 Internal Server Error — storage error.
503 Service Unavailable — metrics-persistence feature was not compiled in.

Example¶

curl -s 'http://localhost:8080/admin/metrics/history?metric=http_requests_total&limit=100' \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq .

Response Cache Admin APIs¶

The Response Cache Admin APIs expose statistics and invalidation operations for the response cache. All endpoints are mounted under /admin/response-cache and require the same authentication as the rest of the Admin API.

Response caching is configured in the response_cache section of your YAML config. See the Response Cache Configuration guide for full configuration details.

Get Response Cache Statistics¶

GET /admin/response-cache/stats

Returns current response cache statistics including hit/miss counts, memory usage, and configuration summary.

Response¶

{
  "enabled": true,
  "backend_type": "memory",
  "entries": 42,
  "capacity": 1000,
  "requests": {
    "hit": 120,
    "miss": 80,
    "skip": 15,
    "total": 215
  },
  "hit_rate": "0.6000",
  "evictions": 3,
  "size_bytes": 1048576,
  "config": {
    "backend": "memory",
    "ttl": "5m",
    "capacity": 1000,
    "max_response_size": 1048576,
    "max_stream_buffer_size": 10485760
  }
}

When using the Redis backend (backend: redis), the response includes an additional redis object:

{
  "enabled": true,
  "backend_type": "redis",
  "entries": 42,
  "capacity": 1000,
  "requests": { "hit": 120, "miss": 80, "skip": 15, "total": 215 },
  "hit_rate": "0.6000",
  "evictions": 3,
  "size_bytes": 1048576,
  "config": { "backend": "redis", "ttl": "5m", "capacity": 1000, "max_response_size": 1048576, "max_stream_buffer_size": 10485760 },
  "redis": {
    "connections": { "active": 3, "idle": 5 },
    "errors": { "connection": 0, "timeout": 0, "other": 0, "total": 0 },
    "fallback_active": false
  }
}

When response caching is disabled (response_cache.enabled: false or the section is absent), enabled is false, entries and capacity are 0, and config is null.

Response Fields¶

Field	Type	Description
`enabled`	boolean	Whether response caching is active
`backend_type`	string	Active cache backend: `"memory"` or `"redis"`
`entries`	integer	Current number of cached entries
`capacity`	integer	Maximum cache capacity (LRU limit)
`requests.hit`	integer	Requests served from cache
`requests.miss`	integer	Cache misses (backend was called, entry stored)
`requests.skip`	integer	Non-cacheable requests (e.g., temperature > 0)
`requests.total`	integer	Total cacheable lookups (hit + miss + skip)
`hit_rate`	string	Rolling cache hit rate as a decimal string (e.g., `"0.6000"`)
`evictions`	integer	Total LRU evictions since startup
`size_bytes`	integer	Approximate memory usage of cached entries in bytes
`config`	object or null	Active configuration summary; `null` when disabled
`redis`	object or absent	Redis-specific stats (only present when `backend_type` is `"redis"`)
`redis.connections.active`	integer	Active connections in the Redis pool
`redis.connections.idle`	integer	Idle connections in the Redis pool
`redis.errors.connection`	integer	Redis connection errors since startup
`redis.errors.timeout`	integer	Redis command timeout errors since startup
`redis.errors.other`	integer	Other Redis errors since startup
`redis.errors.total`	integer	Total Redis errors since startup
`redis.fallback_active`	boolean	Whether the in-memory fallback is currently active

Example¶

curl -s http://localhost:8080/admin/response-cache/stats \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Invalidate Response Cache¶

POST /admin/response-cache/invalidate

Clears cache entries. Only full cache invalidation via clear_all: true is supported; targeted invalidation by model or tenant is not available.

Request Body¶

{
  "clear_all": true,
  "model": "gpt-4",
  "tenant_id": "tenant-abc"
}

Field	Type	Required	Description
`clear_all`	boolean	No	When `true`, clears the entire cache. Defaults to `false`.
`model`	string	No	Accepted but currently ignored; only `clear_all` is honored. Must not exceed 256 characters.
`tenant_id`	string	No	Accepted but currently ignored; only `clear_all` is honored. Must not exceed 256 characters.

Response (clear_all: true)¶

{
  "success": true,
  "action": "clear_all",
  "cleared_entries": 42
}

Response (clear_all: false or omitted)¶

{
  "success": true,
  "action": "noop",
  "message": "Targeted invalidation by model/tenant_id is not yet supported. Use clear_all: true to clear the entire cache."
}

Response (cache disabled)¶

{
  "success": false,
  "error": "Response cache is not enabled"
}

Example¶

# Clear entire cache
curl -X POST http://localhost:8080/admin/response-cache/invalidate \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"clear_all": true}'

KV Cache Index Admin APIs¶

The KV Cache Index Admin APIs expose statistics, per-backend state, and a clear operation for the KV cache index subsystem. All endpoints are mounted under /admin/kv-index and require the same authentication as the rest of the Admin API.

The KV cache index tracks which backends hold cached KV data for specific token prefixes, enabling KV-aware routing. It is configured in the kv_cache_index section of your YAML config.

Get KV Cache Index Statistics¶

GET /admin/kv-index/stats

Returns overall KV cache index statistics, including index size, event source connection status, and routing decision counts.

Response¶

{
  "enabled": true,
  "config": {
    "backend": "memory",
    "max_entries": 100000,
    "entry_ttl_seconds": 600,
    "event_sources_count": 2,
    "scoring": {
      "overlap_weight": 0.6,
      "load_weight": 0.3,
      "health_weight": 0.1,
      "min_overlap_threshold": 0.3
    }
  },
  "index": {
    "prefix_count": 45,
    "entry_count": 120,
    "total_hits": 3842,
    "total_evictions": 12
  },
  "event_sources": [
    {
      "backend_name": "vllm-1",
      "connected": true,
      "events_received": 2100,
      "events_dropped": 0,
      "last_event_at": "2025-03-12T10:45:00Z",
      "reconnect_count": 0
    }
  ],
  "routing_decisions": {
    "kv_aware": 980,
    "fallback": 120,
    "total": 1100
  },
  "query_latency_count": 1100,
  "overlap_score_count": 980
}

When the KV cache index is disabled (kv_cache_index.enabled: false or the section is absent), enabled is false, config is null, and all counters are 0.

Response Fields¶

Field	Type	Description
`enabled`	boolean	Whether the KV cache index is active
`config`	object or null	Active configuration summary; `null` when disabled
`config.backend`	string	Index backend: `"memory"` or `"redis"`
`config.max_entries`	integer	Maximum tracked prefix hash entries
`config.entry_ttl_seconds`	integer	TTL for index entries in seconds
`config.event_sources_count`	integer	Number of configured event sources
`config.scoring`	object	Scoring weight configuration
`index.prefix_count`	integer	Number of distinct prefix hashes tracked
`index.entry_count`	integer	Total (prefix, backend) pairs tracked
`index.total_hits`	integer	Total cache hit recordings since startup
`index.total_evictions`	integer	Total cache eviction recordings since startup
`event_sources`	array	Status of each event source consumer
`event_sources[].connected`	boolean	Whether the consumer is currently connected
`event_sources[].events_received`	integer	Total events received from this source
`event_sources[].events_dropped`	integer	Events dropped due to backpressure
`event_sources[].reconnect_count`	integer	Number of reconnect attempts since startup
`routing_decisions.kv_aware`	integer	Requests routed using KV-aware selection
`routing_decisions.fallback`	integer	Requests that fell back to the default strategy
`routing_decisions.total`	integer	Total routing decisions made

Example¶

curl -s http://localhost:8080/admin/kv-index/stats \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Get Per-Backend KV Cache State¶

GET /admin/kv-index/backends

Returns per-backend KV cache event statistics, including events received, processed, dropped, connection status, and index event counts.

Response (enabled)¶

{
  "enabled": true,
  "backends": [
    {
      "backend_name": "vllm-1",
      "connection": {
        "connected": true,
        "reconnect_count": 0,
        "last_event_at": "2025-03-12T10:45:00Z"
      },
      "events": {
        "received": 2100,
        "dropped": 0,
        "index_created": 1950,
        "index_evicted": 150
      }
    },
    {
      "backend_name": "vllm-2",
      "connection": {
        "connected": false,
        "reconnect_count": 3,
        "last_event_at": null
      },
      "events": {
        "received": 0,
        "dropped": 0,
        "index_created": 0,
        "index_evicted": 0
      },
      "configured_endpoint": "ws://vllm-2:8000/v1/kv_events"
    }
  ]
}

Backends that appear in kv_cache_index.event_sources but have no active consumer yet are included with connected: false and a configured_endpoint field.

Response (disabled)¶

{
  "enabled": false,
  "backends": []
}

Response Fields¶

Field	Type	Description
`enabled`	boolean	Whether the KV cache index is active
`backends[].backend_name`	string	Backend identifier
`backends[].connection.connected`	boolean	Whether the event stream consumer is connected
`backends[].connection.reconnect_count`	integer	Reconnect attempts since startup
`backends[].connection.last_event_at`	string or null	ISO 8601 timestamp of the most recent event
`backends[].events.received`	integer	Total events received from this backend
`backends[].events.dropped`	integer	Events dropped due to backpressure
`backends[].events.index_created`	integer	Index entries created from events
`backends[].events.index_evicted`	integer	Index entries evicted from events
`backends[].configured_endpoint`	string	Configured endpoint URL (only present for inactive sources)

Example¶

curl -s http://localhost:8080/admin/kv-index/backends \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Clear KV Cache Index¶

POST /admin/kv-index/clear

Clears all entries from the KV cache index. Intended for debugging and testing. In production the index rebuilds automatically from incoming KV events.

Response (success)¶

{
  "success": true,
  "entries_before_clear": 120,
  "cleared_entries": 45
}

entries_before_clear is the total (prefix, backend) pair count before clearing. cleared_entries is the number of prefix hash buckets removed. For the Redis backend, cleared_entries counts the number of Redis keys deleted; because each key has a TTL, any remaining keys expire automatically.

Response (disabled)¶

{
  "success": false,
  "error": "KV cache index is not enabled"
}

Example¶

curl -X POST http://localhost:8080/admin/kv-index/clear \
  -H "Authorization: Bearer $ADMIN_TOKEN"

Smart Routing Admin APIs¶

The Smart Routing Admin APIs expose the model tier registry, letting you inspect which tier and domain profile the router assigns to each model, and update profiles at runtime without a restart. All endpoints are mounted under /admin/smart-routing and require the same authentication as the rest of the Admin API.

Smart routing is enabled by setting smart_routing.enabled: true in your YAML config. When disabled, the list endpoint still responds but reports "enabled": false and returns an empty profile list.

List Model Profiles¶

GET /admin/smart-routing/model-profiles

Returns all explicitly configured profiles plus any auto-inferred profiles that have been cached since startup.

Response¶

{
  "enabled": true,
  "default_tier": 2,
  "total": 3,
  "profiles": [
    {
      "model_id": "gpt-4o",
      "tier": 1,
      "tier_name": "flagship",
      "domains": ["general", "code", "reasoning"],
      "cost_per_1k_input_tokens": 0.005,
      "cost_per_1k_output_tokens": 0.015,
      "source": "explicit_exact"
    },
    {
      "model_id": "llama-3-8b-q4_K_M",
      "tier": 3,
      "tier_name": "lightweight",
      "domains": ["general"],
      "cost_per_1k_input_tokens": null,
      "cost_per_1k_output_tokens": null,
      "source": "explicit_pattern"
    }
  ]
}

When smart routing is disabled, enabled is false, profiles is [], and total is 0.

Response Fields¶

Field	Type	Description
`enabled`	boolean	Whether smart routing is active
`default_tier`	integer	Tier assigned when no profile matches (1, 2, or 3)
`total`	integer	Number of profiles returned
`profiles[].model_id`	string	The model identifier
`profiles[].tier`	integer	Numeric tier: 1 = Flagship, 2 = Standard, 3 = Lightweight
`profiles[].tier_name`	string	Human-readable tier name
`profiles[].domains`	array of strings	Domain specialization tags
`profiles[].cost_per_1k_input_tokens`	number or null	Input token cost per 1,000 tokens
`profiles[].cost_per_1k_output_tokens`	number or null	Output token cost per 1,000 tokens
`profiles[].source`	string	How the profile was resolved (see below)

source values:

Value	Meaning
`explicit_exact`	Profile was configured by exact model name
`explicit_pattern`	Profile was matched by a glob pattern
`auto_inferred`	Profile was inferred from pricing, capabilities, or name heuristics
`default`	No match found; default tier was used

Example¶

curl http://localhost:8080/admin/smart-routing/model-profiles \
  -H "Authorization: Bearer $ADMIN_TOKEN"

Get Model Profile¶

GET /admin/smart-routing/model-profiles/{model}

Returns the resolved profile for a specific model. If the model has metadata in model-metadata.yaml, auto-inference uses pricing and capability information from there. Otherwise, name heuristics apply.

Path Parameters¶

Parameter	Description
`model`	Model identifier (max 256 characters)

Response¶

{
  "model_id": "gemini-1.5-flash",
  "tier": 3,
  "tier_name": "lightweight",
  "domains": ["general"],
  "cost_per_1k_input_tokens": null,
  "cost_per_1k_output_tokens": null,
  "source": "auto_inferred"
}

Example¶

curl http://localhost:8080/admin/smart-routing/model-profiles/gpt-4o \
  -H "Authorization: Bearer $ADMIN_TOKEN"

Update Model Profiles¶

PUT /admin/smart-routing/model-profiles

Replaces all model profile configurations. The registry reloads immediately; the inferred-profile cache is cleared so subsequent requests re-evaluate against the new profiles. If a config_sender is available, the change is also propagated to the in-memory config.

Request Body¶

{
  "default_tier": 2,
  "model_profiles": [
    {
      "model": "gpt-4o",
      "tier": 1,
      "domains": ["general", "code", "reasoning"],
      "cost_per_1k_input_tokens": 0.005,
      "cost_per_1k_output_tokens": 0.015
    },
    {
      "model_pattern": "*-q4_K_M",
      "tier": 3,
      "domains": ["general"]
    }
  ]
}

Each entry must include either model (exact name) or model_pattern (glob). Entries with neither are rejected with 400 Bad Request. default_tier is optional; if omitted, the current default is preserved.

Request Fields¶

Field	Type	Required	Description
`model_profiles`	array	Yes	Profile list; replaces existing configuration
`model_profiles[].model`	string	Conditional	Exact model name (max 200 chars)
`model_profiles[].model_pattern`	string	Conditional	Glob pattern such as `*-q4_K_M` (max 200 chars)
`model_profiles[].tier`	integer	Yes	1 (Flagship), 2 (Standard), or 3 (Lightweight)
`model_profiles[].domains`	array of strings	No	Domain tags: `general`, `code`, `reasoning`, `creative`, `multilingual`, `vision`
`model_profiles[].cost_per_1k_input_tokens`	number	No	Input cost per 1,000 tokens
`model_profiles[].cost_per_1k_output_tokens`	number	No	Output cost per 1,000 tokens
`default_tier`	integer	No	Fallback tier when no profile matches

Response¶

{
  "status": "updated",
  "profiles_count": 2,
  "default_tier": 2
}

Example¶

curl -X PUT http://localhost:8080/admin/smart-routing/model-profiles \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model_profiles": [
      {"model": "gpt-4o", "tier": 1, "domains": ["general", "code"]},
      {"model_pattern": "*-mini", "tier": 3, "domains": ["general"]}
    ]
  }'

Smart Routing Status¶

GET /admin/smart-routing/status

Returns overall smart routing status including enabled state, load state, classifier method, and policy count.

Response¶

{
  "enabled": true,
  "virtual_model": "auto",
  "intercept_all": false,
  "default_tier": 2,
  "classifier_method": "rule",
  "has_llm_classifier": false,
  "load_state": "normal",
  "load_monitoring_enabled": false,
  "debug_headers": false,
  "policy_count": 5,
  "profile_count": 3
}

Smart Routing Stats¶

GET /admin/smart-routing/stats

Returns aggregated routing statistics including profile count, policy count, and LLM classifier cache info.

Classify (Diagnostic)¶

POST /admin/smart-routing/classify

Classify a request without routing it. Useful for debugging classification behavior.

Request¶

{
  "payload": {
    "model": "auto",
    "messages": [{"role": "user", "content": "Hello, world!"}]
  }
}

Response¶

{
  "complexity": "trivial",
  "domain": "general",
  "confidence": 0.95,
  "classifier_type": "rule_based",
  "required_capabilities": [],
  "reasoning": null,
  "signals": [
    {"name": "message_length", "strength": 0.1, "influences": "complexity"}
  ]
}

Simulate (Diagnostic)¶

POST /admin/smart-routing/simulate

Simulate the full routing pipeline (classification + policy evaluation + model selection + load state) without actually forwarding the request. Returns the complete routing decision chain.

Request¶

Same as the classify endpoint.

Response¶

{
  "routed": true,
  "target_model": "gpt-4o-mini",
  "classification": {
    "complexity": "simple",
    "domain": "general",
    "confidence": 0.92,
    "classifier_type": "rule_based"
  },
  "policy": {
    "name": "trivial_to_lightweight",
    "tier": 3,
    "prefer_domains": [],
    "require_capabilities": []
  },
  "load_state": "normal",
  "classification_duration_ms": 0.05,
  "available_models": 5
}

List Routing Policies¶

GET /admin/smart-routing/policies

Returns the currently active routing policies with their conditions and targets.

Update Routing Policies¶

PUT /admin/smart-routing/policies

Hot-reload routing policies at runtime.

Request¶

{
  "routing_policies": [
    {
      "name": "all_to_flagship",
      "when": {},
      "route_to": {"tier": 1}
    }
  ],
  "virtual_model": "auto",
  "intercept_all": false
}

Load State¶

GET /admin/smart-routing/load-state

Returns the current load state with assessment details.

Response¶

{
  "enabled": true,
  "state": "normal",
  "max_tier": null,
  "prefer_quantized": false,
  "reject_expert": false
}

Cache Stats¶

GET /admin/smart-routing/cache/stats

Returns LLM classifier cache statistics.

Response¶

{
  "available": true,
  "entries": 42,
  "capacity": 10000,
  "ttl_seconds": 300
}

Clear Cache¶

POST /admin/smart-routing/cache/clear

Clear all entries from the LLM classifier cache.

Response¶

{
  "status": "cleared",
  "entries_removed": 42
}

Guardrail Admin APIs¶

The Guardrail Admin APIs let you inspect and adjust the content-safety guardrail policy at runtime without a restart. Changes propagate through the same hot-reload config channel that the running GuardrailService subscribes to, so a mode switch, an enabled toggle, a threshold change, or a route override takes effect on the live request path immediately. All endpoints are mounted under /admin/guardrails and require the same authentication and audit logging as the rest of the Admin API.

The guardrail provider set itself is defined in the configuration file; these endpoints toggle and tune the existing providers and the global/per-route policy. They do not create or remove providers.

Get Guardrail Policy¶

GET /admin/guardrails

Returns the effective guardrail policy and a status summary. Secrets (the bypass_api_keys list) are masked. service_active is false when guardrails were disabled at startup and no service is running; in that case the returned policy is the configured policy but no checks execute.

Response¶

{
  "enabled": true,
  "mode": "enforce",
  "service_active": true,
  "registered_providers": ["openai-moderation", "llama-guard"],
  "policy": {
    "enabled": true,
    "mode": "enforce",
    "timeout_ms": 2000,
    "on_error": "fail_open",
    "block_behavior": "content_filter",
    "streaming_mode": "buffer_full",
    "providers": [ ... ],
    "routes": { ... },
    "bypass_api_keys": ["su...(24 chars)"],
    "allow": { "exact": [], "regex": [] },
    "deny": { "exact": [], "regex": [] }
  }
}

Example¶

curl http://localhost:8080/admin/guardrails \
  -H "Authorization: Bearer <admin-token>"

Update Guardrail Policy¶

PATCH /admin/guardrails

Partially updates the global guardrail policy. Every field is optional; only the provided fields change. Providers and per-route overrides are managed through their own endpoints below. The candidate policy is validated before it is applied; an invalid change (e.g. timeout_ms: 0, or enabling enforce mode with no providers) returns 400 and leaves the running policy unchanged.

Request Body¶

Field	Type	Description
`enabled`	boolean	Toggle guardrails on/off globally
`mode`	string	`monitor` or `enforce`
`timeout_ms`	integer	Global guardrail timeout in milliseconds
`on_error`	string	`fail_open` or `fail_closed`
`block_behavior`	string	`content_filter`, `error`, or `refusal_message`
`streaming_mode`	string	`buffer_full`, `chunked`, or `passthrough`
`streaming_chunk_size`	integer	`chunked`: characters of new text to accumulate before each incremental check (default `200`)
`streaming_context_size`	integer	`chunked`: trailing characters carried into each check for cross-boundary context (default `50`)
`streaming_stream_first`	boolean	`chunked`: emit each window before checking it (`true`) or check before emitting (`false`, default)
`allow`	object	Replace the global allow list (`{ "exact": [], "regex": [] }`)
`deny`	object	Replace the global deny list
`bypass_api_keys`	array	Replace the bypass API key list

Response¶

{
  "status": "updated",
  "enabled": true,
  "mode": "enforce"
}

Example¶

curl -X PATCH http://localhost:8080/admin/guardrails \
  -H "Authorization: Bearer <admin-token>" \
  -H "Content-Type: application/json" \
  -d '{"mode": "enforce"}'

Update Guardrail Provider¶

PUT /admin/guardrails/providers/{name}

Updates the runtime settings of a single configured provider. All fields are optional. Returns 404 if no provider with the given name is configured.

Request Body¶

Field	Type	Description
`enabled`	boolean	Enable or disable this provider
`category_thresholds`	object	Replace the per-category score thresholds (`{ "violence": 0.8 }`)
`timeout_ms`	integer or null	Set or clear the per-provider timeout override
`on_error`	string or null	Set or clear the per-provider error policy override

Response¶

{
  "status": "updated",
  "provider": "llama-guard",
  "enabled": false
}

Example¶

curl -X PUT http://localhost:8080/admin/guardrails/providers/llama-guard \
  -H "Authorization: Bearer <admin-token>" \
  -H "Content-Type: application/json" \
  -d '{"enabled": false}'

Set Guardrail Route Override¶

PUT /admin/guardrails/routes/{route}

Creates or replaces the per-route guardrail override for the given route. The request body is a route override object; any omitted field inherits the global policy.

Request Body¶

Field	Type	Description
`mode`	string	Override the operating mode for this route
`enabled`	boolean	Override whether guardrails run for this route
`providers`	array	Restrict this route to a subset of provider names
`category_thresholds`	object	Per-route category thresholds
`allow`	object	Route-specific allow list
`deny`	object	Route-specific deny list

Response¶

{
  "status": "updated",
  "route": "gpt-4o"
}

Example¶

curl -X PUT http://localhost:8080/admin/guardrails/routes/gpt-4o \
  -H "Authorization: Bearer <admin-token>" \
  -H "Content-Type: application/json" \
  -d '{"mode": "monitor"}'

Delete Guardrail Route Override¶

DELETE /admin/guardrails/routes/{route}

Removes the per-route override, falling the route back to the global policy. Returns 404 if no override is configured for the route.

Response¶

{
  "status": "deleted",
  "route": "gpt-4o"
}

Test Guardrails (Dry Run)¶

POST /admin/guardrails/test

Diagnostic endpoint for threshold tuning. Runs every registered provider against the supplied sample text and returns each provider's verdict plus the aggregated most-severe-wins verdict. The dry run ignores the global mode and the bypass list so the raw provider output is visible; disabled providers (and those that do not apply to the requested stage) are reported as skipped. Returns 400 when no guardrail service is active.

Request Body¶

Field	Type	Description
`text`	string	The sample text to evaluate (required)
`stage`	string	`input` (default) or `output`
`model`	string	Optional model identifier for the evaluation context
`route`	string	Optional route name for the evaluation context

Response¶

{
  "stage": "input",
  "providers": [
    {
      "provider": "openai-moderation",
      "skipped": false,
      "verdict": { "verdict": "allow" }
    },
    {
      "provider": "llama-guard",
      "skipped": false,
      "verdict": {
        "verdict": "block",
        "category": "violence",
        "score": 0.97,
        "reason": "..."
      }
    }
  ],
  "aggregated": {
    "verdict": "block",
    "category": "violence",
    "score": 0.97,
    "reason": "..."
  }
}

Example¶

curl -X POST http://localhost:8080/admin/guardrails/test \
  -H "Authorization: Bearer <admin-token>" \
  -H "Content-Type: application/json" \
  -d '{"text": "sample prompt to evaluate", "stage": "input"}'

Data Models¶

Configuration Sections¶

Section	Description	Hot Reload
`server`	Bind address, workers, connection pool	Requires restart
`backends`	Backend URLs, weights, models	Gradual
`health_checks`	Intervals, thresholds	Gradual
`logging`	Log level, format, output	Immediate
`retry`	Max attempts, delays, backoff	Immediate
`timeouts`	Connect, request, idle timeouts	Gradual
`rate_limiting`	Limits, storage, whitelist	Immediate
`circuit_breaker`	Thresholds, recovery time	Immediate
`global_prompts`	System prompt injection	Immediate
`fallback`	Fallback chains, policies	Gradual
`files`	Files API settings	Gradual
`api_keys`	API key configuration	Immediate
`metrics`	Prometheus, labels	Gradual
`admin`	Admin API settings	Gradual
`admin.stats`	Stats collection settings	Immediate
`routing`	Model routing rules	Gradual
`smart_routing`	Model tier registry and profiles	Immediate

Backend Object¶

{
  "name": "string",
  "url": "string (http:// or https://)",
  "api_key": "string (optional, masked in responses)",
  "weight": "integer (1-100)",
  "models": ["string"],
  "enabled": "boolean",
  "health_check": {
    "enabled": "boolean",
    "path": "string",
    "interval": "string (duration)"
  }
}

History Entry Object¶

{
  "version": "integer",
  "timestamp": "string (ISO 8601)",
  "sections_changed": ["string"],
  "source": "string (api|file_reload|initial|rollback)",
  "user": "string",
  "description": "string (optional)",
  "rollback_available": "boolean"
}

Validation Result Object¶

{
  "valid": "boolean",
  "errors": [
    {
      "field": "string",
      "message": "string",
      "code": "string"
    }
  ],
  "warnings": [
    {
      "field": "string",
      "message": "string"
    }
  ]
}

Hot Reload Behavior¶

Update Types¶

Type	Behavior	Sections
Immediate	Applied instantly, no disruption	logging, rate_limiting, circuit_breaker, retry, global_prompts, api_keys
Gradual	Existing connections maintained, new connections use new config	backends, health_checks, timeouts, fallback, files, metrics, admin, routing
Requires Restart	Logged as warning, requires server restart	server.bind_address, server.workers

Example Workflow¶

# 1. Check current configuration
curl -s http://localhost:8080/admin/config/logging | jq

# 2. Validate change
curl -X POST http://localhost:8080/admin/config/validate \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"section": "logging", "config": {"level": "debug"}}'

# 3. Apply change (immediate effect)
curl -X PATCH http://localhost:8080/admin/config/logging \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"config": {"level": "debug"}}'

# 4. Verify change
curl -s http://localhost:8080/admin/config/logging | jq '.config.level'

Error Handling¶

Error Response Format¶

{
  "error_code": "string",
  "message": "string",
  "details": {}
}

Error Codes¶

Code	HTTP Status	Description
`VALIDATION_ERROR`	400	Configuration validation failed
`INVALID_SECTION`	400	Unknown configuration section
`PARSE_ERROR`	400	Failed to parse configuration content
`SECTION_NOT_FOUND`	404	Section not found
`VERSION_NOT_FOUND`	404	History version not found
`BACKEND_NOT_FOUND`	404	Backend not found
`BACKEND_EXISTS`	409	Backend with name already exists
`CONTENT_TOO_LARGE`	413	Configuration content exceeds 1MB limit
`INTERNAL_ERROR`	500	Internal server error

Error Examples¶

// Validation Error
{
  "error_code": "VALIDATION_ERROR",
  "message": "Configuration validation failed",
  "details": {
    "errors": [
      {"field": "workers", "message": "workers must be greater than 0"}
    ]
  }
}

// Section Not Found
{
  "error_code": "SECTION_NOT_FOUND",
  "message": "Configuration section 'invalid' not found",
  "details": {
    "available_sections": ["server", "backends", "logging", "..."]
  }
}

// Backend Exists
{
  "error_code": "BACKEND_EXISTS",
  "message": "Backend 'openai' already exists",
  "details": {
    "existing_backend": "openai"
  }
}

Client SDK Examples¶

Python¶

import requests
from typing import Optional, Dict, Any, List
from dataclasses import dataclass


@dataclass
class ContinuumAdminClient:
    """Continuum Router Admin API Client"""

    base_url: str
    token: str

    def __post_init__(self):
        self.session = requests.Session()
        self.session.headers.update({
            "Authorization": f"Bearer {self.token}",
            "Content-Type": "application/json"
        })

    # Configuration Query APIs

    def get_full_config(self) -> Dict[str, Any]:
        """Get full configuration with masked sensitive data"""
        resp = self.session.get(f"{self.base_url}/admin/config/full")
        resp.raise_for_status()
        return resp.json()

    def get_sections(self) -> List[Dict[str, Any]]:
        """Get all configuration sections"""
        resp = self.session.get(f"{self.base_url}/admin/config/sections")
        resp.raise_for_status()
        return resp.json()["sections"]

    def get_section(self, section: str) -> Dict[str, Any]:
        """Get configuration for a specific section"""
        resp = self.session.get(f"{self.base_url}/admin/config/{section}")
        resp.raise_for_status()
        return resp.json()

    def get_schema(self, section: Optional[str] = None) -> Dict[str, Any]:
        """Get JSON schema for validation"""
        params = {"section": section} if section else {}
        resp = self.session.get(
            f"{self.base_url}/admin/config/schema",
            params=params
        )
        resp.raise_for_status()
        return resp.json()

    # Configuration Modification APIs

    def update_section(self, section: str, config: Dict[str, Any]) -> Dict[str, Any]:
        """Replace section configuration"""
        resp = self.session.put(
            f"{self.base_url}/admin/config/{section}",
            json={"config": config}
        )
        resp.raise_for_status()
        return resp.json()

    def patch_section(self, section: str, config: Dict[str, Any]) -> Dict[str, Any]:
        """Partial update section configuration"""
        resp = self.session.patch(
            f"{self.base_url}/admin/config/{section}",
            json={"config": config}
        )
        resp.raise_for_status()
        return resp.json()

    def validate_config(
        self,
        section: str,
        config: Dict[str, Any],
        dry_run: bool = True
    ) -> Dict[str, Any]:
        """Validate configuration without applying"""
        resp = self.session.post(
            f"{self.base_url}/admin/config/validate",
            json={"section": section, "config": config, "dry_run": dry_run}
        )
        resp.raise_for_status()
        return resp.json()

    def apply_config(
        self,
        sections: Optional[List[str]] = None,
        force: bool = False
    ) -> Dict[str, Any]:
        """Apply pending configuration changes"""
        body = {"force": force}
        if sections:
            body["sections"] = sections
        resp = self.session.post(
            f"{self.base_url}/admin/config/apply",
            json=body
        )
        resp.raise_for_status()
        return resp.json()

    # Configuration Save/Restore APIs

    def export_config(
        self,
        format: str = "yaml",
        sections: Optional[List[str]] = None,
        include_sensitive: bool = False
    ) -> str:
        """Export configuration in specified format"""
        body = {"format": format, "include_sensitive": include_sensitive}
        if sections:
            body["sections"] = sections
        resp = self.session.post(
            f"{self.base_url}/admin/config/export",
            json=body
        )
        resp.raise_for_status()
        return resp.json()["content"]

    def import_config(
        self,
        content: str,
        format: str = "yaml",
        apply: bool = True,
        dry_run: bool = False
    ) -> Dict[str, Any]:
        """Import configuration from content"""
        resp = self.session.post(
            f"{self.base_url}/admin/config/import",
            json={
                "format": format,
                "content": content,
                "apply": apply,
                "dry_run": dry_run
            }
        )
        resp.raise_for_status()
        return resp.json()

    def get_history(
        self,
        limit: int = 20,
        offset: int = 0,
        section: Optional[str] = None
    ) -> Dict[str, Any]:
        """Get configuration change history"""
        params = {"limit": limit, "offset": offset}
        if section:
            params["section"] = section
        resp = self.session.get(
            f"{self.base_url}/admin/config/history",
            params=params
        )
        resp.raise_for_status()
        return resp.json()

    def rollback(
        self,
        version: int,
        sections: Optional[List[str]] = None,
        dry_run: bool = False
    ) -> Dict[str, Any]:
        """Rollback to a previous version"""
        body = {"dry_run": dry_run}
        if sections:
            body["sections"] = sections
        resp = self.session.post(
            f"{self.base_url}/admin/config/rollback/{version}",
            json=body
        )
        resp.raise_for_status()
        return resp.json()

    # Backend Management APIs

    def list_backends(self) -> List[Dict[str, Any]]:
        """List all backends"""
        resp = self.session.get(f"{self.base_url}/admin/backends")
        resp.raise_for_status()
        return resp.json()["backends"]

    def get_backend(self, name: str) -> Dict[str, Any]:
        """Get backend configuration"""
        resp = self.session.get(f"{self.base_url}/admin/backends/{name}")
        resp.raise_for_status()
        return resp.json()

    def add_backend(
        self,
        name: str,
        url: str,
        weight: int = 1,
        models: Optional[List[str]] = None
    ) -> Dict[str, Any]:
        """Add a new backend"""
        body = {"name": name, "url": url, "weight": weight}
        if models:
            body["models"] = models
        resp = self.session.post(
            f"{self.base_url}/admin/backends",
            json=body
        )
        resp.raise_for_status()
        return resp.json()

    def update_backend(self, name: str, **kwargs) -> Dict[str, Any]:
        """Update backend configuration"""
        resp = self.session.put(
            f"{self.base_url}/admin/backends/{name}",
            json=kwargs
        )
        resp.raise_for_status()
        return resp.json()

    def delete_backend(self, name: str, force: bool = False) -> Dict[str, Any]:
        """Delete a backend"""
        params = {"force": str(force).lower()} if force else {}
        resp = self.session.delete(
            f"{self.base_url}/admin/backends/{name}",
            params=params
        )
        resp.raise_for_status()
        return resp.json()

    def update_backend_weight(self, name: str, weight: int) -> Dict[str, Any]:
        """Update backend weight"""
        resp = self.session.put(
            f"{self.base_url}/admin/backends/{name}/weight",
            json={"weight": weight}
        )
        resp.raise_for_status()
        return resp.json()

    def update_backend_models(
        self,
        name: str,
        models: List[str],
        append: bool = False
    ) -> Dict[str, Any]:
        """Update backend models"""
        resp = self.session.put(
            f"{self.base_url}/admin/backends/{name}/models",
            json={"models": models, "append": append}
        )
        resp.raise_for_status()
        return resp.json()


# Usage Example
if __name__ == "__main__":
    client = ContinuumAdminClient(
        base_url="http://localhost:8080",
        token="your-admin-token"
    )

    # Get current logging config
    logging_config = client.get_section("logging")
    print(f"Current log level: {logging_config['config']['level']}")

    # Update logging level
    result = client.patch_section("logging", {"level": "debug"})
    print(f"Updated: {result['success']}")

    # Add a new backend
    client.add_backend(
        name="new-ollama",
        url="http://192.168.1.100:11434",
        weight=2,
        models=["llama3.2", "mistral"]
    )

    # Export configuration backup
    backup = client.export_config(format="yaml")
    with open("config-backup.yaml", "w") as f:
        f.write(backup)

JavaScript/TypeScript¶

interface ConfigSection {
  name: string;
  config: Record<string, any>;
  hot_reload_capability: 'immediate' | 'gradual' | 'requires_restart';
}

interface HistoryEntry {
  version: number;
  timestamp: string;
  sections_changed: string[];
  source: string;
  user: string;
}

interface Backend {
  name: string;
  url: string;
  weight: number;
  models: string[];
  enabled: boolean;
  health_status: string;
}

class ContinuumAdminClient {
  private baseUrl: string;
  private token: string;

  constructor(baseUrl: string, token: string) {
    this.baseUrl = baseUrl;
    this.token = token;
  }

  private async request<T>(
    method: string,
    path: string,
    body?: any,
    params?: Record<string, string>
  ): Promise<T> {
    const url = new URL(`${this.baseUrl}${path}`);
    if (params) {
      Object.entries(params).forEach(([k, v]) => url.searchParams.set(k, v));
    }

    const response = await fetch(url.toString(), {
      method,
      headers: {
        'Authorization': `Bearer ${this.token}`,
        'Content-Type': 'application/json',
      },
      body: body ? JSON.stringify(body) : undefined,
    });

    if (!response.ok) {
      const error = await response.json();
      throw new Error(error.message || `HTTP ${response.status}`);
    }

    return response.json();
  }

  // Configuration Query APIs

  async getFullConfig(): Promise<any> {
    return this.request('GET', '/admin/config/full');
  }

  async getSections(): Promise<ConfigSection[]> {
    const result = await this.request<{ sections: ConfigSection[] }>(
      'GET', '/admin/config/sections'
    );
    return result.sections;
  }

  async getSection(section: string): Promise<ConfigSection> {
    return this.request('GET', `/admin/config/${section}`);
  }

  async getSchema(section?: string): Promise<any> {
    const params = section ? { section } : undefined;
    return this.request('GET', '/admin/config/schema', undefined, params);
  }

  // Configuration Modification APIs

  async updateSection(section: string, config: Record<string, any>): Promise<any> {
    return this.request('PUT', `/admin/config/${section}`, { config });
  }

  async patchSection(section: string, config: Record<string, any>): Promise<any> {
    return this.request('PATCH', `/admin/config/${section}`, { config });
  }

  async validateConfig(
    section: string,
    config: Record<string, any>,
    dryRun: boolean = true
  ): Promise<any> {
    return this.request('POST', '/admin/config/validate', {
      section,
      config,
      dry_run: dryRun,
    });
  }

  async applyConfig(sections?: string[], force: boolean = false): Promise<any> {
    return this.request('POST', '/admin/config/apply', { sections, force });
  }

  // Configuration Save/Restore APIs

  async exportConfig(
    format: 'yaml' | 'json' | 'toml' = 'yaml',
    sections?: string[],
    includeSensitive: boolean = false
  ): Promise<string> {
    const result = await this.request<{ content: string }>(
      'POST', '/admin/config/export',
      { format, sections, include_sensitive: includeSensitive }
    );
    return result.content;
  }

  async importConfig(
    content: string,
    format: 'yaml' | 'json' | 'toml' = 'yaml',
    apply: boolean = true,
    dryRun: boolean = false
  ): Promise<any> {
    return this.request('POST', '/admin/config/import', {
      format,
      content,
      apply,
      dry_run: dryRun,
    });
  }

  async getHistory(
    limit: number = 20,
    offset: number = 0,
    section?: string
  ): Promise<{ history: HistoryEntry[]; total_entries: number }> {
    const params: Record<string, string> = {
      limit: limit.toString(),
      offset: offset.toString(),
    };
    if (section) params.section = section;
    return this.request('GET', '/admin/config/history', undefined, params);
  }

  async rollback(
    version: number,
    sections?: string[],
    dryRun: boolean = false
  ): Promise<any> {
    return this.request('POST', `/admin/config/rollback/${version}`, {
      sections,
      dry_run: dryRun,
    });
  }

  // Backend Management APIs

  async listBackends(): Promise<Backend[]> {
    const result = await this.request<{ backends: Backend[] }>(
      'GET', '/admin/backends'
    );
    return result.backends;
  }

  async getBackend(name: string): Promise<Backend> {
    return this.request('GET', `/admin/backends/${name}`);
  }

  async addBackend(
    name: string,
    url: string,
    weight: number = 1,
    models?: string[]
  ): Promise<any> {
    return this.request('POST', '/admin/backends', {
      name,
      url,
      weight,
      models,
    });
  }

  async updateBackend(name: string, updates: Partial<Backend>): Promise<any> {
    return this.request('PUT', `/admin/backends/${name}`, updates);
  }

  async deleteBackend(name: string, force: boolean = false): Promise<any> {
    const params = force ? { force: 'true' } : undefined;
    return this.request('DELETE', `/admin/backends/${name}`, undefined, params);
  }

  async updateBackendWeight(name: string, weight: number): Promise<any> {
    return this.request('PUT', `/admin/backends/${name}/weight`, { weight });
  }

  async updateBackendModels(
    name: string,
    models: string[],
    append: boolean = false
  ): Promise<any> {
    return this.request('PUT', `/admin/backends/${name}/models`, {
      models,
      append,
    });
  }
}

// Usage Example
async function main() {
  const client = new ContinuumAdminClient(
    'http://localhost:8080',
    'your-admin-token'
  );

  // Get current logging config
  const loggingConfig = await client.getSection('logging');
  console.log(`Current log level: ${loggingConfig.config.level}`);

  // Update logging level
  const result = await client.patchSection('logging', { level: 'debug' });
  console.log(`Updated: ${result.success}`);

  // Add a new backend
  await client.addBackend('new-ollama', 'http://192.168.1.100:11434', 2, [
    'llama3.2',
    'mistral',
  ]);

  // Export configuration backup
  const backup = await client.exportConfig('yaml');
  console.log('Configuration exported');
}

main().catch(console.error);

Go¶

package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
    "net/url"
)

type ContinuumAdminClient struct {
    BaseURL string
    Token   string
    client  *http.Client
}

func NewClient(baseURL, token string) *ContinuumAdminClient {
    return &ContinuumAdminClient{
        BaseURL: baseURL,
        Token:   token,
        client:  &http.Client{},
    }
}

func (c *ContinuumAdminClient) request(method, path string, body interface{}) (map[string]interface{}, error) {
    var reqBody io.Reader
    if body != nil {
        jsonBody, err := json.Marshal(body)
        if err != nil {
            return nil, err
        }
        reqBody = bytes.NewBuffer(jsonBody)
    }

    req, err := http.NewRequest(method, c.BaseURL+path, reqBody)
    if err != nil {
        return nil, err
    }

    req.Header.Set("Authorization", "Bearer "+c.Token)
    req.Header.Set("Content-Type", "application/json")

    resp, err := c.client.Do(req)
    if err != nil {
        return nil, err
    }
    defer resp.Body.Close()

    var result map[string]interface{}
    if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
        return nil, err
    }

    if resp.StatusCode >= 400 {
        return nil, fmt.Errorf("HTTP %d: %v", resp.StatusCode, result)
    }

    return result, nil
}

// GetFullConfig retrieves the full configuration
func (c *ContinuumAdminClient) GetFullConfig() (map[string]interface{}, error) {
    return c.request("GET", "/admin/config/full", nil)
}

// GetSection retrieves a specific configuration section
func (c *ContinuumAdminClient) GetSection(section string) (map[string]interface{}, error) {
    return c.request("GET", "/admin/config/"+section, nil)
}

// PatchSection partially updates a configuration section
func (c *ContinuumAdminClient) PatchSection(section string, config map[string]interface{}) (map[string]interface{}, error) {
    return c.request("PATCH", "/admin/config/"+section, map[string]interface{}{
        "config": config,
    })
}

// AddBackend adds a new backend
func (c *ContinuumAdminClient) AddBackend(name, backendURL string, weight int, models []string) (map[string]interface{}, error) {
    return c.request("POST", "/admin/backends", map[string]interface{}{
        "name":   name,
        "url":    backendURL,
        "weight": weight,
        "models": models,
    })
}

// ExportConfig exports configuration in the specified format
func (c *ContinuumAdminClient) ExportConfig(format string) (string, error) {
    result, err := c.request("POST", "/admin/config/export", map[string]interface{}{
        "format": format,
    })
    if err != nil {
        return "", err
    }
    return result["content"].(string), nil
}

// GetHistory retrieves configuration change history
func (c *ContinuumAdminClient) GetHistory(limit int) (map[string]interface{}, error) {
    u, _ := url.Parse(c.BaseURL + "/admin/config/history")
    q := u.Query()
    q.Set("limit", fmt.Sprintf("%d", limit))
    u.RawQuery = q.Encode()

    return c.request("GET", u.Path+"?"+u.RawQuery, nil)
}

func main() {
    client := NewClient("http://localhost:8080", "your-admin-token")

    // Get current logging config
    config, _ := client.GetSection("logging")
    fmt.Printf("Current config: %v\n", config)

    // Update logging level
    result, _ := client.PatchSection("logging", map[string]interface{}{
        "level": "debug",
    })
    fmt.Printf("Update result: %v\n", result)

    // Add a new backend
    client.AddBackend("new-ollama", "http://192.168.1.100:11434", 2, []string{"llama3.2"})

    // Export configuration
    backup, _ := client.ExportConfig("yaml")
    fmt.Println("Configuration exported")
    fmt.Println(backup)
}

Best Practices¶

1. Always Validate Before Applying¶

# Step 1: Validate
curl -X POST http://localhost:8080/admin/config/validate \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"section": "logging", "config": {"level": "debug"}}'

# Step 2: Apply only if valid
curl -X PATCH http://localhost:8080/admin/config/logging \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"config": {"level": "debug"}}'

2. Use Dry Run for Imports¶

# Preview import changes
curl -X POST http://localhost:8080/admin/config/import \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "format": "yaml",
    "content": "...",
    "dry_run": true
  }'

3. Regular Configuration Backups¶

# Daily backup script
#!/bin/bash
DATE=$(date +%Y%m%d)
curl -s -X POST http://localhost:8080/admin/config/export \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"format": "yaml"}' | jq -r '.content' > "config-backup-$DATE.yaml"

4. Monitor Configuration History¶

# Check recent changes
curl -s http://localhost:8080/admin/config/history?limit=5 \
  -H "Authorization: Bearer $TOKEN" | jq '.history[] | {version, timestamp, sections_changed}'

5. Use Partial Updates (PATCH) for Minimal Changes¶

# Only update what's needed
curl -X PATCH http://localhost:8080/admin/config/rate_limiting \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"config": {"requests_per_minute": 200}}'

6. Test Configuration Changes in Staging First¶

# Example: Test configuration in staging before production
staging_client = ContinuumAdminClient("http://staging:8080", staging_token)
production_client = ContinuumAdminClient("http://production:8080", prod_token)

# Apply to staging first
staging_client.patch_section("rate_limiting", {"requests_per_minute": 500})

# Verify in staging
staging_config = staging_client.get_section("rate_limiting")
assert staging_config["config"]["requests_per_minute"] == 500

# Then apply to production
production_client.patch_section("rate_limiting", {"requests_per_minute": 500})

Security Considerations¶

1. Sensitive Data Handling¶

All API responses automatically mask sensitive fields (API keys, passwords, tokens)
Use include_sensitive: true in export only when absolutely necessary
Audit logs record when sensitive data is accessed

2. Authentication Best Practices¶

admin:
  auth:
    method: bearer_token
    token: "${ADMIN_TOKEN}"  # Use environment variables

  # Restrict access by IP
  ip_whitelist:
        - "10.0.0.0/8"      # Internal network only
        - "192.168.1.0/24"  # Office network

3. Audit Logging¶

All configuration changes are logged with: - Timestamp - User/source - Changed sections - Previous and new values (sensitive data masked)

4. Rate Limiting Admin Endpoints¶

Consider rate limiting admin endpoints to prevent abuse:

admin:
  rate_limit:
    requests_per_minute: 60
    burst: 10

5. Backup Before Major Changes¶

# Always backup before major changes
backup=$(curl -s -X POST http://localhost:8080/admin/config/export \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"format": "yaml"}' | jq -r '.content')

# Make changes...

# Restore if needed
curl -X POST http://localhost:8080/admin/config/import \
  -H "Authorization: Bearer $TOKEN" \
  -d "{\"format\": \"yaml\", \"content\": $(echo "$backup" | jq -Rs .)}"

Prompt File Management APIs¶

The Prompt File Management API allows you to manage system prompts stored in external Markdown files. This enables centralized management of system prompts without modifying the main configuration file.

List All Prompts¶

Get a list of all configured prompts with their sources and content.

GET /admin/config/prompts

Response¶

{
  "prompts": [
    {
      "id": "default",
      "prompt_type": "default",
      "source": "file",
      "file_path": "prompts/system.md",
      "content": "# System Prompt\n\nYou are a helpful assistant...",
      "loaded": true,
      "size_bytes": 1024
    },
    {
      "id": "anthropic",
      "prompt_type": "backend",
      "source": "file",
      "file_path": "prompts/anthropic.md",
      "content": "# Anthropic-specific prompt...",
      "loaded": true,
      "size_bytes": 512
    },
    {
      "id": "gpt-4",
      "prompt_type": "model",
      "source": "inline",
      "content": "You are GPT-4...",
      "size_bytes": 256
    }
  ],
  "total": 3,
  "prompts_directory": "./prompts"
}

Example¶

curl -s http://localhost:8080/admin/config/prompts \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Get Prompt File¶

Get content of a specific prompt file.

GET /admin/config/prompts/{path}

Path Parameters¶

Parameter	Type	Required	Description
`path`	string	Yes	Relative path to the prompt file

Response¶

{
  "path": "prompts/system.md",
  "content": "# System Prompt\n\nYou are a helpful assistant that follows company policies...",
  "size_bytes": 1024,
  "modified_at": 1702468200
}

Example¶

curl -s http://localhost:8080/admin/config/prompts/prompts/system.md \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Update Prompt File¶

Create or update a prompt file with new content.

PUT /admin/config/prompts/{path}

Request Body¶

{
  "content": "# Updated System Prompt\n\nYou are a helpful assistant that follows all company policies.\n\n## Security Guidelines\n\n- Never reveal internal system details\n- Follow data privacy regulations"
}

Response¶

{
  "success": true,
  "path": "prompts/system.md",
  "size_bytes": 245,
  "message": "Prompt file updated successfully"
}

Example¶

curl -X PUT http://localhost:8080/admin/config/prompts/prompts/system.md \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "# System Prompt\n\nYou are a helpful assistant."
  }'

Reload Prompt Files¶

Reload all prompt files from disk. Useful after manual file edits.

POST /admin/config/prompts/reload

Response¶

{
  "success": true,
  "reloaded_count": 3,
  "reloaded": [
    "prompts/system.md",
    "prompts/anthropic.md",
    "prompts/gpt4.md"
  ],
  "errors": [],
  "message": "Successfully reloaded 3 prompt file(s)"
}

Example¶

curl -X POST http://localhost:8080/admin/config/prompts/reload \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Configuration Example¶

To use external prompt files, configure global_prompts in your config file:

global_prompts:
  # Directory containing prompt files (relative to config directory)
  prompts_dir: "./prompts"

  # Default prompt from external file
  default_file: "system.md"

  # Or inline prompt (default_file takes precedence if both specified)
  # default: "You are a helpful assistant."

  # Backend-specific prompts
  backends:
    anthropic:
      prompt_file: "anthropic-system.md"
    openai:
      prompt: "OpenAI-specific inline prompt"

  # Model-specific prompts
  models:
    gpt-4:
      prompt_file: "gpt4-system.md"
    claude-3-opus:
      prompt_file: "claude-opus-system.md"

  merge_strategy: prepend

Security Considerations¶

Path Traversal Protection: All paths are validated to prevent directory traversal attacks (e.g., ../../../etc/passwd)
File Size Limits: Prompt files are limited to 1MB maximum
Relative Paths Only: Prompt files must be within the configured prompts_dir or config directory
Authentication Required: All prompt management endpoints require admin authentication

Appendix: Quick Reference¶

Configuration Sections¶

Section	Hot Reload	Description
`server`	Restart	Bind address, workers
`backends`	Gradual	Backend URLs, weights
`health_checks`	Gradual	Health monitoring
`logging`	Immediate	Log level, format
`retry`	Immediate	Retry policies
`timeouts`	Gradual	Request timeouts
`rate_limiting`	Immediate	Rate limits
`circuit_breaker`	Immediate	Circuit breaker
`global_prompts`	Immediate	System prompts
`fallback`	Gradual	Model fallback
`files`	Gradual	Files API
`api_keys`	Immediate	API keys
`metrics`	Gradual	Prometheus metrics
`admin`	Gradual	Admin settings
`admin.stats`	Immediate	Stats collection settings
`routing`	Gradual	Routing rules
`prefix_routing`	Immediate	Prefix-aware KV cache routing
`response_cache`	Immediate	Response cache settings
`kv_cache_index`	Requires restart	KV cache index backend and event sources

HTTP Status Codes¶

Code	Meaning
200	Success
400	Bad Request (validation error)
401	Unauthorized
403	Forbidden
404	Not Found
409	Conflict
413	Payload Too Large
500	Internal Server Error

Common curl Commands¶

# Get full config
curl -s http://localhost:8080/admin/config/full -H "Authorization: Bearer $TOKEN"

# Update logging level
curl -X PATCH http://localhost:8080/admin/config/logging \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -d '{"config": {"level": "debug"}}'

# Add backend
curl -X POST http://localhost:8080/admin/backends \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -d '{"name": "new", "url": "http://host:port", "weight": 1}'

# Export config
curl -X POST http://localhost:8080/admin/config/export \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -d '{"format": "yaml"}'

# View history
curl -s http://localhost:8080/admin/config/history -H "Authorization: Bearer $TOKEN"

# Rollback
curl -X POST http://localhost:8080/admin/config/rollback/5 \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -d '{}'

# List API keys (masked)
curl -s http://localhost:8080/admin/api-keys -H "Authorization: Bearer $TOKEN"

# Create an API key (full value returned once)
curl -X POST http://localhost:8080/admin/api-keys \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -d '{"id": "key-1", "user_id": "user-1", "organization_id": "org-1", "scopes": ["read", "write"]}'

# Rotate an API key
curl -X POST http://localhost:8080/admin/api-keys/key-1/rotate -H "Authorization: Bearer $TOKEN"

# Disable / enable an API key
curl -X POST http://localhost:8080/admin/api-keys/key-1/disable -H "Authorization: Bearer $TOKEN"
curl -X POST http://localhost:8080/admin/api-keys/key-1/enable -H "Authorization: Bearer $TOKEN"

# Revoke an API key
curl -X DELETE http://localhost:8080/admin/api-keys/key-1 -H "Authorization: Bearer $TOKEN"

# Per-API-key and per-user usage statistics
curl -s http://localhost:8080/admin/stats/api-keys -H "Authorization: Bearer $TOKEN"
curl -s http://localhost:8080/admin/stats/users -H "Authorization: Bearer $TOKEN"