Skip to content

Admin REST API Reference

This document covers the Continuum Router Admin REST API for developers building configuration control applications. The Configuration Management API supports runtime configuration viewing, modification, and management without server restarts.

Table of Contents


Overview

The Admin REST API provides programmatic access to Continuum Router's configuration system, enabling:

  • Real-time Configuration Viewing: Retrieve current configuration with automatic sensitive data masking
  • Dynamic Configuration Updates: Modify configuration sections without server restart
  • Configuration Versioning: Track changes with full history and rollback capabilities
  • Backend Management: Add, remove, and modify backends dynamically
  • Export/Import: Save and restore configurations in multiple formats (YAML, JSON, TOML)

Key Features

Feature Description
Hot Reload Changes applied immediately or gradually based on section type
Sensitive Masking API keys, passwords, and tokens automatically masked in responses
Validation All changes validated before application with dry-run support
Audit Logging All modifications logged for security and compliance
History Tracking Up to 100 configuration versions maintained for rollback

Authentication

All Admin API endpoints require authentication via the Admin Auth system.

Authentication Methods

1. Bearer Token

Authorization: Bearer <admin-token>
curl -H "Authorization: Bearer your-admin-token" \
  http://localhost:8080/admin/config/full

2. Basic Authentication

Authorization: Basic <base64(username:password)>
curl -u admin:password http://localhost:8080/admin/config/full

3. API Key Header

X-API-Key: <admin-api-key>
curl -H "X-API-Key: your-admin-key" http://localhost:8080/admin/config/full

Configuration

Configure admin authentication in config.yaml:

admin:
  auth:
    method: bearer_token  # Options: none, bearer_token, basic, api_key
    token: "${ADMIN_TOKEN}"  # Environment variable supported
    # For basic auth:
    # username: admin
    # password: "${ADMIN_PASSWORD}"

  # IP whitelist (optional)
  ip_whitelist:
        - "127.0.0.1"
        - "10.0.0.0/8"

  # Configurable limits
  max_history_entries: 100
  max_backend_name_length: 256

Base URL and Headers

Base URL

http://localhost:8080/admin

Common Request Headers

Content-Type: application/json
Accept: application/json
Authorization: Bearer <token>

Common Response Headers

Content-Type: application/json
X-Request-Id: <unique-request-id>

Configuration Query APIs

Get Full Configuration

Retrieve the complete configuration with sensitive information masked.

GET /admin/config/full

Response

{
  "config": {
    "server": {
      "bind_address": "0.0.0.0:8080",
      "workers": 4
    },
    "backends": [
      {
        "name": "openai",
        "url": "https://api.openai.com",
        "api_key": "sk-***abcd",
        "weight": 1
      }
    ],
    "logging": {
      "level": "info"
    },
    "rate_limiting": {
      "enabled": true,
      "requests_per_minute": 100
    }
  },
  "hot_reload_enabled": true,
  "last_modified": "2025-12-13T10:30:00Z"
}

Example

curl -s http://localhost:8080/admin/config/full \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

List Configuration Sections

Get all available configuration sections with their hot reload capabilities.

GET /admin/config/sections

Response

{
  "sections": [
    {
      "name": "server",
      "description": "Server configuration including bind address and workers",
      "hot_reload_capability": "requires_restart"
    },
    {
      "name": "backends",
      "description": "Backend server configurations",
      "hot_reload_capability": "gradual"
    },
    {
      "name": "logging",
      "description": "Logging configuration",
      "hot_reload_capability": "immediate"
    },
    {
      "name": "rate_limiting",
      "description": "Rate limiting configuration",
      "hot_reload_capability": "immediate"
    },
    {
      "name": "circuit_breaker",
      "description": "Circuit breaker configuration",
      "hot_reload_capability": "immediate"
    },
    {
      "name": "retry",
      "description": "Retry policy configuration",
      "hot_reload_capability": "immediate"
    },
    {
      "name": "timeouts",
      "description": "Timeout configuration",
      "hot_reload_capability": "gradual"
    },
    {
      "name": "health_checks",
      "description": "Health check configuration",
      "hot_reload_capability": "gradual"
    },
    {
      "name": "global_prompts",
      "description": "Global prompt injection configuration",
      "hot_reload_capability": "immediate"
    },
    {
      "name": "fallback",
      "description": "Model fallback configuration",
      "hot_reload_capability": "gradual"
    },
    {
      "name": "files",
      "description": "Files API configuration",
      "hot_reload_capability": "gradual"
    },
    {
      "name": "api_keys",
      "description": "API keys configuration",
      "hot_reload_capability": "immediate"
    },
    {
      "name": "metrics",
      "description": "Metrics and monitoring configuration",
      "hot_reload_capability": "gradual"
    },
    {
      "name": "admin",
      "description": "Admin API configuration",
      "hot_reload_capability": "gradual"
    },
    {
      "name": "routing",
      "description": "Request routing configuration",
      "hot_reload_capability": "gradual"
    }
  ]
}

Example

curl -s http://localhost:8080/admin/config/sections \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq '.sections[].name'

Get Section Configuration

Retrieve configuration for a specific section.

GET /admin/config/{section}

Path Parameters

Parameter Type Required Description
section string Yes Section name (see list above)

Response

{
  "section": "logging",
  "config": {
    "level": "info",
    "format": "json",
    "file": "/var/log/continuum-router.log"
  },
  "hot_reload_capability": "immediate",
  "description": "Logging configuration"
}

Example

# Get logging configuration
curl -s http://localhost:8080/admin/config/logging \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

# Get backends configuration
curl -s http://localhost:8080/admin/config/backends \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Get Configuration Schema

Retrieve JSON Schema for configuration validation.

GET /admin/config/schema

Query Parameters

Parameter Type Required Description
section string No Get schema for specific section only

Response

{
  "schema": {
    "$schema": "http://json-schema.org/draft-07/schema#",
    "type": "object",
    "properties": {
      "server": {
        "type": "object",
        "properties": {
          "bind_address": {
            "type": "string",
            "pattern": "^[^:]+:[0-9]+$",
            "description": "Server bind address in host:port format"
          },
          "workers": {
            "type": "integer",
            "minimum": 1,
            "description": "Number of worker threads"
          }
        }
      },
      "logging": {
        "type": "object",
        "properties": {
          "level": {
            "type": "string",
            "enum": ["trace", "debug", "info", "warn", "error"]
          }
        }
      }
    }
  }
}

Example

# Get full schema
curl -s http://localhost:8080/admin/config/schema \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

# Get schema for specific section
curl -s "http://localhost:8080/admin/config/schema?section=logging" \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Configuration Modification APIs

Replace Section Configuration

Replace entire section configuration with new values.

PUT /admin/config/{section}

Request Body

{
  "config": {
    "level": "debug",
    "format": "json"
  }
}

Response

{
  "success": true,
  "message": "Configuration updated successfully",
  "version": 5,
  "hot_reload_capability": "immediate",
  "applied": true,
  "warnings": []
}

Example

# Update logging level to debug
curl -X PUT http://localhost:8080/admin/config/logging \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "config": {
      "level": "debug"
    }
  }'

Partial Update Section

Apply partial updates using JSON merge patch semantics.

PATCH /admin/config/{section}

Request Body

{
  "config": {
    "level": "warn"
  }
}

Only specified fields are updated; other fields remain unchanged.

Response

{
  "success": true,
  "message": "Configuration partially updated",
  "version": 6,
  "hot_reload_capability": "immediate",
  "applied": true,
  "merged_config": {
    "level": "warn",
    "format": "json",
    "file": "/var/log/continuum-router.log"
  }
}

Example

# Update only rate limit value
curl -X PATCH http://localhost:8080/admin/config/rate_limiting \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "config": {
      "requests_per_minute": 200
    }
  }'

Validate Configuration

Validate configuration changes without applying them.

POST /admin/config/validate

Request Body

{
  "section": "server",
  "config": {
    "bind_address": "0.0.0.0:9090",
    "workers": 8
  },
  "dry_run": true
}

Response (Valid)

{
  "valid": true,
  "errors": [],
  "warnings": [
    {
      "field": "bind_address",
      "message": "Changing bind_address requires server restart"
    }
  ],
  "hot_reload_capability": "requires_restart"
}

Response (Invalid)

{
  "valid": false,
  "errors": [
    {
      "field": "workers",
      "message": "workers must be greater than 0",
      "code": "VALIDATION_ERROR"
    }
  ],
  "warnings": []
}

Example

# Validate before applying
curl -X POST http://localhost:8080/admin/config/validate \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "section": "rate_limiting",
    "config": {
      "enabled": true,
      "requests_per_minute": 500
    }
  }'

Apply Configuration

Apply pending configuration changes immediately (trigger hot reload).

POST /admin/config/apply

Request Body

{
  "sections": ["logging", "rate_limiting"],
  "force": false
}
Field Type Required Description
sections array No Specific sections to apply (default: all pending)
force boolean No Force apply even with warnings (default: false)

Response

{
  "success": true,
  "applied_sections": ["logging", "rate_limiting"],
  "version": 7,
  "results": {
    "logging": {
      "status": "applied",
      "hot_reload_type": "immediate"
    },
    "rate_limiting": {
      "status": "applied",
      "hot_reload_type": "immediate"
    }
  }
}

Example

curl -X POST http://localhost:8080/admin/config/apply \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "sections": ["logging"]
  }'

Configuration Save/Restore APIs

Export Configuration

Export current configuration in specified format.

POST /admin/config/export

Request Body

{
  "format": "yaml",
  "sections": ["server", "backends", "logging"],
  "include_sensitive": false,
  "include_defaults": true
}
Field Type Required Description
format string Yes Output format: yaml, json, or toml
sections array No Sections to export (default: all)
include_sensitive boolean No Include unmasked sensitive data (default: false)
include_defaults boolean No Include default values (default: true)

Response

{
  "format": "yaml",
  "content": "server:\n  bind_address: \"0.0.0.0:8080\"\n  workers: 4\n\nbackends:\n  - name: openai\n    url: https://api.openai.com\n    api_key: \"sk-***abcd\"\n",
  "exported_at": "2025-12-13T10:30:00Z",
  "sections_exported": ["server", "backends", "logging"]
}

Example

# Export as YAML
curl -X POST http://localhost:8080/admin/config/export \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"format": "yaml"}' | jq -r '.content' > config-backup.yaml

# Export as JSON
curl -X POST http://localhost:8080/admin/config/export \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"format": "json"}' | jq -r '.content' > config-backup.json

# Export specific sections
curl -X POST http://localhost:8080/admin/config/export \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "format": "yaml",
    "sections": ["backends", "rate_limiting"]
  }'

Import Configuration

Import and apply configuration from content.

POST /admin/config/import

Request Body

{
  "format": "yaml",
  "content": "logging:\n  level: info\n  format: json\n",
  "apply": true,
  "dry_run": false,
  "merge": true
}
Field Type Required Description
format string Yes Content format: yaml, json, or toml
content string Yes Configuration content (max 1MB)
apply boolean No Apply after validation (default: true)
dry_run boolean No Validate only without applying (default: false)
merge boolean No Merge with existing config (default: false)

Response

{
  "success": true,
  "message": "Configuration imported and applied",
  "version": 8,
  "validation": {
    "valid": true,
    "errors": [],
    "warnings": []
  },
  "sections_imported": ["logging"],
  "applied": true
}

Example

# Import from file
curl -X POST http://localhost:8080/admin/config/import \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d "{
    \"format\": \"yaml\",
    \"content\": $(cat config-backup.yaml | jq -Rs .),
    \"apply\": true
  }"

# Dry run import
curl -X POST http://localhost:8080/admin/config/import \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "format": "yaml",
    "content": "logging:\n  level: debug\n",
    "dry_run": true
  }'

Get Configuration History

View configuration change history.

GET /admin/config/history

Query Parameters

Parameter Type Required Description
limit integer No Number of entries to return (default: 20, max: 100)
offset integer No Number of entries to skip (default: 0)
section string No Filter by section name

Response

{
  "history": [
    {
      "version": 8,
      "timestamp": "2025-12-13T10:30:00Z",
      "sections_changed": ["logging"],
      "source": "api",
      "user": "admin",
      "description": "Updated logging level to debug",
      "rollback_available": true
    },
    {
      "version": 7,
      "timestamp": "2025-12-13T10:25:00Z",
      "sections_changed": ["rate_limiting"],
      "source": "api",
      "user": "admin",
      "description": "Increased rate limit to 200 rpm",
      "rollback_available": true
    },
    {
      "version": 6,
      "timestamp": "2025-12-13T09:00:00Z",
      "sections_changed": ["backends"],
      "source": "file_reload",
      "user": "system",
      "description": "Configuration file changed",
      "rollback_available": true
    }
  ],
  "total_entries": 8,
  "current_version": 8
}

Example

# Get recent history
curl -s http://localhost:8080/admin/config/history \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

# Get history for specific section
curl -s "http://localhost:8080/admin/config/history?section=backends&limit=10" \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Rollback Configuration

Rollback to a previous configuration version.

POST /admin/config/rollback/{version}

Path Parameters

Parameter Type Required Description
version integer Yes Version number to rollback to

Request Body

{
  "sections": ["logging", "rate_limiting"],
  "dry_run": false
}
Field Type Required Description
sections array No Specific sections to rollback (default: all changed)
dry_run boolean No Preview without applying (default: false)

Response

{
  "success": true,
  "message": "Rolled back to version 5",
  "previous_version": 8,
  "new_version": 9,
  "sections_rolled_back": ["logging", "rate_limiting"],
  "changes": {
    "logging": {
      "level": {
        "from": "debug",
        "to": "info"
      }
    }
  }
}

Example

# Rollback to version 5
curl -X POST http://localhost:8080/admin/config/rollback/5 \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{}'

# Preview rollback (dry run)
curl -X POST http://localhost:8080/admin/config/rollback/5 \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"dry_run": true}'

Backend Management APIs

Add Backend

Add a new backend dynamically.

POST /admin/backends

Request Body

{
  "name": "new-ollama",
  "url": "http://192.168.1.100:11434",
  "weight": 1,
  "models": ["llama3.2", "mistral"],
  "api_key": "optional-key",
  "enabled": true,
  "health_check": {
    "enabled": true,
    "path": "/v1/models"
  }
}
Field Type Required Description
name string Yes Unique backend name (alphanumeric, -, _)
type string No Backend type: openai, azure, vllm, ollama, anthropic, gemini, llamacpp, generic. Default: generic (auto-detect)
url string Yes Backend URL (http:// or https://)
weight integer No Load balancing weight (default: 1)
models array No List of models served by this backend
api_key string No API key for backend authentication
enabled boolean No Whether backend is enabled (default: true)

Backend Type Auto-Detection

When type is not specified or set to generic, the router automatically probes the backend's /v1/models endpoint to detect the backend type. Currently supports auto-detection of:

  • llama.cpp: Identified by owned_by: "llamacpp" or llama.cpp-specific metadata fields

llama.cpp backends can therefore be added without explicit type configuration:

# llama.cpp backend - type auto-detected
curl -X POST http://localhost:8080/admin/backends \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "local-llama",
    "url": "http://localhost:8080"
  }'

Response

{
  "success": true,
  "message": "Backend 'new-ollama' added successfully",
  "backend": {
    "name": "new-ollama",
    "url": "http://192.168.1.100:11434",
    "weight": 1,
    "models": ["llama3.2", "mistral"],
    "enabled": true,
    "health_status": "unknown"
  }
}

Example

curl -X POST http://localhost:8080/admin/backends \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "new-backend",
    "url": "http://192.168.1.100:11434",
    "weight": 2,
    "models": ["llama3.2"]
  }'

Get Backend

Get configuration for a specific backend.

GET /admin/backends/{name}

Response

{
  "name": "openai",
  "url": "https://api.openai.com",
  "api_key": "sk-***abcd",
  "weight": 1,
  "models": ["gpt-4", "gpt-3.5-turbo"],
  "enabled": true,
  "health_status": "healthy",
  "stats": {
    "total_requests": 1250,
    "failed_requests": 12,
    "average_latency_ms": 150,
    "last_used": "2025-12-13T10:29:55Z"
  }
}

Example

curl -s http://localhost:8080/admin/backends/openai \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Update Backend

Update backend configuration.

PUT /admin/backends/{name}

Request Body

{
  "url": "https://api.openai.com",
  "weight": 2,
  "models": ["gpt-4", "gpt-4-turbo", "gpt-3.5-turbo"],
  "enabled": true
}

Response

{
  "success": true,
  "message": "Backend 'openai' updated successfully",
  "backend": {
    "name": "openai",
    "url": "https://api.openai.com",
    "weight": 2,
    "models": ["gpt-4", "gpt-4-turbo", "gpt-3.5-turbo"],
    "enabled": true
  }
}

Example

curl -X PUT http://localhost:8080/admin/backends/openai \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "weight": 3,
    "models": ["gpt-4", "gpt-4-turbo"]
  }'

Delete Backend

Remove a backend from the router.

DELETE /admin/backends/{name}

Query Parameters

Parameter Type Required Description
force boolean No Force delete even if backend has active connections

Response

{
  "success": true,
  "message": "Backend 'old-backend' removed successfully",
  "removed_backend": "old-backend"
}

Notes

  • Deleting the last backend is allowed: The router can operate with zero backends configured. When the last backend is deleted:
    • /v1/models returns an empty list
    • Routing requests return 503 "No backends available"
    • New backends can be added via POST /admin/backends

Example

curl -X DELETE http://localhost:8080/admin/backends/old-backend \
  -H "Authorization: Bearer $ADMIN_TOKEN"

# Force delete
curl -X DELETE "http://localhost:8080/admin/backends/old-backend?force=true" \
  -H "Authorization: Bearer $ADMIN_TOKEN"

Update Backend Weight

Update only the backend weight for load balancing.

PUT /admin/backends/{name}/weight

Request Body

{
  "weight": 5
}

Response

{
  "success": true,
  "message": "Backend 'openai' weight updated to 5",
  "previous_weight": 2,
  "new_weight": 5
}

Example

curl -X PUT http://localhost:8080/admin/backends/openai/weight \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"weight": 5}'

Update Backend Models

Update the model list for a backend.

PUT /admin/backends/{name}/models

Request Body

{
  "models": ["gpt-4", "gpt-4-turbo", "gpt-4o", "gpt-3.5-turbo"],
  "append": false
}
Field Type Required Description
models array Yes List of model names
append boolean No Append to existing list (default: false, replaces)

Response

{
  "success": true,
  "message": "Backend 'openai' models updated",
  "models": ["gpt-4", "gpt-4-turbo", "gpt-4o", "gpt-3.5-turbo"]
}

Example

# Replace models
curl -X PUT http://localhost:8080/admin/backends/openai/models \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"models": ["gpt-4", "gpt-4o"]}'

# Append models
curl -X PUT http://localhost:8080/admin/backends/openai/models \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"models": ["gpt-4.5-turbo"], "append": true}'

API Key Management APIs

The API Key Management APIs let you issue, inspect, update, rotate, enable, disable, and revoke per-user API keys at runtime. All eight endpoints are mounted under /admin/api-keys and require the same admin authentication as the rest of the Admin API.

These endpoints operate on the same key store that authenticates incoming client requests. A key created here is immediately usable by a client through the Authorization: Bearer <key> header, subject to the configured authentication mode (see Authentication Mode and Client Usage below).

API Key Object

Each API key is described by an ApiKeyConfig record. The fields below are configurable inline in config.yaml, in an external keys file, or through the create/update endpoints.

Field Type Description
key string The secret key value. Generated cryptographically (format sk-<base64url>) when not supplied. Never returned in full except once at creation or rotation; elsewhere it is masked.
id string Unique identifier for the key (1–128 chars). Used in every /admin/api-keys/{id} path.
user_id string Associated user identifier (1–128 chars). Surfaced in per-user usage statistics.
organization_id string Associated organization identifier (1–128 chars).
name string or absent Optional human-readable label (max 256 chars).
description string or absent Optional notes about the key (max 1024 chars).
scopes array of strings Permissions granted to the key. Common values: read, write, files, admin. At least one scope is required when creating a key.
rate_limit integer or absent Optional per-key rate limit in requests per minute. Overrides the global limit for this key.
enabled boolean Whether the key is active. A disabled key fails authentication even before expiry is checked.
created_at string (ISO 8601) Creation timestamp.
expires_at string (ISO 8601) or absent Optional expiration timestamp. A key past this instant is automatically invalid regardless of enabled.
annotations object (string to string) or absent Free-form metadata map. Recommended canonical keys: email, uuid, owner, team, environment. An operator-configured allowlist of annotation keys is exported as labels on the api_key_info Prometheus metric (values are sanitized).
allowed_backends array of strings or absent Per-key backend allow-list. When non-empty, requests authenticated with this key may only route to backends whose name appears here. Empty or absent means no restriction. Matching is exact and case-sensitive; unservable requests are rejected with 403 Forbidden.

A key is considered valid when it is enabled and not past expires_at. The listing endpoint reports active, expired, and disabled counts derived from these rules.

Key Masking

The full key value is returned exactly once: in the response to POST /admin/api-keys (creation) and POST /admin/api-keys/{id}/rotate (rotation). Every other response returns a masked_key of the form sk-***abcd, preserving the sk- prefix and the last four characters. Logs always use the masked form.

Authentication Mode and Client Usage

The api_keys.mode setting controls how the router treats client requests that lack a valid key:

Mode Behavior
permissive (default) Requests with a valid key are authenticated and attributed; requests without a key are still allowed through. Use this for incremental rollout.
blocking Every API request must carry a valid key. Requests without one receive 401 Unauthorized.

Set the mode in config.yaml:

api_keys:
  mode: blocking            # "permissive" (default) | "blocking"
  persistence_file: ~/.config/continuum-router/runtime-keys.yaml
  api_keys:
    - key: "sk-prod-..."
      id: "key-1"
      user_id: "user-1"
      organization_id: "org-1"
      scopes: ["read", "write"]

A client authenticates by sending the issued key as a bearer token:

curl http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer sk-the-issued-key-value" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'

The mode setting hot-reloads: switching between permissive and blocking takes effect without a restart.

Persistence and Hot Reload

Keys created or modified through these endpoints live in the in-memory key store. When api_keys.persistence_file is set, runtime changes are written to that file (tilde expansion is supported) and restored on the next startup, so admin-created keys survive restarts. Without persistence_file, runtime keys are in-memory only and lost on restart. Keys loaded from inline config or api_keys_file are read-only sources and are reloaded on config hot-reload.

List API Keys

GET /admin/api-keys

Returns every API key with its value masked, plus a summary of active, expired, and disabled counts.

Response

{
  "keys": [
    {
      "id": "key-1",
      "masked_key": "sk-***A1aB",
      "user_id": "user-1",
      "organization_id": "org-1",
      "name": "Production key",
      "scopes": ["read", "write"],
      "rate_limit": 600,
      "is_active": true,
      "expires_at": null,
      "created_at": "2026-03-05T10:30:00Z",
      "is_expired": false,
      "allowed_backends": ["openai", "anthropic"]
    }
  ],
  "summary": {
    "total": 1,
    "active": 1,
    "expired": 0,
    "disabled": 0
  }
}

Example

curl -s http://localhost:8080/admin/api-keys \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Create API Key

POST /admin/api-keys

Creates a new API key. If key is omitted, the router generates a cryptographically random value. The full key value is returned only in this response.

Request Body

{
  "id": "key-acme-1",
  "user_id": "user-acme",
  "organization_id": "org-acme",
  "name": "Acme integration",
  "description": "Server-to-server key for the Acme integration",
  "scopes": ["read", "write"],
  "rate_limit": 600,
  "enabled": true,
  "expires_at": "2027-01-01T00:00:00Z",
  "allowed_backends": ["openai"]
}
Field Type Required Description
id string Yes Unique key identifier (1–128 chars).
user_id string Yes Associated user identifier (must be non-empty).
organization_id string Yes Associated organization identifier (must be non-empty).
key string No Custom key value. A new value is generated when omitted.
name string No Human-readable label (max 256 chars).
description string No Notes about the key (max 1024 chars).
scopes array No Permissions; defaults to ["read", "write"]. Must contain at least one scope.
rate_limit integer No Per-key rate limit in requests per minute.
enabled boolean No Whether the key is active; defaults to true.
expires_at string (ISO 8601) No Expiration timestamp.
allowed_backends array No Per-key backend allow-list. Empty or omitted means unrestricted.

Response

Returns 201 Created. The key field is the full value and is shown only here.

{
  "key": "sk-G7q2...full-value...A1",
  "masked_key": "sk-***A1aB",
  "id": "key-acme-1",
  "user_id": "user-acme",
  "organization_id": "org-acme",
  "name": "Acme integration",
  "scopes": ["read", "write"],
  "rate_limit": 600,
  "enabled": true,
  "created_at": "2026-03-05T10:30:00Z",
  "expires_at": "2027-01-01T00:00:00Z",
  "allowed_backends": ["openai"]
}

Error Responses

  • 400 Bad Request: empty user_id/organization_id, no scopes, or a name/description over the length limit.
  • 409 Conflict: a key with the same id already exists.
  • 507 Insufficient Storage: the maximum key count (10,000) has been reached.

Example

curl -X POST http://localhost:8080/admin/api-keys \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "id": "key-acme-1",
    "user_id": "user-acme",
    "organization_id": "org-acme",
    "scopes": ["read", "write"],
    "rate_limit": 600
  }'

Get API Key

GET /admin/api-keys/{id}

Returns a single key by id, with its value masked.

Response

{
  "id": "key-acme-1",
  "masked_key": "sk-***A1aB",
  "user_id": "user-acme",
  "organization_id": "org-acme",
  "name": "Acme integration",
  "scopes": ["read", "write"],
  "rate_limit": 600,
  "is_active": true,
  "created_at": "2026-03-05T10:30:00Z",
  "expires_at": "2027-01-01T00:00:00Z",
  "is_expired": false,
  "is_valid": true,
  "allowed_backends": ["openai"]
}

The is_active, is_expired, and is_valid fields are computed: is_valid is true only when the key is active and not expired.

Error Responses

  • 404 Not Found: no key with the given id.

Example

curl -s http://localhost:8080/admin/api-keys/key-acme-1 \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Update API Key

PUT /admin/api-keys/{id}

Updates one or more properties of an existing key. Only the fields present in the body are changed; omitted fields are left untouched. The key value itself is not changed by this endpoint (use Rotate for that).

Request Body

{
  "name": "Acme integration (renamed)",
  "scopes": ["read"],
  "rate_limit": 300,
  "enabled": true,
  "expires_at": "2027-06-01T00:00:00Z",
  "allowed_backends": ["openai", "anthropic"]
}
Field Type Description
name string New label.
scopes array Replacement scope list.
rate_limit integer New per-key rate limit.
enabled boolean Enable or disable the key.
expires_at string (ISO 8601) New expiration timestamp.
allowed_backends array Backend allow-list. null (omitted) leaves it unchanged; an empty array clears all restrictions; a non-empty array replaces the list.

Response

{
  "success": true,
  "action": "update",
  "key": {
    "id": "key-acme-1",
    "masked_key": "sk-***A1aB",
    "user_id": "user-acme",
    "organization_id": "org-acme",
    "name": "Acme integration (renamed)",
    "scopes": ["read"],
    "rate_limit": 300,
    "is_active": true,
    "created_at": "2026-03-05T10:30:00Z",
    "expires_at": "2027-06-01T00:00:00Z",
    "is_valid": true,
    "allowed_backends": ["openai", "anthropic"]
  }
}

Error Responses

  • 404 Not Found: no key with the given id.

Example

curl -X PUT http://localhost:8080/admin/api-keys/key-acme-1 \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"rate_limit": 300, "scopes": ["read"]}'

Delete API Key

DELETE /admin/api-keys/{id}

Permanently revokes and removes a key. After deletion, any client still presenting the old value fails authentication. This action is irreversible.

Response

{
  "success": true,
  "action": "delete",
  "id": "key-acme-1"
}

Error Responses

  • 404 Not Found: no key with the given id.

Example

curl -X DELETE http://localhost:8080/admin/api-keys/key-acme-1 \
  -H "Authorization: Bearer $ADMIN_TOKEN"

Rotate API Key

POST /admin/api-keys/{id}/rotate

Generates a new secret value for an existing key while preserving its id and all other properties. The previous value stops working immediately. The new value is returned only in this response.

Response

{
  "success": true,
  "action": "rotate",
  "id": "key-acme-1",
  "new_key": "sk-Hq9z...new-full-value...B2",
  "masked_key": "sk-***B2cD",
  "warning": "Store this key securely. It will not be shown again."
}

Error Responses

  • 404 Not Found: no key with the given id.

Example

curl -X POST http://localhost:8080/admin/api-keys/key-acme-1/rotate \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Enable API Key

POST /admin/api-keys/{id}/enable

Marks a key as active. A re-enabled key authenticates again, provided it has not expired.

Response

{
  "success": true,
  "action": "enable",
  "id": "key-acme-1"
}

Error Responses

  • 404 Not Found: no key with the given id.

Example

curl -X POST http://localhost:8080/admin/api-keys/key-acme-1/enable \
  -H "Authorization: Bearer $ADMIN_TOKEN"

Disable API Key

POST /admin/api-keys/{id}/disable

Marks a key as inactive without deleting it. A disabled key fails authentication but keeps its configuration, so it can be re-enabled later. Use this for a reversible suspension instead of Delete.

Response

{
  "success": true,
  "action": "disable",
  "id": "key-acme-1"
}

Error Responses

  • 404 Not Found: no key with the given id.

Example

curl -X POST http://localhost:8080/admin/api-keys/key-acme-1/disable \
  -H "Authorization: Bearer $ADMIN_TOKEN"

Statistics APIs

The Statistics APIs expose aggregated request metrics collected by the StatsCollector. All endpoints are mounted under /admin/stats and share the same authentication as the rest of the Admin API. Alongside the overall, per-model, and per-backend breakdowns, the collector also tracks per-API-key and per-user usage (see Per-API-Key and Per-User Statistics).

Stats collection is enabled by default. It can be configured or disabled via the admin.stats section in your YAML config:

admin:
  stats:
    enabled: true                # Enable/disable collection (default: true)
    retention_window: 24h        # Ring-buffer retention for windowed queries (default: 24h)
    token_tracking: true         # Parse response bodies for token usage (default: true)
    persistence:
      enabled: true              # Enable stats persistence across restarts (default: true)
      path: ./data/stats.json    # File path for the snapshot (default: ./data/stats.json)
      snapshot_interval: 5m      # How often to write periodic snapshots (default: 5m)
      max_age: 7d                # Discard snapshots older than this on startup (default: 7d)

The retention_window and token_tracking settings support hot-reload: changes are applied immediately without a restart.

Stats Persistence

When the persistence subsection is present and enabled is true, the router saves a statistics snapshot to disk periodically and restores it on startup. This ensures that request counters, per-model breakdowns, and the latency ring buffer survive restarts.

How it works:

  • On startup, the router reads the snapshot file and restores all counters and ring-buffer records. Uptime resets to zero on each restart.
  • A background task writes a new snapshot every snapshot_interval. Writes are atomic (temp file + rename) to prevent corruption.
  • On graceful shutdown (SIGTERM/SIGINT), a final snapshot is saved before the process exits.
  • If the snapshot file is missing, corrupted, or older than max_age, the router starts with fresh counters and logs a warning or info message.

Supported duration formats for snapshot_interval and max_age:

Format Example Meaning
Xs 30s 30 seconds
Xm 5m 5 minutes
Xh 1h 1 hour
Xd 7d 7 days

Set max_age to "0" or "" to disable staleness checks (always restore regardless of age).

Get Full Statistics

GET /admin/stats

Returns overall, per-model, and per-backend statistics.

Query Parameters

Parameter Type Description
window string Optional time window filter. Accepted formats: 30m, 1h, 24h, 7d. Omit for all-time totals.

Response

{
  "uptime_seconds": 3600,
  "window": "all",
  "overall": {
    "total_requests": 1500,
    "successful_requests": 1480,
    "failed_requests": 20,
    "avg_latency_ms": 145.3,
    "p50_latency_ms": 120.0,
    "p95_latency_ms": 380.0,
    "p99_latency_ms": 750.0,
    "total_prompt_tokens": 450000,
    "total_completion_tokens": 180000,
    "total_tokens": 630000,
    "tokens_per_sec_avg": 87.4
  },
  "models": [
    {
      "model_id": "gpt-4",
      "total_requests": 900,
      "successful_requests": 895,
      "failed_requests": 5,
      "total_prompt_tokens": 270000,
      "total_completion_tokens": 108000,
      "total_tokens": 378000,
      "avg_latency_ms": 160.2,
      "avg_tokens_per_sec": 92.1,
      "last_used": "2026-03-05T10:30:00Z"
    }
  ],
  "backends": [
    {
      "backend_name": "openai",
      "total_requests": 900,
      "successful_requests": 895,
      "failed_requests": 5,
      "avg_latency_ms": 160.2,
      "health_status": "healthy"
    }
  ]
}

Example

# All-time statistics
curl -s http://localhost:8080/admin/stats \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

# Last hour only
curl -s "http://localhost:8080/admin/stats?window=1h" \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Get Per-Model Statistics

GET /admin/stats/models

Returns only the per-model breakdown (subset of the full stats response).

Response

{
  "models": [
    {
      "model_id": "gpt-4",
      "total_requests": 900,
      "successful_requests": 895,
      "failed_requests": 5,
      "total_prompt_tokens": 270000,
      "total_completion_tokens": 108000,
      "total_tokens": 378000,
      "avg_latency_ms": 160.2,
      "avg_tokens_per_sec": 92.1,
      "last_used": "2026-03-05T10:30:00Z"
    }
  ]
}

Models are sorted by total_requests in descending order.

Example

curl -s http://localhost:8080/admin/stats/models \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq '.models[].model_id'

Get Per-Backend Statistics

GET /admin/stats/backends

Returns only the per-backend breakdown. The health_status field is populated from the health checker ("healthy", "unhealthy", or "unknown" when health checks are disabled).

Response

{
  "backends": [
    {
      "backend_name": "openai",
      "total_requests": 900,
      "successful_requests": 895,
      "failed_requests": 5,
      "avg_latency_ms": 160.2,
      "health_status": "healthy"
    }
  ]
}

Backends are sorted by total_requests in descending order.

Example

curl -s http://localhost:8080/admin/stats/backends \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Per-API-Key and Per-User Statistics

These endpoints break down usage by the API key that authenticated each request and by the user attached to that key. They sit beside Get Per-Model Statistics and Get Per-Backend Statistics: same collector, a different grouping dimension.

Identifier and Bucketing Semantics

  • Coverage: every inference surface contributes to these statistics — /v1/chat/completions, /anthropic/v1/messages, and the OpenAI Responses API (/v1/responses, including its pass-through, Chat-Completions-conversion, and Anthropic-conversion strategies). Successful non-streaming requests carry full token usage; streaming requests are recorded at connect time (request counts and per-key/per-user attribution, with token totals omitted because they are only known once the stream completes).
  • api_key_id is a derived, non-reversible identifier, never a raw key. It is the same value used as the api_key_id Prometheus label, and it corresponds to the issued key's id. The per-user endpoints key on the user_id attached to the matched key. The derived api_key_id requires the metrics feature to be compiled in; without it, per-key attribution collapses to the "anonymous" bucket (per-user attribution is unaffected, since it reads the key's user_id directly).
  • Requests with no key (or no associated user) are bucketed under "anonymous".
  • Each dimension has a cardinality cap of 1000 distinct identifiers (excluding the reserved buckets). Once the cap is reached, further new identifiers are folded into an "unknown" overflow bucket so their usage is still counted in aggregate.
  • The window query parameter is accepted and echoed back in the response for consistency with GET /admin/stats, but the per-key and per-user aggregates are all-time totals, exactly like GET /admin/stats/models. The identifier is resolved off the request hot path, so it is not present on the windowed ring-buffer records used for time-filtered latency percentiles.

The ApiKeyStats and UserStats objects share the same shape:

Field Type Description
api_key_id / user_id string The derived key identifier or the user identifier.
total_requests integer Total requests attributed to this identifier.
successful_requests integer Requests that completed successfully.
failed_requests integer Requests that failed.
total_prompt_tokens integer Prompt tokens consumed.
total_completion_tokens integer Completion tokens produced.
total_tokens integer Sum of prompt and completion tokens.
avg_latency_ms number Average latency in milliseconds.
avg_tokens_per_sec number Average generation throughput in tokens per second.
last_used string (ISO 8601) or null Timestamp of the most recent request, or null if never used.

Get Per-API-Key Statistics

GET /admin/stats/api-keys

Returns one entry per tracked API key, sorted by total_requests in descending order.

Query Parameters
Parameter Type Description
window string Accepted and echoed in the window field, but does not filter the all-time aggregates.
Response
{
  "window": "all",
  "api_keys": [
    {
      "api_key_id": "k_3f9a1c",
      "total_requests": 1200,
      "successful_requests": 1185,
      "failed_requests": 15,
      "total_prompt_tokens": 360000,
      "total_completion_tokens": 144000,
      "total_tokens": 504000,
      "avg_latency_ms": 152.7,
      "avg_tokens_per_sec": 88.3,
      "last_used": "2026-03-05T10:30:00Z"
    },
    {
      "api_key_id": "anonymous",
      "total_requests": 80,
      "successful_requests": 80,
      "failed_requests": 0,
      "total_prompt_tokens": 12000,
      "total_completion_tokens": 4800,
      "total_tokens": 16800,
      "avg_latency_ms": 131.0,
      "avg_tokens_per_sec": 90.1,
      "last_used": "2026-03-05T10:28:00Z"
    }
  ]
}
Example
curl -s http://localhost:8080/admin/stats/api-keys \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

# The window param is accepted and echoed but does not change the aggregates
curl -s "http://localhost:8080/admin/stats/api-keys?window=24h" \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq '.window'

Get Per-API-Key Statistics by ID

GET /admin/stats/api-keys/{id}

Returns the stats for a single api_key_id (the derived identifier returned by the list endpoint, not a raw key). Returns 404 Not Found when the identifier has no recorded usage.

Response
{
  "window": "all",
  "api_key": {
    "api_key_id": "k_3f9a1c",
    "total_requests": 1200,
    "successful_requests": 1185,
    "failed_requests": 15,
    "total_prompt_tokens": 360000,
    "total_completion_tokens": 144000,
    "total_tokens": 504000,
    "avg_latency_ms": 152.7,
    "avg_tokens_per_sec": 88.3,
    "last_used": "2026-03-05T10:30:00Z"
  }
}
Example
curl -s http://localhost:8080/admin/stats/api-keys/k_3f9a1c \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Get Per-User Statistics

GET /admin/stats/users

Returns one entry per tracked user identifier (the user_id attached to the matched key), sorted by total_requests in descending order. Same fields and bucketing rules as the per-API-key endpoint.

Response
{
  "window": "all",
  "users": [
    {
      "user_id": "user-acme",
      "total_requests": 1200,
      "successful_requests": 1185,
      "failed_requests": 15,
      "total_prompt_tokens": 360000,
      "total_completion_tokens": 144000,
      "total_tokens": 504000,
      "avg_latency_ms": 152.7,
      "avg_tokens_per_sec": 88.3,
      "last_used": "2026-03-05T10:30:00Z"
    }
  ]
}
Example
curl -s http://localhost:8080/admin/stats/users \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Get Per-User Statistics by ID

GET /admin/stats/users/{user_id}

Returns the stats for a single user_id. Returns 404 Not Found when the identifier has no recorded usage.

Response
{
  "window": "all",
  "user": {
    "user_id": "user-acme",
    "total_requests": 1200,
    "successful_requests": 1185,
    "failed_requests": 15,
    "total_prompt_tokens": 360000,
    "total_completion_tokens": 144000,
    "total_tokens": 504000,
    "avg_latency_ms": 152.7,
    "avg_tokens_per_sec": 88.3,
    "last_used": "2026-03-05T10:30:00Z"
  }
}
Example
curl -s http://localhost:8080/admin/stats/users/user-acme \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Per-Model Breakdown and Usage Time Series

These per-identifier drill-downs power dashboard widgets: a per-model breakdown (a "tokens by model" donut) and a daily usage trend (a usage-over-time chart). They are tracked as two independent dimensions, not a per-(identifier, model, date) cube, so cardinality stays bounded.

Scope and semantics carry over from Per-API-Key and Per-User Statistics:

  • Only token and request totals are tracked. There is no cost field; the dashboard derives cost from tokens against its own pricing table.
  • api_key_id is the derived, non-reversible identifier (never a raw key); user_id is the user attached to the matched key. Unknown identifiers return 200 OK with an empty array, matching the list endpoints rather than returning 404.
  • Each new dimension has its own cardinality cap (folding overflow into an aggregate "unknown" bucket that is excluded from per-identifier reads), and the unknown-model label is "unknown".

Get Per-API-Key Model Breakdown

GET /admin/stats/api-keys/{id}/models

Returns the per-model breakdown for a single api_key_id as a models array of the same ModelStats objects used by GET /admin/stats/models (model id, request counts, prompt/completion/total tokens, average latency, average tokens-per-second, last used), sorted by total_requests descending. The window query parameter is accepted and echoed but does not filter these all-time aggregates.

Response
{
  "api_key_id": "k_3f9a1c",
  "window": "all",
  "models": [
    {
      "model_id": "claude-haiku-4-5",
      "total_requests": 2,
      "successful_requests": 2,
      "failed_requests": 0,
      "total_prompt_tokens": 374,
      "total_completion_tokens": 8,
      "total_tokens": 382,
      "avg_latency_ms": 975.0,
      "avg_tokens_per_sec": 195.9,
      "last_used": "2026-06-18T22:11:54Z"
    }
  ]
}
Example
curl -s http://localhost:8080/admin/stats/api-keys/k_3f9a1c/models \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Get Per-User Model Breakdown

GET /admin/stats/users/{user_id}/models

Same shape as the per-API-key model breakdown, grouped by user_id.

Response
{
  "user_id": "user-acme",
  "window": "all",
  "models": [
    {
      "model_id": "claude-haiku-4-5",
      "total_requests": 2,
      "successful_requests": 2,
      "failed_requests": 0,
      "total_prompt_tokens": 374,
      "total_completion_tokens": 8,
      "total_tokens": 382,
      "avg_latency_ms": 975.0,
      "avg_tokens_per_sec": 195.9,
      "last_used": "2026-06-18T22:11:54Z"
    }
  ]
}
Example
curl -s http://localhost:8080/admin/stats/users/user-acme/models \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Get Per-API-Key Usage Time Series

GET /admin/stats/api-keys/{id}/series?from=&to=&interval=day

Returns a daily usage series for a single api_key_id, one point per UTC calendar day, sorted ascending by date. Buckets are retained for series_retention_days (default 30); the periodic snapshot task prunes older days and the read path filters them out, so the series never returns days beyond the retention window.

Query Parameters
Parameter Type Description
from string Inclusive lower bound, as a Unix-millis integer or an RFC 3339 timestamp. Defaults to 30 days ago.
to string Exclusive upper bound, same formats as from. Defaults to now.
interval string Bucket granularity. Only day is supported; any other value returns 400 Bad Request. Defaults to day.

An inverted range (from >= to) also returns 400 Bad Request.

Response
{
  "api_key_id": "k_3f9a1c",
  "interval": "day",
  "series": [
    { "date": "2026-06-17", "total_requests": 12, "prompt_tokens": 3600, "completion_tokens": 1440, "total_tokens": 5040 },
    { "date": "2026-06-18", "total_requests": 8, "prompt_tokens": 2400, "completion_tokens": 960, "total_tokens": 3360 }
  ]
}
Example
curl -s "http://localhost:8080/admin/stats/api-keys/k_3f9a1c/series?from=2026-06-01T00:00:00Z&to=2026-06-30T00:00:00Z&interval=day" \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Get Per-User Usage Time Series

GET /admin/stats/users/{user_id}/series?from=&to=&interval=day

Same shape and parameters as the per-API-key series, grouped by user_id.

Response
{
  "user_id": "user-acme",
  "interval": "day",
  "series": [
    { "date": "2026-06-17", "total_requests": 12, "prompt_tokens": 3600, "completion_tokens": 1440, "total_tokens": 5040 },
    { "date": "2026-06-18", "total_requests": 8, "prompt_tokens": 2400, "completion_tokens": 960, "total_tokens": 3360 }
  ]
}
Example
curl -s "http://localhost:8080/admin/stats/users/user-acme/series?from=2026-06-01T00:00:00Z&to=2026-06-30T00:00:00Z" \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Reset Statistics

POST /admin/stats/reset

Resets all counters, per-model records, per-backend records, the per-API-key and per-user records (including their per-model breakdowns and daily time-series buckets), and the latency ring buffer. This action is irreversible.

Response

{
  "success": true,
  "action": "reset",
  "message": "Statistics counters have been reset"
}

Example

curl -X POST http://localhost:8080/admin/stats/reset \
  -H "Authorization: Bearer $ADMIN_TOKEN"

Persistent Metrics Log API

The Persistent Metrics Log API exposes recent Prometheus registry history persisted to a local store (default: SQLite). See the Persistent Metrics Log guide for storage layout, retention math, and configuration.

Get Metrics History

GET /admin/metrics/history?metric=<name>&from=<ts>&to=<ts>&limit=<n>

Returns historical samples for metric over a half-open time window [from, to).

Query parameters

Parameter Required Default Notes
metric yes Metric family name, e.g. http_requests_total.
from no now − 24h Unix milliseconds (int) or RFC 3339 timestamp.
to no now Unix milliseconds (int) or RFC 3339 timestamp.
limit no 10,000 Cap on returned rows. Hard ceiling 100,000.

Response

{
  "metric": "http_requests_total",
  "from_ms": 1715385600000,
  "to_ms": 1715472000000,
  "row_count": 2,
  "limit": 10000,
  "samples": [
    {
      "ts_ms": 1715385600000,
      "labels": {"backend": "openai", "endpoint": "/v1/chat/completions"},
      "value": 42.0,
      "kind": "counter"
    }
  ]
}

Histograms and summaries return multiple kind rows per family — see the Persistent Metrics Log guide.

Error responses

  • 400 Bad Requestmetric missing or oversized, or time range non-positive.
  • 404 Not Found — persistence is disabled (metrics.persistence.enabled: false).
  • 500 Internal Server Error — storage error.
  • 503 Service Unavailablemetrics-persistence feature was not compiled in.

Example

curl -s 'http://localhost:8080/admin/metrics/history?metric=http_requests_total&limit=100' \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq .

Response Cache Admin APIs

The Response Cache Admin APIs expose statistics and invalidation operations for the response cache. All endpoints are mounted under /admin/response-cache and require the same authentication as the rest of the Admin API.

Response caching is configured in the response_cache section of your YAML config. See the Response Cache Configuration guide for full configuration details.

Get Response Cache Statistics

GET /admin/response-cache/stats

Returns current response cache statistics including hit/miss counts, memory usage, and configuration summary.

Response

{
  "enabled": true,
  "backend_type": "memory",
  "entries": 42,
  "capacity": 1000,
  "requests": {
    "hit": 120,
    "miss": 80,
    "skip": 15,
    "total": 215
  },
  "hit_rate": "0.6000",
  "evictions": 3,
  "size_bytes": 1048576,
  "config": {
    "backend": "memory",
    "ttl": "5m",
    "capacity": 1000,
    "max_response_size": 1048576,
    "max_stream_buffer_size": 10485760
  }
}

When using the Redis backend (backend: redis), the response includes an additional redis object:

{
  "enabled": true,
  "backend_type": "redis",
  "entries": 42,
  "capacity": 1000,
  "requests": { "hit": 120, "miss": 80, "skip": 15, "total": 215 },
  "hit_rate": "0.6000",
  "evictions": 3,
  "size_bytes": 1048576,
  "config": { "backend": "redis", "ttl": "5m", "capacity": 1000, "max_response_size": 1048576, "max_stream_buffer_size": 10485760 },
  "redis": {
    "connections": { "active": 3, "idle": 5 },
    "errors": { "connection": 0, "timeout": 0, "other": 0, "total": 0 },
    "fallback_active": false
  }
}

When response caching is disabled (response_cache.enabled: false or the section is absent), enabled is false, entries and capacity are 0, and config is null.

Response Fields

Field Type Description
enabled boolean Whether response caching is active
backend_type string Active cache backend: "memory" or "redis"
entries integer Current number of cached entries
capacity integer Maximum cache capacity (LRU limit)
requests.hit integer Requests served from cache
requests.miss integer Cache misses (backend was called, entry stored)
requests.skip integer Non-cacheable requests (e.g., temperature > 0)
requests.total integer Total cacheable lookups (hit + miss + skip)
hit_rate string Rolling cache hit rate as a decimal string (e.g., "0.6000")
evictions integer Total LRU evictions since startup
size_bytes integer Approximate memory usage of cached entries in bytes
config object or null Active configuration summary; null when disabled
redis object or absent Redis-specific stats (only present when backend_type is "redis")
redis.connections.active integer Active connections in the Redis pool
redis.connections.idle integer Idle connections in the Redis pool
redis.errors.connection integer Redis connection errors since startup
redis.errors.timeout integer Redis command timeout errors since startup
redis.errors.other integer Other Redis errors since startup
redis.errors.total integer Total Redis errors since startup
redis.fallback_active boolean Whether the in-memory fallback is currently active

Example

curl -s http://localhost:8080/admin/response-cache/stats \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Invalidate Response Cache

POST /admin/response-cache/invalidate

Clears cache entries. Only full cache invalidation via clear_all: true is supported; targeted invalidation by model or tenant is not available.

Request Body

{
  "clear_all": true,
  "model": "gpt-4",
  "tenant_id": "tenant-abc"
}
Field Type Required Description
clear_all boolean No When true, clears the entire cache. Defaults to false.
model string No Accepted but currently ignored; only clear_all is honored. Must not exceed 256 characters.
tenant_id string No Accepted but currently ignored; only clear_all is honored. Must not exceed 256 characters.

Response (clear_all: true)

{
  "success": true,
  "action": "clear_all",
  "cleared_entries": 42
}

Response (clear_all: false or omitted)

{
  "success": true,
  "action": "noop",
  "message": "Targeted invalidation by model/tenant_id is not yet supported. Use clear_all: true to clear the entire cache."
}

Response (cache disabled)

{
  "success": false,
  "error": "Response cache is not enabled"
}

Example

# Clear entire cache
curl -X POST http://localhost:8080/admin/response-cache/invalidate \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"clear_all": true}'

KV Cache Index Admin APIs

The KV Cache Index Admin APIs expose statistics, per-backend state, and a clear operation for the KV cache index subsystem. All endpoints are mounted under /admin/kv-index and require the same authentication as the rest of the Admin API.

The KV cache index tracks which backends hold cached KV data for specific token prefixes, enabling KV-aware routing. It is configured in the kv_cache_index section of your YAML config.

Get KV Cache Index Statistics

GET /admin/kv-index/stats

Returns overall KV cache index statistics, including index size, event source connection status, and routing decision counts.

Response

{
  "enabled": true,
  "config": {
    "backend": "memory",
    "max_entries": 100000,
    "entry_ttl_seconds": 600,
    "event_sources_count": 2,
    "scoring": {
      "overlap_weight": 0.6,
      "load_weight": 0.3,
      "health_weight": 0.1,
      "min_overlap_threshold": 0.3
    }
  },
  "index": {
    "prefix_count": 45,
    "entry_count": 120,
    "total_hits": 3842,
    "total_evictions": 12
  },
  "event_sources": [
    {
      "backend_name": "vllm-1",
      "connected": true,
      "events_received": 2100,
      "events_dropped": 0,
      "last_event_at": "2025-03-12T10:45:00Z",
      "reconnect_count": 0
    }
  ],
  "routing_decisions": {
    "kv_aware": 980,
    "fallback": 120,
    "total": 1100
  },
  "query_latency_count": 1100,
  "overlap_score_count": 980
}

When the KV cache index is disabled (kv_cache_index.enabled: false or the section is absent), enabled is false, config is null, and all counters are 0.

Response Fields

Field Type Description
enabled boolean Whether the KV cache index is active
config object or null Active configuration summary; null when disabled
config.backend string Index backend: "memory" or "redis"
config.max_entries integer Maximum tracked prefix hash entries
config.entry_ttl_seconds integer TTL for index entries in seconds
config.event_sources_count integer Number of configured event sources
config.scoring object Scoring weight configuration
index.prefix_count integer Number of distinct prefix hashes tracked
index.entry_count integer Total (prefix, backend) pairs tracked
index.total_hits integer Total cache hit recordings since startup
index.total_evictions integer Total cache eviction recordings since startup
event_sources array Status of each event source consumer
event_sources[].connected boolean Whether the consumer is currently connected
event_sources[].events_received integer Total events received from this source
event_sources[].events_dropped integer Events dropped due to backpressure
event_sources[].reconnect_count integer Number of reconnect attempts since startup
routing_decisions.kv_aware integer Requests routed using KV-aware selection
routing_decisions.fallback integer Requests that fell back to the default strategy
routing_decisions.total integer Total routing decisions made

Example

curl -s http://localhost:8080/admin/kv-index/stats \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Get Per-Backend KV Cache State

GET /admin/kv-index/backends

Returns per-backend KV cache event statistics, including events received, processed, dropped, connection status, and index event counts.

Response (enabled)

{
  "enabled": true,
  "backends": [
    {
      "backend_name": "vllm-1",
      "connection": {
        "connected": true,
        "reconnect_count": 0,
        "last_event_at": "2025-03-12T10:45:00Z"
      },
      "events": {
        "received": 2100,
        "dropped": 0,
        "index_created": 1950,
        "index_evicted": 150
      }
    },
    {
      "backend_name": "vllm-2",
      "connection": {
        "connected": false,
        "reconnect_count": 3,
        "last_event_at": null
      },
      "events": {
        "received": 0,
        "dropped": 0,
        "index_created": 0,
        "index_evicted": 0
      },
      "configured_endpoint": "ws://vllm-2:8000/v1/kv_events"
    }
  ]
}

Backends that appear in kv_cache_index.event_sources but have no active consumer yet are included with connected: false and a configured_endpoint field.

Response (disabled)

{
  "enabled": false,
  "backends": []
}

Response Fields

Field Type Description
enabled boolean Whether the KV cache index is active
backends[].backend_name string Backend identifier
backends[].connection.connected boolean Whether the event stream consumer is connected
backends[].connection.reconnect_count integer Reconnect attempts since startup
backends[].connection.last_event_at string or null ISO 8601 timestamp of the most recent event
backends[].events.received integer Total events received from this backend
backends[].events.dropped integer Events dropped due to backpressure
backends[].events.index_created integer Index entries created from events
backends[].events.index_evicted integer Index entries evicted from events
backends[].configured_endpoint string Configured endpoint URL (only present for inactive sources)

Example

curl -s http://localhost:8080/admin/kv-index/backends \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Clear KV Cache Index

POST /admin/kv-index/clear

Clears all entries from the KV cache index. Intended for debugging and testing. In production the index rebuilds automatically from incoming KV events.

Response (success)

{
  "success": true,
  "entries_before_clear": 120,
  "cleared_entries": 45
}

entries_before_clear is the total (prefix, backend) pair count before clearing. cleared_entries is the number of prefix hash buckets removed. For the Redis backend, cleared_entries counts the number of Redis keys deleted; because each key has a TTL, any remaining keys expire automatically.

Response (disabled)

{
  "success": false,
  "error": "KV cache index is not enabled"
}

Example

curl -X POST http://localhost:8080/admin/kv-index/clear \
  -H "Authorization: Bearer $ADMIN_TOKEN"

Smart Routing Admin APIs

The Smart Routing Admin APIs expose the model tier registry, letting you inspect which tier and domain profile the router assigns to each model, and update profiles at runtime without a restart. All endpoints are mounted under /admin/smart-routing and require the same authentication as the rest of the Admin API.

Smart routing is enabled by setting smart_routing.enabled: true in your YAML config. When disabled, the list endpoint still responds but reports "enabled": false and returns an empty profile list.

List Model Profiles

GET /admin/smart-routing/model-profiles

Returns all explicitly configured profiles plus any auto-inferred profiles that have been cached since startup.

Response

{
  "enabled": true,
  "default_tier": 2,
  "total": 3,
  "profiles": [
    {
      "model_id": "gpt-4o",
      "tier": 1,
      "tier_name": "flagship",
      "domains": ["general", "code", "reasoning"],
      "cost_per_1k_input_tokens": 0.005,
      "cost_per_1k_output_tokens": 0.015,
      "source": "explicit_exact"
    },
    {
      "model_id": "llama-3-8b-q4_K_M",
      "tier": 3,
      "tier_name": "lightweight",
      "domains": ["general"],
      "cost_per_1k_input_tokens": null,
      "cost_per_1k_output_tokens": null,
      "source": "explicit_pattern"
    }
  ]
}

When smart routing is disabled, enabled is false, profiles is [], and total is 0.

Response Fields

Field Type Description
enabled boolean Whether smart routing is active
default_tier integer Tier assigned when no profile matches (1, 2, or 3)
total integer Number of profiles returned
profiles[].model_id string The model identifier
profiles[].tier integer Numeric tier: 1 = Flagship, 2 = Standard, 3 = Lightweight
profiles[].tier_name string Human-readable tier name
profiles[].domains array of strings Domain specialization tags
profiles[].cost_per_1k_input_tokens number or null Input token cost per 1,000 tokens
profiles[].cost_per_1k_output_tokens number or null Output token cost per 1,000 tokens
profiles[].source string How the profile was resolved (see below)

source values:

Value Meaning
explicit_exact Profile was configured by exact model name
explicit_pattern Profile was matched by a glob pattern
auto_inferred Profile was inferred from pricing, capabilities, or name heuristics
default No match found; default tier was used

Example

curl http://localhost:8080/admin/smart-routing/model-profiles \
  -H "Authorization: Bearer $ADMIN_TOKEN"

Get Model Profile

GET /admin/smart-routing/model-profiles/{model}

Returns the resolved profile for a specific model. If the model has metadata in model-metadata.yaml, auto-inference uses pricing and capability information from there. Otherwise, name heuristics apply.

Path Parameters

Parameter Description
model Model identifier (max 256 characters)

Response

{
  "model_id": "gemini-1.5-flash",
  "tier": 3,
  "tier_name": "lightweight",
  "domains": ["general"],
  "cost_per_1k_input_tokens": null,
  "cost_per_1k_output_tokens": null,
  "source": "auto_inferred"
}

Example

curl http://localhost:8080/admin/smart-routing/model-profiles/gpt-4o \
  -H "Authorization: Bearer $ADMIN_TOKEN"

Update Model Profiles

PUT /admin/smart-routing/model-profiles

Replaces all model profile configurations. The registry reloads immediately; the inferred-profile cache is cleared so subsequent requests re-evaluate against the new profiles. If a config_sender is available, the change is also propagated to the in-memory config.

Request Body

{
  "default_tier": 2,
  "model_profiles": [
    {
      "model": "gpt-4o",
      "tier": 1,
      "domains": ["general", "code", "reasoning"],
      "cost_per_1k_input_tokens": 0.005,
      "cost_per_1k_output_tokens": 0.015
    },
    {
      "model_pattern": "*-q4_K_M",
      "tier": 3,
      "domains": ["general"]
    }
  ]
}

Each entry must include either model (exact name) or model_pattern (glob). Entries with neither are rejected with 400 Bad Request. default_tier is optional; if omitted, the current default is preserved.

Request Fields

Field Type Required Description
model_profiles array Yes Profile list; replaces existing configuration
model_profiles[].model string Conditional Exact model name (max 200 chars)
model_profiles[].model_pattern string Conditional Glob pattern such as *-q4_K_M (max 200 chars)
model_profiles[].tier integer Yes 1 (Flagship), 2 (Standard), or 3 (Lightweight)
model_profiles[].domains array of strings No Domain tags: general, code, reasoning, creative, multilingual, vision
model_profiles[].cost_per_1k_input_tokens number No Input cost per 1,000 tokens
model_profiles[].cost_per_1k_output_tokens number No Output cost per 1,000 tokens
default_tier integer No Fallback tier when no profile matches

Response

{
  "status": "updated",
  "profiles_count": 2,
  "default_tier": 2
}

Example

curl -X PUT http://localhost:8080/admin/smart-routing/model-profiles \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model_profiles": [
      {"model": "gpt-4o", "tier": 1, "domains": ["general", "code"]},
      {"model_pattern": "*-mini", "tier": 3, "domains": ["general"]}
    ]
  }'

Smart Routing Status

GET /admin/smart-routing/status

Returns overall smart routing status including enabled state, load state, classifier method, and policy count.

Response

{
  "enabled": true,
  "virtual_model": "auto",
  "intercept_all": false,
  "default_tier": 2,
  "classifier_method": "rule",
  "has_llm_classifier": false,
  "load_state": "normal",
  "load_monitoring_enabled": false,
  "debug_headers": false,
  "policy_count": 5,
  "profile_count": 3
}

Smart Routing Stats

GET /admin/smart-routing/stats

Returns aggregated routing statistics including profile count, policy count, and LLM classifier cache info.

Classify (Diagnostic)

POST /admin/smart-routing/classify

Classify a request without routing it. Useful for debugging classification behavior.

Request

{
  "payload": {
    "model": "auto",
    "messages": [{"role": "user", "content": "Hello, world!"}]
  }
}

Response

{
  "complexity": "trivial",
  "domain": "general",
  "confidence": 0.95,
  "classifier_type": "rule_based",
  "required_capabilities": [],
  "reasoning": null,
  "signals": [
    {"name": "message_length", "strength": 0.1, "influences": "complexity"}
  ]
}

Simulate (Diagnostic)

POST /admin/smart-routing/simulate

Simulate the full routing pipeline (classification + policy evaluation + model selection + load state) without actually forwarding the request. Returns the complete routing decision chain.

Request

Same as the classify endpoint.

Response

{
  "routed": true,
  "target_model": "gpt-4o-mini",
  "classification": {
    "complexity": "simple",
    "domain": "general",
    "confidence": 0.92,
    "classifier_type": "rule_based"
  },
  "policy": {
    "name": "trivial_to_lightweight",
    "tier": 3,
    "prefer_domains": [],
    "require_capabilities": []
  },
  "load_state": "normal",
  "classification_duration_ms": 0.05,
  "available_models": 5
}

List Routing Policies

GET /admin/smart-routing/policies

Returns the currently active routing policies with their conditions and targets.

Update Routing Policies

PUT /admin/smart-routing/policies

Hot-reload routing policies at runtime.

Request

{
  "routing_policies": [
    {
      "name": "all_to_flagship",
      "when": {},
      "route_to": {"tier": 1}
    }
  ],
  "virtual_model": "auto",
  "intercept_all": false
}

Load State

GET /admin/smart-routing/load-state

Returns the current load state with assessment details.

Response

{
  "enabled": true,
  "state": "normal",
  "max_tier": null,
  "prefer_quantized": false,
  "reject_expert": false
}

Cache Stats

GET /admin/smart-routing/cache/stats

Returns LLM classifier cache statistics.

Response

{
  "available": true,
  "entries": 42,
  "capacity": 10000,
  "ttl_seconds": 300
}

Clear Cache

POST /admin/smart-routing/cache/clear

Clear all entries from the LLM classifier cache.

Response

{
  "status": "cleared",
  "entries_removed": 42
}

Guardrail Admin APIs

The Guardrail Admin APIs let you inspect and adjust the content-safety guardrail policy at runtime without a restart. Changes propagate through the same hot-reload config channel that the running GuardrailService subscribes to, so a mode switch, an enabled toggle, a threshold change, or a route override takes effect on the live request path immediately. All endpoints are mounted under /admin/guardrails and require the same authentication and audit logging as the rest of the Admin API.

The guardrail provider set itself is defined in the configuration file; these endpoints toggle and tune the existing providers and the global/per-route policy. They do not create or remove providers.

Get Guardrail Policy

GET /admin/guardrails

Returns the effective guardrail policy and a status summary. Secrets (the bypass_api_keys list) are masked. service_active is false when guardrails were disabled at startup and no service is running; in that case the returned policy is the configured policy but no checks execute.

Response

{
  "enabled": true,
  "mode": "enforce",
  "service_active": true,
  "registered_providers": ["openai-moderation", "llama-guard"],
  "policy": {
    "enabled": true,
    "mode": "enforce",
    "timeout_ms": 2000,
    "on_error": "fail_open",
    "block_behavior": "content_filter",
    "streaming_mode": "buffer_full",
    "providers": [ ... ],
    "routes": { ... },
    "bypass_api_keys": ["su...(24 chars)"],
    "allow": { "exact": [], "regex": [] },
    "deny": { "exact": [], "regex": [] }
  }
}

Example

curl http://localhost:8080/admin/guardrails \
  -H "Authorization: Bearer <admin-token>"

Update Guardrail Policy

PATCH /admin/guardrails

Partially updates the global guardrail policy. Every field is optional; only the provided fields change. Providers and per-route overrides are managed through their own endpoints below. The candidate policy is validated before it is applied; an invalid change (e.g. timeout_ms: 0, or enabling enforce mode with no providers) returns 400 and leaves the running policy unchanged.

Request Body

Field Type Description
enabled boolean Toggle guardrails on/off globally
mode string monitor or enforce
timeout_ms integer Global guardrail timeout in milliseconds
on_error string fail_open or fail_closed
block_behavior string content_filter, error, or refusal_message
streaming_mode string buffer_full, chunked, or passthrough
streaming_chunk_size integer chunked: characters of new text to accumulate before each incremental check (default 200)
streaming_context_size integer chunked: trailing characters carried into each check for cross-boundary context (default 50)
streaming_stream_first boolean chunked: emit each window before checking it (true) or check before emitting (false, default)
allow object Replace the global allow list ({ "exact": [], "regex": [] })
deny object Replace the global deny list
bypass_api_keys array Replace the bypass API key list

Response

{
  "status": "updated",
  "enabled": true,
  "mode": "enforce"
}

Example

curl -X PATCH http://localhost:8080/admin/guardrails \
  -H "Authorization: Bearer <admin-token>" \
  -H "Content-Type: application/json" \
  -d '{"mode": "enforce"}'

Update Guardrail Provider

PUT /admin/guardrails/providers/{name}

Updates the runtime settings of a single configured provider. All fields are optional. Returns 404 if no provider with the given name is configured.

Request Body

Field Type Description
enabled boolean Enable or disable this provider
category_thresholds object Replace the per-category score thresholds ({ "violence": 0.8 })
timeout_ms integer or null Set or clear the per-provider timeout override
on_error string or null Set or clear the per-provider error policy override

Response

{
  "status": "updated",
  "provider": "llama-guard",
  "enabled": false
}

Example

curl -X PUT http://localhost:8080/admin/guardrails/providers/llama-guard \
  -H "Authorization: Bearer <admin-token>" \
  -H "Content-Type: application/json" \
  -d '{"enabled": false}'

Set Guardrail Route Override

PUT /admin/guardrails/routes/{route}

Creates or replaces the per-route guardrail override for the given route. The request body is a route override object; any omitted field inherits the global policy.

Request Body

Field Type Description
mode string Override the operating mode for this route
enabled boolean Override whether guardrails run for this route
providers array Restrict this route to a subset of provider names
category_thresholds object Per-route category thresholds
allow object Route-specific allow list
deny object Route-specific deny list

Response

{
  "status": "updated",
  "route": "gpt-4o"
}

Example

curl -X PUT http://localhost:8080/admin/guardrails/routes/gpt-4o \
  -H "Authorization: Bearer <admin-token>" \
  -H "Content-Type: application/json" \
  -d '{"mode": "monitor"}'

Delete Guardrail Route Override

DELETE /admin/guardrails/routes/{route}

Removes the per-route override, falling the route back to the global policy. Returns 404 if no override is configured for the route.

Response

{
  "status": "deleted",
  "route": "gpt-4o"
}

Test Guardrails (Dry Run)

POST /admin/guardrails/test

Diagnostic endpoint for threshold tuning. Runs every registered provider against the supplied sample text and returns each provider's verdict plus the aggregated most-severe-wins verdict. The dry run ignores the global mode and the bypass list so the raw provider output is visible; disabled providers (and those that do not apply to the requested stage) are reported as skipped. Returns 400 when no guardrail service is active.

Request Body

Field Type Description
text string The sample text to evaluate (required)
stage string input (default) or output
model string Optional model identifier for the evaluation context
route string Optional route name for the evaluation context

Response

{
  "stage": "input",
  "providers": [
    {
      "provider": "openai-moderation",
      "skipped": false,
      "verdict": { "verdict": "allow" }
    },
    {
      "provider": "llama-guard",
      "skipped": false,
      "verdict": {
        "verdict": "block",
        "category": "violence",
        "score": 0.97,
        "reason": "..."
      }
    }
  ],
  "aggregated": {
    "verdict": "block",
    "category": "violence",
    "score": 0.97,
    "reason": "..."
  }
}

Example

curl -X POST http://localhost:8080/admin/guardrails/test \
  -H "Authorization: Bearer <admin-token>" \
  -H "Content-Type: application/json" \
  -d '{"text": "sample prompt to evaluate", "stage": "input"}'

Data Models

Configuration Sections

Section Description Hot Reload
server Bind address, workers, connection pool Requires restart
backends Backend URLs, weights, models Gradual
health_checks Intervals, thresholds Gradual
logging Log level, format, output Immediate
retry Max attempts, delays, backoff Immediate
timeouts Connect, request, idle timeouts Gradual
rate_limiting Limits, storage, whitelist Immediate
circuit_breaker Thresholds, recovery time Immediate
global_prompts System prompt injection Immediate
fallback Fallback chains, policies Gradual
files Files API settings Gradual
api_keys API key configuration Immediate
metrics Prometheus, labels Gradual
admin Admin API settings Gradual
admin.stats Stats collection settings Immediate
routing Model routing rules Gradual
smart_routing Model tier registry and profiles Immediate

Backend Object

{
  "name": "string",
  "url": "string (http:// or https://)",
  "api_key": "string (optional, masked in responses)",
  "weight": "integer (1-100)",
  "models": ["string"],
  "enabled": "boolean",
  "health_check": {
    "enabled": "boolean",
    "path": "string",
    "interval": "string (duration)"
  }
}

History Entry Object

{
  "version": "integer",
  "timestamp": "string (ISO 8601)",
  "sections_changed": ["string"],
  "source": "string (api|file_reload|initial|rollback)",
  "user": "string",
  "description": "string (optional)",
  "rollback_available": "boolean"
}

Validation Result Object

{
  "valid": "boolean",
  "errors": [
    {
      "field": "string",
      "message": "string",
      "code": "string"
    }
  ],
  "warnings": [
    {
      "field": "string",
      "message": "string"
    }
  ]
}

Hot Reload Behavior

Update Types

Type Behavior Sections
Immediate Applied instantly, no disruption logging, rate_limiting, circuit_breaker, retry, global_prompts, api_keys
Gradual Existing connections maintained, new connections use new config backends, health_checks, timeouts, fallback, files, metrics, admin, routing
Requires Restart Logged as warning, requires server restart server.bind_address, server.workers

Example Workflow

# 1. Check current configuration
curl -s http://localhost:8080/admin/config/logging | jq

# 2. Validate change
curl -X POST http://localhost:8080/admin/config/validate \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"section": "logging", "config": {"level": "debug"}}'

# 3. Apply change (immediate effect)
curl -X PATCH http://localhost:8080/admin/config/logging \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"config": {"level": "debug"}}'

# 4. Verify change
curl -s http://localhost:8080/admin/config/logging | jq '.config.level'

Error Handling

Error Response Format

{
  "error_code": "string",
  "message": "string",
  "details": {}
}

Error Codes

Code HTTP Status Description
VALIDATION_ERROR 400 Configuration validation failed
INVALID_SECTION 400 Unknown configuration section
PARSE_ERROR 400 Failed to parse configuration content
SECTION_NOT_FOUND 404 Section not found
VERSION_NOT_FOUND 404 History version not found
BACKEND_NOT_FOUND 404 Backend not found
BACKEND_EXISTS 409 Backend with name already exists
CONTENT_TOO_LARGE 413 Configuration content exceeds 1MB limit
INTERNAL_ERROR 500 Internal server error

Error Examples

// Validation Error
{
  "error_code": "VALIDATION_ERROR",
  "message": "Configuration validation failed",
  "details": {
    "errors": [
      {"field": "workers", "message": "workers must be greater than 0"}
    ]
  }
}

// Section Not Found
{
  "error_code": "SECTION_NOT_FOUND",
  "message": "Configuration section 'invalid' not found",
  "details": {
    "available_sections": ["server", "backends", "logging", "..."]
  }
}

// Backend Exists
{
  "error_code": "BACKEND_EXISTS",
  "message": "Backend 'openai' already exists",
  "details": {
    "existing_backend": "openai"
  }
}

Client SDK Examples

Python

import requests
from typing import Optional, Dict, Any, List
from dataclasses import dataclass


@dataclass
class ContinuumAdminClient:
    """Continuum Router Admin API Client"""

    base_url: str
    token: str

    def __post_init__(self):
        self.session = requests.Session()
        self.session.headers.update({
            "Authorization": f"Bearer {self.token}",
            "Content-Type": "application/json"
        })

    # Configuration Query APIs

    def get_full_config(self) -> Dict[str, Any]:
        """Get full configuration with masked sensitive data"""
        resp = self.session.get(f"{self.base_url}/admin/config/full")
        resp.raise_for_status()
        return resp.json()

    def get_sections(self) -> List[Dict[str, Any]]:
        """Get all configuration sections"""
        resp = self.session.get(f"{self.base_url}/admin/config/sections")
        resp.raise_for_status()
        return resp.json()["sections"]

    def get_section(self, section: str) -> Dict[str, Any]:
        """Get configuration for a specific section"""
        resp = self.session.get(f"{self.base_url}/admin/config/{section}")
        resp.raise_for_status()
        return resp.json()

    def get_schema(self, section: Optional[str] = None) -> Dict[str, Any]:
        """Get JSON schema for validation"""
        params = {"section": section} if section else {}
        resp = self.session.get(
            f"{self.base_url}/admin/config/schema",
            params=params
        )
        resp.raise_for_status()
        return resp.json()

    # Configuration Modification APIs

    def update_section(self, section: str, config: Dict[str, Any]) -> Dict[str, Any]:
        """Replace section configuration"""
        resp = self.session.put(
            f"{self.base_url}/admin/config/{section}",
            json={"config": config}
        )
        resp.raise_for_status()
        return resp.json()

    def patch_section(self, section: str, config: Dict[str, Any]) -> Dict[str, Any]:
        """Partial update section configuration"""
        resp = self.session.patch(
            f"{self.base_url}/admin/config/{section}",
            json={"config": config}
        )
        resp.raise_for_status()
        return resp.json()

    def validate_config(
        self,
        section: str,
        config: Dict[str, Any],
        dry_run: bool = True
    ) -> Dict[str, Any]:
        """Validate configuration without applying"""
        resp = self.session.post(
            f"{self.base_url}/admin/config/validate",
            json={"section": section, "config": config, "dry_run": dry_run}
        )
        resp.raise_for_status()
        return resp.json()

    def apply_config(
        self,
        sections: Optional[List[str]] = None,
        force: bool = False
    ) -> Dict[str, Any]:
        """Apply pending configuration changes"""
        body = {"force": force}
        if sections:
            body["sections"] = sections
        resp = self.session.post(
            f"{self.base_url}/admin/config/apply",
            json=body
        )
        resp.raise_for_status()
        return resp.json()

    # Configuration Save/Restore APIs

    def export_config(
        self,
        format: str = "yaml",
        sections: Optional[List[str]] = None,
        include_sensitive: bool = False
    ) -> str:
        """Export configuration in specified format"""
        body = {"format": format, "include_sensitive": include_sensitive}
        if sections:
            body["sections"] = sections
        resp = self.session.post(
            f"{self.base_url}/admin/config/export",
            json=body
        )
        resp.raise_for_status()
        return resp.json()["content"]

    def import_config(
        self,
        content: str,
        format: str = "yaml",
        apply: bool = True,
        dry_run: bool = False
    ) -> Dict[str, Any]:
        """Import configuration from content"""
        resp = self.session.post(
            f"{self.base_url}/admin/config/import",
            json={
                "format": format,
                "content": content,
                "apply": apply,
                "dry_run": dry_run
            }
        )
        resp.raise_for_status()
        return resp.json()

    def get_history(
        self,
        limit: int = 20,
        offset: int = 0,
        section: Optional[str] = None
    ) -> Dict[str, Any]:
        """Get configuration change history"""
        params = {"limit": limit, "offset": offset}
        if section:
            params["section"] = section
        resp = self.session.get(
            f"{self.base_url}/admin/config/history",
            params=params
        )
        resp.raise_for_status()
        return resp.json()

    def rollback(
        self,
        version: int,
        sections: Optional[List[str]] = None,
        dry_run: bool = False
    ) -> Dict[str, Any]:
        """Rollback to a previous version"""
        body = {"dry_run": dry_run}
        if sections:
            body["sections"] = sections
        resp = self.session.post(
            f"{self.base_url}/admin/config/rollback/{version}",
            json=body
        )
        resp.raise_for_status()
        return resp.json()

    # Backend Management APIs

    def list_backends(self) -> List[Dict[str, Any]]:
        """List all backends"""
        resp = self.session.get(f"{self.base_url}/admin/backends")
        resp.raise_for_status()
        return resp.json()["backends"]

    def get_backend(self, name: str) -> Dict[str, Any]:
        """Get backend configuration"""
        resp = self.session.get(f"{self.base_url}/admin/backends/{name}")
        resp.raise_for_status()
        return resp.json()

    def add_backend(
        self,
        name: str,
        url: str,
        weight: int = 1,
        models: Optional[List[str]] = None
    ) -> Dict[str, Any]:
        """Add a new backend"""
        body = {"name": name, "url": url, "weight": weight}
        if models:
            body["models"] = models
        resp = self.session.post(
            f"{self.base_url}/admin/backends",
            json=body
        )
        resp.raise_for_status()
        return resp.json()

    def update_backend(self, name: str, **kwargs) -> Dict[str, Any]:
        """Update backend configuration"""
        resp = self.session.put(
            f"{self.base_url}/admin/backends/{name}",
            json=kwargs
        )
        resp.raise_for_status()
        return resp.json()

    def delete_backend(self, name: str, force: bool = False) -> Dict[str, Any]:
        """Delete a backend"""
        params = {"force": str(force).lower()} if force else {}
        resp = self.session.delete(
            f"{self.base_url}/admin/backends/{name}",
            params=params
        )
        resp.raise_for_status()
        return resp.json()

    def update_backend_weight(self, name: str, weight: int) -> Dict[str, Any]:
        """Update backend weight"""
        resp = self.session.put(
            f"{self.base_url}/admin/backends/{name}/weight",
            json={"weight": weight}
        )
        resp.raise_for_status()
        return resp.json()

    def update_backend_models(
        self,
        name: str,
        models: List[str],
        append: bool = False
    ) -> Dict[str, Any]:
        """Update backend models"""
        resp = self.session.put(
            f"{self.base_url}/admin/backends/{name}/models",
            json={"models": models, "append": append}
        )
        resp.raise_for_status()
        return resp.json()


# Usage Example
if __name__ == "__main__":
    client = ContinuumAdminClient(
        base_url="http://localhost:8080",
        token="your-admin-token"
    )

    # Get current logging config
    logging_config = client.get_section("logging")
    print(f"Current log level: {logging_config['config']['level']}")

    # Update logging level
    result = client.patch_section("logging", {"level": "debug"})
    print(f"Updated: {result['success']}")

    # Add a new backend
    client.add_backend(
        name="new-ollama",
        url="http://192.168.1.100:11434",
        weight=2,
        models=["llama3.2", "mistral"]
    )

    # Export configuration backup
    backup = client.export_config(format="yaml")
    with open("config-backup.yaml", "w") as f:
        f.write(backup)

JavaScript/TypeScript

interface ConfigSection {
  name: string;
  config: Record<string, any>;
  hot_reload_capability: 'immediate' | 'gradual' | 'requires_restart';
}

interface HistoryEntry {
  version: number;
  timestamp: string;
  sections_changed: string[];
  source: string;
  user: string;
}

interface Backend {
  name: string;
  url: string;
  weight: number;
  models: string[];
  enabled: boolean;
  health_status: string;
}

class ContinuumAdminClient {
  private baseUrl: string;
  private token: string;

  constructor(baseUrl: string, token: string) {
    this.baseUrl = baseUrl;
    this.token = token;
  }

  private async request<T>(
    method: string,
    path: string,
    body?: any,
    params?: Record<string, string>
  ): Promise<T> {
    const url = new URL(`${this.baseUrl}${path}`);
    if (params) {
      Object.entries(params).forEach(([k, v]) => url.searchParams.set(k, v));
    }

    const response = await fetch(url.toString(), {
      method,
      headers: {
        'Authorization': `Bearer ${this.token}`,
        'Content-Type': 'application/json',
      },
      body: body ? JSON.stringify(body) : undefined,
    });

    if (!response.ok) {
      const error = await response.json();
      throw new Error(error.message || `HTTP ${response.status}`);
    }

    return response.json();
  }

  // Configuration Query APIs

  async getFullConfig(): Promise<any> {
    return this.request('GET', '/admin/config/full');
  }

  async getSections(): Promise<ConfigSection[]> {
    const result = await this.request<{ sections: ConfigSection[] }>(
      'GET', '/admin/config/sections'
    );
    return result.sections;
  }

  async getSection(section: string): Promise<ConfigSection> {
    return this.request('GET', `/admin/config/${section}`);
  }

  async getSchema(section?: string): Promise<any> {
    const params = section ? { section } : undefined;
    return this.request('GET', '/admin/config/schema', undefined, params);
  }

  // Configuration Modification APIs

  async updateSection(section: string, config: Record<string, any>): Promise<any> {
    return this.request('PUT', `/admin/config/${section}`, { config });
  }

  async patchSection(section: string, config: Record<string, any>): Promise<any> {
    return this.request('PATCH', `/admin/config/${section}`, { config });
  }

  async validateConfig(
    section: string,
    config: Record<string, any>,
    dryRun: boolean = true
  ): Promise<any> {
    return this.request('POST', '/admin/config/validate', {
      section,
      config,
      dry_run: dryRun,
    });
  }

  async applyConfig(sections?: string[], force: boolean = false): Promise<any> {
    return this.request('POST', '/admin/config/apply', { sections, force });
  }

  // Configuration Save/Restore APIs

  async exportConfig(
    format: 'yaml' | 'json' | 'toml' = 'yaml',
    sections?: string[],
    includeSensitive: boolean = false
  ): Promise<string> {
    const result = await this.request<{ content: string }>(
      'POST', '/admin/config/export',
      { format, sections, include_sensitive: includeSensitive }
    );
    return result.content;
  }

  async importConfig(
    content: string,
    format: 'yaml' | 'json' | 'toml' = 'yaml',
    apply: boolean = true,
    dryRun: boolean = false
  ): Promise<any> {
    return this.request('POST', '/admin/config/import', {
      format,
      content,
      apply,
      dry_run: dryRun,
    });
  }

  async getHistory(
    limit: number = 20,
    offset: number = 0,
    section?: string
  ): Promise<{ history: HistoryEntry[]; total_entries: number }> {
    const params: Record<string, string> = {
      limit: limit.toString(),
      offset: offset.toString(),
    };
    if (section) params.section = section;
    return this.request('GET', '/admin/config/history', undefined, params);
  }

  async rollback(
    version: number,
    sections?: string[],
    dryRun: boolean = false
  ): Promise<any> {
    return this.request('POST', `/admin/config/rollback/${version}`, {
      sections,
      dry_run: dryRun,
    });
  }

  // Backend Management APIs

  async listBackends(): Promise<Backend[]> {
    const result = await this.request<{ backends: Backend[] }>(
      'GET', '/admin/backends'
    );
    return result.backends;
  }

  async getBackend(name: string): Promise<Backend> {
    return this.request('GET', `/admin/backends/${name}`);
  }

  async addBackend(
    name: string,
    url: string,
    weight: number = 1,
    models?: string[]
  ): Promise<any> {
    return this.request('POST', '/admin/backends', {
      name,
      url,
      weight,
      models,
    });
  }

  async updateBackend(name: string, updates: Partial<Backend>): Promise<any> {
    return this.request('PUT', `/admin/backends/${name}`, updates);
  }

  async deleteBackend(name: string, force: boolean = false): Promise<any> {
    const params = force ? { force: 'true' } : undefined;
    return this.request('DELETE', `/admin/backends/${name}`, undefined, params);
  }

  async updateBackendWeight(name: string, weight: number): Promise<any> {
    return this.request('PUT', `/admin/backends/${name}/weight`, { weight });
  }

  async updateBackendModels(
    name: string,
    models: string[],
    append: boolean = false
  ): Promise<any> {
    return this.request('PUT', `/admin/backends/${name}/models`, {
      models,
      append,
    });
  }
}

// Usage Example
async function main() {
  const client = new ContinuumAdminClient(
    'http://localhost:8080',
    'your-admin-token'
  );

  // Get current logging config
  const loggingConfig = await client.getSection('logging');
  console.log(`Current log level: ${loggingConfig.config.level}`);

  // Update logging level
  const result = await client.patchSection('logging', { level: 'debug' });
  console.log(`Updated: ${result.success}`);

  // Add a new backend
  await client.addBackend('new-ollama', 'http://192.168.1.100:11434', 2, [
    'llama3.2',
    'mistral',
  ]);

  // Export configuration backup
  const backup = await client.exportConfig('yaml');
  console.log('Configuration exported');
}

main().catch(console.error);

Go

package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
    "net/url"
)

type ContinuumAdminClient struct {
    BaseURL string
    Token   string
    client  *http.Client
}

func NewClient(baseURL, token string) *ContinuumAdminClient {
    return &ContinuumAdminClient{
        BaseURL: baseURL,
        Token:   token,
        client:  &http.Client{},
    }
}

func (c *ContinuumAdminClient) request(method, path string, body interface{}) (map[string]interface{}, error) {
    var reqBody io.Reader
    if body != nil {
        jsonBody, err := json.Marshal(body)
        if err != nil {
            return nil, err
        }
        reqBody = bytes.NewBuffer(jsonBody)
    }

    req, err := http.NewRequest(method, c.BaseURL+path, reqBody)
    if err != nil {
        return nil, err
    }

    req.Header.Set("Authorization", "Bearer "+c.Token)
    req.Header.Set("Content-Type", "application/json")

    resp, err := c.client.Do(req)
    if err != nil {
        return nil, err
    }
    defer resp.Body.Close()

    var result map[string]interface{}
    if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
        return nil, err
    }

    if resp.StatusCode >= 400 {
        return nil, fmt.Errorf("HTTP %d: %v", resp.StatusCode, result)
    }

    return result, nil
}

// GetFullConfig retrieves the full configuration
func (c *ContinuumAdminClient) GetFullConfig() (map[string]interface{}, error) {
    return c.request("GET", "/admin/config/full", nil)
}

// GetSection retrieves a specific configuration section
func (c *ContinuumAdminClient) GetSection(section string) (map[string]interface{}, error) {
    return c.request("GET", "/admin/config/"+section, nil)
}

// PatchSection partially updates a configuration section
func (c *ContinuumAdminClient) PatchSection(section string, config map[string]interface{}) (map[string]interface{}, error) {
    return c.request("PATCH", "/admin/config/"+section, map[string]interface{}{
        "config": config,
    })
}

// AddBackend adds a new backend
func (c *ContinuumAdminClient) AddBackend(name, backendURL string, weight int, models []string) (map[string]interface{}, error) {
    return c.request("POST", "/admin/backends", map[string]interface{}{
        "name":   name,
        "url":    backendURL,
        "weight": weight,
        "models": models,
    })
}

// ExportConfig exports configuration in the specified format
func (c *ContinuumAdminClient) ExportConfig(format string) (string, error) {
    result, err := c.request("POST", "/admin/config/export", map[string]interface{}{
        "format": format,
    })
    if err != nil {
        return "", err
    }
    return result["content"].(string), nil
}

// GetHistory retrieves configuration change history
func (c *ContinuumAdminClient) GetHistory(limit int) (map[string]interface{}, error) {
    u, _ := url.Parse(c.BaseURL + "/admin/config/history")
    q := u.Query()
    q.Set("limit", fmt.Sprintf("%d", limit))
    u.RawQuery = q.Encode()

    return c.request("GET", u.Path+"?"+u.RawQuery, nil)
}

func main() {
    client := NewClient("http://localhost:8080", "your-admin-token")

    // Get current logging config
    config, _ := client.GetSection("logging")
    fmt.Printf("Current config: %v\n", config)

    // Update logging level
    result, _ := client.PatchSection("logging", map[string]interface{}{
        "level": "debug",
    })
    fmt.Printf("Update result: %v\n", result)

    // Add a new backend
    client.AddBackend("new-ollama", "http://192.168.1.100:11434", 2, []string{"llama3.2"})

    // Export configuration
    backup, _ := client.ExportConfig("yaml")
    fmt.Println("Configuration exported")
    fmt.Println(backup)
}

Best Practices

1. Always Validate Before Applying

# Step 1: Validate
curl -X POST http://localhost:8080/admin/config/validate \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"section": "logging", "config": {"level": "debug"}}'

# Step 2: Apply only if valid
curl -X PATCH http://localhost:8080/admin/config/logging \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"config": {"level": "debug"}}'

2. Use Dry Run for Imports

# Preview import changes
curl -X POST http://localhost:8080/admin/config/import \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "format": "yaml",
    "content": "...",
    "dry_run": true
  }'

3. Regular Configuration Backups

# Daily backup script
#!/bin/bash
DATE=$(date +%Y%m%d)
curl -s -X POST http://localhost:8080/admin/config/export \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"format": "yaml"}' | jq -r '.content' > "config-backup-$DATE.yaml"

4. Monitor Configuration History

# Check recent changes
curl -s http://localhost:8080/admin/config/history?limit=5 \
  -H "Authorization: Bearer $TOKEN" | jq '.history[] | {version, timestamp, sections_changed}'

5. Use Partial Updates (PATCH) for Minimal Changes

# Only update what's needed
curl -X PATCH http://localhost:8080/admin/config/rate_limiting \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"config": {"requests_per_minute": 200}}'

6. Test Configuration Changes in Staging First

# Example: Test configuration in staging before production
staging_client = ContinuumAdminClient("http://staging:8080", staging_token)
production_client = ContinuumAdminClient("http://production:8080", prod_token)

# Apply to staging first
staging_client.patch_section("rate_limiting", {"requests_per_minute": 500})

# Verify in staging
staging_config = staging_client.get_section("rate_limiting")
assert staging_config["config"]["requests_per_minute"] == 500

# Then apply to production
production_client.patch_section("rate_limiting", {"requests_per_minute": 500})

Security Considerations

1. Sensitive Data Handling

  • All API responses automatically mask sensitive fields (API keys, passwords, tokens)
  • Use include_sensitive: true in export only when absolutely necessary
  • Audit logs record when sensitive data is accessed

2. Authentication Best Practices

admin:
  auth:
    method: bearer_token
    token: "${ADMIN_TOKEN}"  # Use environment variables

  # Restrict access by IP
  ip_whitelist:
        - "10.0.0.0/8"      # Internal network only
        - "192.168.1.0/24"  # Office network

3. Audit Logging

All configuration changes are logged with: - Timestamp - User/source - Changed sections - Previous and new values (sensitive data masked)

4. Rate Limiting Admin Endpoints

Consider rate limiting admin endpoints to prevent abuse:

admin:
  rate_limit:
    requests_per_minute: 60
    burst: 10

5. Backup Before Major Changes

# Always backup before major changes
backup=$(curl -s -X POST http://localhost:8080/admin/config/export \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"format": "yaml"}' | jq -r '.content')

# Make changes...

# Restore if needed
curl -X POST http://localhost:8080/admin/config/import \
  -H "Authorization: Bearer $TOKEN" \
  -d "{\"format\": \"yaml\", \"content\": $(echo "$backup" | jq -Rs .)}"

Prompt File Management APIs

The Prompt File Management API allows you to manage system prompts stored in external Markdown files. This enables centralized management of system prompts without modifying the main configuration file.

List All Prompts

Get a list of all configured prompts with their sources and content.

GET /admin/config/prompts

Response

{
  "prompts": [
    {
      "id": "default",
      "prompt_type": "default",
      "source": "file",
      "file_path": "prompts/system.md",
      "content": "# System Prompt\n\nYou are a helpful assistant...",
      "loaded": true,
      "size_bytes": 1024
    },
    {
      "id": "anthropic",
      "prompt_type": "backend",
      "source": "file",
      "file_path": "prompts/anthropic.md",
      "content": "# Anthropic-specific prompt...",
      "loaded": true,
      "size_bytes": 512
    },
    {
      "id": "gpt-4",
      "prompt_type": "model",
      "source": "inline",
      "content": "You are GPT-4...",
      "size_bytes": 256
    }
  ],
  "total": 3,
  "prompts_directory": "./prompts"
}

Example

curl -s http://localhost:8080/admin/config/prompts \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Get Prompt File

Get content of a specific prompt file.

GET /admin/config/prompts/{path}

Path Parameters

Parameter Type Required Description
path string Yes Relative path to the prompt file

Response

{
  "path": "prompts/system.md",
  "content": "# System Prompt\n\nYou are a helpful assistant that follows company policies...",
  "size_bytes": 1024,
  "modified_at": 1702468200
}

Example

curl -s http://localhost:8080/admin/config/prompts/prompts/system.md \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Update Prompt File

Create or update a prompt file with new content.

PUT /admin/config/prompts/{path}

Request Body

{
  "content": "# Updated System Prompt\n\nYou are a helpful assistant that follows all company policies.\n\n## Security Guidelines\n\n- Never reveal internal system details\n- Follow data privacy regulations"
}

Response

{
  "success": true,
  "path": "prompts/system.md",
  "size_bytes": 245,
  "message": "Prompt file updated successfully"
}

Example

curl -X PUT http://localhost:8080/admin/config/prompts/prompts/system.md \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "# System Prompt\n\nYou are a helpful assistant."
  }'

Reload Prompt Files

Reload all prompt files from disk. Useful after manual file edits.

POST /admin/config/prompts/reload

Response

{
  "success": true,
  "reloaded_count": 3,
  "reloaded": [
    "prompts/system.md",
    "prompts/anthropic.md",
    "prompts/gpt4.md"
  ],
  "errors": [],
  "message": "Successfully reloaded 3 prompt file(s)"
}

Example

curl -X POST http://localhost:8080/admin/config/prompts/reload \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq

Configuration Example

To use external prompt files, configure global_prompts in your config file:

global_prompts:
  # Directory containing prompt files (relative to config directory)
  prompts_dir: "./prompts"

  # Default prompt from external file
  default_file: "system.md"

  # Or inline prompt (default_file takes precedence if both specified)
  # default: "You are a helpful assistant."

  # Backend-specific prompts
  backends:
    anthropic:
      prompt_file: "anthropic-system.md"
    openai:
      prompt: "OpenAI-specific inline prompt"

  # Model-specific prompts
  models:
    gpt-4:
      prompt_file: "gpt4-system.md"
    claude-3-opus:
      prompt_file: "claude-opus-system.md"

  merge_strategy: prepend

Security Considerations

  • Path Traversal Protection: All paths are validated to prevent directory traversal attacks (e.g., ../../../etc/passwd)
  • File Size Limits: Prompt files are limited to 1MB maximum
  • Relative Paths Only: Prompt files must be within the configured prompts_dir or config directory
  • Authentication Required: All prompt management endpoints require admin authentication

Appendix: Quick Reference

Configuration Sections

Section Hot Reload Description
server Restart Bind address, workers
backends Gradual Backend URLs, weights
health_checks Gradual Health monitoring
logging Immediate Log level, format
retry Immediate Retry policies
timeouts Gradual Request timeouts
rate_limiting Immediate Rate limits
circuit_breaker Immediate Circuit breaker
global_prompts Immediate System prompts
fallback Gradual Model fallback
files Gradual Files API
api_keys Immediate API keys
metrics Gradual Prometheus metrics
admin Gradual Admin settings
admin.stats Immediate Stats collection settings
routing Gradual Routing rules
prefix_routing Immediate Prefix-aware KV cache routing
response_cache Immediate Response cache settings
kv_cache_index Requires restart KV cache index backend and event sources

HTTP Status Codes

Code Meaning
200 Success
400 Bad Request (validation error)
401 Unauthorized
403 Forbidden
404 Not Found
409 Conflict
413 Payload Too Large
500 Internal Server Error

Common curl Commands

# Get full config
curl -s http://localhost:8080/admin/config/full -H "Authorization: Bearer $TOKEN"

# Update logging level
curl -X PATCH http://localhost:8080/admin/config/logging \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -d '{"config": {"level": "debug"}}'

# Add backend
curl -X POST http://localhost:8080/admin/backends \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -d '{"name": "new", "url": "http://host:port", "weight": 1}'

# Export config
curl -X POST http://localhost:8080/admin/config/export \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -d '{"format": "yaml"}'

# View history
curl -s http://localhost:8080/admin/config/history -H "Authorization: Bearer $TOKEN"

# Rollback
curl -X POST http://localhost:8080/admin/config/rollback/5 \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -d '{}'

# List API keys (masked)
curl -s http://localhost:8080/admin/api-keys -H "Authorization: Bearer $TOKEN"

# Create an API key (full value returned once)
curl -X POST http://localhost:8080/admin/api-keys \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -d '{"id": "key-1", "user_id": "user-1", "organization_id": "org-1", "scopes": ["read", "write"]}'

# Rotate an API key
curl -X POST http://localhost:8080/admin/api-keys/key-1/rotate -H "Authorization: Bearer $TOKEN"

# Disable / enable an API key
curl -X POST http://localhost:8080/admin/api-keys/key-1/disable -H "Authorization: Bearer $TOKEN"
curl -X POST http://localhost:8080/admin/api-keys/key-1/enable -H "Authorization: Bearer $TOKEN"

# Revoke an API key
curl -X DELETE http://localhost:8080/admin/api-keys/key-1 -H "Authorization: Bearer $TOKEN"

# Per-API-key and per-user usage statistics
curl -s http://localhost:8080/admin/stats/api-keys -H "Authorization: Bearer $TOKEN"
curl -s http://localhost:8080/admin/stats/users -H "Authorization: Bearer $TOKEN"