Admin REST API Reference¶
This document covers the Continuum Router Admin REST API for developers building configuration control applications. The Configuration Management API supports runtime configuration viewing, modification, and management without server restarts.
Table of Contents¶
- Overview
- Authentication
- Base URL and Headers
- Configuration Query APIs
- Configuration Modification APIs
- Configuration Save/Restore APIs
- Backend Management APIs
- API Key Management APIs
- Statistics APIs
- Response Cache Admin APIs
- KV Cache Index Admin APIs
- Smart Routing Admin APIs
- Guardrail Admin APIs
- Data Models
- Hot Reload Behavior
- Error Handling
- Client SDK Examples
- Best Practices
- Security Considerations
Overview¶
The Admin REST API provides programmatic access to Continuum Router's configuration system, enabling:
- Real-time Configuration Viewing: Retrieve current configuration with automatic sensitive data masking
- Dynamic Configuration Updates: Modify configuration sections without server restart
- Configuration Versioning: Track changes with full history and rollback capabilities
- Backend Management: Add, remove, and modify backends dynamically
- Export/Import: Save and restore configurations in multiple formats (YAML, JSON, TOML)
Key Features¶
| Feature | Description |
|---|---|
| Hot Reload | Changes applied immediately or gradually based on section type |
| Sensitive Masking | API keys, passwords, and tokens automatically masked in responses |
| Validation | All changes validated before application with dry-run support |
| Audit Logging | All modifications logged for security and compliance |
| History Tracking | Up to 100 configuration versions maintained for rollback |
Authentication¶
All Admin API endpoints require authentication via the Admin Auth system.
Authentication Methods¶
1. Bearer Token¶
2. Basic Authentication¶
3. API Key Header¶
Configuration¶
Configure admin authentication in config.yaml:
admin:
auth:
method: bearer_token # Options: none, bearer_token, basic, api_key
token: "${ADMIN_TOKEN}" # Environment variable supported
# For basic auth:
# username: admin
# password: "${ADMIN_PASSWORD}"
# IP whitelist (optional)
ip_whitelist:
- "127.0.0.1"
- "10.0.0.0/8"
# Configurable limits
max_history_entries: 100
max_backend_name_length: 256
Base URL and Headers¶
Base URL¶
Common Request Headers¶
Common Response Headers¶
Configuration Query APIs¶
Get Full Configuration¶
Retrieve the complete configuration with sensitive information masked.
Response¶
{
"config": {
"server": {
"bind_address": "0.0.0.0:8080",
"workers": 4
},
"backends": [
{
"name": "openai",
"url": "https://api.openai.com",
"api_key": "sk-***abcd",
"weight": 1
}
],
"logging": {
"level": "info"
},
"rate_limiting": {
"enabled": true,
"requests_per_minute": 100
}
},
"hot_reload_enabled": true,
"last_modified": "2025-12-13T10:30:00Z"
}
Example¶
List Configuration Sections¶
Get all available configuration sections with their hot reload capabilities.
Response¶
{
"sections": [
{
"name": "server",
"description": "Server configuration including bind address and workers",
"hot_reload_capability": "requires_restart"
},
{
"name": "backends",
"description": "Backend server configurations",
"hot_reload_capability": "gradual"
},
{
"name": "logging",
"description": "Logging configuration",
"hot_reload_capability": "immediate"
},
{
"name": "rate_limiting",
"description": "Rate limiting configuration",
"hot_reload_capability": "immediate"
},
{
"name": "circuit_breaker",
"description": "Circuit breaker configuration",
"hot_reload_capability": "immediate"
},
{
"name": "retry",
"description": "Retry policy configuration",
"hot_reload_capability": "immediate"
},
{
"name": "timeouts",
"description": "Timeout configuration",
"hot_reload_capability": "gradual"
},
{
"name": "health_checks",
"description": "Health check configuration",
"hot_reload_capability": "gradual"
},
{
"name": "global_prompts",
"description": "Global prompt injection configuration",
"hot_reload_capability": "immediate"
},
{
"name": "fallback",
"description": "Model fallback configuration",
"hot_reload_capability": "gradual"
},
{
"name": "files",
"description": "Files API configuration",
"hot_reload_capability": "gradual"
},
{
"name": "api_keys",
"description": "API keys configuration",
"hot_reload_capability": "immediate"
},
{
"name": "metrics",
"description": "Metrics and monitoring configuration",
"hot_reload_capability": "gradual"
},
{
"name": "admin",
"description": "Admin API configuration",
"hot_reload_capability": "gradual"
},
{
"name": "routing",
"description": "Request routing configuration",
"hot_reload_capability": "gradual"
}
]
}
Example¶
curl -s http://localhost:8080/admin/config/sections \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq '.sections[].name'
Get Section Configuration¶
Retrieve configuration for a specific section.
Path Parameters¶
| Parameter | Type | Required | Description |
|---|---|---|---|
section |
string | Yes | Section name (see list above) |
Response¶
{
"section": "logging",
"config": {
"level": "info",
"format": "json",
"file": "/var/log/continuum-router.log"
},
"hot_reload_capability": "immediate",
"description": "Logging configuration"
}
Example¶
# Get logging configuration
curl -s http://localhost:8080/admin/config/logging \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq
# Get backends configuration
curl -s http://localhost:8080/admin/config/backends \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq
Get Configuration Schema¶
Retrieve JSON Schema for configuration validation.
Query Parameters¶
| Parameter | Type | Required | Description |
|---|---|---|---|
section |
string | No | Get schema for specific section only |
Response¶
{
"schema": {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"server": {
"type": "object",
"properties": {
"bind_address": {
"type": "string",
"pattern": "^[^:]+:[0-9]+$",
"description": "Server bind address in host:port format"
},
"workers": {
"type": "integer",
"minimum": 1,
"description": "Number of worker threads"
}
}
},
"logging": {
"type": "object",
"properties": {
"level": {
"type": "string",
"enum": ["trace", "debug", "info", "warn", "error"]
}
}
}
}
}
}
Example¶
# Get full schema
curl -s http://localhost:8080/admin/config/schema \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq
# Get schema for specific section
curl -s "http://localhost:8080/admin/config/schema?section=logging" \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq
Configuration Modification APIs¶
Replace Section Configuration¶
Replace entire section configuration with new values.
Request Body¶
Response¶
{
"success": true,
"message": "Configuration updated successfully",
"version": 5,
"hot_reload_capability": "immediate",
"applied": true,
"warnings": []
}
Example¶
# Update logging level to debug
curl -X PUT http://localhost:8080/admin/config/logging \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"config": {
"level": "debug"
}
}'
Partial Update Section¶
Apply partial updates using JSON merge patch semantics.
Request Body¶
Only specified fields are updated; other fields remain unchanged.
Response¶
{
"success": true,
"message": "Configuration partially updated",
"version": 6,
"hot_reload_capability": "immediate",
"applied": true,
"merged_config": {
"level": "warn",
"format": "json",
"file": "/var/log/continuum-router.log"
}
}
Example¶
# Update only rate limit value
curl -X PATCH http://localhost:8080/admin/config/rate_limiting \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"config": {
"requests_per_minute": 200
}
}'
Validate Configuration¶
Validate configuration changes without applying them.
Request Body¶
{
"section": "server",
"config": {
"bind_address": "0.0.0.0:9090",
"workers": 8
},
"dry_run": true
}
Response (Valid)¶
{
"valid": true,
"errors": [],
"warnings": [
{
"field": "bind_address",
"message": "Changing bind_address requires server restart"
}
],
"hot_reload_capability": "requires_restart"
}
Response (Invalid)¶
{
"valid": false,
"errors": [
{
"field": "workers",
"message": "workers must be greater than 0",
"code": "VALIDATION_ERROR"
}
],
"warnings": []
}
Example¶
# Validate before applying
curl -X POST http://localhost:8080/admin/config/validate \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"section": "rate_limiting",
"config": {
"enabled": true,
"requests_per_minute": 500
}
}'
Apply Configuration¶
Apply pending configuration changes immediately (trigger hot reload).
Request Body¶
| Field | Type | Required | Description |
|---|---|---|---|
sections |
array | No | Specific sections to apply (default: all pending) |
force |
boolean | No | Force apply even with warnings (default: false) |
Response¶
{
"success": true,
"applied_sections": ["logging", "rate_limiting"],
"version": 7,
"results": {
"logging": {
"status": "applied",
"hot_reload_type": "immediate"
},
"rate_limiting": {
"status": "applied",
"hot_reload_type": "immediate"
}
}
}
Example¶
curl -X POST http://localhost:8080/admin/config/apply \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"sections": ["logging"]
}'
Configuration Save/Restore APIs¶
Export Configuration¶
Export current configuration in specified format.
Request Body¶
{
"format": "yaml",
"sections": ["server", "backends", "logging"],
"include_sensitive": false,
"include_defaults": true
}
| Field | Type | Required | Description |
|---|---|---|---|
format |
string | Yes | Output format: yaml, json, or toml |
sections |
array | No | Sections to export (default: all) |
include_sensitive |
boolean | No | Include unmasked sensitive data (default: false) |
include_defaults |
boolean | No | Include default values (default: true) |
Response¶
{
"format": "yaml",
"content": "server:\n bind_address: \"0.0.0.0:8080\"\n workers: 4\n\nbackends:\n - name: openai\n url: https://api.openai.com\n api_key: \"sk-***abcd\"\n",
"exported_at": "2025-12-13T10:30:00Z",
"sections_exported": ["server", "backends", "logging"]
}
Example¶
# Export as YAML
curl -X POST http://localhost:8080/admin/config/export \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"format": "yaml"}' | jq -r '.content' > config-backup.yaml
# Export as JSON
curl -X POST http://localhost:8080/admin/config/export \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"format": "json"}' | jq -r '.content' > config-backup.json
# Export specific sections
curl -X POST http://localhost:8080/admin/config/export \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"format": "yaml",
"sections": ["backends", "rate_limiting"]
}'
Import Configuration¶
Import and apply configuration from content.
Request Body¶
{
"format": "yaml",
"content": "logging:\n level: info\n format: json\n",
"apply": true,
"dry_run": false,
"merge": true
}
| Field | Type | Required | Description |
|---|---|---|---|
format |
string | Yes | Content format: yaml, json, or toml |
content |
string | Yes | Configuration content (max 1MB) |
apply |
boolean | No | Apply after validation (default: true) |
dry_run |
boolean | No | Validate only without applying (default: false) |
merge |
boolean | No | Merge with existing config (default: false) |
Response¶
{
"success": true,
"message": "Configuration imported and applied",
"version": 8,
"validation": {
"valid": true,
"errors": [],
"warnings": []
},
"sections_imported": ["logging"],
"applied": true
}
Example¶
# Import from file
curl -X POST http://localhost:8080/admin/config/import \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d "{
\"format\": \"yaml\",
\"content\": $(cat config-backup.yaml | jq -Rs .),
\"apply\": true
}"
# Dry run import
curl -X POST http://localhost:8080/admin/config/import \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"format": "yaml",
"content": "logging:\n level: debug\n",
"dry_run": true
}'
Get Configuration History¶
View configuration change history.
Query Parameters¶
| Parameter | Type | Required | Description |
|---|---|---|---|
limit |
integer | No | Number of entries to return (default: 20, max: 100) |
offset |
integer | No | Number of entries to skip (default: 0) |
section |
string | No | Filter by section name |
Response¶
{
"history": [
{
"version": 8,
"timestamp": "2025-12-13T10:30:00Z",
"sections_changed": ["logging"],
"source": "api",
"user": "admin",
"description": "Updated logging level to debug",
"rollback_available": true
},
{
"version": 7,
"timestamp": "2025-12-13T10:25:00Z",
"sections_changed": ["rate_limiting"],
"source": "api",
"user": "admin",
"description": "Increased rate limit to 200 rpm",
"rollback_available": true
},
{
"version": 6,
"timestamp": "2025-12-13T09:00:00Z",
"sections_changed": ["backends"],
"source": "file_reload",
"user": "system",
"description": "Configuration file changed",
"rollback_available": true
}
],
"total_entries": 8,
"current_version": 8
}
Example¶
# Get recent history
curl -s http://localhost:8080/admin/config/history \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq
# Get history for specific section
curl -s "http://localhost:8080/admin/config/history?section=backends&limit=10" \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq
Rollback Configuration¶
Rollback to a previous configuration version.
Path Parameters¶
| Parameter | Type | Required | Description |
|---|---|---|---|
version |
integer | Yes | Version number to rollback to |
Request Body¶
| Field | Type | Required | Description |
|---|---|---|---|
sections |
array | No | Specific sections to rollback (default: all changed) |
dry_run |
boolean | No | Preview without applying (default: false) |
Response¶
{
"success": true,
"message": "Rolled back to version 5",
"previous_version": 8,
"new_version": 9,
"sections_rolled_back": ["logging", "rate_limiting"],
"changes": {
"logging": {
"level": {
"from": "debug",
"to": "info"
}
}
}
}
Example¶
# Rollback to version 5
curl -X POST http://localhost:8080/admin/config/rollback/5 \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{}'
# Preview rollback (dry run)
curl -X POST http://localhost:8080/admin/config/rollback/5 \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"dry_run": true}'
Backend Management APIs¶
Add Backend¶
Add a new backend dynamically.
Request Body¶
{
"name": "new-ollama",
"url": "http://192.168.1.100:11434",
"weight": 1,
"models": ["llama3.2", "mistral"],
"api_key": "optional-key",
"enabled": true,
"health_check": {
"enabled": true,
"path": "/v1/models"
}
}
| Field | Type | Required | Description |
|---|---|---|---|
name |
string | Yes | Unique backend name (alphanumeric, -, _) |
type |
string | No | Backend type: openai, azure, vllm, ollama, anthropic, gemini, llamacpp, generic. Default: generic (auto-detect) |
url |
string | Yes | Backend URL (http:// or https://) |
weight |
integer | No | Load balancing weight (default: 1) |
models |
array | No | List of models served by this backend |
api_key |
string | No | API key for backend authentication |
enabled |
boolean | No | Whether backend is enabled (default: true) |
Backend Type Auto-Detection¶
When type is not specified or set to generic, the router automatically probes the backend's /v1/models endpoint to detect the backend type. Currently supports auto-detection of:
- llama.cpp: Identified by
owned_by: "llamacpp"or llama.cpp-specific metadata fields
llama.cpp backends can therefore be added without explicit type configuration:
# llama.cpp backend - type auto-detected
curl -X POST http://localhost:8080/admin/backends \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "local-llama",
"url": "http://localhost:8080"
}'
Response¶
{
"success": true,
"message": "Backend 'new-ollama' added successfully",
"backend": {
"name": "new-ollama",
"url": "http://192.168.1.100:11434",
"weight": 1,
"models": ["llama3.2", "mistral"],
"enabled": true,
"health_status": "unknown"
}
}
Example¶
curl -X POST http://localhost:8080/admin/backends \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "new-backend",
"url": "http://192.168.1.100:11434",
"weight": 2,
"models": ["llama3.2"]
}'
Get Backend¶
Get configuration for a specific backend.
Response¶
{
"name": "openai",
"url": "https://api.openai.com",
"api_key": "sk-***abcd",
"weight": 1,
"models": ["gpt-4", "gpt-3.5-turbo"],
"enabled": true,
"health_status": "healthy",
"stats": {
"total_requests": 1250,
"failed_requests": 12,
"average_latency_ms": 150,
"last_used": "2025-12-13T10:29:55Z"
}
}
Example¶
Update Backend¶
Update backend configuration.
Request Body¶
{
"url": "https://api.openai.com",
"weight": 2,
"models": ["gpt-4", "gpt-4-turbo", "gpt-3.5-turbo"],
"enabled": true
}
Response¶
{
"success": true,
"message": "Backend 'openai' updated successfully",
"backend": {
"name": "openai",
"url": "https://api.openai.com",
"weight": 2,
"models": ["gpt-4", "gpt-4-turbo", "gpt-3.5-turbo"],
"enabled": true
}
}
Example¶
curl -X PUT http://localhost:8080/admin/backends/openai \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"weight": 3,
"models": ["gpt-4", "gpt-4-turbo"]
}'
Delete Backend¶
Remove a backend from the router.
Query Parameters¶
| Parameter | Type | Required | Description |
|---|---|---|---|
force |
boolean | No | Force delete even if backend has active connections |
Response¶
{
"success": true,
"message": "Backend 'old-backend' removed successfully",
"removed_backend": "old-backend"
}
Notes¶
- Deleting the last backend is allowed: The router can operate with zero backends configured. When the last backend is deleted:
/v1/modelsreturns an empty list- Routing requests return 503 "No backends available"
- New backends can be added via
POST /admin/backends
Example¶
curl -X DELETE http://localhost:8080/admin/backends/old-backend \
-H "Authorization: Bearer $ADMIN_TOKEN"
# Force delete
curl -X DELETE "http://localhost:8080/admin/backends/old-backend?force=true" \
-H "Authorization: Bearer $ADMIN_TOKEN"
Update Backend Weight¶
Update only the backend weight for load balancing.
Request Body¶
Response¶
{
"success": true,
"message": "Backend 'openai' weight updated to 5",
"previous_weight": 2,
"new_weight": 5
}
Example¶
curl -X PUT http://localhost:8080/admin/backends/openai/weight \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"weight": 5}'
Update Backend Models¶
Update the model list for a backend.
Request Body¶
| Field | Type | Required | Description |
|---|---|---|---|
models |
array | Yes | List of model names |
append |
boolean | No | Append to existing list (default: false, replaces) |
Response¶
{
"success": true,
"message": "Backend 'openai' models updated",
"models": ["gpt-4", "gpt-4-turbo", "gpt-4o", "gpt-3.5-turbo"]
}
Example¶
# Replace models
curl -X PUT http://localhost:8080/admin/backends/openai/models \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"models": ["gpt-4", "gpt-4o"]}'
# Append models
curl -X PUT http://localhost:8080/admin/backends/openai/models \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"models": ["gpt-4.5-turbo"], "append": true}'
API Key Management APIs¶
The API Key Management APIs let you issue, inspect, update, rotate, enable, disable, and revoke per-user API keys at runtime. All eight endpoints are mounted under /admin/api-keys and require the same admin authentication as the rest of the Admin API.
These endpoints operate on the same key store that authenticates incoming client requests. A key created here is immediately usable by a client through the Authorization: Bearer <key> header, subject to the configured authentication mode (see Authentication Mode and Client Usage below).
API Key Object¶
Each API key is described by an ApiKeyConfig record. The fields below are configurable inline in config.yaml, in an external keys file, or through the create/update endpoints.
| Field | Type | Description |
|---|---|---|
key |
string | The secret key value. Generated cryptographically (format sk-<base64url>) when not supplied. Never returned in full except once at creation or rotation; elsewhere it is masked. |
id |
string | Unique identifier for the key (1–128 chars). Used in every /admin/api-keys/{id} path. |
user_id |
string | Associated user identifier (1–128 chars). Surfaced in per-user usage statistics. |
organization_id |
string | Associated organization identifier (1–128 chars). |
name |
string or absent | Optional human-readable label (max 256 chars). |
description |
string or absent | Optional notes about the key (max 1024 chars). |
scopes |
array of strings | Permissions granted to the key. Common values: read, write, files, admin. At least one scope is required when creating a key. |
rate_limit |
integer or absent | Optional per-key rate limit in requests per minute. Overrides the global limit for this key. |
enabled |
boolean | Whether the key is active. A disabled key fails authentication even before expiry is checked. |
created_at |
string (ISO 8601) | Creation timestamp. |
expires_at |
string (ISO 8601) or absent | Optional expiration timestamp. A key past this instant is automatically invalid regardless of enabled. |
annotations |
object (string to string) or absent | Free-form metadata map. Recommended canonical keys: email, uuid, owner, team, environment. An operator-configured allowlist of annotation keys is exported as labels on the api_key_info Prometheus metric (values are sanitized). |
allowed_backends |
array of strings or absent | Per-key backend allow-list. When non-empty, requests authenticated with this key may only route to backends whose name appears here. Empty or absent means no restriction. Matching is exact and case-sensitive; unservable requests are rejected with 403 Forbidden. |
A key is considered valid when it is enabled and not past expires_at. The listing endpoint reports active, expired, and disabled counts derived from these rules.
Key Masking¶
The full key value is returned exactly once: in the response to POST /admin/api-keys (creation) and POST /admin/api-keys/{id}/rotate (rotation). Every other response returns a masked_key of the form sk-***abcd, preserving the sk- prefix and the last four characters. Logs always use the masked form.
Authentication Mode and Client Usage¶
The api_keys.mode setting controls how the router treats client requests that lack a valid key:
| Mode | Behavior |
|---|---|
permissive (default) |
Requests with a valid key are authenticated and attributed; requests without a key are still allowed through. Use this for incremental rollout. |
blocking |
Every API request must carry a valid key. Requests without one receive 401 Unauthorized. |
Set the mode in config.yaml:
api_keys:
mode: blocking # "permissive" (default) | "blocking"
persistence_file: ~/.config/continuum-router/runtime-keys.yaml
api_keys:
- key: "sk-prod-..."
id: "key-1"
user_id: "user-1"
organization_id: "org-1"
scopes: ["read", "write"]
A client authenticates by sending the issued key as a bearer token:
curl http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer sk-the-issued-key-value" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'
The mode setting hot-reloads: switching between permissive and blocking takes effect without a restart.
Persistence and Hot Reload¶
Keys created or modified through these endpoints live in the in-memory key store. When api_keys.persistence_file is set, runtime changes are written to that file (tilde expansion is supported) and restored on the next startup, so admin-created keys survive restarts. Without persistence_file, runtime keys are in-memory only and lost on restart. Keys loaded from inline config or api_keys_file are read-only sources and are reloaded on config hot-reload.
List API Keys¶
Returns every API key with its value masked, plus a summary of active, expired, and disabled counts.
Response¶
{
"keys": [
{
"id": "key-1",
"masked_key": "sk-***A1aB",
"user_id": "user-1",
"organization_id": "org-1",
"name": "Production key",
"scopes": ["read", "write"],
"rate_limit": 600,
"is_active": true,
"expires_at": null,
"created_at": "2026-03-05T10:30:00Z",
"is_expired": false,
"allowed_backends": ["openai", "anthropic"]
}
],
"summary": {
"total": 1,
"active": 1,
"expired": 0,
"disabled": 0
}
}
Example¶
Create API Key¶
Creates a new API key. If key is omitted, the router generates a cryptographically random value. The full key value is returned only in this response.
Request Body¶
{
"id": "key-acme-1",
"user_id": "user-acme",
"organization_id": "org-acme",
"name": "Acme integration",
"description": "Server-to-server key for the Acme integration",
"scopes": ["read", "write"],
"rate_limit": 600,
"enabled": true,
"expires_at": "2027-01-01T00:00:00Z",
"allowed_backends": ["openai"]
}
| Field | Type | Required | Description |
|---|---|---|---|
id |
string | Yes | Unique key identifier (1–128 chars). |
user_id |
string | Yes | Associated user identifier (must be non-empty). |
organization_id |
string | Yes | Associated organization identifier (must be non-empty). |
key |
string | No | Custom key value. A new value is generated when omitted. |
name |
string | No | Human-readable label (max 256 chars). |
description |
string | No | Notes about the key (max 1024 chars). |
scopes |
array | No | Permissions; defaults to ["read", "write"]. Must contain at least one scope. |
rate_limit |
integer | No | Per-key rate limit in requests per minute. |
enabled |
boolean | No | Whether the key is active; defaults to true. |
expires_at |
string (ISO 8601) | No | Expiration timestamp. |
allowed_backends |
array | No | Per-key backend allow-list. Empty or omitted means unrestricted. |
Response¶
Returns 201 Created. The key field is the full value and is shown only here.
{
"key": "sk-G7q2...full-value...A1",
"masked_key": "sk-***A1aB",
"id": "key-acme-1",
"user_id": "user-acme",
"organization_id": "org-acme",
"name": "Acme integration",
"scopes": ["read", "write"],
"rate_limit": 600,
"enabled": true,
"created_at": "2026-03-05T10:30:00Z",
"expires_at": "2027-01-01T00:00:00Z",
"allowed_backends": ["openai"]
}
Error Responses¶
400 Bad Request: emptyuser_id/organization_id, no scopes, or aname/descriptionover the length limit.409 Conflict: a key with the sameidalready exists.507 Insufficient Storage: the maximum key count (10,000) has been reached.
Example¶
curl -X POST http://localhost:8080/admin/api-keys \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"id": "key-acme-1",
"user_id": "user-acme",
"organization_id": "org-acme",
"scopes": ["read", "write"],
"rate_limit": 600
}'
Get API Key¶
Returns a single key by id, with its value masked.
Response¶
{
"id": "key-acme-1",
"masked_key": "sk-***A1aB",
"user_id": "user-acme",
"organization_id": "org-acme",
"name": "Acme integration",
"scopes": ["read", "write"],
"rate_limit": 600,
"is_active": true,
"created_at": "2026-03-05T10:30:00Z",
"expires_at": "2027-01-01T00:00:00Z",
"is_expired": false,
"is_valid": true,
"allowed_backends": ["openai"]
}
The is_active, is_expired, and is_valid fields are computed: is_valid is true only when the key is active and not expired.
Error Responses¶
404 Not Found: no key with the givenid.
Example¶
curl -s http://localhost:8080/admin/api-keys/key-acme-1 \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq
Update API Key¶
Updates one or more properties of an existing key. Only the fields present in the body are changed; omitted fields are left untouched. The key value itself is not changed by this endpoint (use Rotate for that).
Request Body¶
{
"name": "Acme integration (renamed)",
"scopes": ["read"],
"rate_limit": 300,
"enabled": true,
"expires_at": "2027-06-01T00:00:00Z",
"allowed_backends": ["openai", "anthropic"]
}
| Field | Type | Description |
|---|---|---|
name |
string | New label. |
scopes |
array | Replacement scope list. |
rate_limit |
integer | New per-key rate limit. |
enabled |
boolean | Enable or disable the key. |
expires_at |
string (ISO 8601) | New expiration timestamp. |
allowed_backends |
array | Backend allow-list. null (omitted) leaves it unchanged; an empty array clears all restrictions; a non-empty array replaces the list. |
Response¶
{
"success": true,
"action": "update",
"key": {
"id": "key-acme-1",
"masked_key": "sk-***A1aB",
"user_id": "user-acme",
"organization_id": "org-acme",
"name": "Acme integration (renamed)",
"scopes": ["read"],
"rate_limit": 300,
"is_active": true,
"created_at": "2026-03-05T10:30:00Z",
"expires_at": "2027-06-01T00:00:00Z",
"is_valid": true,
"allowed_backends": ["openai", "anthropic"]
}
}
Error Responses¶
404 Not Found: no key with the givenid.
Example¶
curl -X PUT http://localhost:8080/admin/api-keys/key-acme-1 \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"rate_limit": 300, "scopes": ["read"]}'
Delete API Key¶
Permanently revokes and removes a key. After deletion, any client still presenting the old value fails authentication. This action is irreversible.
Response¶
Error Responses¶
404 Not Found: no key with the givenid.
Example¶
curl -X DELETE http://localhost:8080/admin/api-keys/key-acme-1 \
-H "Authorization: Bearer $ADMIN_TOKEN"
Rotate API Key¶
Generates a new secret value for an existing key while preserving its id and all other properties. The previous value stops working immediately. The new value is returned only in this response.
Response¶
{
"success": true,
"action": "rotate",
"id": "key-acme-1",
"new_key": "sk-Hq9z...new-full-value...B2",
"masked_key": "sk-***B2cD",
"warning": "Store this key securely. It will not be shown again."
}
Error Responses¶
404 Not Found: no key with the givenid.
Example¶
curl -X POST http://localhost:8080/admin/api-keys/key-acme-1/rotate \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq
Enable API Key¶
Marks a key as active. A re-enabled key authenticates again, provided it has not expired.
Response¶
Error Responses¶
404 Not Found: no key with the givenid.
Example¶
curl -X POST http://localhost:8080/admin/api-keys/key-acme-1/enable \
-H "Authorization: Bearer $ADMIN_TOKEN"
Disable API Key¶
Marks a key as inactive without deleting it. A disabled key fails authentication but keeps its configuration, so it can be re-enabled later. Use this for a reversible suspension instead of Delete.
Response¶
Error Responses¶
404 Not Found: no key with the givenid.
Example¶
curl -X POST http://localhost:8080/admin/api-keys/key-acme-1/disable \
-H "Authorization: Bearer $ADMIN_TOKEN"
Statistics APIs¶
The Statistics APIs expose aggregated request metrics collected by the StatsCollector. All endpoints are mounted under /admin/stats and share the same authentication as the rest of the Admin API. Alongside the overall, per-model, and per-backend breakdowns, the collector also tracks per-API-key and per-user usage (see Per-API-Key and Per-User Statistics).
Stats collection is enabled by default. It can be configured or disabled via the admin.stats section in your YAML config:
admin:
stats:
enabled: true # Enable/disable collection (default: true)
retention_window: 24h # Ring-buffer retention for windowed queries (default: 24h)
token_tracking: true # Parse response bodies for token usage (default: true)
persistence:
enabled: true # Enable stats persistence across restarts (default: true)
path: ./data/stats.json # File path for the snapshot (default: ./data/stats.json)
snapshot_interval: 5m # How often to write periodic snapshots (default: 5m)
max_age: 7d # Discard snapshots older than this on startup (default: 7d)
The retention_window and token_tracking settings support hot-reload: changes are applied immediately without a restart.
Stats Persistence¶
When the persistence subsection is present and enabled is true, the router saves a statistics snapshot to disk periodically and restores it on startup. This ensures that request counters, per-model breakdowns, and the latency ring buffer survive restarts.
How it works:
- On startup, the router reads the snapshot file and restores all counters and ring-buffer records. Uptime resets to zero on each restart.
- A background task writes a new snapshot every
snapshot_interval. Writes are atomic (temp file + rename) to prevent corruption. - On graceful shutdown (SIGTERM/SIGINT), a final snapshot is saved before the process exits.
- If the snapshot file is missing, corrupted, or older than
max_age, the router starts with fresh counters and logs a warning or info message.
Supported duration formats for snapshot_interval and max_age:
| Format | Example | Meaning |
|---|---|---|
Xs |
30s |
30 seconds |
Xm |
5m |
5 minutes |
Xh |
1h |
1 hour |
Xd |
7d |
7 days |
Set max_age to "0" or "" to disable staleness checks (always restore regardless of age).
Get Full Statistics¶
Returns overall, per-model, and per-backend statistics.
Query Parameters¶
| Parameter | Type | Description |
|---|---|---|
window |
string | Optional time window filter. Accepted formats: 30m, 1h, 24h, 7d. Omit for all-time totals. |
Response¶
{
"uptime_seconds": 3600,
"window": "all",
"overall": {
"total_requests": 1500,
"successful_requests": 1480,
"failed_requests": 20,
"avg_latency_ms": 145.3,
"p50_latency_ms": 120.0,
"p95_latency_ms": 380.0,
"p99_latency_ms": 750.0,
"total_prompt_tokens": 450000,
"total_completion_tokens": 180000,
"total_tokens": 630000,
"tokens_per_sec_avg": 87.4
},
"models": [
{
"model_id": "gpt-4",
"total_requests": 900,
"successful_requests": 895,
"failed_requests": 5,
"total_prompt_tokens": 270000,
"total_completion_tokens": 108000,
"total_tokens": 378000,
"avg_latency_ms": 160.2,
"avg_tokens_per_sec": 92.1,
"last_used": "2026-03-05T10:30:00Z"
}
],
"backends": [
{
"backend_name": "openai",
"total_requests": 900,
"successful_requests": 895,
"failed_requests": 5,
"avg_latency_ms": 160.2,
"health_status": "healthy"
}
]
}
Example¶
# All-time statistics
curl -s http://localhost:8080/admin/stats \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq
# Last hour only
curl -s "http://localhost:8080/admin/stats?window=1h" \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq
Get Per-Model Statistics¶
Returns only the per-model breakdown (subset of the full stats response).
Response¶
{
"models": [
{
"model_id": "gpt-4",
"total_requests": 900,
"successful_requests": 895,
"failed_requests": 5,
"total_prompt_tokens": 270000,
"total_completion_tokens": 108000,
"total_tokens": 378000,
"avg_latency_ms": 160.2,
"avg_tokens_per_sec": 92.1,
"last_used": "2026-03-05T10:30:00Z"
}
]
}
Models are sorted by total_requests in descending order.
Example¶
curl -s http://localhost:8080/admin/stats/models \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq '.models[].model_id'
Get Per-Backend Statistics¶
Returns only the per-backend breakdown. The health_status field is populated from the health checker ("healthy", "unhealthy", or "unknown" when health checks are disabled).
Response¶
{
"backends": [
{
"backend_name": "openai",
"total_requests": 900,
"successful_requests": 895,
"failed_requests": 5,
"avg_latency_ms": 160.2,
"health_status": "healthy"
}
]
}
Backends are sorted by total_requests in descending order.
Example¶
Per-API-Key and Per-User Statistics¶
These endpoints break down usage by the API key that authenticated each request and by the user attached to that key. They sit beside Get Per-Model Statistics and Get Per-Backend Statistics: same collector, a different grouping dimension.
Identifier and Bucketing Semantics¶
- Coverage: every inference surface contributes to these statistics —
/v1/chat/completions,/anthropic/v1/messages, and the OpenAI Responses API (/v1/responses, including its pass-through, Chat-Completions-conversion, and Anthropic-conversion strategies). Successful non-streaming requests carry full token usage; streaming requests are recorded at connect time (request counts and per-key/per-user attribution, with token totals omitted because they are only known once the stream completes). api_key_idis a derived, non-reversible identifier, never a raw key. It is the same value used as theapi_key_idPrometheus label, and it corresponds to the issued key'sid. The per-user endpoints key on theuser_idattached to the matched key. The derivedapi_key_idrequires themetricsfeature to be compiled in; without it, per-key attribution collapses to the"anonymous"bucket (per-user attribution is unaffected, since it reads the key'suser_iddirectly).- Requests with no key (or no associated user) are bucketed under
"anonymous". - Each dimension has a cardinality cap of 1000 distinct identifiers (excluding the reserved buckets). Once the cap is reached, further new identifiers are folded into an
"unknown"overflow bucket so their usage is still counted in aggregate. - The
windowquery parameter is accepted and echoed back in the response for consistency withGET /admin/stats, but the per-key and per-user aggregates are all-time totals, exactly likeGET /admin/stats/models. The identifier is resolved off the request hot path, so it is not present on the windowed ring-buffer records used for time-filtered latency percentiles.
The ApiKeyStats and UserStats objects share the same shape:
| Field | Type | Description |
|---|---|---|
api_key_id / user_id |
string | The derived key identifier or the user identifier. |
total_requests |
integer | Total requests attributed to this identifier. |
successful_requests |
integer | Requests that completed successfully. |
failed_requests |
integer | Requests that failed. |
total_prompt_tokens |
integer | Prompt tokens consumed. |
total_completion_tokens |
integer | Completion tokens produced. |
total_tokens |
integer | Sum of prompt and completion tokens. |
avg_latency_ms |
number | Average latency in milliseconds. |
avg_tokens_per_sec |
number | Average generation throughput in tokens per second. |
last_used |
string (ISO 8601) or null | Timestamp of the most recent request, or null if never used. |
Get Per-API-Key Statistics¶
Returns one entry per tracked API key, sorted by total_requests in descending order.
Query Parameters¶
| Parameter | Type | Description |
|---|---|---|
window |
string | Accepted and echoed in the window field, but does not filter the all-time aggregates. |
Response¶
{
"window": "all",
"api_keys": [
{
"api_key_id": "k_3f9a1c",
"total_requests": 1200,
"successful_requests": 1185,
"failed_requests": 15,
"total_prompt_tokens": 360000,
"total_completion_tokens": 144000,
"total_tokens": 504000,
"avg_latency_ms": 152.7,
"avg_tokens_per_sec": 88.3,
"last_used": "2026-03-05T10:30:00Z"
},
{
"api_key_id": "anonymous",
"total_requests": 80,
"successful_requests": 80,
"failed_requests": 0,
"total_prompt_tokens": 12000,
"total_completion_tokens": 4800,
"total_tokens": 16800,
"avg_latency_ms": 131.0,
"avg_tokens_per_sec": 90.1,
"last_used": "2026-03-05T10:28:00Z"
}
]
}
Example¶
curl -s http://localhost:8080/admin/stats/api-keys \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq
# The window param is accepted and echoed but does not change the aggregates
curl -s "http://localhost:8080/admin/stats/api-keys?window=24h" \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq '.window'
Get Per-API-Key Statistics by ID¶
Returns the stats for a single api_key_id (the derived identifier returned by the list endpoint, not a raw key). Returns 404 Not Found when the identifier has no recorded usage.
Response¶
{
"window": "all",
"api_key": {
"api_key_id": "k_3f9a1c",
"total_requests": 1200,
"successful_requests": 1185,
"failed_requests": 15,
"total_prompt_tokens": 360000,
"total_completion_tokens": 144000,
"total_tokens": 504000,
"avg_latency_ms": 152.7,
"avg_tokens_per_sec": 88.3,
"last_used": "2026-03-05T10:30:00Z"
}
}
Example¶
curl -s http://localhost:8080/admin/stats/api-keys/k_3f9a1c \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq
Get Per-User Statistics¶
Returns one entry per tracked user identifier (the user_id attached to the matched key), sorted by total_requests in descending order. Same fields and bucketing rules as the per-API-key endpoint.
Response¶
{
"window": "all",
"users": [
{
"user_id": "user-acme",
"total_requests": 1200,
"successful_requests": 1185,
"failed_requests": 15,
"total_prompt_tokens": 360000,
"total_completion_tokens": 144000,
"total_tokens": 504000,
"avg_latency_ms": 152.7,
"avg_tokens_per_sec": 88.3,
"last_used": "2026-03-05T10:30:00Z"
}
]
}
Example¶
Get Per-User Statistics by ID¶
Returns the stats for a single user_id. Returns 404 Not Found when the identifier has no recorded usage.
Response¶
{
"window": "all",
"user": {
"user_id": "user-acme",
"total_requests": 1200,
"successful_requests": 1185,
"failed_requests": 15,
"total_prompt_tokens": 360000,
"total_completion_tokens": 144000,
"total_tokens": 504000,
"avg_latency_ms": 152.7,
"avg_tokens_per_sec": 88.3,
"last_used": "2026-03-05T10:30:00Z"
}
}
Example¶
curl -s http://localhost:8080/admin/stats/users/user-acme \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq
Per-Model Breakdown and Usage Time Series¶
These per-identifier drill-downs power dashboard widgets: a per-model breakdown (a "tokens by model" donut) and a daily usage trend (a usage-over-time chart). They are tracked as two independent dimensions, not a per-(identifier, model, date) cube, so cardinality stays bounded.
Scope and semantics carry over from Per-API-Key and Per-User Statistics:
- Only token and request totals are tracked. There is no cost field; the dashboard derives cost from tokens against its own pricing table.
api_key_idis the derived, non-reversible identifier (never a raw key);user_idis the user attached to the matched key. Unknown identifiers return200 OKwith an empty array, matching the list endpoints rather than returning 404.- Each new dimension has its own cardinality cap (folding overflow into an aggregate
"unknown"bucket that is excluded from per-identifier reads), and the unknown-model label is"unknown".
Get Per-API-Key Model Breakdown¶
Returns the per-model breakdown for a single api_key_id as a models array of the same ModelStats objects used by GET /admin/stats/models (model id, request counts, prompt/completion/total tokens, average latency, average tokens-per-second, last used), sorted by total_requests descending. The window query parameter is accepted and echoed but does not filter these all-time aggregates.
Response¶
{
"api_key_id": "k_3f9a1c",
"window": "all",
"models": [
{
"model_id": "claude-haiku-4-5",
"total_requests": 2,
"successful_requests": 2,
"failed_requests": 0,
"total_prompt_tokens": 374,
"total_completion_tokens": 8,
"total_tokens": 382,
"avg_latency_ms": 975.0,
"avg_tokens_per_sec": 195.9,
"last_used": "2026-06-18T22:11:54Z"
}
]
}
Example¶
curl -s http://localhost:8080/admin/stats/api-keys/k_3f9a1c/models \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq
Get Per-User Model Breakdown¶
Same shape as the per-API-key model breakdown, grouped by user_id.
Response¶
{
"user_id": "user-acme",
"window": "all",
"models": [
{
"model_id": "claude-haiku-4-5",
"total_requests": 2,
"successful_requests": 2,
"failed_requests": 0,
"total_prompt_tokens": 374,
"total_completion_tokens": 8,
"total_tokens": 382,
"avg_latency_ms": 975.0,
"avg_tokens_per_sec": 195.9,
"last_used": "2026-06-18T22:11:54Z"
}
]
}
Example¶
curl -s http://localhost:8080/admin/stats/users/user-acme/models \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq
Get Per-API-Key Usage Time Series¶
Returns a daily usage series for a single api_key_id, one point per UTC calendar day, sorted ascending by date. Buckets are retained for series_retention_days (default 30); the periodic snapshot task prunes older days and the read path filters them out, so the series never returns days beyond the retention window.
Query Parameters¶
| Parameter | Type | Description |
|---|---|---|
from |
string | Inclusive lower bound, as a Unix-millis integer or an RFC 3339 timestamp. Defaults to 30 days ago. |
to |
string | Exclusive upper bound, same formats as from. Defaults to now. |
interval |
string | Bucket granularity. Only day is supported; any other value returns 400 Bad Request. Defaults to day. |
An inverted range (from >= to) also returns 400 Bad Request.
Response¶
{
"api_key_id": "k_3f9a1c",
"interval": "day",
"series": [
{ "date": "2026-06-17", "total_requests": 12, "prompt_tokens": 3600, "completion_tokens": 1440, "total_tokens": 5040 },
{ "date": "2026-06-18", "total_requests": 8, "prompt_tokens": 2400, "completion_tokens": 960, "total_tokens": 3360 }
]
}
Example¶
curl -s "http://localhost:8080/admin/stats/api-keys/k_3f9a1c/series?from=2026-06-01T00:00:00Z&to=2026-06-30T00:00:00Z&interval=day" \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq
Get Per-User Usage Time Series¶
Same shape and parameters as the per-API-key series, grouped by user_id.
Response¶
{
"user_id": "user-acme",
"interval": "day",
"series": [
{ "date": "2026-06-17", "total_requests": 12, "prompt_tokens": 3600, "completion_tokens": 1440, "total_tokens": 5040 },
{ "date": "2026-06-18", "total_requests": 8, "prompt_tokens": 2400, "completion_tokens": 960, "total_tokens": 3360 }
]
}
Example¶
curl -s "http://localhost:8080/admin/stats/users/user-acme/series?from=2026-06-01T00:00:00Z&to=2026-06-30T00:00:00Z" \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq
Reset Statistics¶
Resets all counters, per-model records, per-backend records, the per-API-key and per-user records (including their per-model breakdowns and daily time-series buckets), and the latency ring buffer. This action is irreversible.
Response¶
Example¶
Persistent Metrics Log API¶
The Persistent Metrics Log API exposes recent Prometheus registry history persisted to a local store (default: SQLite). See the Persistent Metrics Log guide for storage layout, retention math, and configuration.
Get Metrics History¶
Returns historical samples for metric over a half-open time window [from, to).
Query parameters¶
| Parameter | Required | Default | Notes |
|---|---|---|---|
metric |
yes | — | Metric family name, e.g. http_requests_total. |
from |
no | now − 24h | Unix milliseconds (int) or RFC 3339 timestamp. |
to |
no | now | Unix milliseconds (int) or RFC 3339 timestamp. |
limit |
no | 10,000 | Cap on returned rows. Hard ceiling 100,000. |
Response¶
{
"metric": "http_requests_total",
"from_ms": 1715385600000,
"to_ms": 1715472000000,
"row_count": 2,
"limit": 10000,
"samples": [
{
"ts_ms": 1715385600000,
"labels": {"backend": "openai", "endpoint": "/v1/chat/completions"},
"value": 42.0,
"kind": "counter"
}
]
}
Histograms and summaries return multiple kind rows per family — see the Persistent Metrics Log guide.
Error responses¶
400 Bad Request—metricmissing or oversized, or time range non-positive.404 Not Found— persistence is disabled (metrics.persistence.enabled: false).500 Internal Server Error— storage error.503 Service Unavailable—metrics-persistencefeature was not compiled in.
Example¶
curl -s 'http://localhost:8080/admin/metrics/history?metric=http_requests_total&limit=100' \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq .
Response Cache Admin APIs¶
The Response Cache Admin APIs expose statistics and invalidation operations for the response cache. All endpoints are mounted under /admin/response-cache and require the same authentication as the rest of the Admin API.
Response caching is configured in the response_cache section of your YAML config. See the Response Cache Configuration guide for full configuration details.
Get Response Cache Statistics¶
Returns current response cache statistics including hit/miss counts, memory usage, and configuration summary.
Response¶
{
"enabled": true,
"backend_type": "memory",
"entries": 42,
"capacity": 1000,
"requests": {
"hit": 120,
"miss": 80,
"skip": 15,
"total": 215
},
"hit_rate": "0.6000",
"evictions": 3,
"size_bytes": 1048576,
"config": {
"backend": "memory",
"ttl": "5m",
"capacity": 1000,
"max_response_size": 1048576,
"max_stream_buffer_size": 10485760
}
}
When using the Redis backend (backend: redis), the response includes an additional redis object:
{
"enabled": true,
"backend_type": "redis",
"entries": 42,
"capacity": 1000,
"requests": { "hit": 120, "miss": 80, "skip": 15, "total": 215 },
"hit_rate": "0.6000",
"evictions": 3,
"size_bytes": 1048576,
"config": { "backend": "redis", "ttl": "5m", "capacity": 1000, "max_response_size": 1048576, "max_stream_buffer_size": 10485760 },
"redis": {
"connections": { "active": 3, "idle": 5 },
"errors": { "connection": 0, "timeout": 0, "other": 0, "total": 0 },
"fallback_active": false
}
}
When response caching is disabled (response_cache.enabled: false or the section is absent), enabled is false, entries and capacity are 0, and config is null.
Response Fields¶
| Field | Type | Description |
|---|---|---|
enabled |
boolean | Whether response caching is active |
backend_type |
string | Active cache backend: "memory" or "redis" |
entries |
integer | Current number of cached entries |
capacity |
integer | Maximum cache capacity (LRU limit) |
requests.hit |
integer | Requests served from cache |
requests.miss |
integer | Cache misses (backend was called, entry stored) |
requests.skip |
integer | Non-cacheable requests (e.g., temperature > 0) |
requests.total |
integer | Total cacheable lookups (hit + miss + skip) |
hit_rate |
string | Rolling cache hit rate as a decimal string (e.g., "0.6000") |
evictions |
integer | Total LRU evictions since startup |
size_bytes |
integer | Approximate memory usage of cached entries in bytes |
config |
object or null | Active configuration summary; null when disabled |
redis |
object or absent | Redis-specific stats (only present when backend_type is "redis") |
redis.connections.active |
integer | Active connections in the Redis pool |
redis.connections.idle |
integer | Idle connections in the Redis pool |
redis.errors.connection |
integer | Redis connection errors since startup |
redis.errors.timeout |
integer | Redis command timeout errors since startup |
redis.errors.other |
integer | Other Redis errors since startup |
redis.errors.total |
integer | Total Redis errors since startup |
redis.fallback_active |
boolean | Whether the in-memory fallback is currently active |
Example¶
curl -s http://localhost:8080/admin/response-cache/stats \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq
Invalidate Response Cache¶
Clears cache entries. Only full cache invalidation via clear_all: true is supported; targeted invalidation by model or tenant is not available.
Request Body¶
| Field | Type | Required | Description |
|---|---|---|---|
clear_all |
boolean | No | When true, clears the entire cache. Defaults to false. |
model |
string | No | Accepted but currently ignored; only clear_all is honored. Must not exceed 256 characters. |
tenant_id |
string | No | Accepted but currently ignored; only clear_all is honored. Must not exceed 256 characters. |
Response (clear_all: true)¶
Response (clear_all: false or omitted)¶
{
"success": true,
"action": "noop",
"message": "Targeted invalidation by model/tenant_id is not yet supported. Use clear_all: true to clear the entire cache."
}
Response (cache disabled)¶
Example¶
# Clear entire cache
curl -X POST http://localhost:8080/admin/response-cache/invalidate \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"clear_all": true}'
KV Cache Index Admin APIs¶
The KV Cache Index Admin APIs expose statistics, per-backend state, and a clear operation for the KV cache index subsystem. All endpoints are mounted under /admin/kv-index and require the same authentication as the rest of the Admin API.
The KV cache index tracks which backends hold cached KV data for specific token prefixes, enabling KV-aware routing. It is configured in the kv_cache_index section of your YAML config.
Get KV Cache Index Statistics¶
Returns overall KV cache index statistics, including index size, event source connection status, and routing decision counts.
Response¶
{
"enabled": true,
"config": {
"backend": "memory",
"max_entries": 100000,
"entry_ttl_seconds": 600,
"event_sources_count": 2,
"scoring": {
"overlap_weight": 0.6,
"load_weight": 0.3,
"health_weight": 0.1,
"min_overlap_threshold": 0.3
}
},
"index": {
"prefix_count": 45,
"entry_count": 120,
"total_hits": 3842,
"total_evictions": 12
},
"event_sources": [
{
"backend_name": "vllm-1",
"connected": true,
"events_received": 2100,
"events_dropped": 0,
"last_event_at": "2025-03-12T10:45:00Z",
"reconnect_count": 0
}
],
"routing_decisions": {
"kv_aware": 980,
"fallback": 120,
"total": 1100
},
"query_latency_count": 1100,
"overlap_score_count": 980
}
When the KV cache index is disabled (kv_cache_index.enabled: false or the section is absent), enabled is false, config is null, and all counters are 0.
Response Fields¶
| Field | Type | Description |
|---|---|---|
enabled |
boolean | Whether the KV cache index is active |
config |
object or null | Active configuration summary; null when disabled |
config.backend |
string | Index backend: "memory" or "redis" |
config.max_entries |
integer | Maximum tracked prefix hash entries |
config.entry_ttl_seconds |
integer | TTL for index entries in seconds |
config.event_sources_count |
integer | Number of configured event sources |
config.scoring |
object | Scoring weight configuration |
index.prefix_count |
integer | Number of distinct prefix hashes tracked |
index.entry_count |
integer | Total (prefix, backend) pairs tracked |
index.total_hits |
integer | Total cache hit recordings since startup |
index.total_evictions |
integer | Total cache eviction recordings since startup |
event_sources |
array | Status of each event source consumer |
event_sources[].connected |
boolean | Whether the consumer is currently connected |
event_sources[].events_received |
integer | Total events received from this source |
event_sources[].events_dropped |
integer | Events dropped due to backpressure |
event_sources[].reconnect_count |
integer | Number of reconnect attempts since startup |
routing_decisions.kv_aware |
integer | Requests routed using KV-aware selection |
routing_decisions.fallback |
integer | Requests that fell back to the default strategy |
routing_decisions.total |
integer | Total routing decisions made |
Example¶
Get Per-Backend KV Cache State¶
Returns per-backend KV cache event statistics, including events received, processed, dropped, connection status, and index event counts.
Response (enabled)¶
{
"enabled": true,
"backends": [
{
"backend_name": "vllm-1",
"connection": {
"connected": true,
"reconnect_count": 0,
"last_event_at": "2025-03-12T10:45:00Z"
},
"events": {
"received": 2100,
"dropped": 0,
"index_created": 1950,
"index_evicted": 150
}
},
{
"backend_name": "vllm-2",
"connection": {
"connected": false,
"reconnect_count": 3,
"last_event_at": null
},
"events": {
"received": 0,
"dropped": 0,
"index_created": 0,
"index_evicted": 0
},
"configured_endpoint": "ws://vllm-2:8000/v1/kv_events"
}
]
}
Backends that appear in kv_cache_index.event_sources but have no active consumer yet are included with connected: false and a configured_endpoint field.
Response (disabled)¶
Response Fields¶
| Field | Type | Description |
|---|---|---|
enabled |
boolean | Whether the KV cache index is active |
backends[].backend_name |
string | Backend identifier |
backends[].connection.connected |
boolean | Whether the event stream consumer is connected |
backends[].connection.reconnect_count |
integer | Reconnect attempts since startup |
backends[].connection.last_event_at |
string or null | ISO 8601 timestamp of the most recent event |
backends[].events.received |
integer | Total events received from this backend |
backends[].events.dropped |
integer | Events dropped due to backpressure |
backends[].events.index_created |
integer | Index entries created from events |
backends[].events.index_evicted |
integer | Index entries evicted from events |
backends[].configured_endpoint |
string | Configured endpoint URL (only present for inactive sources) |
Example¶
curl -s http://localhost:8080/admin/kv-index/backends \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq
Clear KV Cache Index¶
Clears all entries from the KV cache index. Intended for debugging and testing. In production the index rebuilds automatically from incoming KV events.
Response (success)¶
entries_before_clear is the total (prefix, backend) pair count before clearing. cleared_entries is the number of prefix hash buckets removed. For the Redis backend, cleared_entries counts the number of Redis keys deleted; because each key has a TTL, any remaining keys expire automatically.
Response (disabled)¶
Example¶
Smart Routing Admin APIs¶
The Smart Routing Admin APIs expose the model tier registry, letting you inspect which tier and domain profile the router assigns to each model, and update profiles at runtime without a restart. All endpoints are mounted under /admin/smart-routing and require the same authentication as the rest of the Admin API.
Smart routing is enabled by setting smart_routing.enabled: true in your YAML config. When disabled, the list endpoint still responds but reports "enabled": false and returns an empty profile list.
List Model Profiles¶
Returns all explicitly configured profiles plus any auto-inferred profiles that have been cached since startup.
Response¶
{
"enabled": true,
"default_tier": 2,
"total": 3,
"profiles": [
{
"model_id": "gpt-4o",
"tier": 1,
"tier_name": "flagship",
"domains": ["general", "code", "reasoning"],
"cost_per_1k_input_tokens": 0.005,
"cost_per_1k_output_tokens": 0.015,
"source": "explicit_exact"
},
{
"model_id": "llama-3-8b-q4_K_M",
"tier": 3,
"tier_name": "lightweight",
"domains": ["general"],
"cost_per_1k_input_tokens": null,
"cost_per_1k_output_tokens": null,
"source": "explicit_pattern"
}
]
}
When smart routing is disabled, enabled is false, profiles is [], and total is 0.
Response Fields¶
| Field | Type | Description |
|---|---|---|
enabled |
boolean | Whether smart routing is active |
default_tier |
integer | Tier assigned when no profile matches (1, 2, or 3) |
total |
integer | Number of profiles returned |
profiles[].model_id |
string | The model identifier |
profiles[].tier |
integer | Numeric tier: 1 = Flagship, 2 = Standard, 3 = Lightweight |
profiles[].tier_name |
string | Human-readable tier name |
profiles[].domains |
array of strings | Domain specialization tags |
profiles[].cost_per_1k_input_tokens |
number or null | Input token cost per 1,000 tokens |
profiles[].cost_per_1k_output_tokens |
number or null | Output token cost per 1,000 tokens |
profiles[].source |
string | How the profile was resolved (see below) |
source values:
| Value | Meaning |
|---|---|
explicit_exact |
Profile was configured by exact model name |
explicit_pattern |
Profile was matched by a glob pattern |
auto_inferred |
Profile was inferred from pricing, capabilities, or name heuristics |
default |
No match found; default tier was used |
Example¶
curl http://localhost:8080/admin/smart-routing/model-profiles \
-H "Authorization: Bearer $ADMIN_TOKEN"
Get Model Profile¶
Returns the resolved profile for a specific model. If the model has metadata in model-metadata.yaml, auto-inference uses pricing and capability information from there. Otherwise, name heuristics apply.
Path Parameters¶
| Parameter | Description |
|---|---|
model |
Model identifier (max 256 characters) |
Response¶
{
"model_id": "gemini-1.5-flash",
"tier": 3,
"tier_name": "lightweight",
"domains": ["general"],
"cost_per_1k_input_tokens": null,
"cost_per_1k_output_tokens": null,
"source": "auto_inferred"
}
Example¶
curl http://localhost:8080/admin/smart-routing/model-profiles/gpt-4o \
-H "Authorization: Bearer $ADMIN_TOKEN"
Update Model Profiles¶
Replaces all model profile configurations. The registry reloads immediately; the inferred-profile cache is cleared so subsequent requests re-evaluate against the new profiles. If a config_sender is available, the change is also propagated to the in-memory config.
Request Body¶
{
"default_tier": 2,
"model_profiles": [
{
"model": "gpt-4o",
"tier": 1,
"domains": ["general", "code", "reasoning"],
"cost_per_1k_input_tokens": 0.005,
"cost_per_1k_output_tokens": 0.015
},
{
"model_pattern": "*-q4_K_M",
"tier": 3,
"domains": ["general"]
}
]
}
Each entry must include either model (exact name) or model_pattern (glob). Entries with neither are rejected with 400 Bad Request. default_tier is optional; if omitted, the current default is preserved.
Request Fields¶
| Field | Type | Required | Description |
|---|---|---|---|
model_profiles |
array | Yes | Profile list; replaces existing configuration |
model_profiles[].model |
string | Conditional | Exact model name (max 200 chars) |
model_profiles[].model_pattern |
string | Conditional | Glob pattern such as *-q4_K_M (max 200 chars) |
model_profiles[].tier |
integer | Yes | 1 (Flagship), 2 (Standard), or 3 (Lightweight) |
model_profiles[].domains |
array of strings | No | Domain tags: general, code, reasoning, creative, multilingual, vision |
model_profiles[].cost_per_1k_input_tokens |
number | No | Input cost per 1,000 tokens |
model_profiles[].cost_per_1k_output_tokens |
number | No | Output cost per 1,000 tokens |
default_tier |
integer | No | Fallback tier when no profile matches |
Response¶
Example¶
curl -X PUT http://localhost:8080/admin/smart-routing/model-profiles \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model_profiles": [
{"model": "gpt-4o", "tier": 1, "domains": ["general", "code"]},
{"model_pattern": "*-mini", "tier": 3, "domains": ["general"]}
]
}'
Smart Routing Status¶
Returns overall smart routing status including enabled state, load state, classifier method, and policy count.
Response¶
{
"enabled": true,
"virtual_model": "auto",
"intercept_all": false,
"default_tier": 2,
"classifier_method": "rule",
"has_llm_classifier": false,
"load_state": "normal",
"load_monitoring_enabled": false,
"debug_headers": false,
"policy_count": 5,
"profile_count": 3
}
Smart Routing Stats¶
Returns aggregated routing statistics including profile count, policy count, and LLM classifier cache info.
Classify (Diagnostic)¶
Classify a request without routing it. Useful for debugging classification behavior.
Request¶
Response¶
{
"complexity": "trivial",
"domain": "general",
"confidence": 0.95,
"classifier_type": "rule_based",
"required_capabilities": [],
"reasoning": null,
"signals": [
{"name": "message_length", "strength": 0.1, "influences": "complexity"}
]
}
Simulate (Diagnostic)¶
Simulate the full routing pipeline (classification + policy evaluation + model selection + load state) without actually forwarding the request. Returns the complete routing decision chain.
Request¶
Same as the classify endpoint.
Response¶
{
"routed": true,
"target_model": "gpt-4o-mini",
"classification": {
"complexity": "simple",
"domain": "general",
"confidence": 0.92,
"classifier_type": "rule_based"
},
"policy": {
"name": "trivial_to_lightweight",
"tier": 3,
"prefer_domains": [],
"require_capabilities": []
},
"load_state": "normal",
"classification_duration_ms": 0.05,
"available_models": 5
}
List Routing Policies¶
Returns the currently active routing policies with their conditions and targets.
Update Routing Policies¶
Hot-reload routing policies at runtime.
Request¶
{
"routing_policies": [
{
"name": "all_to_flagship",
"when": {},
"route_to": {"tier": 1}
}
],
"virtual_model": "auto",
"intercept_all": false
}
Load State¶
Returns the current load state with assessment details.
Response¶
{
"enabled": true,
"state": "normal",
"max_tier": null,
"prefer_quantized": false,
"reject_expert": false
}
Cache Stats¶
Returns LLM classifier cache statistics.
Response¶
Clear Cache¶
Clear all entries from the LLM classifier cache.
Response¶
Guardrail Admin APIs¶
The Guardrail Admin APIs let you inspect and adjust the content-safety guardrail policy at runtime without a restart. Changes propagate through the same hot-reload config channel that the running GuardrailService subscribes to, so a mode switch, an enabled toggle, a threshold change, or a route override takes effect on the live request path immediately. All endpoints are mounted under /admin/guardrails and require the same authentication and audit logging as the rest of the Admin API.
The guardrail provider set itself is defined in the configuration file; these endpoints toggle and tune the existing providers and the global/per-route policy. They do not create or remove providers.
Get Guardrail Policy¶
Returns the effective guardrail policy and a status summary. Secrets (the bypass_api_keys list) are masked. service_active is false when guardrails were disabled at startup and no service is running; in that case the returned policy is the configured policy but no checks execute.
Response¶
{
"enabled": true,
"mode": "enforce",
"service_active": true,
"registered_providers": ["openai-moderation", "llama-guard"],
"policy": {
"enabled": true,
"mode": "enforce",
"timeout_ms": 2000,
"on_error": "fail_open",
"block_behavior": "content_filter",
"streaming_mode": "buffer_full",
"providers": [ ... ],
"routes": { ... },
"bypass_api_keys": ["su...(24 chars)"],
"allow": { "exact": [], "regex": [] },
"deny": { "exact": [], "regex": [] }
}
}
Example¶
Update Guardrail Policy¶
Partially updates the global guardrail policy. Every field is optional; only the provided fields change. Providers and per-route overrides are managed through their own endpoints below. The candidate policy is validated before it is applied; an invalid change (e.g. timeout_ms: 0, or enabling enforce mode with no providers) returns 400 and leaves the running policy unchanged.
Request Body¶
| Field | Type | Description |
|---|---|---|
enabled |
boolean | Toggle guardrails on/off globally |
mode |
string | monitor or enforce |
timeout_ms |
integer | Global guardrail timeout in milliseconds |
on_error |
string | fail_open or fail_closed |
block_behavior |
string | content_filter, error, or refusal_message |
streaming_mode |
string | buffer_full, chunked, or passthrough |
streaming_chunk_size |
integer | chunked: characters of new text to accumulate before each incremental check (default 200) |
streaming_context_size |
integer | chunked: trailing characters carried into each check for cross-boundary context (default 50) |
streaming_stream_first |
boolean | chunked: emit each window before checking it (true) or check before emitting (false, default) |
allow |
object | Replace the global allow list ({ "exact": [], "regex": [] }) |
deny |
object | Replace the global deny list |
bypass_api_keys |
array | Replace the bypass API key list |
Response¶
Example¶
curl -X PATCH http://localhost:8080/admin/guardrails \
-H "Authorization: Bearer <admin-token>" \
-H "Content-Type: application/json" \
-d '{"mode": "enforce"}'
Update Guardrail Provider¶
Updates the runtime settings of a single configured provider. All fields are optional. Returns 404 if no provider with the given name is configured.
Request Body¶
| Field | Type | Description |
|---|---|---|
enabled |
boolean | Enable or disable this provider |
category_thresholds |
object | Replace the per-category score thresholds ({ "violence": 0.8 }) |
timeout_ms |
integer or null | Set or clear the per-provider timeout override |
on_error |
string or null | Set or clear the per-provider error policy override |
Response¶
Example¶
curl -X PUT http://localhost:8080/admin/guardrails/providers/llama-guard \
-H "Authorization: Bearer <admin-token>" \
-H "Content-Type: application/json" \
-d '{"enabled": false}'
Set Guardrail Route Override¶
Creates or replaces the per-route guardrail override for the given route. The request body is a route override object; any omitted field inherits the global policy.
Request Body¶
| Field | Type | Description |
|---|---|---|
mode |
string | Override the operating mode for this route |
enabled |
boolean | Override whether guardrails run for this route |
providers |
array | Restrict this route to a subset of provider names |
category_thresholds |
object | Per-route category thresholds |
allow |
object | Route-specific allow list |
deny |
object | Route-specific deny list |
Response¶
Example¶
curl -X PUT http://localhost:8080/admin/guardrails/routes/gpt-4o \
-H "Authorization: Bearer <admin-token>" \
-H "Content-Type: application/json" \
-d '{"mode": "monitor"}'
Delete Guardrail Route Override¶
Removes the per-route override, falling the route back to the global policy. Returns 404 if no override is configured for the route.
Response¶
Test Guardrails (Dry Run)¶
Diagnostic endpoint for threshold tuning. Runs every registered provider against the supplied sample text and returns each provider's verdict plus the aggregated most-severe-wins verdict. The dry run ignores the global mode and the bypass list so the raw provider output is visible; disabled providers (and those that do not apply to the requested stage) are reported as skipped. Returns 400 when no guardrail service is active.
Request Body¶
| Field | Type | Description |
|---|---|---|
text |
string | The sample text to evaluate (required) |
stage |
string | input (default) or output |
model |
string | Optional model identifier for the evaluation context |
route |
string | Optional route name for the evaluation context |
Response¶
{
"stage": "input",
"providers": [
{
"provider": "openai-moderation",
"skipped": false,
"verdict": { "verdict": "allow" }
},
{
"provider": "llama-guard",
"skipped": false,
"verdict": {
"verdict": "block",
"category": "violence",
"score": 0.97,
"reason": "..."
}
}
],
"aggregated": {
"verdict": "block",
"category": "violence",
"score": 0.97,
"reason": "..."
}
}
Example¶
curl -X POST http://localhost:8080/admin/guardrails/test \
-H "Authorization: Bearer <admin-token>" \
-H "Content-Type: application/json" \
-d '{"text": "sample prompt to evaluate", "stage": "input"}'
Data Models¶
Configuration Sections¶
| Section | Description | Hot Reload |
|---|---|---|
server |
Bind address, workers, connection pool | Requires restart |
backends |
Backend URLs, weights, models | Gradual |
health_checks |
Intervals, thresholds | Gradual |
logging |
Log level, format, output | Immediate |
retry |
Max attempts, delays, backoff | Immediate |
timeouts |
Connect, request, idle timeouts | Gradual |
rate_limiting |
Limits, storage, whitelist | Immediate |
circuit_breaker |
Thresholds, recovery time | Immediate |
global_prompts |
System prompt injection | Immediate |
fallback |
Fallback chains, policies | Gradual |
files |
Files API settings | Gradual |
api_keys |
API key configuration | Immediate |
metrics |
Prometheus, labels | Gradual |
admin |
Admin API settings | Gradual |
admin.stats |
Stats collection settings | Immediate |
routing |
Model routing rules | Gradual |
smart_routing |
Model tier registry and profiles | Immediate |
Backend Object¶
{
"name": "string",
"url": "string (http:// or https://)",
"api_key": "string (optional, masked in responses)",
"weight": "integer (1-100)",
"models": ["string"],
"enabled": "boolean",
"health_check": {
"enabled": "boolean",
"path": "string",
"interval": "string (duration)"
}
}
History Entry Object¶
{
"version": "integer",
"timestamp": "string (ISO 8601)",
"sections_changed": ["string"],
"source": "string (api|file_reload|initial|rollback)",
"user": "string",
"description": "string (optional)",
"rollback_available": "boolean"
}
Validation Result Object¶
{
"valid": "boolean",
"errors": [
{
"field": "string",
"message": "string",
"code": "string"
}
],
"warnings": [
{
"field": "string",
"message": "string"
}
]
}
Hot Reload Behavior¶
Update Types¶
| Type | Behavior | Sections |
|---|---|---|
| Immediate | Applied instantly, no disruption | logging, rate_limiting, circuit_breaker, retry, global_prompts, api_keys |
| Gradual | Existing connections maintained, new connections use new config | backends, health_checks, timeouts, fallback, files, metrics, admin, routing |
| Requires Restart | Logged as warning, requires server restart | server.bind_address, server.workers |
Example Workflow¶
# 1. Check current configuration
curl -s http://localhost:8080/admin/config/logging | jq
# 2. Validate change
curl -X POST http://localhost:8080/admin/config/validate \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"section": "logging", "config": {"level": "debug"}}'
# 3. Apply change (immediate effect)
curl -X PATCH http://localhost:8080/admin/config/logging \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"config": {"level": "debug"}}'
# 4. Verify change
curl -s http://localhost:8080/admin/config/logging | jq '.config.level'
Error Handling¶
Error Response Format¶
Error Codes¶
| Code | HTTP Status | Description |
|---|---|---|
VALIDATION_ERROR |
400 | Configuration validation failed |
INVALID_SECTION |
400 | Unknown configuration section |
PARSE_ERROR |
400 | Failed to parse configuration content |
SECTION_NOT_FOUND |
404 | Section not found |
VERSION_NOT_FOUND |
404 | History version not found |
BACKEND_NOT_FOUND |
404 | Backend not found |
BACKEND_EXISTS |
409 | Backend with name already exists |
CONTENT_TOO_LARGE |
413 | Configuration content exceeds 1MB limit |
INTERNAL_ERROR |
500 | Internal server error |
Error Examples¶
// Validation Error
{
"error_code": "VALIDATION_ERROR",
"message": "Configuration validation failed",
"details": {
"errors": [
{"field": "workers", "message": "workers must be greater than 0"}
]
}
}
// Section Not Found
{
"error_code": "SECTION_NOT_FOUND",
"message": "Configuration section 'invalid' not found",
"details": {
"available_sections": ["server", "backends", "logging", "..."]
}
}
// Backend Exists
{
"error_code": "BACKEND_EXISTS",
"message": "Backend 'openai' already exists",
"details": {
"existing_backend": "openai"
}
}
Client SDK Examples¶
Python¶
import requests
from typing import Optional, Dict, Any, List
from dataclasses import dataclass
@dataclass
class ContinuumAdminClient:
"""Continuum Router Admin API Client"""
base_url: str
token: str
def __post_init__(self):
self.session = requests.Session()
self.session.headers.update({
"Authorization": f"Bearer {self.token}",
"Content-Type": "application/json"
})
# Configuration Query APIs
def get_full_config(self) -> Dict[str, Any]:
"""Get full configuration with masked sensitive data"""
resp = self.session.get(f"{self.base_url}/admin/config/full")
resp.raise_for_status()
return resp.json()
def get_sections(self) -> List[Dict[str, Any]]:
"""Get all configuration sections"""
resp = self.session.get(f"{self.base_url}/admin/config/sections")
resp.raise_for_status()
return resp.json()["sections"]
def get_section(self, section: str) -> Dict[str, Any]:
"""Get configuration for a specific section"""
resp = self.session.get(f"{self.base_url}/admin/config/{section}")
resp.raise_for_status()
return resp.json()
def get_schema(self, section: Optional[str] = None) -> Dict[str, Any]:
"""Get JSON schema for validation"""
params = {"section": section} if section else {}
resp = self.session.get(
f"{self.base_url}/admin/config/schema",
params=params
)
resp.raise_for_status()
return resp.json()
# Configuration Modification APIs
def update_section(self, section: str, config: Dict[str, Any]) -> Dict[str, Any]:
"""Replace section configuration"""
resp = self.session.put(
f"{self.base_url}/admin/config/{section}",
json={"config": config}
)
resp.raise_for_status()
return resp.json()
def patch_section(self, section: str, config: Dict[str, Any]) -> Dict[str, Any]:
"""Partial update section configuration"""
resp = self.session.patch(
f"{self.base_url}/admin/config/{section}",
json={"config": config}
)
resp.raise_for_status()
return resp.json()
def validate_config(
self,
section: str,
config: Dict[str, Any],
dry_run: bool = True
) -> Dict[str, Any]:
"""Validate configuration without applying"""
resp = self.session.post(
f"{self.base_url}/admin/config/validate",
json={"section": section, "config": config, "dry_run": dry_run}
)
resp.raise_for_status()
return resp.json()
def apply_config(
self,
sections: Optional[List[str]] = None,
force: bool = False
) -> Dict[str, Any]:
"""Apply pending configuration changes"""
body = {"force": force}
if sections:
body["sections"] = sections
resp = self.session.post(
f"{self.base_url}/admin/config/apply",
json=body
)
resp.raise_for_status()
return resp.json()
# Configuration Save/Restore APIs
def export_config(
self,
format: str = "yaml",
sections: Optional[List[str]] = None,
include_sensitive: bool = False
) -> str:
"""Export configuration in specified format"""
body = {"format": format, "include_sensitive": include_sensitive}
if sections:
body["sections"] = sections
resp = self.session.post(
f"{self.base_url}/admin/config/export",
json=body
)
resp.raise_for_status()
return resp.json()["content"]
def import_config(
self,
content: str,
format: str = "yaml",
apply: bool = True,
dry_run: bool = False
) -> Dict[str, Any]:
"""Import configuration from content"""
resp = self.session.post(
f"{self.base_url}/admin/config/import",
json={
"format": format,
"content": content,
"apply": apply,
"dry_run": dry_run
}
)
resp.raise_for_status()
return resp.json()
def get_history(
self,
limit: int = 20,
offset: int = 0,
section: Optional[str] = None
) -> Dict[str, Any]:
"""Get configuration change history"""
params = {"limit": limit, "offset": offset}
if section:
params["section"] = section
resp = self.session.get(
f"{self.base_url}/admin/config/history",
params=params
)
resp.raise_for_status()
return resp.json()
def rollback(
self,
version: int,
sections: Optional[List[str]] = None,
dry_run: bool = False
) -> Dict[str, Any]:
"""Rollback to a previous version"""
body = {"dry_run": dry_run}
if sections:
body["sections"] = sections
resp = self.session.post(
f"{self.base_url}/admin/config/rollback/{version}",
json=body
)
resp.raise_for_status()
return resp.json()
# Backend Management APIs
def list_backends(self) -> List[Dict[str, Any]]:
"""List all backends"""
resp = self.session.get(f"{self.base_url}/admin/backends")
resp.raise_for_status()
return resp.json()["backends"]
def get_backend(self, name: str) -> Dict[str, Any]:
"""Get backend configuration"""
resp = self.session.get(f"{self.base_url}/admin/backends/{name}")
resp.raise_for_status()
return resp.json()
def add_backend(
self,
name: str,
url: str,
weight: int = 1,
models: Optional[List[str]] = None
) -> Dict[str, Any]:
"""Add a new backend"""
body = {"name": name, "url": url, "weight": weight}
if models:
body["models"] = models
resp = self.session.post(
f"{self.base_url}/admin/backends",
json=body
)
resp.raise_for_status()
return resp.json()
def update_backend(self, name: str, **kwargs) -> Dict[str, Any]:
"""Update backend configuration"""
resp = self.session.put(
f"{self.base_url}/admin/backends/{name}",
json=kwargs
)
resp.raise_for_status()
return resp.json()
def delete_backend(self, name: str, force: bool = False) -> Dict[str, Any]:
"""Delete a backend"""
params = {"force": str(force).lower()} if force else {}
resp = self.session.delete(
f"{self.base_url}/admin/backends/{name}",
params=params
)
resp.raise_for_status()
return resp.json()
def update_backend_weight(self, name: str, weight: int) -> Dict[str, Any]:
"""Update backend weight"""
resp = self.session.put(
f"{self.base_url}/admin/backends/{name}/weight",
json={"weight": weight}
)
resp.raise_for_status()
return resp.json()
def update_backend_models(
self,
name: str,
models: List[str],
append: bool = False
) -> Dict[str, Any]:
"""Update backend models"""
resp = self.session.put(
f"{self.base_url}/admin/backends/{name}/models",
json={"models": models, "append": append}
)
resp.raise_for_status()
return resp.json()
# Usage Example
if __name__ == "__main__":
client = ContinuumAdminClient(
base_url="http://localhost:8080",
token="your-admin-token"
)
# Get current logging config
logging_config = client.get_section("logging")
print(f"Current log level: {logging_config['config']['level']}")
# Update logging level
result = client.patch_section("logging", {"level": "debug"})
print(f"Updated: {result['success']}")
# Add a new backend
client.add_backend(
name="new-ollama",
url="http://192.168.1.100:11434",
weight=2,
models=["llama3.2", "mistral"]
)
# Export configuration backup
backup = client.export_config(format="yaml")
with open("config-backup.yaml", "w") as f:
f.write(backup)
JavaScript/TypeScript¶
interface ConfigSection {
name: string;
config: Record<string, any>;
hot_reload_capability: 'immediate' | 'gradual' | 'requires_restart';
}
interface HistoryEntry {
version: number;
timestamp: string;
sections_changed: string[];
source: string;
user: string;
}
interface Backend {
name: string;
url: string;
weight: number;
models: string[];
enabled: boolean;
health_status: string;
}
class ContinuumAdminClient {
private baseUrl: string;
private token: string;
constructor(baseUrl: string, token: string) {
this.baseUrl = baseUrl;
this.token = token;
}
private async request<T>(
method: string,
path: string,
body?: any,
params?: Record<string, string>
): Promise<T> {
const url = new URL(`${this.baseUrl}${path}`);
if (params) {
Object.entries(params).forEach(([k, v]) => url.searchParams.set(k, v));
}
const response = await fetch(url.toString(), {
method,
headers: {
'Authorization': `Bearer ${this.token}`,
'Content-Type': 'application/json',
},
body: body ? JSON.stringify(body) : undefined,
});
if (!response.ok) {
const error = await response.json();
throw new Error(error.message || `HTTP ${response.status}`);
}
return response.json();
}
// Configuration Query APIs
async getFullConfig(): Promise<any> {
return this.request('GET', '/admin/config/full');
}
async getSections(): Promise<ConfigSection[]> {
const result = await this.request<{ sections: ConfigSection[] }>(
'GET', '/admin/config/sections'
);
return result.sections;
}
async getSection(section: string): Promise<ConfigSection> {
return this.request('GET', `/admin/config/${section}`);
}
async getSchema(section?: string): Promise<any> {
const params = section ? { section } : undefined;
return this.request('GET', '/admin/config/schema', undefined, params);
}
// Configuration Modification APIs
async updateSection(section: string, config: Record<string, any>): Promise<any> {
return this.request('PUT', `/admin/config/${section}`, { config });
}
async patchSection(section: string, config: Record<string, any>): Promise<any> {
return this.request('PATCH', `/admin/config/${section}`, { config });
}
async validateConfig(
section: string,
config: Record<string, any>,
dryRun: boolean = true
): Promise<any> {
return this.request('POST', '/admin/config/validate', {
section,
config,
dry_run: dryRun,
});
}
async applyConfig(sections?: string[], force: boolean = false): Promise<any> {
return this.request('POST', '/admin/config/apply', { sections, force });
}
// Configuration Save/Restore APIs
async exportConfig(
format: 'yaml' | 'json' | 'toml' = 'yaml',
sections?: string[],
includeSensitive: boolean = false
): Promise<string> {
const result = await this.request<{ content: string }>(
'POST', '/admin/config/export',
{ format, sections, include_sensitive: includeSensitive }
);
return result.content;
}
async importConfig(
content: string,
format: 'yaml' | 'json' | 'toml' = 'yaml',
apply: boolean = true,
dryRun: boolean = false
): Promise<any> {
return this.request('POST', '/admin/config/import', {
format,
content,
apply,
dry_run: dryRun,
});
}
async getHistory(
limit: number = 20,
offset: number = 0,
section?: string
): Promise<{ history: HistoryEntry[]; total_entries: number }> {
const params: Record<string, string> = {
limit: limit.toString(),
offset: offset.toString(),
};
if (section) params.section = section;
return this.request('GET', '/admin/config/history', undefined, params);
}
async rollback(
version: number,
sections?: string[],
dryRun: boolean = false
): Promise<any> {
return this.request('POST', `/admin/config/rollback/${version}`, {
sections,
dry_run: dryRun,
});
}
// Backend Management APIs
async listBackends(): Promise<Backend[]> {
const result = await this.request<{ backends: Backend[] }>(
'GET', '/admin/backends'
);
return result.backends;
}
async getBackend(name: string): Promise<Backend> {
return this.request('GET', `/admin/backends/${name}`);
}
async addBackend(
name: string,
url: string,
weight: number = 1,
models?: string[]
): Promise<any> {
return this.request('POST', '/admin/backends', {
name,
url,
weight,
models,
});
}
async updateBackend(name: string, updates: Partial<Backend>): Promise<any> {
return this.request('PUT', `/admin/backends/${name}`, updates);
}
async deleteBackend(name: string, force: boolean = false): Promise<any> {
const params = force ? { force: 'true' } : undefined;
return this.request('DELETE', `/admin/backends/${name}`, undefined, params);
}
async updateBackendWeight(name: string, weight: number): Promise<any> {
return this.request('PUT', `/admin/backends/${name}/weight`, { weight });
}
async updateBackendModels(
name: string,
models: string[],
append: boolean = false
): Promise<any> {
return this.request('PUT', `/admin/backends/${name}/models`, {
models,
append,
});
}
}
// Usage Example
async function main() {
const client = new ContinuumAdminClient(
'http://localhost:8080',
'your-admin-token'
);
// Get current logging config
const loggingConfig = await client.getSection('logging');
console.log(`Current log level: ${loggingConfig.config.level}`);
// Update logging level
const result = await client.patchSection('logging', { level: 'debug' });
console.log(`Updated: ${result.success}`);
// Add a new backend
await client.addBackend('new-ollama', 'http://192.168.1.100:11434', 2, [
'llama3.2',
'mistral',
]);
// Export configuration backup
const backup = await client.exportConfig('yaml');
console.log('Configuration exported');
}
main().catch(console.error);
Go¶
package main
import (
"bytes"
"encoding/json"
"fmt"
"io"
"net/http"
"net/url"
)
type ContinuumAdminClient struct {
BaseURL string
Token string
client *http.Client
}
func NewClient(baseURL, token string) *ContinuumAdminClient {
return &ContinuumAdminClient{
BaseURL: baseURL,
Token: token,
client: &http.Client{},
}
}
func (c *ContinuumAdminClient) request(method, path string, body interface{}) (map[string]interface{}, error) {
var reqBody io.Reader
if body != nil {
jsonBody, err := json.Marshal(body)
if err != nil {
return nil, err
}
reqBody = bytes.NewBuffer(jsonBody)
}
req, err := http.NewRequest(method, c.BaseURL+path, reqBody)
if err != nil {
return nil, err
}
req.Header.Set("Authorization", "Bearer "+c.Token)
req.Header.Set("Content-Type", "application/json")
resp, err := c.client.Do(req)
if err != nil {
return nil, err
}
defer resp.Body.Close()
var result map[string]interface{}
if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
return nil, err
}
if resp.StatusCode >= 400 {
return nil, fmt.Errorf("HTTP %d: %v", resp.StatusCode, result)
}
return result, nil
}
// GetFullConfig retrieves the full configuration
func (c *ContinuumAdminClient) GetFullConfig() (map[string]interface{}, error) {
return c.request("GET", "/admin/config/full", nil)
}
// GetSection retrieves a specific configuration section
func (c *ContinuumAdminClient) GetSection(section string) (map[string]interface{}, error) {
return c.request("GET", "/admin/config/"+section, nil)
}
// PatchSection partially updates a configuration section
func (c *ContinuumAdminClient) PatchSection(section string, config map[string]interface{}) (map[string]interface{}, error) {
return c.request("PATCH", "/admin/config/"+section, map[string]interface{}{
"config": config,
})
}
// AddBackend adds a new backend
func (c *ContinuumAdminClient) AddBackend(name, backendURL string, weight int, models []string) (map[string]interface{}, error) {
return c.request("POST", "/admin/backends", map[string]interface{}{
"name": name,
"url": backendURL,
"weight": weight,
"models": models,
})
}
// ExportConfig exports configuration in the specified format
func (c *ContinuumAdminClient) ExportConfig(format string) (string, error) {
result, err := c.request("POST", "/admin/config/export", map[string]interface{}{
"format": format,
})
if err != nil {
return "", err
}
return result["content"].(string), nil
}
// GetHistory retrieves configuration change history
func (c *ContinuumAdminClient) GetHistory(limit int) (map[string]interface{}, error) {
u, _ := url.Parse(c.BaseURL + "/admin/config/history")
q := u.Query()
q.Set("limit", fmt.Sprintf("%d", limit))
u.RawQuery = q.Encode()
return c.request("GET", u.Path+"?"+u.RawQuery, nil)
}
func main() {
client := NewClient("http://localhost:8080", "your-admin-token")
// Get current logging config
config, _ := client.GetSection("logging")
fmt.Printf("Current config: %v\n", config)
// Update logging level
result, _ := client.PatchSection("logging", map[string]interface{}{
"level": "debug",
})
fmt.Printf("Update result: %v\n", result)
// Add a new backend
client.AddBackend("new-ollama", "http://192.168.1.100:11434", 2, []string{"llama3.2"})
// Export configuration
backup, _ := client.ExportConfig("yaml")
fmt.Println("Configuration exported")
fmt.Println(backup)
}
Best Practices¶
1. Always Validate Before Applying¶
# Step 1: Validate
curl -X POST http://localhost:8080/admin/config/validate \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"section": "logging", "config": {"level": "debug"}}'
# Step 2: Apply only if valid
curl -X PATCH http://localhost:8080/admin/config/logging \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"config": {"level": "debug"}}'
2. Use Dry Run for Imports¶
# Preview import changes
curl -X POST http://localhost:8080/admin/config/import \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"format": "yaml",
"content": "...",
"dry_run": true
}'
3. Regular Configuration Backups¶
# Daily backup script
#!/bin/bash
DATE=$(date +%Y%m%d)
curl -s -X POST http://localhost:8080/admin/config/export \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"format": "yaml"}' | jq -r '.content' > "config-backup-$DATE.yaml"
4. Monitor Configuration History¶
# Check recent changes
curl -s http://localhost:8080/admin/config/history?limit=5 \
-H "Authorization: Bearer $TOKEN" | jq '.history[] | {version, timestamp, sections_changed}'
5. Use Partial Updates (PATCH) for Minimal Changes¶
# Only update what's needed
curl -X PATCH http://localhost:8080/admin/config/rate_limiting \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"config": {"requests_per_minute": 200}}'
6. Test Configuration Changes in Staging First¶
# Example: Test configuration in staging before production
staging_client = ContinuumAdminClient("http://staging:8080", staging_token)
production_client = ContinuumAdminClient("http://production:8080", prod_token)
# Apply to staging first
staging_client.patch_section("rate_limiting", {"requests_per_minute": 500})
# Verify in staging
staging_config = staging_client.get_section("rate_limiting")
assert staging_config["config"]["requests_per_minute"] == 500
# Then apply to production
production_client.patch_section("rate_limiting", {"requests_per_minute": 500})
Security Considerations¶
1. Sensitive Data Handling¶
- All API responses automatically mask sensitive fields (API keys, passwords, tokens)
- Use
include_sensitive: truein export only when absolutely necessary - Audit logs record when sensitive data is accessed
2. Authentication Best Practices¶
admin:
auth:
method: bearer_token
token: "${ADMIN_TOKEN}" # Use environment variables
# Restrict access by IP
ip_whitelist:
- "10.0.0.0/8" # Internal network only
- "192.168.1.0/24" # Office network
3. Audit Logging¶
All configuration changes are logged with: - Timestamp - User/source - Changed sections - Previous and new values (sensitive data masked)
4. Rate Limiting Admin Endpoints¶
Consider rate limiting admin endpoints to prevent abuse:
5. Backup Before Major Changes¶
# Always backup before major changes
backup=$(curl -s -X POST http://localhost:8080/admin/config/export \
-H "Authorization: Bearer $TOKEN" \
-d '{"format": "yaml"}' | jq -r '.content')
# Make changes...
# Restore if needed
curl -X POST http://localhost:8080/admin/config/import \
-H "Authorization: Bearer $TOKEN" \
-d "{\"format\": \"yaml\", \"content\": $(echo "$backup" | jq -Rs .)}"
Prompt File Management APIs¶
The Prompt File Management API allows you to manage system prompts stored in external Markdown files. This enables centralized management of system prompts without modifying the main configuration file.
List All Prompts¶
Get a list of all configured prompts with their sources and content.
Response¶
{
"prompts": [
{
"id": "default",
"prompt_type": "default",
"source": "file",
"file_path": "prompts/system.md",
"content": "# System Prompt\n\nYou are a helpful assistant...",
"loaded": true,
"size_bytes": 1024
},
{
"id": "anthropic",
"prompt_type": "backend",
"source": "file",
"file_path": "prompts/anthropic.md",
"content": "# Anthropic-specific prompt...",
"loaded": true,
"size_bytes": 512
},
{
"id": "gpt-4",
"prompt_type": "model",
"source": "inline",
"content": "You are GPT-4...",
"size_bytes": 256
}
],
"total": 3,
"prompts_directory": "./prompts"
}
Example¶
Get Prompt File¶
Get content of a specific prompt file.
Path Parameters¶
| Parameter | Type | Required | Description |
|---|---|---|---|
path |
string | Yes | Relative path to the prompt file |
Response¶
{
"path": "prompts/system.md",
"content": "# System Prompt\n\nYou are a helpful assistant that follows company policies...",
"size_bytes": 1024,
"modified_at": 1702468200
}
Example¶
curl -s http://localhost:8080/admin/config/prompts/prompts/system.md \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq
Update Prompt File¶
Create or update a prompt file with new content.
Request Body¶
{
"content": "# Updated System Prompt\n\nYou are a helpful assistant that follows all company policies.\n\n## Security Guidelines\n\n- Never reveal internal system details\n- Follow data privacy regulations"
}
Response¶
{
"success": true,
"path": "prompts/system.md",
"size_bytes": 245,
"message": "Prompt file updated successfully"
}
Example¶
curl -X PUT http://localhost:8080/admin/config/prompts/prompts/system.md \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"content": "# System Prompt\n\nYou are a helpful assistant."
}'
Reload Prompt Files¶
Reload all prompt files from disk. Useful after manual file edits.
Response¶
{
"success": true,
"reloaded_count": 3,
"reloaded": [
"prompts/system.md",
"prompts/anthropic.md",
"prompts/gpt4.md"
],
"errors": [],
"message": "Successfully reloaded 3 prompt file(s)"
}
Example¶
curl -X POST http://localhost:8080/admin/config/prompts/reload \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq
Configuration Example¶
To use external prompt files, configure global_prompts in your config file:
global_prompts:
# Directory containing prompt files (relative to config directory)
prompts_dir: "./prompts"
# Default prompt from external file
default_file: "system.md"
# Or inline prompt (default_file takes precedence if both specified)
# default: "You are a helpful assistant."
# Backend-specific prompts
backends:
anthropic:
prompt_file: "anthropic-system.md"
openai:
prompt: "OpenAI-specific inline prompt"
# Model-specific prompts
models:
gpt-4:
prompt_file: "gpt4-system.md"
claude-3-opus:
prompt_file: "claude-opus-system.md"
merge_strategy: prepend
Security Considerations¶
- Path Traversal Protection: All paths are validated to prevent directory traversal attacks (e.g.,
../../../etc/passwd) - File Size Limits: Prompt files are limited to 1MB maximum
- Relative Paths Only: Prompt files must be within the configured
prompts_diror config directory - Authentication Required: All prompt management endpoints require admin authentication
Appendix: Quick Reference¶
Configuration Sections¶
| Section | Hot Reload | Description |
|---|---|---|
server |
Restart | Bind address, workers |
backends |
Gradual | Backend URLs, weights |
health_checks |
Gradual | Health monitoring |
logging |
Immediate | Log level, format |
retry |
Immediate | Retry policies |
timeouts |
Gradual | Request timeouts |
rate_limiting |
Immediate | Rate limits |
circuit_breaker |
Immediate | Circuit breaker |
global_prompts |
Immediate | System prompts |
fallback |
Gradual | Model fallback |
files |
Gradual | Files API |
api_keys |
Immediate | API keys |
metrics |
Gradual | Prometheus metrics |
admin |
Gradual | Admin settings |
admin.stats |
Immediate | Stats collection settings |
routing |
Gradual | Routing rules |
prefix_routing |
Immediate | Prefix-aware KV cache routing |
response_cache |
Immediate | Response cache settings |
kv_cache_index |
Requires restart | KV cache index backend and event sources |
HTTP Status Codes¶
| Code | Meaning |
|---|---|
| 200 | Success |
| 400 | Bad Request (validation error) |
| 401 | Unauthorized |
| 403 | Forbidden |
| 404 | Not Found |
| 409 | Conflict |
| 413 | Payload Too Large |
| 500 | Internal Server Error |
Common curl Commands¶
# Get full config
curl -s http://localhost:8080/admin/config/full -H "Authorization: Bearer $TOKEN"
# Update logging level
curl -X PATCH http://localhost:8080/admin/config/logging \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-d '{"config": {"level": "debug"}}'
# Add backend
curl -X POST http://localhost:8080/admin/backends \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-d '{"name": "new", "url": "http://host:port", "weight": 1}'
# Export config
curl -X POST http://localhost:8080/admin/config/export \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-d '{"format": "yaml"}'
# View history
curl -s http://localhost:8080/admin/config/history -H "Authorization: Bearer $TOKEN"
# Rollback
curl -X POST http://localhost:8080/admin/config/rollback/5 \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -d '{}'
# List API keys (masked)
curl -s http://localhost:8080/admin/api-keys -H "Authorization: Bearer $TOKEN"
# Create an API key (full value returned once)
curl -X POST http://localhost:8080/admin/api-keys \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-d '{"id": "key-1", "user_id": "user-1", "organization_id": "org-1", "scopes": ["read", "write"]}'
# Rotate an API key
curl -X POST http://localhost:8080/admin/api-keys/key-1/rotate -H "Authorization: Bearer $TOKEN"
# Disable / enable an API key
curl -X POST http://localhost:8080/admin/api-keys/key-1/disable -H "Authorization: Bearer $TOKEN"
curl -X POST http://localhost:8080/admin/api-keys/key-1/enable -H "Authorization: Bearer $TOKEN"
# Revoke an API key
curl -X DELETE http://localhost:8080/admin/api-keys/key-1 -H "Authorization: Bearer $TOKEN"
# Per-API-key and per-user usage statistics
curl -s http://localhost:8080/admin/stats/api-keys -H "Authorization: Bearer $TOKEN"
curl -s http://localhost:8080/admin/stats/users -H "Authorization: Bearer $TOKEN"