Admin REST API Reference¶
This document provides a comprehensive guide for developers building configuration control applications using Continuum Router's Admin REST API. The Configuration Management API enables runtime configuration viewing, modification, and management without server restarts.
Table of Contents¶
- Overview
- Authentication
- Base URL and Headers
- Configuration Query APIs
- Configuration Modification APIs
- Configuration Save/Restore APIs
- Backend Management APIs
- Statistics APIs
- Response Cache Admin APIs
- KV Cache Index Admin APIs
- Data Models
- Hot Reload Behavior
- Error Handling
- Client SDK Examples
- Best Practices
- Security Considerations
Overview¶
The Admin REST API provides programmatic access to Continuum Router's configuration system, enabling:
- Real-time Configuration Viewing: Retrieve current configuration with automatic sensitive data masking
- Dynamic Configuration Updates: Modify configuration sections without server restart
- Configuration Versioning: Track changes with full history and rollback capabilities
- Backend Management: Add, remove, and modify backends dynamically
- Export/Import: Save and restore configurations in multiple formats (YAML, JSON, TOML)
Key Features¶
| Feature | Description |
|---|---|
| Hot Reload | Changes applied immediately or gradually based on section type |
| Sensitive Masking | API keys, passwords, and tokens automatically masked in responses |
| Validation | All changes validated before application with dry-run support |
| Audit Logging | All modifications logged for security and compliance |
| History Tracking | Up to 100 configuration versions maintained for rollback |
Authentication¶
All Admin API endpoints require authentication via the Admin Auth system.
Authentication Methods¶
1. Bearer Token¶
2. Basic Authentication¶
3. API Key Header¶
Configuration¶
Configure admin authentication in config.yaml:
admin:
auth:
method: bearer_token # Options: none, bearer_token, basic, api_key
token: "${ADMIN_TOKEN}" # Environment variable supported
# For basic auth:
# username: admin
# password: "${ADMIN_PASSWORD}"
# IP whitelist (optional)
ip_whitelist:
- "127.0.0.1"
- "10.0.0.0/8"
# Configurable limits
max_history_entries: 100
max_backend_name_length: 256
Base URL and Headers¶
Base URL¶
Common Request Headers¶
Common Response Headers¶
Configuration Query APIs¶
Get Full Configuration¶
Retrieve the complete configuration with sensitive information masked.
Response¶
{
"config": {
"server": {
"bind_address": "0.0.0.0:8080",
"workers": 4
},
"backends": [
{
"name": "openai",
"url": "https://api.openai.com",
"api_key": "sk-***abcd",
"weight": 1
}
],
"logging": {
"level": "info"
},
"rate_limiting": {
"enabled": true,
"requests_per_minute": 100
}
},
"hot_reload_enabled": true,
"last_modified": "2025-12-13T10:30:00Z"
}
Example¶
List Configuration Sections¶
Get all available configuration sections with their hot reload capabilities.
Response¶
{
"sections": [
{
"name": "server",
"description": "Server configuration including bind address and workers",
"hot_reload_capability": "requires_restart"
},
{
"name": "backends",
"description": "Backend server configurations",
"hot_reload_capability": "gradual"
},
{
"name": "logging",
"description": "Logging configuration",
"hot_reload_capability": "immediate"
},
{
"name": "rate_limiting",
"description": "Rate limiting configuration",
"hot_reload_capability": "immediate"
},
{
"name": "circuit_breaker",
"description": "Circuit breaker configuration",
"hot_reload_capability": "immediate"
},
{
"name": "retry",
"description": "Retry policy configuration",
"hot_reload_capability": "immediate"
},
{
"name": "timeouts",
"description": "Timeout configuration",
"hot_reload_capability": "gradual"
},
{
"name": "health_checks",
"description": "Health check configuration",
"hot_reload_capability": "gradual"
},
{
"name": "global_prompts",
"description": "Global prompt injection configuration",
"hot_reload_capability": "immediate"
},
{
"name": "fallback",
"description": "Model fallback configuration",
"hot_reload_capability": "gradual"
},
{
"name": "files",
"description": "Files API configuration",
"hot_reload_capability": "gradual"
},
{
"name": "api_keys",
"description": "API keys configuration",
"hot_reload_capability": "immediate"
},
{
"name": "metrics",
"description": "Metrics and monitoring configuration",
"hot_reload_capability": "gradual"
},
{
"name": "admin",
"description": "Admin API configuration",
"hot_reload_capability": "gradual"
},
{
"name": "routing",
"description": "Request routing configuration",
"hot_reload_capability": "gradual"
}
]
}
Example¶
curl -s http://localhost:8080/admin/config/sections \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq '.sections[].name'
Get Section Configuration¶
Retrieve configuration for a specific section.
Path Parameters¶
| Parameter | Type | Required | Description |
|---|---|---|---|
section | string | Yes | Section name (see list above) |
Response¶
{
"section": "logging",
"config": {
"level": "info",
"format": "json",
"file": "/var/log/continuum-router.log"
},
"hot_reload_capability": "immediate",
"description": "Logging configuration"
}
Example¶
# Get logging configuration
curl -s http://localhost:8080/admin/config/logging \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq
# Get backends configuration
curl -s http://localhost:8080/admin/config/backends \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq
Get Configuration Schema¶
Retrieve JSON Schema for configuration validation.
Query Parameters¶
| Parameter | Type | Required | Description |
|---|---|---|---|
section | string | No | Get schema for specific section only |
Response¶
{
"schema": {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"server": {
"type": "object",
"properties": {
"bind_address": {
"type": "string",
"pattern": "^[^:]+:[0-9]+$",
"description": "Server bind address in host:port format"
},
"workers": {
"type": "integer",
"minimum": 1,
"description": "Number of worker threads"
}
}
},
"logging": {
"type": "object",
"properties": {
"level": {
"type": "string",
"enum": ["trace", "debug", "info", "warn", "error"]
}
}
}
}
}
}
Example¶
# Get full schema
curl -s http://localhost:8080/admin/config/schema \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq
# Get schema for specific section
curl -s "http://localhost:8080/admin/config/schema?section=logging" \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq
Configuration Modification APIs¶
Replace Section Configuration¶
Replace entire section configuration with new values.
Request Body¶
Response¶
{
"success": true,
"message": "Configuration updated successfully",
"version": 5,
"hot_reload_capability": "immediate",
"applied": true,
"warnings": []
}
Example¶
# Update logging level to debug
curl -X PUT http://localhost:8080/admin/config/logging \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"config": {
"level": "debug"
}
}'
Partial Update Section¶
Apply partial updates using JSON merge patch semantics.
Request Body¶
Only specified fields are updated; other fields remain unchanged.
Response¶
{
"success": true,
"message": "Configuration partially updated",
"version": 6,
"hot_reload_capability": "immediate",
"applied": true,
"merged_config": {
"level": "warn",
"format": "json",
"file": "/var/log/continuum-router.log"
}
}
Example¶
# Update only rate limit value
curl -X PATCH http://localhost:8080/admin/config/rate_limiting \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"config": {
"requests_per_minute": 200
}
}'
Validate Configuration¶
Validate configuration changes without applying them.
Request Body¶
{
"section": "server",
"config": {
"bind_address": "0.0.0.0:9090",
"workers": 8
},
"dry_run": true
}
Response (Valid)¶
{
"valid": true,
"errors": [],
"warnings": [
{
"field": "bind_address",
"message": "Changing bind_address requires server restart"
}
],
"hot_reload_capability": "requires_restart"
}
Response (Invalid)¶
{
"valid": false,
"errors": [
{
"field": "workers",
"message": "workers must be greater than 0",
"code": "VALIDATION_ERROR"
}
],
"warnings": []
}
Example¶
# Validate before applying
curl -X POST http://localhost:8080/admin/config/validate \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"section": "rate_limiting",
"config": {
"enabled": true,
"requests_per_minute": 500
}
}'
Apply Configuration¶
Apply pending configuration changes immediately (trigger hot reload).
Request Body¶
| Field | Type | Required | Description |
|---|---|---|---|
sections | array | No | Specific sections to apply (default: all pending) |
force | boolean | No | Force apply even with warnings (default: false) |
Response¶
{
"success": true,
"applied_sections": ["logging", "rate_limiting"],
"version": 7,
"results": {
"logging": {
"status": "applied",
"hot_reload_type": "immediate"
},
"rate_limiting": {
"status": "applied",
"hot_reload_type": "immediate"
}
}
}
Example¶
curl -X POST http://localhost:8080/admin/config/apply \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"sections": ["logging"]
}'
Configuration Save/Restore APIs¶
Export Configuration¶
Export current configuration in specified format.
Request Body¶
{
"format": "yaml",
"sections": ["server", "backends", "logging"],
"include_sensitive": false,
"include_defaults": true
}
| Field | Type | Required | Description |
|---|---|---|---|
format | string | Yes | Output format: yaml, json, or toml |
sections | array | No | Sections to export (default: all) |
include_sensitive | boolean | No | Include unmasked sensitive data (default: false) |
include_defaults | boolean | No | Include default values (default: true) |
Response¶
{
"format": "yaml",
"content": "server:\n bind_address: \"0.0.0.0:8080\"\n workers: 4\n\nbackends:\n - name: openai\n url: https://api.openai.com\n api_key: \"sk-***abcd\"\n",
"exported_at": "2025-12-13T10:30:00Z",
"sections_exported": ["server", "backends", "logging"]
}
Example¶
# Export as YAML
curl -X POST http://localhost:8080/admin/config/export \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"format": "yaml"}' | jq -r '.content' > config-backup.yaml
# Export as JSON
curl -X POST http://localhost:8080/admin/config/export \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"format": "json"}' | jq -r '.content' > config-backup.json
# Export specific sections
curl -X POST http://localhost:8080/admin/config/export \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"format": "yaml",
"sections": ["backends", "rate_limiting"]
}'
Import Configuration¶
Import and apply configuration from content.
Request Body¶
{
"format": "yaml",
"content": "logging:\n level: info\n format: json\n",
"apply": true,
"dry_run": false,
"merge": true
}
| Field | Type | Required | Description |
|---|---|---|---|
format | string | Yes | Content format: yaml, json, or toml |
content | string | Yes | Configuration content (max 1MB) |
apply | boolean | No | Apply after validation (default: true) |
dry_run | boolean | No | Validate only without applying (default: false) |
merge | boolean | No | Merge with existing config (default: false) |
Response¶
{
"success": true,
"message": "Configuration imported and applied",
"version": 8,
"validation": {
"valid": true,
"errors": [],
"warnings": []
},
"sections_imported": ["logging"],
"applied": true
}
Example¶
# Import from file
curl -X POST http://localhost:8080/admin/config/import \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d "{
\"format\": \"yaml\",
\"content\": $(cat config-backup.yaml | jq -Rs .),
\"apply\": true
}"
# Dry run import
curl -X POST http://localhost:8080/admin/config/import \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"format": "yaml",
"content": "logging:\n level: debug\n",
"dry_run": true
}'
Get Configuration History¶
View configuration change history.
Query Parameters¶
| Parameter | Type | Required | Description |
|---|---|---|---|
limit | integer | No | Number of entries to return (default: 20, max: 100) |
offset | integer | No | Number of entries to skip (default: 0) |
section | string | No | Filter by section name |
Response¶
{
"history": [
{
"version": 8,
"timestamp": "2025-12-13T10:30:00Z",
"sections_changed": ["logging"],
"source": "api",
"user": "admin",
"description": "Updated logging level to debug",
"rollback_available": true
},
{
"version": 7,
"timestamp": "2025-12-13T10:25:00Z",
"sections_changed": ["rate_limiting"],
"source": "api",
"user": "admin",
"description": "Increased rate limit to 200 rpm",
"rollback_available": true
},
{
"version": 6,
"timestamp": "2025-12-13T09:00:00Z",
"sections_changed": ["backends"],
"source": "file_reload",
"user": "system",
"description": "Configuration file changed",
"rollback_available": true
}
],
"total_entries": 8,
"current_version": 8
}
Example¶
# Get recent history
curl -s http://localhost:8080/admin/config/history \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq
# Get history for specific section
curl -s "http://localhost:8080/admin/config/history?section=backends&limit=10" \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq
Rollback Configuration¶
Rollback to a previous configuration version.
Path Parameters¶
| Parameter | Type | Required | Description |
|---|---|---|---|
version | integer | Yes | Version number to rollback to |
Request Body¶
| Field | Type | Required | Description |
|---|---|---|---|
sections | array | No | Specific sections to rollback (default: all changed) |
dry_run | boolean | No | Preview without applying (default: false) |
Response¶
{
"success": true,
"message": "Rolled back to version 5",
"previous_version": 8,
"new_version": 9,
"sections_rolled_back": ["logging", "rate_limiting"],
"changes": {
"logging": {
"level": {
"from": "debug",
"to": "info"
}
}
}
}
Example¶
# Rollback to version 5
curl -X POST http://localhost:8080/admin/config/rollback/5 \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{}'
# Preview rollback (dry run)
curl -X POST http://localhost:8080/admin/config/rollback/5 \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"dry_run": true}'
Backend Management APIs¶
Add Backend¶
Add a new backend dynamically.
Request Body¶
{
"name": "new-ollama",
"url": "http://192.168.1.100:11434",
"weight": 1,
"models": ["llama3.2", "mistral"],
"api_key": "optional-key",
"enabled": true,
"health_check": {
"enabled": true,
"path": "/v1/models"
}
}
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Unique backend name (alphanumeric, -, _) |
type | string | No | Backend type: openai, azure, vllm, ollama, anthropic, gemini, llamacpp, generic. Default: generic (auto-detect) |
url | string | Yes | Backend URL (http:// or https://) |
weight | integer | No | Load balancing weight (default: 1) |
models | array | No | List of models served by this backend |
api_key | string | No | API key for backend authentication |
enabled | boolean | No | Whether backend is enabled (default: true) |
Backend Type Auto-Detection¶
When type is not specified or set to generic, the router automatically probes the backend's /v1/models endpoint to detect the backend type. Currently supports auto-detection of:
- llama.cpp: Identified by
owned_by: "llamacpp"or llama.cpp-specific metadata fields
This allows seamless integration of llama.cpp backends without explicit type configuration:
# llama.cpp backend - type auto-detected
curl -X POST http://localhost:8080/admin/backends \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "local-llama",
"url": "http://localhost:8080"
}'
Response¶
{
"success": true,
"message": "Backend 'new-ollama' added successfully",
"backend": {
"name": "new-ollama",
"url": "http://192.168.1.100:11434",
"weight": 1,
"models": ["llama3.2", "mistral"],
"enabled": true,
"health_status": "unknown"
}
}
Example¶
curl -X POST http://localhost:8080/admin/backends \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "new-backend",
"url": "http://192.168.1.100:11434",
"weight": 2,
"models": ["llama3.2"]
}'
Get Backend¶
Get configuration for a specific backend.
Response¶
{
"name": "openai",
"url": "https://api.openai.com",
"api_key": "sk-***abcd",
"weight": 1,
"models": ["gpt-4", "gpt-3.5-turbo"],
"enabled": true,
"health_status": "healthy",
"stats": {
"total_requests": 1250,
"failed_requests": 12,
"average_latency_ms": 150,
"last_used": "2025-12-13T10:29:55Z"
}
}
Example¶
Update Backend¶
Update backend configuration.
Request Body¶
{
"url": "https://api.openai.com",
"weight": 2,
"models": ["gpt-4", "gpt-4-turbo", "gpt-3.5-turbo"],
"enabled": true
}
Response¶
{
"success": true,
"message": "Backend 'openai' updated successfully",
"backend": {
"name": "openai",
"url": "https://api.openai.com",
"weight": 2,
"models": ["gpt-4", "gpt-4-turbo", "gpt-3.5-turbo"],
"enabled": true
}
}
Example¶
curl -X PUT http://localhost:8080/admin/backends/openai \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"weight": 3,
"models": ["gpt-4", "gpt-4-turbo"]
}'
Delete Backend¶
Remove a backend from the router.
Query Parameters¶
| Parameter | Type | Required | Description |
|---|---|---|---|
force | boolean | No | Force delete even if backend has active connections |
Response¶
{
"success": true,
"message": "Backend 'old-backend' removed successfully",
"removed_backend": "old-backend"
}
Notes¶
- Deleting the last backend is allowed: The router can operate with zero backends configured. When the last backend is deleted:
/v1/modelsreturns an empty list- Routing requests return 503 "No backends available"
- New backends can be added via
POST /admin/backends
Example¶
curl -X DELETE http://localhost:8080/admin/backends/old-backend \
-H "Authorization: Bearer $ADMIN_TOKEN"
# Force delete
curl -X DELETE "http://localhost:8080/admin/backends/old-backend?force=true" \
-H "Authorization: Bearer $ADMIN_TOKEN"
Update Backend Weight¶
Update only the backend weight for load balancing.
Request Body¶
Response¶
{
"success": true,
"message": "Backend 'openai' weight updated to 5",
"previous_weight": 2,
"new_weight": 5
}
Example¶
curl -X PUT http://localhost:8080/admin/backends/openai/weight \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"weight": 5}'
Update Backend Models¶
Update the model list for a backend.
Request Body¶
| Field | Type | Required | Description |
|---|---|---|---|
models | array | Yes | List of model names |
append | boolean | No | Append to existing list (default: false, replaces) |
Response¶
{
"success": true,
"message": "Backend 'openai' models updated",
"models": ["gpt-4", "gpt-4-turbo", "gpt-4o", "gpt-3.5-turbo"]
}
Example¶
# Replace models
curl -X PUT http://localhost:8080/admin/backends/openai/models \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"models": ["gpt-4", "gpt-4o"]}'
# Append models
curl -X PUT http://localhost:8080/admin/backends/openai/models \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"models": ["gpt-4.5-turbo"], "append": true}'
Statistics APIs¶
The Statistics APIs expose aggregated request metrics collected by the StatsCollector. All four endpoints are mounted under /admin/stats and share the same authentication as the rest of the Admin API.
Stats collection is enabled by default. It can be configured or disabled via the admin.stats section in your YAML config:
admin:
stats:
enabled: true # Enable/disable collection (default: true)
retention_window: 24h # Ring-buffer retention for windowed queries (default: 24h)
token_tracking: true # Parse response bodies for token usage (default: true)
persistence:
enabled: true # Enable stats persistence across restarts (default: true)
path: ./data/stats.json # File path for the snapshot (default: ./data/stats.json)
snapshot_interval: 5m # How often to write periodic snapshots (default: 5m)
max_age: 7d # Discard snapshots older than this on startup (default: 7d)
The retention_window and token_tracking settings support hot-reload: changes are applied immediately without a restart.
Stats Persistence¶
When the persistence subsection is present and enabled is true, the router saves a statistics snapshot to disk periodically and restores it on startup. This ensures that request counters, per-model breakdowns, and the latency ring buffer survive restarts.
How it works:
- On startup, the router reads the snapshot file and restores all counters and ring-buffer records. Uptime resets to zero on each restart.
- A background task writes a new snapshot every
snapshot_interval. Writes are atomic (temp file + rename) to prevent corruption. - On graceful shutdown (SIGTERM/SIGINT), a final snapshot is saved before the process exits.
- If the snapshot file is missing, corrupted, or older than
max_age, the router starts with fresh counters and logs a warning or info message.
Supported duration formats for snapshot_interval and max_age:
| Format | Example | Meaning |
|---|---|---|
Xs | 30s | 30 seconds |
Xm | 5m | 5 minutes |
Xh | 1h | 1 hour |
Xd | 7d | 7 days |
Set max_age to "0" or "" to disable staleness checks (always restore regardless of age).
Get Full Statistics¶
Returns overall, per-model, and per-backend statistics.
Query Parameters¶
| Parameter | Type | Description |
|---|---|---|
window | string | Optional time window filter. Accepted formats: 30m, 1h, 24h, 7d. Omit for all-time totals. |
Response¶
{
"uptime_seconds": 3600,
"window": "all",
"overall": {
"total_requests": 1500,
"successful_requests": 1480,
"failed_requests": 20,
"avg_latency_ms": 145.3,
"p50_latency_ms": 120.0,
"p95_latency_ms": 380.0,
"p99_latency_ms": 750.0,
"total_prompt_tokens": 450000,
"total_completion_tokens": 180000,
"total_tokens": 630000,
"tokens_per_sec_avg": 87.4
},
"models": [
{
"model_id": "gpt-4",
"total_requests": 900,
"successful_requests": 895,
"failed_requests": 5,
"total_prompt_tokens": 270000,
"total_completion_tokens": 108000,
"total_tokens": 378000,
"avg_latency_ms": 160.2,
"avg_tokens_per_sec": 92.1,
"last_used": "2026-03-05T10:30:00Z"
}
],
"backends": [
{
"backend_name": "openai",
"total_requests": 900,
"successful_requests": 895,
"failed_requests": 5,
"avg_latency_ms": 160.2,
"health_status": "healthy"
}
]
}
Example¶
# All-time statistics
curl -s http://localhost:8080/admin/stats \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq
# Last hour only
curl -s "http://localhost:8080/admin/stats?window=1h" \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq
Get Per-Model Statistics¶
Returns only the per-model breakdown (subset of the full stats response).
Response¶
{
"models": [
{
"model_id": "gpt-4",
"total_requests": 900,
"successful_requests": 895,
"failed_requests": 5,
"total_prompt_tokens": 270000,
"total_completion_tokens": 108000,
"total_tokens": 378000,
"avg_latency_ms": 160.2,
"avg_tokens_per_sec": 92.1,
"last_used": "2026-03-05T10:30:00Z"
}
]
}
Models are sorted by total_requests in descending order.
Example¶
curl -s http://localhost:8080/admin/stats/models \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq '.models[].model_id'
Get Per-Backend Statistics¶
Returns only the per-backend breakdown. The health_status field is populated from the health checker ("healthy", "unhealthy", or "unknown" when health checks are disabled).
Response¶
{
"backends": [
{
"backend_name": "openai",
"total_requests": 900,
"successful_requests": 895,
"failed_requests": 5,
"avg_latency_ms": 160.2,
"health_status": "healthy"
}
]
}
Backends are sorted by total_requests in descending order.
Example¶
Reset Statistics¶
Resets all counters, per-model records, per-backend records, and the latency ring buffer. This action is irreversible.
Response¶
Example¶
Response Cache Admin APIs¶
The Response Cache Admin APIs expose statistics and invalidation operations for the response cache. All endpoints are mounted under /admin/response-cache and require the same authentication as the rest of the Admin API.
Response caching is configured in the response_cache section of your YAML config. See the Response Cache Configuration guide for full configuration details.
Get Response Cache Statistics¶
Returns current response cache statistics including hit/miss counts, memory usage, and configuration summary.
Response¶
{
"enabled": true,
"backend_type": "memory",
"entries": 42,
"capacity": 1000,
"requests": {
"hit": 120,
"miss": 80,
"skip": 15,
"total": 215
},
"hit_rate": "0.6000",
"evictions": 3,
"size_bytes": 1048576,
"config": {
"backend": "memory",
"ttl": "5m",
"capacity": 1000,
"max_response_size": 1048576,
"max_stream_buffer_size": 10485760
}
}
When using the Redis backend (backend: redis), the response includes an additional redis object:
{
"enabled": true,
"backend_type": "redis",
"entries": 42,
"capacity": 1000,
"requests": { "hit": 120, "miss": 80, "skip": 15, "total": 215 },
"hit_rate": "0.6000",
"evictions": 3,
"size_bytes": 1048576,
"config": { "backend": "redis", "ttl": "5m", "capacity": 1000, "max_response_size": 1048576, "max_stream_buffer_size": 10485760 },
"redis": {
"connections": { "active": 3, "idle": 5 },
"errors": { "connection": 0, "timeout": 0, "other": 0, "total": 0 },
"fallback_active": false
}
}
When response caching is disabled (response_cache.enabled: false or the section is absent), enabled is false, entries and capacity are 0, and config is null.
Response Fields¶
| Field | Type | Description |
|---|---|---|
enabled | boolean | Whether response caching is active |
backend_type | string | Active cache backend: "memory" or "redis" |
entries | integer | Current number of cached entries |
capacity | integer | Maximum cache capacity (LRU limit) |
requests.hit | integer | Requests served from cache |
requests.miss | integer | Cache misses (backend was called, entry stored) |
requests.skip | integer | Non-cacheable requests (e.g., temperature > 0) |
requests.total | integer | Total cacheable lookups (hit + miss + skip) |
hit_rate | string | Rolling cache hit rate as a decimal string (e.g., "0.6000") |
evictions | integer | Total LRU evictions since startup |
size_bytes | integer | Approximate memory usage of cached entries in bytes |
config | object or null | Active configuration summary; null when disabled |
redis | object or absent | Redis-specific stats (only present when backend_type is "redis") |
redis.connections.active | integer | Active connections in the Redis pool |
redis.connections.idle | integer | Idle connections in the Redis pool |
redis.errors.connection | integer | Redis connection errors since startup |
redis.errors.timeout | integer | Redis command timeout errors since startup |
redis.errors.other | integer | Other Redis errors since startup |
redis.errors.total | integer | Total Redis errors since startup |
redis.fallback_active | boolean | Whether the in-memory fallback is currently active |
Example¶
curl -s http://localhost:8080/admin/response-cache/stats \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq
Invalidate Response Cache¶
Clears cache entries. Currently supports full cache invalidation via clear_all: true. Targeted invalidation by model or tenant is reserved for a future release.
Request Body¶
| Field | Type | Required | Description |
|---|---|---|---|
clear_all | boolean | No | When true, clears the entire cache. Defaults to false. |
model | string | No | Reserved for future targeted invalidation. Must not exceed 256 characters. |
tenant_id | string | No | Reserved for future targeted invalidation. Must not exceed 256 characters. |
Response (clear_all: true)¶
Response (clear_all: false or omitted)¶
{
"success": true,
"action": "noop",
"message": "Targeted invalidation by model/tenant_id is not yet supported. Use clear_all: true to clear the entire cache."
}
Response (cache disabled)¶
Example¶
# Clear entire cache
curl -X POST http://localhost:8080/admin/response-cache/invalidate \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"clear_all": true}'
KV Cache Index Admin APIs¶
The KV Cache Index Admin APIs expose statistics, per-backend state, and a clear operation for the KV cache index subsystem. All endpoints are mounted under /admin/kv-index and require the same authentication as the rest of the Admin API.
The KV cache index tracks which backends hold cached KV data for specific token prefixes, enabling KV-aware routing. It is configured in the kv_cache_index section of your YAML config.
Get KV Cache Index Statistics¶
Returns overall KV cache index statistics, including index size, event source connection status, and routing decision counts.
Response¶
{
"enabled": true,
"config": {
"backend": "memory",
"max_entries": 100000,
"entry_ttl_seconds": 600,
"event_sources_count": 2,
"scoring": {
"overlap_weight": 0.6,
"load_weight": 0.3,
"health_weight": 0.1,
"min_overlap_threshold": 0.3
}
},
"index": {
"prefix_count": 45,
"entry_count": 120,
"total_hits": 3842,
"total_evictions": 12
},
"event_sources": [
{
"backend_name": "vllm-1",
"connected": true,
"events_received": 2100,
"events_dropped": 0,
"last_event_at": "2025-03-12T10:45:00Z",
"reconnect_count": 0
}
],
"routing_decisions": {
"kv_aware": 980,
"fallback": 120,
"total": 1100
},
"query_latency_count": 1100,
"overlap_score_count": 980
}
When the KV cache index is disabled (kv_cache_index.enabled: false or the section is absent), enabled is false, config is null, and all counters are 0.
Response Fields¶
| Field | Type | Description |
|---|---|---|
enabled | boolean | Whether the KV cache index is active |
config | object or null | Active configuration summary; null when disabled |
config.backend | string | Index backend: "memory" or "redis" |
config.max_entries | integer | Maximum tracked prefix hash entries |
config.entry_ttl_seconds | integer | TTL for index entries in seconds |
config.event_sources_count | integer | Number of configured event sources |
config.scoring | object | Scoring weight configuration |
index.prefix_count | integer | Number of distinct prefix hashes tracked |
index.entry_count | integer | Total (prefix, backend) pairs tracked |
index.total_hits | integer | Total cache hit recordings since startup |
index.total_evictions | integer | Total cache eviction recordings since startup |
event_sources | array | Status of each event source consumer |
event_sources[].connected | boolean | Whether the consumer is currently connected |
event_sources[].events_received | integer | Total events received from this source |
event_sources[].events_dropped | integer | Events dropped due to backpressure |
event_sources[].reconnect_count | integer | Number of reconnect attempts since startup |
routing_decisions.kv_aware | integer | Requests routed using KV-aware selection |
routing_decisions.fallback | integer | Requests that fell back to the default strategy |
routing_decisions.total | integer | Total routing decisions made |
Example¶
Get Per-Backend KV Cache State¶
Returns per-backend KV cache event statistics, including events received, processed, dropped, connection status, and index event counts.
Response (enabled)¶
{
"enabled": true,
"backends": [
{
"backend_name": "vllm-1",
"connection": {
"connected": true,
"reconnect_count": 0,
"last_event_at": "2025-03-12T10:45:00Z"
},
"events": {
"received": 2100,
"dropped": 0,
"index_created": 1950,
"index_evicted": 150
}
},
{
"backend_name": "vllm-2",
"connection": {
"connected": false,
"reconnect_count": 3,
"last_event_at": null
},
"events": {
"received": 0,
"dropped": 0,
"index_created": 0,
"index_evicted": 0
},
"configured_endpoint": "ws://vllm-2:8000/v1/kv_events"
}
]
}
Backends that appear in kv_cache_index.event_sources but have no active consumer yet are included with connected: false and a configured_endpoint field.
Response (disabled)¶
Response Fields¶
| Field | Type | Description |
|---|---|---|
enabled | boolean | Whether the KV cache index is active |
backends[].backend_name | string | Backend identifier |
backends[].connection.connected | boolean | Whether the event stream consumer is connected |
backends[].connection.reconnect_count | integer | Reconnect attempts since startup |
backends[].connection.last_event_at | string or null | ISO 8601 timestamp of the most recent event |
backends[].events.received | integer | Total events received from this backend |
backends[].events.dropped | integer | Events dropped due to backpressure |
backends[].events.index_created | integer | Index entries created from events |
backends[].events.index_evicted | integer | Index entries evicted from events |
backends[].configured_endpoint | string | Configured endpoint URL (only present for inactive sources) |
Example¶
curl -s http://localhost:8080/admin/kv-index/backends \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq
Clear KV Cache Index¶
Clears all entries from the KV cache index. Intended for debugging and testing. In production the index rebuilds automatically from incoming KV events.
Response (success)¶
entries_before_clear is the total (prefix, backend) pair count before clearing. cleared_entries is the number of prefix hash buckets removed. For the Redis backend, cleared_entries counts the number of Redis keys deleted; because each key has a TTL, any remaining keys expire automatically.
Response (disabled)¶
Example¶
Data Models¶
Configuration Sections¶
| Section | Description | Hot Reload |
|---|---|---|
server | Bind address, workers, connection pool | Requires restart |
backends | Backend URLs, weights, models | Gradual |
health_checks | Intervals, thresholds | Gradual |
logging | Log level, format, output | Immediate |
retry | Max attempts, delays, backoff | Immediate |
timeouts | Connect, request, idle timeouts | Gradual |
rate_limiting | Limits, storage, whitelist | Immediate |
circuit_breaker | Thresholds, recovery time | Immediate |
global_prompts | System prompt injection | Immediate |
fallback | Fallback chains, policies | Gradual |
files | Files API settings | Gradual |
api_keys | API key configuration | Immediate |
metrics | Prometheus, labels | Gradual |
admin | Admin API settings | Gradual |
admin.stats | Stats collection settings | Immediate |
routing | Model routing rules | Gradual |
Backend Object¶
{
"name": "string",
"url": "string (http:// or https://)",
"api_key": "string (optional, masked in responses)",
"weight": "integer (1-100)",
"models": ["string"],
"enabled": "boolean",
"health_check": {
"enabled": "boolean",
"path": "string",
"interval": "string (duration)"
}
}
History Entry Object¶
{
"version": "integer",
"timestamp": "string (ISO 8601)",
"sections_changed": ["string"],
"source": "string (api|file_reload|initial|rollback)",
"user": "string",
"description": "string (optional)",
"rollback_available": "boolean"
}
Validation Result Object¶
{
"valid": "boolean",
"errors": [
{
"field": "string",
"message": "string",
"code": "string"
}
],
"warnings": [
{
"field": "string",
"message": "string"
}
]
}
Hot Reload Behavior¶
Update Types¶
| Type | Behavior | Sections |
|---|---|---|
| Immediate | Applied instantly, no disruption | logging, ratelimiting, circuitbreaker, retry, globalprompts, apikeys |
| Gradual | Existing connections maintained, new connections use new config | backends, health_checks, timeouts, fallback, files, metrics, admin, routing |
| Requires Restart | Logged as warning, requires server restart | server.bind_address, server.workers |
Example Workflow¶
# 1. Check current configuration
curl -s http://localhost:8080/admin/config/logging | jq
# 2. Validate change
curl -X POST http://localhost:8080/admin/config/validate \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"section": "logging", "config": {"level": "debug"}}'
# 3. Apply change (immediate effect)
curl -X PATCH http://localhost:8080/admin/config/logging \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"config": {"level": "debug"}}'
# 4. Verify change
curl -s http://localhost:8080/admin/config/logging | jq '.config.level'
Error Handling¶
Error Response Format¶
Error Codes¶
| Code | HTTP Status | Description |
|---|---|---|
VALIDATION_ERROR | 400 | Configuration validation failed |
INVALID_SECTION | 400 | Unknown configuration section |
PARSE_ERROR | 400 | Failed to parse configuration content |
SECTION_NOT_FOUND | 404 | Section not found |
VERSION_NOT_FOUND | 404 | History version not found |
BACKEND_NOT_FOUND | 404 | Backend not found |
BACKEND_EXISTS | 409 | Backend with name already exists |
CONTENT_TOO_LARGE | 413 | Configuration content exceeds 1MB limit |
INTERNAL_ERROR | 500 | Internal server error |
Error Examples¶
// Validation Error
{
"error_code": "VALIDATION_ERROR",
"message": "Configuration validation failed",
"details": {
"errors": [
{"field": "workers", "message": "workers must be greater than 0"}
]
}
}
// Section Not Found
{
"error_code": "SECTION_NOT_FOUND",
"message": "Configuration section 'invalid' not found",
"details": {
"available_sections": ["server", "backends", "logging", "..."]
}
}
// Backend Exists
{
"error_code": "BACKEND_EXISTS",
"message": "Backend 'openai' already exists",
"details": {
"existing_backend": "openai"
}
}
Client SDK Examples¶
Python¶
import requests
from typing import Optional, Dict, Any, List
from dataclasses import dataclass
@dataclass
class ContinuumAdminClient:
"""Continuum Router Admin API Client"""
base_url: str
token: str
def __post_init__(self):
self.session = requests.Session()
self.session.headers.update({
"Authorization": f"Bearer {self.token}",
"Content-Type": "application/json"
})
# Configuration Query APIs
def get_full_config(self) -> Dict[str, Any]:
"""Get full configuration with masked sensitive data"""
resp = self.session.get(f"{self.base_url}/admin/config/full")
resp.raise_for_status()
return resp.json()
def get_sections(self) -> List[Dict[str, Any]]:
"""Get all configuration sections"""
resp = self.session.get(f"{self.base_url}/admin/config/sections")
resp.raise_for_status()
return resp.json()["sections"]
def get_section(self, section: str) -> Dict[str, Any]:
"""Get configuration for a specific section"""
resp = self.session.get(f"{self.base_url}/admin/config/{section}")
resp.raise_for_status()
return resp.json()
def get_schema(self, section: Optional[str] = None) -> Dict[str, Any]:
"""Get JSON schema for validation"""
params = {"section": section} if section else {}
resp = self.session.get(
f"{self.base_url}/admin/config/schema",
params=params
)
resp.raise_for_status()
return resp.json()
# Configuration Modification APIs
def update_section(self, section: str, config: Dict[str, Any]) -> Dict[str, Any]:
"""Replace section configuration"""
resp = self.session.put(
f"{self.base_url}/admin/config/{section}",
json={"config": config}
)
resp.raise_for_status()
return resp.json()
def patch_section(self, section: str, config: Dict[str, Any]) -> Dict[str, Any]:
"""Partial update section configuration"""
resp = self.session.patch(
f"{self.base_url}/admin/config/{section}",
json={"config": config}
)
resp.raise_for_status()
return resp.json()
def validate_config(
self,
section: str,
config: Dict[str, Any],
dry_run: bool = True
) -> Dict[str, Any]:
"""Validate configuration without applying"""
resp = self.session.post(
f"{self.base_url}/admin/config/validate",
json={"section": section, "config": config, "dry_run": dry_run}
)
resp.raise_for_status()
return resp.json()
def apply_config(
self,
sections: Optional[List[str]] = None,
force: bool = False
) -> Dict[str, Any]:
"""Apply pending configuration changes"""
body = {"force": force}
if sections:
body["sections"] = sections
resp = self.session.post(
f"{self.base_url}/admin/config/apply",
json=body
)
resp.raise_for_status()
return resp.json()
# Configuration Save/Restore APIs
def export_config(
self,
format: str = "yaml",
sections: Optional[List[str]] = None,
include_sensitive: bool = False
) -> str:
"""Export configuration in specified format"""
body = {"format": format, "include_sensitive": include_sensitive}
if sections:
body["sections"] = sections
resp = self.session.post(
f"{self.base_url}/admin/config/export",
json=body
)
resp.raise_for_status()
return resp.json()["content"]
def import_config(
self,
content: str,
format: str = "yaml",
apply: bool = True,
dry_run: bool = False
) -> Dict[str, Any]:
"""Import configuration from content"""
resp = self.session.post(
f"{self.base_url}/admin/config/import",
json={
"format": format,
"content": content,
"apply": apply,
"dry_run": dry_run
}
)
resp.raise_for_status()
return resp.json()
def get_history(
self,
limit: int = 20,
offset: int = 0,
section: Optional[str] = None
) -> Dict[str, Any]:
"""Get configuration change history"""
params = {"limit": limit, "offset": offset}
if section:
params["section"] = section
resp = self.session.get(
f"{self.base_url}/admin/config/history",
params=params
)
resp.raise_for_status()
return resp.json()
def rollback(
self,
version: int,
sections: Optional[List[str]] = None,
dry_run: bool = False
) -> Dict[str, Any]:
"""Rollback to a previous version"""
body = {"dry_run": dry_run}
if sections:
body["sections"] = sections
resp = self.session.post(
f"{self.base_url}/admin/config/rollback/{version}",
json=body
)
resp.raise_for_status()
return resp.json()
# Backend Management APIs
def list_backends(self) -> List[Dict[str, Any]]:
"""List all backends"""
resp = self.session.get(f"{self.base_url}/admin/backends")
resp.raise_for_status()
return resp.json()["backends"]
def get_backend(self, name: str) -> Dict[str, Any]:
"""Get backend configuration"""
resp = self.session.get(f"{self.base_url}/admin/backends/{name}")
resp.raise_for_status()
return resp.json()
def add_backend(
self,
name: str,
url: str,
weight: int = 1,
models: Optional[List[str]] = None
) -> Dict[str, Any]:
"""Add a new backend"""
body = {"name": name, "url": url, "weight": weight}
if models:
body["models"] = models
resp = self.session.post(
f"{self.base_url}/admin/backends",
json=body
)
resp.raise_for_status()
return resp.json()
def update_backend(self, name: str, **kwargs) -> Dict[str, Any]:
"""Update backend configuration"""
resp = self.session.put(
f"{self.base_url}/admin/backends/{name}",
json=kwargs
)
resp.raise_for_status()
return resp.json()
def delete_backend(self, name: str, force: bool = False) -> Dict[str, Any]:
"""Delete a backend"""
params = {"force": str(force).lower()} if force else {}
resp = self.session.delete(
f"{self.base_url}/admin/backends/{name}",
params=params
)
resp.raise_for_status()
return resp.json()
def update_backend_weight(self, name: str, weight: int) -> Dict[str, Any]:
"""Update backend weight"""
resp = self.session.put(
f"{self.base_url}/admin/backends/{name}/weight",
json={"weight": weight}
)
resp.raise_for_status()
return resp.json()
def update_backend_models(
self,
name: str,
models: List[str],
append: bool = False
) -> Dict[str, Any]:
"""Update backend models"""
resp = self.session.put(
f"{self.base_url}/admin/backends/{name}/models",
json={"models": models, "append": append}
)
resp.raise_for_status()
return resp.json()
# Usage Example
if __name__ == "__main__":
client = ContinuumAdminClient(
base_url="http://localhost:8080",
token="your-admin-token"
)
# Get current logging config
logging_config = client.get_section("logging")
print(f"Current log level: {logging_config['config']['level']}")
# Update logging level
result = client.patch_section("logging", {"level": "debug"})
print(f"Updated: {result['success']}")
# Add a new backend
client.add_backend(
name="new-ollama",
url="http://192.168.1.100:11434",
weight=2,
models=["llama3.2", "mistral"]
)
# Export configuration backup
backup = client.export_config(format="yaml")
with open("config-backup.yaml", "w") as f:
f.write(backup)
JavaScript/TypeScript¶
interface ConfigSection {
name: string;
config: Record<string, any>;
hot_reload_capability: 'immediate' | 'gradual' | 'requires_restart';
}
interface HistoryEntry {
version: number;
timestamp: string;
sections_changed: string[];
source: string;
user: string;
}
interface Backend {
name: string;
url: string;
weight: number;
models: string[];
enabled: boolean;
health_status: string;
}
class ContinuumAdminClient {
private baseUrl: string;
private token: string;
constructor(baseUrl: string, token: string) {
this.baseUrl = baseUrl;
this.token = token;
}
private async request<T>(
method: string,
path: string,
body?: any,
params?: Record<string, string>
): Promise<T> {
const url = new URL(`${this.baseUrl}${path}`);
if (params) {
Object.entries(params).forEach(([k, v]) => url.searchParams.set(k, v));
}
const response = await fetch(url.toString(), {
method,
headers: {
'Authorization': `Bearer ${this.token}`,
'Content-Type': 'application/json',
},
body: body ? JSON.stringify(body) : undefined,
});
if (!response.ok) {
const error = await response.json();
throw new Error(error.message || `HTTP ${response.status}`);
}
return response.json();
}
// Configuration Query APIs
async getFullConfig(): Promise<any> {
return this.request('GET', '/admin/config/full');
}
async getSections(): Promise<ConfigSection[]> {
const result = await this.request<{ sections: ConfigSection[] }>(
'GET', '/admin/config/sections'
);
return result.sections;
}
async getSection(section: string): Promise<ConfigSection> {
return this.request('GET', `/admin/config/${section}`);
}
async getSchema(section?: string): Promise<any> {
const params = section ? { section } : undefined;
return this.request('GET', '/admin/config/schema', undefined, params);
}
// Configuration Modification APIs
async updateSection(section: string, config: Record<string, any>): Promise<any> {
return this.request('PUT', `/admin/config/${section}`, { config });
}
async patchSection(section: string, config: Record<string, any>): Promise<any> {
return this.request('PATCH', `/admin/config/${section}`, { config });
}
async validateConfig(
section: string,
config: Record<string, any>,
dryRun: boolean = true
): Promise<any> {
return this.request('POST', '/admin/config/validate', {
section,
config,
dry_run: dryRun,
});
}
async applyConfig(sections?: string[], force: boolean = false): Promise<any> {
return this.request('POST', '/admin/config/apply', { sections, force });
}
// Configuration Save/Restore APIs
async exportConfig(
format: 'yaml' | 'json' | 'toml' = 'yaml',
sections?: string[],
includeSensitive: boolean = false
): Promise<string> {
const result = await this.request<{ content: string }>(
'POST', '/admin/config/export',
{ format, sections, include_sensitive: includeSensitive }
);
return result.content;
}
async importConfig(
content: string,
format: 'yaml' | 'json' | 'toml' = 'yaml',
apply: boolean = true,
dryRun: boolean = false
): Promise<any> {
return this.request('POST', '/admin/config/import', {
format,
content,
apply,
dry_run: dryRun,
});
}
async getHistory(
limit: number = 20,
offset: number = 0,
section?: string
): Promise<{ history: HistoryEntry[]; total_entries: number }> {
const params: Record<string, string> = {
limit: limit.toString(),
offset: offset.toString(),
};
if (section) params.section = section;
return this.request('GET', '/admin/config/history', undefined, params);
}
async rollback(
version: number,
sections?: string[],
dryRun: boolean = false
): Promise<any> {
return this.request('POST', `/admin/config/rollback/${version}`, {
sections,
dry_run: dryRun,
});
}
// Backend Management APIs
async listBackends(): Promise<Backend[]> {
const result = await this.request<{ backends: Backend[] }>(
'GET', '/admin/backends'
);
return result.backends;
}
async getBackend(name: string): Promise<Backend> {
return this.request('GET', `/admin/backends/${name}`);
}
async addBackend(
name: string,
url: string,
weight: number = 1,
models?: string[]
): Promise<any> {
return this.request('POST', '/admin/backends', {
name,
url,
weight,
models,
});
}
async updateBackend(name: string, updates: Partial<Backend>): Promise<any> {
return this.request('PUT', `/admin/backends/${name}`, updates);
}
async deleteBackend(name: string, force: boolean = false): Promise<any> {
const params = force ? { force: 'true' } : undefined;
return this.request('DELETE', `/admin/backends/${name}`, undefined, params);
}
async updateBackendWeight(name: string, weight: number): Promise<any> {
return this.request('PUT', `/admin/backends/${name}/weight`, { weight });
}
async updateBackendModels(
name: string,
models: string[],
append: boolean = false
): Promise<any> {
return this.request('PUT', `/admin/backends/${name}/models`, {
models,
append,
});
}
}
// Usage Example
async function main() {
const client = new ContinuumAdminClient(
'http://localhost:8080',
'your-admin-token'
);
// Get current logging config
const loggingConfig = await client.getSection('logging');
console.log(`Current log level: ${loggingConfig.config.level}`);
// Update logging level
const result = await client.patchSection('logging', { level: 'debug' });
console.log(`Updated: ${result.success}`);
// Add a new backend
await client.addBackend('new-ollama', 'http://192.168.1.100:11434', 2, [
'llama3.2',
'mistral',
]);
// Export configuration backup
const backup = await client.exportConfig('yaml');
console.log('Configuration exported');
}
main().catch(console.error);
Go¶
package main
import (
"bytes"
"encoding/json"
"fmt"
"io"
"net/http"
"net/url"
)
type ContinuumAdminClient struct {
BaseURL string
Token string
client *http.Client
}
func NewClient(baseURL, token string) *ContinuumAdminClient {
return &ContinuumAdminClient{
BaseURL: baseURL,
Token: token,
client: &http.Client{},
}
}
func (c *ContinuumAdminClient) request(method, path string, body interface{}) (map[string]interface{}, error) {
var reqBody io.Reader
if body != nil {
jsonBody, err := json.Marshal(body)
if err != nil {
return nil, err
}
reqBody = bytes.NewBuffer(jsonBody)
}
req, err := http.NewRequest(method, c.BaseURL+path, reqBody)
if err != nil {
return nil, err
}
req.Header.Set("Authorization", "Bearer "+c.Token)
req.Header.Set("Content-Type", "application/json")
resp, err := c.client.Do(req)
if err != nil {
return nil, err
}
defer resp.Body.Close()
var result map[string]interface{}
if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
return nil, err
}
if resp.StatusCode >= 400 {
return nil, fmt.Errorf("HTTP %d: %v", resp.StatusCode, result)
}
return result, nil
}
// GetFullConfig retrieves the full configuration
func (c *ContinuumAdminClient) GetFullConfig() (map[string]interface{}, error) {
return c.request("GET", "/admin/config/full", nil)
}
// GetSection retrieves a specific configuration section
func (c *ContinuumAdminClient) GetSection(section string) (map[string]interface{}, error) {
return c.request("GET", "/admin/config/"+section, nil)
}
// PatchSection partially updates a configuration section
func (c *ContinuumAdminClient) PatchSection(section string, config map[string]interface{}) (map[string]interface{}, error) {
return c.request("PATCH", "/admin/config/"+section, map[string]interface{}{
"config": config,
})
}
// AddBackend adds a new backend
func (c *ContinuumAdminClient) AddBackend(name, backendURL string, weight int, models []string) (map[string]interface{}, error) {
return c.request("POST", "/admin/backends", map[string]interface{}{
"name": name,
"url": backendURL,
"weight": weight,
"models": models,
})
}
// ExportConfig exports configuration in the specified format
func (c *ContinuumAdminClient) ExportConfig(format string) (string, error) {
result, err := c.request("POST", "/admin/config/export", map[string]interface{}{
"format": format,
})
if err != nil {
return "", err
}
return result["content"].(string), nil
}
// GetHistory retrieves configuration change history
func (c *ContinuumAdminClient) GetHistory(limit int) (map[string]interface{}, error) {
u, _ := url.Parse(c.BaseURL + "/admin/config/history")
q := u.Query()
q.Set("limit", fmt.Sprintf("%d", limit))
u.RawQuery = q.Encode()
return c.request("GET", u.Path+"?"+u.RawQuery, nil)
}
func main() {
client := NewClient("http://localhost:8080", "your-admin-token")
// Get current logging config
config, _ := client.GetSection("logging")
fmt.Printf("Current config: %v\n", config)
// Update logging level
result, _ := client.PatchSection("logging", map[string]interface{}{
"level": "debug",
})
fmt.Printf("Update result: %v\n", result)
// Add a new backend
client.AddBackend("new-ollama", "http://192.168.1.100:11434", 2, []string{"llama3.2"})
// Export configuration
backup, _ := client.ExportConfig("yaml")
fmt.Println("Configuration exported")
fmt.Println(backup)
}
Best Practices¶
1. Always Validate Before Applying¶
# Step 1: Validate
curl -X POST http://localhost:8080/admin/config/validate \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"section": "logging", "config": {"level": "debug"}}'
# Step 2: Apply only if valid
curl -X PATCH http://localhost:8080/admin/config/logging \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"config": {"level": "debug"}}'
2. Use Dry Run for Imports¶
# Preview import changes
curl -X POST http://localhost:8080/admin/config/import \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"format": "yaml",
"content": "...",
"dry_run": true
}'
3. Regular Configuration Backups¶
# Daily backup script
#!/bin/bash
DATE=$(date +%Y%m%d)
curl -s -X POST http://localhost:8080/admin/config/export \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"format": "yaml"}' | jq -r '.content' > "config-backup-$DATE.yaml"
4. Monitor Configuration History¶
# Check recent changes
curl -s http://localhost:8080/admin/config/history?limit=5 \
-H "Authorization: Bearer $TOKEN" | jq '.history[] | {version, timestamp, sections_changed}'
5. Use Partial Updates (PATCH) for Minimal Changes¶
# Only update what's needed
curl -X PATCH http://localhost:8080/admin/config/rate_limiting \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"config": {"requests_per_minute": 200}}'
6. Test Configuration Changes in Staging First¶
# Example: Test configuration in staging before production
staging_client = ContinuumAdminClient("http://staging:8080", staging_token)
production_client = ContinuumAdminClient("http://production:8080", prod_token)
# Apply to staging first
staging_client.patch_section("rate_limiting", {"requests_per_minute": 500})
# Verify in staging
staging_config = staging_client.get_section("rate_limiting")
assert staging_config["config"]["requests_per_minute"] == 500
# Then apply to production
production_client.patch_section("rate_limiting", {"requests_per_minute": 500})
Security Considerations¶
1. Sensitive Data Handling¶
- All API responses automatically mask sensitive fields (API keys, passwords, tokens)
- Use
include_sensitive: truein export only when absolutely necessary - Audit logs record when sensitive data is accessed
2. Authentication Best Practices¶
admin:
auth:
method: bearer_token
token: "${ADMIN_TOKEN}" # Use environment variables
# Restrict access by IP
ip_whitelist:
- "10.0.0.0/8" # Internal network only
- "192.168.1.0/24" # Office network
3. Audit Logging¶
All configuration changes are logged with: - Timestamp - User/source - Changed sections - Previous and new values (sensitive data masked)
4. Rate Limiting Admin Endpoints¶
Consider rate limiting admin endpoints to prevent abuse:
5. Backup Before Major Changes¶
# Always backup before major changes
backup=$(curl -s -X POST http://localhost:8080/admin/config/export \
-H "Authorization: Bearer $TOKEN" \
-d '{"format": "yaml"}' | jq -r '.content')
# Make changes...
# Restore if needed
curl -X POST http://localhost:8080/admin/config/import \
-H "Authorization: Bearer $TOKEN" \
-d "{\"format\": \"yaml\", \"content\": $(echo "$backup" | jq -Rs .)}"
Prompt File Management APIs¶
The Prompt File Management API allows you to manage system prompts stored in external Markdown files. This enables centralized management of system prompts without modifying the main configuration file.
List All Prompts¶
Get a list of all configured prompts with their sources and content.
Response¶
{
"prompts": [
{
"id": "default",
"prompt_type": "default",
"source": "file",
"file_path": "prompts/system.md",
"content": "# System Prompt\n\nYou are a helpful assistant...",
"loaded": true,
"size_bytes": 1024
},
{
"id": "anthropic",
"prompt_type": "backend",
"source": "file",
"file_path": "prompts/anthropic.md",
"content": "# Anthropic-specific prompt...",
"loaded": true,
"size_bytes": 512
},
{
"id": "gpt-4",
"prompt_type": "model",
"source": "inline",
"content": "You are GPT-4...",
"size_bytes": 256
}
],
"total": 3,
"prompts_directory": "./prompts"
}
Example¶
Get Prompt File¶
Get content of a specific prompt file.
Path Parameters¶
| Parameter | Type | Required | Description |
|---|---|---|---|
path | string | Yes | Relative path to the prompt file |
Response¶
{
"path": "prompts/system.md",
"content": "# System Prompt\n\nYou are a helpful assistant that follows company policies...",
"size_bytes": 1024,
"modified_at": 1702468200
}
Example¶
curl -s http://localhost:8080/admin/config/prompts/prompts/system.md \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq
Update Prompt File¶
Create or update a prompt file with new content.
Request Body¶
{
"content": "# Updated System Prompt\n\nYou are a helpful assistant that follows all company policies.\n\n## Security Guidelines\n\n- Never reveal internal system details\n- Follow data privacy regulations"
}
Response¶
{
"success": true,
"path": "prompts/system.md",
"size_bytes": 245,
"message": "Prompt file updated successfully"
}
Example¶
curl -X PUT http://localhost:8080/admin/config/prompts/prompts/system.md \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"content": "# System Prompt\n\nYou are a helpful assistant."
}'
Reload Prompt Files¶
Reload all prompt files from disk. Useful after manual file edits.
Response¶
{
"success": true,
"reloaded_count": 3,
"reloaded": [
"prompts/system.md",
"prompts/anthropic.md",
"prompts/gpt4.md"
],
"errors": [],
"message": "Successfully reloaded 3 prompt file(s)"
}
Example¶
curl -X POST http://localhost:8080/admin/config/prompts/reload \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq
Configuration Example¶
To use external prompt files, configure global_prompts in your config file:
global_prompts:
# Directory containing prompt files (relative to config directory)
prompts_dir: "./prompts"
# Default prompt from external file
default_file: "system.md"
# Or inline prompt (default_file takes precedence if both specified)
# default: "You are a helpful assistant."
# Backend-specific prompts
backends:
anthropic:
prompt_file: "anthropic-system.md"
openai:
prompt: "OpenAI-specific inline prompt"
# Model-specific prompts
models:
gpt-4:
prompt_file: "gpt4-system.md"
claude-3-opus:
prompt_file: "claude-opus-system.md"
merge_strategy: prepend
Security Considerations¶
- Path Traversal Protection: All paths are validated to prevent directory traversal attacks (e.g.,
../../../etc/passwd) - File Size Limits: Prompt files are limited to 1MB maximum
- Relative Paths Only: Prompt files must be within the configured
prompts_diror config directory - Authentication Required: All prompt management endpoints require admin authentication
Appendix: Quick Reference¶
Configuration Sections¶
| Section | Hot Reload | Description |
|---|---|---|
server | Restart | Bind address, workers |
backends | Gradual | Backend URLs, weights |
health_checks | Gradual | Health monitoring |
logging | Immediate | Log level, format |
retry | Immediate | Retry policies |
timeouts | Gradual | Request timeouts |
rate_limiting | Immediate | Rate limits |
circuit_breaker | Immediate | Circuit breaker |
global_prompts | Immediate | System prompts |
fallback | Gradual | Model fallback |
files | Gradual | Files API |
api_keys | Immediate | API keys |
metrics | Gradual | Prometheus metrics |
admin | Gradual | Admin settings |
admin.stats | Immediate | Stats collection settings |
routing | Gradual | Routing rules |
prefix_routing | Immediate | Prefix-aware KV cache routing |
response_cache | Immediate | Response cache settings |
kv_cache_index | Requires restart | KV cache index backend and event sources |
HTTP Status Codes¶
| Code | Meaning |
|---|---|
| 200 | Success |
| 400 | Bad Request (validation error) |
| 401 | Unauthorized |
| 403 | Forbidden |
| 404 | Not Found |
| 409 | Conflict |
| 413 | Payload Too Large |
| 500 | Internal Server Error |
Common curl Commands¶
# Get full config
curl -s http://localhost:8080/admin/config/full -H "Authorization: Bearer $TOKEN"
# Update logging level
curl -X PATCH http://localhost:8080/admin/config/logging \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-d '{"config": {"level": "debug"}}'
# Add backend
curl -X POST http://localhost:8080/admin/backends \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-d '{"name": "new", "url": "http://host:port", "weight": 1}'
# Export config
curl -X POST http://localhost:8080/admin/config/export \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-d '{"format": "yaml"}'
# View history
curl -s http://localhost:8080/admin/config/history -H "Authorization: Bearer $TOKEN"
# Rollback
curl -X POST http://localhost:8080/admin/config/rollback/5 \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -d '{}'