Metrics and Monitoring¶
This document describes the metrics and monitoring capabilities of the Continuum Router.
Table of Contents¶
- Overview
- Quick Start
- Configuration
- Available Metrics
- Integration
- Grafana Dashboard
- Alerting
- Examples
- Best Practices
Overview¶
The Continuum Router provides comprehensive Prometheus-compatible metrics for monitoring system health, performance, and usage patterns. The metrics system is designed to be:
- Lightweight: Minimal performance overhead
- Comprehensive: Covers all critical aspects of the router
- Production-ready: Includes cardinality limits and proper labeling
- Easy to integrate: Works with standard Prometheus/Grafana setups
Quick Start¶
1. Enable Metrics¶
Metrics are enabled by default. The metrics endpoint is available at /metrics:
2. Configure Prometheus¶
Add the router as a target in your prometheus.yml:
scrape_configs:
- job_name: 'continuum-router'
static_configs:
- targets: ['localhost:8000']
scrape_interval: 15s
3. Import Grafana Dashboard¶
Import the provided dashboard from monitoring/grafana/dashboards/router-overview.json.
Configuration¶
Metrics configuration is done through the main config file:
metrics:
# Enable/disable metrics collection
enabled: true
# Metrics endpoint path
endpoint: "/metrics"
# Cardinality limits to prevent metric explosion
cardinality_limit:
max_labels_per_metric: 100
max_unique_label_values: 1000
# Optional metrics (disabled by default for performance)
optional_metrics:
enable_request_body_size: false
enable_response_body_size: false
enable_detailed_errors: true
Environment Variables¶
You can also configure metrics using environment variables:
# Enable/disable metrics
METRICS_ENABLED=true
# Change metrics endpoint
METRICS_ENDPOINT=/custom/metrics
# Enable optional metrics
METRICS_ENABLE_BODY_SIZE=true
Available Metrics¶
HTTP Metrics¶
| Metric | Type | Description | Labels |
|---|---|---|---|
http_requests_total | Counter | Total number of HTTP requests | method, endpoint, status |
http_request_duration_seconds | Histogram | Request latency | method, endpoint |
http_active_connections | Gauge | Current active connections | - |
http_request_size_bytes | Histogram | Request body size | method, endpoint |
http_response_size_bytes | Histogram | Response body size | method, endpoint |
Backend Metrics¶
| Metric | Type | Description | Labels |
|---|---|---|---|
backend_health_status | Gauge | Backend health (1=healthy, 0=unhealthy) | backend_id, backend_url |
backend_health_check_duration_seconds | Histogram | Health check duration | backend_id |
backend_health_check_failures_total | Counter | Total health check failures | backend_id, error_type |
backend_request_latency_seconds | Histogram | Backend request latency | backend_id, endpoint |
backend_connection_pool_size | Gauge | Connection pool size | backend_id |
backend_connection_pool_active | Gauge | Active connections in pool | backend_id |
Routing Metrics¶
| Metric | Type | Description | Labels |
|---|---|---|---|
routing_decisions_total | Counter | Total routing decisions | strategy, selected_backend |
routing_backend_selection_duration_seconds | Histogram | Time to select backend | strategy |
routing_model_availability | Gauge | Model availability per backend | model, backend_id |
routing_retries_total | Counter | Total retry attempts | backend_id, reason |
routing_circuit_breaker_state | Gauge | Circuit breaker state | backend_id |
Model Service Metrics¶
| Metric | Type | Description | Labels |
|---|---|---|---|
model_cache_hits_total | Counter | Model cache hits | operation |
model_cache_misses_total | Counter | Model cache misses | operation |
model_refresh_duration_seconds | Histogram | Model list refresh duration | backend_id |
model_discovery_errors_total | Counter | Model discovery errors | backend_id, error_type |
Streaming Metrics¶
| Metric | Type | Description | Labels |
|---|---|---|---|
streaming_active_connections | Gauge | Active streaming connections | endpoint |
streaming_events_sent_total | Counter | Total SSE events sent | endpoint, event_type |
streaming_connection_duration_seconds | Histogram | Streaming connection duration | endpoint |
streaming_errors_total | Counter | Streaming errors | endpoint, error_type |
Fallback Metrics¶
| Metric | Type | Description | Labels |
|---|---|---|---|
fallback_attempts_total | Counter | Total fallback attempts | original_model, fallback_model, reason |
fallback_success_total | Counter | Successful fallbacks | original_model, fallback_model |
fallback_exhausted_total | Counter | Exhausted fallback chains | original_model |
fallback_cross_provider_total | Counter | Cross-provider fallbacks | from_provider, to_provider |
fallback_duration_seconds | Histogram | Fallback operation duration | original_model |
Business Metrics¶
| Metric | Type | Description | Labels |
|---|---|---|---|
model_usage_total | Counter | Model usage count | model, backend_id |
tokens_consumed_total | Counter | Total tokens consumed | model, operation |
Integration¶
Prometheus Configuration¶
Complete Prometheus configuration example:
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'continuum-router'
static_configs:
- targets: ['router1:8000', 'router2:8000']
metric_relabel_configs:
# Drop high-cardinality metrics if needed
- source_labels: [__name__]
regex: 'http_request_duration_seconds_bucket'
action: drop
Kubernetes Integration¶
For Kubernetes deployments, use ServiceMonitor:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: continuum-router
namespace: monitoring
spec:
selector:
matchLabels:
app: continuum-router
endpoints:
- port: metrics
interval: 15s
path: /metrics
Grafana Dashboard¶
The provided Grafana dashboard includes:
Overview Panel¶
- Request rate and error rate
- P50, P95, P99 latencies
- Active connections
- Backend health status
Backend Performance¶
- Backend-specific latencies
- Health check success rate
- Connection pool utilization
- Circuit breaker status
Model Usage¶
- Model request distribution
- Cache hit rates
- Token consumption
- Model availability matrix
Alerts Overview¶
- Active alerts
- Alert history
- SLO compliance
To import the dashboard:
- Open Grafana
- Go to Dashboards → Import
- Upload
monitoring/grafana/dashboards/router-overview.json - Select your Prometheus data source
- Click Import
Alerting¶
Pre-configured alert rules are available in monitoring/prometheus/alerts.yml:
Critical Alerts¶
- alert: BackendDown
expr: backend_health_status == 0
for: 1m
annotations:
summary: "Backend {{ $labels.backend_id }} is down"
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
for: 5m
annotations:
summary: "High error rate: {{ $value | humanizePercentage }}"
Warning Alerts¶
- alert: HighLatency
expr: histogram_quantile(0.95, http_request_duration_seconds) > 1
for: 5m
annotations:
summary: "P95 latency above 1s: {{ $value | humanizeDuration }}"
- alert: LowCacheHitRate
expr: rate(model_cache_hits_total[5m]) / rate(model_cache_total[5m]) < 0.8
for: 10m
annotations:
summary: "Cache hit rate below 80%: {{ $value | humanizePercentage }}"
Examples¶
Query Examples¶
Request Rate by Status¶
P95 Latency by Endpoint¶
Backend Health Overview¶
Model Usage Ranking¶
Error Rate Percentage¶
Programmatic Access¶
You can also access metrics programmatically:
import requests
from prometheus_client.parser import text_string_to_metric_families
# Fetch metrics
response = requests.get('http://localhost:8000/metrics')
metrics = text_string_to_metric_families(response.text)
# Process metrics
for family in metrics:
for sample in family.samples:
if sample.name == 'http_requests_total':
print(f"Endpoint: {sample.labels['endpoint']}, Count: {sample.value}")
Custom Metrics Collection¶
#!/bin/bash
# Collect metrics every 30 seconds and save to file
while true; do
timestamp=$(date +%s)
curl -s http://localhost:8000/metrics > "metrics_${timestamp}.txt"
sleep 30
done
Best Practices¶
1. Label Cardinality¶
Keep label cardinality low to prevent metric explosion:
# Good: Low cardinality
labels:
status: "200" # ~5 possible values
method: "GET" # ~7 possible values
# Bad: High cardinality
labels:
user_id: "12345" # Unbounded
request_id: "abc-123" # Unique per request
2. Metric Naming¶
Follow Prometheus naming conventions:
- Use
snake_case - Include units in metric names (
_seconds,_bytes,_total) - Use standard prefixes (
http_,backend_,model_)
3. Dashboard Design¶
- Group related metrics together
- Use appropriate visualization types (gauge for current values, graph for time series)
- Include both absolute values and rates
- Set reasonable refresh intervals (15-30s for real-time, 1-5m for historical)
4. Alert Configuration¶
- Use appropriate evaluation periods (
for: 5mto avoid flapping) - Include context in alert descriptions
- Set up alert routing based on severity
- Test alerts in staging before production
5. Performance Considerations¶
- Disable optional metrics if not needed
- Use recording rules for complex queries
- Implement proper metric retention policies
- Consider using remote storage for long-term retention
6. Security¶
- Protect metrics endpoint if sensitive data is exposed
- Use TLS for Prometheus scraping in production
- Implement authentication for Grafana dashboards
- Audit metric access logs
Troubleshooting¶
Metrics Not Appearing¶
- Check if metrics are enabled in configuration
- Verify the metrics endpoint is accessible
- Check Prometheus target status
- Review router logs for metric initialization errors
High Memory Usage¶
- Review cardinality limits
- Check for unbounded labels
- Reduce histogram buckets if needed
- Enable metric expiration
Incorrect Values¶
- Verify metric types (counter vs gauge)
- Check aggregation functions
- Review label selectors
- Validate time ranges