Metrics and Monitoring¶

This document describes the metrics and monitoring capabilities of the Continuum Router.

Table of Contents¶

Overview
Quick Start
Configuration
Available Metrics
Integration
Grafana Dashboard
Alerting
Examples
Best Practices

Overview¶

The Continuum Router provides comprehensive Prometheus-compatible metrics for monitoring system health, performance, and usage patterns. The metrics system is designed to be:

Lightweight: Minimal performance overhead
Comprehensive: Covers all critical aspects of the router
Production-ready: Includes cardinality limits and proper labeling
Easy to integrate: Works with standard Prometheus/Grafana setups

Quick Start¶

1. Enable Metrics¶

Metrics are enabled by default. The metrics endpoint is available at /metrics:

# View metrics
curl http://localhost:8000/metrics

2. Configure Prometheus¶

Add the router as a target in your prometheus.yml:

scrape_configs:
    - job_name: 'continuum-router'
    static_configs:
      - targets: ['localhost:8000']
    scrape_interval: 15s

3. Import Grafana Dashboard¶

Import the provided dashboard from monitoring/grafana/dashboards/router-overview.json.

Configuration¶

Metrics configuration is done through the main config file:

metrics:
  # Enable/disable metrics collection
  enabled: true

  # Metrics endpoint path
  endpoint: "/metrics"

  # Cardinality limits to prevent metric explosion
  cardinality_limit:
    max_labels_per_metric: 100
    max_unique_label_values: 1000

  # Optional metrics (disabled by default for performance)
  optional_metrics:
    enable_request_body_size: false
    enable_response_body_size: false
    enable_detailed_errors: true

Environment Variables¶

You can also configure metrics using environment variables:

# Enable/disable metrics
METRICS_ENABLED=true

# Change metrics endpoint
METRICS_ENDPOINT=/custom/metrics

# Enable optional metrics
METRICS_ENABLE_BODY_SIZE=true

Available Metrics¶

HTTP Metrics¶

Metric	Type	Description	Labels
`http_requests_total`	Counter	Total number of HTTP requests	`method`, `endpoint`, `status`
`http_request_duration_seconds`	Histogram	Request latency	`method`, `endpoint`
`http_active_connections`	Gauge	Current active connections	-
`http_request_size_bytes`	Histogram	Request body size	`method`, `endpoint`
`http_response_size_bytes`	Histogram	Response body size	`method`, `endpoint`

Backend Metrics¶

Metric	Type	Description	Labels
`backend_health_status`	Gauge	Backend health (1=healthy, 0=unhealthy)	`backend_id`, `backend_url`
`backend_health_check_duration_seconds`	Histogram	Health check duration	`backend_id`
`backend_health_check_failures_total`	Counter	Total health check failures	`backend_id`, `error_type`
`backend_request_latency_seconds`	Histogram	Backend request latency	`backend_id`, `endpoint`
`backend_connection_pool_size`	Gauge	Connection pool size	`backend_id`
`backend_connection_pool_active`	Gauge	Active connections in pool	`backend_id`

Routing Metrics¶

Metric	Type	Description	Labels
`routing_decisions_total`	Counter	Total routing decisions	`strategy`, `selected_backend`
`routing_backend_selection_duration_seconds`	Histogram	Time to select backend	`strategy`
`routing_model_availability`	Gauge	Model availability per backend	`model`, `backend_id`
`routing_retries_total`	Counter	Total retry attempts	`backend_id`, `reason`
`routing_circuit_breaker_state`	Gauge	Circuit breaker state	`backend_id`

Model Service Metrics¶

Metric	Type	Description	Labels
`model_cache_hits_total`	Counter	Model cache hits	`operation`
`model_cache_misses_total`	Counter	Model cache misses	`operation`
`model_refresh_duration_seconds`	Histogram	Model list refresh duration	`backend_id`
`model_discovery_errors_total`	Counter	Model discovery errors	`backend_id`, `error_type`

Cache Stampede Prevention Metrics¶

These metrics help monitor the cache stampede prevention mechanisms:

Metric	Type	Description	Labels
`model_stale_while_revalidate_total`	Counter	Requests that returned stale data while refresh was in progress	-
`model_coalesced_requests_total`	Counter	Requests that waited for ongoing aggregation instead of triggering new one	-
`model_background_refreshes_total`	Counter	Background refresh operations initiated	-
`model_background_refresh_successes_total`	Counter	Successful background refresh operations	-
`model_background_refresh_failures_total`	Counter	Failed background refresh operations	-
`model_singleflight_lock_acquired_total`	Counter	Times the aggregation lock was acquired for singleflight	-

Understanding Cache Stampede Metrics¶

High coalesced_requests: Indicates the singleflight pattern is effectively preventing duplicate aggregations
High stale_while_revalidate: Shows the stale-while-revalidate pattern is returning cached data during refresh
Low background_refresh_failures: Confirms background refresh is working correctly
Zero blocking on cache miss: When background_refreshes > 0, requests should rarely block on cache refresh

Streaming Metrics¶

Metric	Type	Description	Labels
`streaming_active_connections`	Gauge	Active streaming connections	`endpoint`
`streaming_events_sent_total`	Counter	Total SSE events sent	`endpoint`, `event_type`
`streaming_connection_duration_seconds`	Histogram	Streaming connection duration	`endpoint`
`streaming_errors_total`	Counter	Streaming errors	`endpoint`, `error_type`

Fallback Metrics¶

Metric	Type	Description	Labels
`fallback_attempts_total`	Counter	Total fallback attempts	`original_model`, `fallback_model`, `reason`
`fallback_success_total`	Counter	Successful fallbacks	`original_model`, `fallback_model`
`fallback_exhausted_total`	Counter	Exhausted fallback chains	`original_model`
`fallback_cross_provider_total`	Counter	Cross-provider fallbacks	`from_provider`, `to_provider`
`fallback_duration_seconds`	Histogram	Fallback operation duration	`original_model`

Business Metrics¶

Metric	Type	Description	Labels
`model_usage_total`	Counter	Model usage count	`model`, `backend_id`
`tokens_consumed_total`	Counter	Total tokens consumed	`model`, `operation`

Integration¶

Prometheus Configuration¶

Complete Prometheus configuration example:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
    - job_name: 'continuum-router'
    static_configs:
      - targets: ['router1:8000', 'router2:8000']
    metric_relabel_configs:
      # Drop high-cardinality metrics if needed
      - source_labels: [__name__]
        regex: 'http_request_duration_seconds_bucket'
        action: drop

Kubernetes Integration¶

For Kubernetes deployments, use ServiceMonitor:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: continuum-router
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: continuum-router
  endpoints:
    - port: metrics
    interval: 15s
    path: /metrics

Grafana Dashboard¶

The provided Grafana dashboard includes:

Overview Panel¶

Request rate and error rate
P50, P95, P99 latencies
Active connections
Backend health status

Backend Performance¶

Backend-specific latencies
Health check success rate
Connection pool utilization
Circuit breaker status

Model Usage¶

Model request distribution
Cache hit rates
Token consumption
Model availability matrix

Alerts Overview¶

Active alerts
Alert history
SLO compliance

To import the dashboard:

Open Grafana
Go to Dashboards → Import
Upload monitoring/grafana/dashboards/router-overview.json
Select your Prometheus data source
Click Import

Alerting¶

Pre-configured alert rules are available in monitoring/prometheus/alerts.yml:

Critical Alerts¶

- alert: BackendDown
  expr: backend_health_status == 0
  for: 1m
  annotations:
    summary: "Backend {{ $labels.backend_id }} is down"

- alert: HighErrorRate
  expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
  for: 5m
  annotations:
    summary: "High error rate: {{ $value | humanizePercentage }}"

Warning Alerts¶

- alert: HighLatency
  expr: histogram_quantile(0.95, http_request_duration_seconds) > 1
  for: 5m
  annotations:
    summary: "P95 latency above 1s: {{ $value | humanizeDuration }}"

- alert: LowCacheHitRate
  expr: rate(model_cache_hits_total[5m]) / rate(model_cache_total[5m]) < 0.8
  for: 10m
  annotations:
    summary: "Cache hit rate below 80%: {{ $value | humanizePercentage }}"

Examples¶

Query Examples¶

Request Rate by Status¶

sum(rate(http_requests_total[5m])) by (status)

P95 Latency by Endpoint¶

histogram_quantile(0.95, 
  sum(rate(http_request_duration_seconds_bucket[5m])) by (endpoint, le)
)

Backend Health Overview¶

sum(backend_health_status) by (backend_id)

Model Usage Ranking¶

topk(10, sum(rate(model_usage_total[1h])) by (model))

Error Rate Percentage¶

sum(rate(http_requests_total{status=~"5.."}[5m])) / 
sum(rate(http_requests_total[5m])) * 100

Programmatic Access¶

You can also access metrics programmatically:

import requests
from prometheus_client.parser import text_string_to_metric_families

# Fetch metrics
response = requests.get('http://localhost:8000/metrics')
metrics = text_string_to_metric_families(response.text)

# Process metrics
for family in metrics:
    for sample in family.samples:
        if sample.name == 'http_requests_total':
            print(f"Endpoint: {sample.labels['endpoint']}, Count: {sample.value}")

Custom Metrics Collection¶

#!/bin/bash
# Collect metrics every 30 seconds and save to file

while true; do
  timestamp=$(date +%s)
  curl -s http://localhost:8000/metrics > "metrics_${timestamp}.txt"
  sleep 30
done

Best Practices¶

1. Label Cardinality¶

Keep label cardinality low to prevent metric explosion:

# Good: Low cardinality
labels:
  status: "200"  # ~5 possible values
  method: "GET"  # ~7 possible values

# Bad: High cardinality
labels:
  user_id: "12345"  # Unbounded
  request_id: "abc-123"  # Unique per request

2. Metric Naming¶

Follow Prometheus naming conventions:

Use snake_case
Include units in metric names (_seconds, _bytes, _total)
Use standard prefixes (http_, backend_, model_)

3. Dashboard Design¶

Group related metrics together
Use appropriate visualization types (gauge for current values, graph for time series)
Include both absolute values and rates
Set reasonable refresh intervals (15-30s for real-time, 1-5m for historical)

4. Alert Configuration¶

Use appropriate evaluation periods (for: 5m to avoid flapping)
Include context in alert descriptions
Set up alert routing based on severity
Test alerts in staging before production

5. Performance Considerations¶

Disable optional metrics if not needed
Use recording rules for complex queries
Implement proper metric retention policies
Consider using remote storage for long-term retention

6. Security¶

Protect metrics endpoint if sensitive data is exposed
Use TLS for Prometheus scraping in production
Implement authentication for Grafana dashboards
Audit metric access logs

Troubleshooting¶

Metrics Not Appearing¶

Check if metrics are enabled in configuration
Verify the metrics endpoint is accessible
Check Prometheus target status
Review router logs for metric initialization errors

High Memory Usage¶

Review cardinality limits
Check for unbounded labels
Reduce histogram buckets if needed
Enable metric expiration

Incorrect Values¶

Verify metric types (counter vs gauge)
Check aggregation functions
Review label selectors
Validate time ranges