Skip to content

Metrics and Monitoring

This document describes the metrics and monitoring capabilities of the Continuum Router.

Table of Contents

Overview

The Continuum Router provides comprehensive Prometheus-compatible metrics for monitoring system health, performance, and usage patterns. The metrics system is designed to be:

  • Lightweight: Minimal performance overhead
  • Comprehensive: Covers all critical aspects of the router
  • Production-ready: Includes cardinality limits and proper labeling
  • Easy to integrate: Works with standard Prometheus/Grafana setups

Quick Start

1. Enable Metrics

Metrics are enabled by default. The metrics endpoint is available at /metrics:

# View metrics
curl http://localhost:8000/metrics

2. Configure Prometheus

Add the router as a target in your prometheus.yml:

scrape_configs:
    - job_name: 'continuum-router'
    static_configs:
      - targets: ['localhost:8000']
    scrape_interval: 15s

3. Import Grafana Dashboard

Import the provided dashboard from monitoring/grafana/dashboards/router-overview.json.

Configuration

Metrics configuration is done through the main config file:

metrics:
  # Enable/disable metrics collection
  enabled: true

  # Metrics endpoint path
  endpoint: "/metrics"

  # Cardinality limits to prevent metric explosion
  cardinality_limit:
    max_labels_per_metric: 100
    max_unique_label_values: 1000

  # Optional metrics (disabled by default for performance)
  optional_metrics:
    enable_request_body_size: false
    enable_response_body_size: false
    enable_detailed_errors: true

Environment Variables

You can also configure metrics using environment variables:

# Enable/disable metrics
METRICS_ENABLED=true

# Change metrics endpoint
METRICS_ENDPOINT=/custom/metrics

# Enable optional metrics
METRICS_ENABLE_BODY_SIZE=true

Available Metrics

HTTP Metrics

Metric Type Description Labels
http_requests_total Counter Total number of HTTP requests method, endpoint, status
http_request_duration_seconds Histogram Request latency method, endpoint
http_active_connections Gauge Current active connections -
http_request_size_bytes Histogram Request body size method, endpoint
http_response_size_bytes Histogram Response body size method, endpoint

Backend Metrics

Metric Type Description Labels
backend_health_status Gauge Backend health (1=healthy, 0=unhealthy) backend_id, backend_url
backend_health_check_duration_seconds Histogram Health check duration backend_id
backend_health_check_failures_total Counter Total health check failures backend_id, error_type
backend_request_latency_seconds Histogram Backend request latency backend_id, endpoint
backend_connection_pool_size Gauge Connection pool size backend_id
backend_connection_pool_active Gauge Active connections in pool backend_id

Routing Metrics

Metric Type Description Labels
routing_decisions_total Counter Total routing decisions strategy, selected_backend
routing_backend_selection_duration_seconds Histogram Time to select backend strategy
routing_model_availability Gauge Model availability per backend model, backend_id
routing_retries_total Counter Total retry attempts backend_id, reason
routing_circuit_breaker_state Gauge Circuit breaker state backend_id

Model Service Metrics

Metric Type Description Labels
model_cache_hits_total Counter Model cache hits operation
model_cache_misses_total Counter Model cache misses operation
model_refresh_duration_seconds Histogram Model list refresh duration backend_id
model_discovery_errors_total Counter Model discovery errors backend_id, error_type

Streaming Metrics

Metric Type Description Labels
streaming_active_connections Gauge Active streaming connections endpoint
streaming_events_sent_total Counter Total SSE events sent endpoint, event_type
streaming_connection_duration_seconds Histogram Streaming connection duration endpoint
streaming_errors_total Counter Streaming errors endpoint, error_type

Fallback Metrics

Metric Type Description Labels
fallback_attempts_total Counter Total fallback attempts original_model, fallback_model, reason
fallback_success_total Counter Successful fallbacks original_model, fallback_model
fallback_exhausted_total Counter Exhausted fallback chains original_model
fallback_cross_provider_total Counter Cross-provider fallbacks from_provider, to_provider
fallback_duration_seconds Histogram Fallback operation duration original_model

Business Metrics

Metric Type Description Labels
model_usage_total Counter Model usage count model, backend_id
tokens_consumed_total Counter Total tokens consumed model, operation

Integration

Prometheus Configuration

Complete Prometheus configuration example:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
    - job_name: 'continuum-router'
    static_configs:
      - targets: ['router1:8000', 'router2:8000']
    metric_relabel_configs:
      # Drop high-cardinality metrics if needed
      - source_labels: [__name__]
        regex: 'http_request_duration_seconds_bucket'
        action: drop

Kubernetes Integration

For Kubernetes deployments, use ServiceMonitor:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: continuum-router
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: continuum-router
  endpoints:
    - port: metrics
    interval: 15s
    path: /metrics

Grafana Dashboard

The provided Grafana dashboard includes:

Overview Panel

  • Request rate and error rate
  • P50, P95, P99 latencies
  • Active connections
  • Backend health status

Backend Performance

  • Backend-specific latencies
  • Health check success rate
  • Connection pool utilization
  • Circuit breaker status

Model Usage

  • Model request distribution
  • Cache hit rates
  • Token consumption
  • Model availability matrix

Alerts Overview

  • Active alerts
  • Alert history
  • SLO compliance

To import the dashboard:

  1. Open Grafana
  2. Go to Dashboards → Import
  3. Upload monitoring/grafana/dashboards/router-overview.json
  4. Select your Prometheus data source
  5. Click Import

Alerting

Pre-configured alert rules are available in monitoring/prometheus/alerts.yml:

Critical Alerts

- alert: BackendDown
  expr: backend_health_status == 0
  for: 1m
  annotations:
    summary: "Backend {{ $labels.backend_id }} is down"

- alert: HighErrorRate
  expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
  for: 5m
  annotations:
    summary: "High error rate: {{ $value | humanizePercentage }}"

Warning Alerts

- alert: HighLatency
  expr: histogram_quantile(0.95, http_request_duration_seconds) > 1
  for: 5m
  annotations:
    summary: "P95 latency above 1s: {{ $value | humanizeDuration }}"

- alert: LowCacheHitRate
  expr: rate(model_cache_hits_total[5m]) / rate(model_cache_total[5m]) < 0.8
  for: 10m
  annotations:
    summary: "Cache hit rate below 80%: {{ $value | humanizePercentage }}"

Examples

Query Examples

Request Rate by Status

sum(rate(http_requests_total[5m])) by (status)

P95 Latency by Endpoint

histogram_quantile(0.95, 
  sum(rate(http_request_duration_seconds_bucket[5m])) by (endpoint, le)
)

Backend Health Overview

sum(backend_health_status) by (backend_id)

Model Usage Ranking

topk(10, sum(rate(model_usage_total[1h])) by (model))

Error Rate Percentage

sum(rate(http_requests_total{status=~"5.."}[5m])) / 
sum(rate(http_requests_total[5m])) * 100

Programmatic Access

You can also access metrics programmatically:

import requests
from prometheus_client.parser import text_string_to_metric_families

# Fetch metrics
response = requests.get('http://localhost:8000/metrics')
metrics = text_string_to_metric_families(response.text)

# Process metrics
for family in metrics:
    for sample in family.samples:
        if sample.name == 'http_requests_total':
            print(f"Endpoint: {sample.labels['endpoint']}, Count: {sample.value}")

Custom Metrics Collection

#!/bin/bash
# Collect metrics every 30 seconds and save to file

while true; do
  timestamp=$(date +%s)
  curl -s http://localhost:8000/metrics > "metrics_${timestamp}.txt"
  sleep 30
done

Best Practices

1. Label Cardinality

Keep label cardinality low to prevent metric explosion:

# Good: Low cardinality
labels:
  status: "200"  # ~5 possible values
  method: "GET"  # ~7 possible values

# Bad: High cardinality
labels:
  user_id: "12345"  # Unbounded
  request_id: "abc-123"  # Unique per request

2. Metric Naming

Follow Prometheus naming conventions:

  • Use snake_case
  • Include units in metric names (_seconds, _bytes, _total)
  • Use standard prefixes (http_, backend_, model_)

3. Dashboard Design

  • Group related metrics together
  • Use appropriate visualization types (gauge for current values, graph for time series)
  • Include both absolute values and rates
  • Set reasonable refresh intervals (15-30s for real-time, 1-5m for historical)

4. Alert Configuration

  • Use appropriate evaluation periods (for: 5m to avoid flapping)
  • Include context in alert descriptions
  • Set up alert routing based on severity
  • Test alerts in staging before production

5. Performance Considerations

  • Disable optional metrics if not needed
  • Use recording rules for complex queries
  • Implement proper metric retention policies
  • Consider using remote storage for long-term retention

6. Security

  • Protect metrics endpoint if sensitive data is exposed
  • Use TLS for Prometheus scraping in production
  • Implement authentication for Grafana dashboards
  • Audit metric access logs

Troubleshooting

Metrics Not Appearing

  1. Check if metrics are enabled in configuration
  2. Verify the metrics endpoint is accessible
  3. Check Prometheus target status
  4. Review router logs for metric initialization errors

High Memory Usage

  1. Review cardinality limits
  2. Check for unbounded labels
  3. Reduce histogram buckets if needed
  4. Enable metric expiration

Incorrect Values

  1. Verify metric types (counter vs gauge)
  2. Check aggregation functions
  3. Review label selectors
  4. Validate time ranges

Additional Resources