Metrics and Monitoring¶
This document describes the metrics and monitoring capabilities of the Continuum Router.
Table of Contents¶
- Overview
- Quick Start
- Configuration
- Available Metrics
- Integration
- Grafana Dashboard
- Alerting
- Examples
- Best Practices
Overview¶
The Continuum Router exposes Prometheus-compatible metrics for monitoring system health, performance, and usage patterns. The metrics system is designed to be:
- Lightweight: Minimal performance overhead
- Broad coverage: Covers HTTP, backend, routing, model, streaming, and cache subsystems
- Production-ready: Includes cardinality limits and proper labeling
- Easy to integrate: Works with standard Prometheus/Grafana setups
For restart-survival history without standing up a full Prometheus stack, see the Persistent Metrics Log. It snapshots the registry to a local SQLite store and exposes recent history via GET /admin/metrics/history.
Quick Start¶
1. Enable Metrics¶
Metrics are enabled by default. The metrics endpoint is available at /metrics:
2. Configure Prometheus¶
Add the router as a target in your prometheus.yml:
scrape_configs:
- job_name: 'continuum-router'
static_configs:
- targets: ['localhost:8000']
scrape_interval: 15s
3. Import Grafana Dashboard¶
Import the provided dashboard from monitoring/grafana/dashboards/router-overview.json.
Configuration¶
Metrics configuration is done through the main config file:
metrics:
# Enable/disable metrics collection
enabled: true
# Metrics endpoint path
endpoint: "/metrics"
# Cardinality limits to prevent metric explosion
cardinality_limit:
max_labels_per_metric: 100
max_unique_label_values: 1000
# Optional metrics (disabled by default for performance)
optional_metrics:
enable_request_body_size: false
enable_response_body_size: false
enable_detailed_errors: true
Environment Variables¶
You can also configure metrics using environment variables:
# Enable/disable metrics
METRICS_ENABLED=true
# Change metrics endpoint
METRICS_ENDPOINT=/custom/metrics
# Enable optional metrics
METRICS_ENABLE_BODY_SIZE=true
Available Metrics¶
HTTP Metrics¶
| Metric | Type | Description | Labels |
|---|---|---|---|
http_requests_total |
Counter | Total number of HTTP requests | method, endpoint, status |
http_request_duration_seconds |
Histogram | Request latency | method, endpoint |
http_active_connections |
Gauge | Current active connections | - |
http_request_size_bytes |
Histogram | Request body size | method, endpoint |
http_response_size_bytes |
Histogram | Response body size | method, endpoint |
Backend Metrics¶
| Metric | Type | Description | Labels |
|---|---|---|---|
backend_health_status |
Gauge | Backend health (1=healthy, 0=unhealthy) | backend_id, backend_url |
backend_health_check_duration_seconds |
Histogram | Health check duration | backend_id |
backend_health_check_failures_total |
Counter | Total health check failures | backend_id, error_type |
backend_request_latency_seconds |
Histogram | Backend request latency | backend_id, endpoint |
backend_connection_pool_size |
Gauge | Connection pool size | backend_id |
backend_connection_pool_active |
Gauge | Active connections in pool | backend_id |
Routing Metrics¶
| Metric | Type | Description | Labels |
|---|---|---|---|
routing_decisions_total |
Counter | Total routing decisions | strategy, selected_backend |
routing_backend_selection_duration_seconds |
Histogram | Time to select backend | strategy |
routing_model_availability |
Gauge | Model availability per backend | model, backend_id |
routing_retries_total |
Counter | Total retry attempts | backend_id, reason |
routing_circuit_breaker_state |
Gauge | Circuit breaker state | backend_id |
Model Service Metrics¶
| Metric | Type | Description | Labels |
|---|---|---|---|
model_cache_hits_total |
Counter | Model cache hits | operation |
model_cache_misses_total |
Counter | Model cache misses | operation |
model_refresh_duration_seconds |
Histogram | Model list refresh duration | backend_id |
model_discovery_errors_total |
Counter | Model discovery errors | backend_id, error_type |
Cache Stampede Prevention Metrics¶
These metrics help monitor the cache stampede prevention mechanisms:
| Metric | Type | Description | Labels |
|---|---|---|---|
model_stale_while_revalidate_total |
Counter | Requests that returned stale data while refresh was in progress | - |
model_coalesced_requests_total |
Counter | Requests that waited for ongoing aggregation instead of triggering new one | - |
model_background_refreshes_total |
Counter | Background refresh operations initiated | - |
model_background_refresh_successes_total |
Counter | Successful background refresh operations | - |
model_background_refresh_failures_total |
Counter | Failed background refresh operations | - |
model_singleflight_lock_acquired_total |
Counter | Times the aggregation lock was acquired for singleflight | - |
Understanding Cache Stampede Metrics¶
- High
coalesced_requests: Indicates the singleflight pattern is effectively preventing duplicate aggregations - High
stale_while_revalidate: Shows the stale-while-revalidate pattern is returning cached data during refresh - Low
background_refresh_failures: Confirms background refresh is working correctly - Zero blocking on cache miss: When
background_refreshes > 0, requests should rarely block on cache refresh
Streaming Metrics¶
| Metric | Type | Description | Labels |
|---|---|---|---|
streaming_active_connections |
Gauge | Active streaming connections | endpoint |
streaming_events_sent_total |
Counter | Total SSE events sent | endpoint, event_type |
streaming_connection_duration_seconds |
Histogram | Streaming connection duration | endpoint |
streaming_errors_total |
Counter | Streaming errors | endpoint, error_type |
Mid-Stream Fallback Metrics¶
These metrics are emitted when the mid-stream fallback feature is enabled (streaming.mid_stream_fallback.enabled: true).
| Metric | Type | Description | Labels |
|---|---|---|---|
streaming_fallback_total |
Counter | Total mid-stream fallback attempts | reason |
streaming_fallback_success_total |
Counter | Successful mid-stream fallback recoveries | original_backend, fallback_backend |
streaming_fallback_accumulated_tokens |
Histogram | Estimated tokens accumulated before fallback | outcome (success, failure) |
Reason Label Values for streaming_fallback_total¶
| Value | Description |
|---|---|
timeout |
Backend inactivity timeout exceeded |
connection_error |
TCP/TLS connection error |
stream_read_error |
Error reading bytes from stream |
stream_ended_unexpectedly |
Stream closed without [DONE] marker |
too_many_stream_errors |
Consecutive error event threshold reached |
other |
Other failure reason |
Key PromQL Queries¶
# Mid-stream fallback rate
rate(streaming_fallback_total[5m])
# Fallback recovery success rate
sum(rate(streaming_fallback_success_total[5m])) /
sum(rate(streaming_fallback_total[5m]))
# Median accumulated tokens at fallback trigger
histogram_quantile(0.5, rate(streaming_fallback_accumulated_tokens_bucket[5m]))
Fallback Metrics¶
| Metric | Type | Description | Labels |
|---|---|---|---|
fallback_attempts_total |
Counter | Total fallback attempts | original_model, fallback_model, reason |
fallback_success_total |
Counter | Successful fallbacks | original_model, fallback_model |
fallback_exhausted_total |
Counter | Exhausted fallback chains | original_model |
fallback_cross_provider_total |
Counter | Cross-provider fallbacks | from_provider, to_provider |
fallback_duration_seconds |
Histogram | Fallback operation duration | original_model |
Response Cache Metrics¶
| Metric | Type | Description | Labels |
|---|---|---|---|
continuum_response_cache_requests_total |
Counter | Cache lookups by result | result (hit, miss, skip) |
continuum_response_cache_entries |
Gauge | Current number of cached entries | -- |
continuum_response_cache_size_bytes |
Gauge | Approximate cache memory usage | -- |
continuum_response_cache_evictions_total |
Counter | LRU evictions | -- |
continuum_response_cache_hit_rate |
Gauge | Rolling cache hit rate (0.0--1.0) | -- |
continuum_cache_backend_type |
Gauge | Active cache backend (1 = active) | backend (memory, redis) |
Redis Cache Backend Metrics¶
These metrics are populated when the Redis cache backend is active (backend: redis).
| Metric | Type | Description | Labels |
|---|---|---|---|
continuum_cache_redis_connections_active |
Gauge | Active Redis connections in the pool | -- |
continuum_cache_redis_connections_idle |
Gauge | Idle Redis connections in the pool | -- |
continuum_cache_redis_latency_seconds |
Histogram | Redis operation latency | operation (get, set, delete) |
continuum_cache_redis_errors_total |
Counter | Redis errors by type | type (connection, timeout, other) |
continuum_cache_fallback_active |
Gauge | Whether in-memory fallback is active (0 or 1) | -- |
KV Event Consumer Metrics¶
These metrics are populated when vLLM KV event consumers are active (src/infrastructure/kv_index/). All backend label values are sanitized to prevent cardinality explosion.
| Metric | Type | Description | Labels |
|---|---|---|---|
continuum_kv_event_received_total |
Counter | KV cache events received from each backend | backend |
continuum_kv_event_processed_total |
Counter | KV cache events successfully forwarded via channel | backend |
continuum_kv_event_dropped_total |
Counter | KV cache events dropped due to backpressure | backend |
continuum_kv_consumer_connected |
Gauge | Whether the KV event consumer is connected (1 = connected, 0 = disconnected) | backend |
continuum_kv_consumer_reconnects_total |
Counter | Total reconnection attempts for each backend consumer | backend |
Prefix Routing Metrics¶
These metrics track prefix-aware sticky routing decisions and backend distribution.
| Metric | Type | Description | Labels |
|---|---|---|---|
continuum_prefix_routing_requests_total |
Counter | Total prefix routing decisions by strategy type | strategy (prefix_hash, overflow, fallback, unknown) |
continuum_prefix_routing_backend_distribution |
Gauge | In-flight requests per backend (for load balancing) | backend |
continuum_prefix_routing_prefix_cardinality |
Gauge | Approximate number of unique prefix keys seen | -- |
Key PromQL Queries¶
# Prefix routing hit rate (% of requests using prefix hash vs fallback)
sum(rate(continuum_prefix_routing_requests_total{strategy="prefix_hash"}[5m])) /
sum(rate(continuum_prefix_routing_requests_total[5m]))
# Overflow rate (CHWBL load balancing activations)
rate(continuum_prefix_routing_requests_total{strategy="overflow"}[5m])
# Backend load distribution (should be roughly even)
continuum_prefix_routing_backend_distribution
KV Cache Index Metrics¶
These metrics track the KV cache index subsystem including index state, query performance, routing decisions, and overlap scoring.
| Metric | Type | Description | Labels |
|---|---|---|---|
continuum_kv_index_entries |
Gauge | Current number of entries in the KV cache index | -- |
continuum_kv_index_events_total |
Counter | KV cache index mutation events (created/evicted) | backend, type (created, evicted) |
continuum_kv_index_query_latency_seconds |
Histogram | Latency of KV index query operations | -- |
continuum_kv_index_routing_decisions_total |
Counter | KV-aware routing decisions by outcome | decision (kv_aware, fallback) |
continuum_kv_index_overlap_score |
Histogram | Distribution of overlap scores for routed requests | -- |
continuum_kv_index_event_source_status |
Gauge | Event source connection status (1 = connected, 0 = disconnected) | backend, status |
Key PromQL Queries¶
# KV-aware routing ratio
sum(rate(continuum_kv_index_routing_decisions_total{decision="kv_aware"}[5m])) /
sum(rate(continuum_kv_index_routing_decisions_total[5m]))
# Average overlap score for routed requests
histogram_quantile(0.5, rate(continuum_kv_index_overlap_score_bucket[5m]))
# KV index query P99 latency
histogram_quantile(0.99, rate(continuum_kv_index_query_latency_seconds_bucket[5m]))
# Event source connection health
continuum_kv_index_event_source_status{status="connected"}
Smart Routing Metrics¶
These metrics cover the smart routing pipeline, including the LLM-based classifier.
Classification and Routing¶
| Metric | Type | Description | Labels |
|---|---|---|---|
smart_routing_classifications_total |
Counter | Total classifications performed | complexity, domain, classifier_type |
smart_routing_decisions_total |
Counter | Total routing decisions made | source_model, target_model, policy, tier |
smart_routing_classifier_duration_seconds |
Histogram | Classifier latency | classifier_type |
smart_routing_policy_no_match_total |
Counter | Requests with no matching policy | - |
smart_routing_tier_no_model_total |
Counter | Policy matched but no model available in tier | tier |
Load Management¶
| Metric | Type | Description | Labels |
|---|---|---|---|
smart_routing_load_state |
Gauge | Current load state: 0=Normal, 1=Warning, 2=Critical | - |
smart_routing_tier_degradation_total |
Counter | Routing degraded due to load | load_state |
smart_routing_load_transitions_total |
Counter | Load state transitions | from_state, to_state |
LLM Classifier¶
| Metric | Type | Description | Labels |
|---|---|---|---|
smart_routing_llm_classifier_calls_total |
Counter | Total LLM classifier invocations | - |
smart_routing_llm_classifier_cache_hits_total |
Counter | Classification results served from cache | - |
smart_routing_llm_classifier_duration_seconds |
Histogram | End-to-end LLM classification latency (buckets: 50ms–5s) | - |
smart_routing_llm_classifier_fallbacks_total |
Counter | Times the LLM result was discarded and rule-based result used | - |
smart_routing_llm_classifier_parse_errors_total |
Counter | Response parse failures before retry | - |
smart_routing_llm_classifier_retries_total |
Counter | Retry attempts after initial parse failure | - |
Aggregate and Operational¶
| Metric | Type | Description | Labels |
|---|---|---|---|
smart_routing_requests_total |
Counter | Total smart-routed requests | source_model, target_model, policy, load_state |
smart_routing_tier_usage_total |
Counter | Tier usage distribution | tier, domain |
smart_routing_cost_estimate_total |
Counter | Estimated cost from tier optimization | tier |
smart_routing_policy_evaluations_total |
Counter | Policy evaluation frequency | policy_name, result |
smart_routing_model_availability |
Gauge | Available models per tier | model, tier |
Key PromQL Queries¶
# Smart routing request rate by policy
rate(smart_routing_requests_total[5m])
# Tier usage distribution
sum by(tier) (rate(smart_routing_tier_usage_total[5m]))
# LLM classifier cache hit rate
rate(smart_routing_llm_classifier_cache_hits_total[5m]) /
rate(smart_routing_llm_classifier_calls_total[5m])
# LLM classifier P95 latency
histogram_quantile(0.95, rate(smart_routing_llm_classifier_duration_seconds_bucket[5m]))
# LLM classifier fallback rate (reliability indicator)
rate(smart_routing_llm_classifier_fallbacks_total[5m]) /
rate(smart_routing_llm_classifier_calls_total[5m])
# Fraction of requests classified by LLM vs rule-based
rate(smart_routing_classifications_total{classifier_type="llm_based"}[5m]) /
rate(smart_routing_classifications_total[5m])
# Policy evaluation success rate
sum by(policy_name) (rate(smart_routing_policy_evaluations_total{result="matched"}[5m]))
Business Metrics¶
| Metric | Type | Description | Labels |
|---|---|---|---|
model_usage_total |
Counter | Model usage count | model, backend_id |
tokens_consumed_total |
Counter | Total tokens consumed | model, operation |
Guardrail Metrics¶
Exported when guardrails are configured and the metrics feature is enabled. Every guardrail decision is recorded so operators can observe what a policy does (or would do, in monitor mode) before and after enforcement.
| Metric | Type | Description | Labels |
|---|---|---|---|
guardrail_checks_total |
Counter | Per-provider checks by stage and verdict result | stage, provider, result |
guardrail_blocks_total |
Counter | Block verdicts by stage, provider, and category | stage, provider, category |
guardrail_check_duration_seconds |
Histogram | Per-provider check latency in seconds | stage, provider |
guardrail_errors_total |
Counter | Provider errors (timeout / hard failure) | provider, kind |
guardrail_fail_open_total |
Counter | Provider failures resolved fail-open (request allowed) | provider |
guardrail_fail_closed_total |
Counter | Provider failures resolved fail-closed (request blocked) | provider |
guardrail_verdicts_total |
Counter | Aggregated verdict per request after applying mode semantics | stage, mode, result |
Label values:
stageisinput,output, orstreaming.resultisallow,block,transform, orflag.kindistimeoutorerror.modeismonitororenforce. Becauseguardrail_verdicts_totalcarriesmode, monitor-mode verdicts are visible even though they never gate a request, which is what makes the monitor-then-enforce rollout observable.
Key PromQL Queries¶
# What would be blocked, broken down by category (monitor-mode tuning)
sum by (category) (rate(guardrail_blocks_total[1h]))
# Block rate per stage after enforcement
sum by (stage) (rate(guardrail_verdicts_total{result="block", mode="enforce"}[5m]))
# Provider error rate (timeouts vs hard failures)
sum by (provider, kind) (rate(guardrail_errors_total[5m]))
# P95 guardrail check latency per provider
histogram_quantile(0.95, sum by (le, provider) (rate(guardrail_check_duration_seconds_bucket[5m])))
For the full guardrail guide (concepts, providers, configuration, and the threshold-tuning workflow), see Guardrails.
Per-API-Key LLM Token Usage¶
The router publishes a per-API-key breakdown of LLM token consumption so operators can answer questions like "which key consumed the most completion tokens last hour?" or "how many prompt tokens did team X spend on model Y today?". This data is independent of the legacy aggregate counter and is intended for capacity planning, fair-use enforcement, and (eventually) cost attribution.
Metric Definition¶
| Metric | Type | Description | Labels |
|---|---|---|---|
llm_tokens_total |
Counter | LLM tokens consumed per request | api_key_id, model, backend, kind |
api_key_info |
Gauge (constant 1) | Info-metric exposing configured API-key annotations as labels | api_key_id, plus the configured annotation allowlist |
kind is one of:
prompt— tokens in the upstream request promptcompletion— tokens in the upstream response completion
Both OpenAI-compatible (prompt_tokens / completion_tokens) and Anthropic (input_tokens / output_tokens) response shapes are normalized into the same counter. The router also injects stream_options.include_usage=true on OpenAI-compat streaming requests so usage data arrives in the final SSE chunk regardless of client behavior.
api_key_id Derivation¶
api_key_id is never the raw API key. The router derives a stable, non-reversible identifier in this priority order:
- If the request's bearer token matches a configured API-key entry, the entry's
idfield is used (e.g.,key-production-1). - Otherwise, the router computes SHA-256 over the raw token and uses the first 12 hex characters prefixed with
k_(e.g.,k_3f5a7c9b1e2d). - If no token is presented, the literal value
anonymousis used.
All label values flow through the existing CardinalityManager so a runaway/rotating-key attack cannot exhaust Prometheus series.
Annotation Labels and api_key_info¶
Each configured API key may carry an optional free-form annotations: { key: value } map. Operators declare which annotation keys become Prometheus labels via the global metrics.annotation_labels allowlist; everything else stays internal.
Configuration schema (under the existing api_keys block):
api_keys:
api_keys:
- key: "${API_KEY_1}"
id: "key-production-1"
user_id: "user-admin"
organization_id: "org-main"
annotations:
email: "ops@example.com"
team: "platform"
environment: "prod"
owner: "alice"
metrics:
enabled: true
annotation_labels: [email, team] # Allowlist of label keys
Reserved annotation keys (recommended canonical names, not enforced): email, uuid, owner, team, environment. Operators may add custom keys.
When metrics.annotation_labels is non-empty, the router publishes api_key_info{api_key_id, email, team, ...} = 1 once per known key. Use PromQL joins to project the metadata onto llm_tokens_total without bloating its label set:
# Tokens per email (sums prompt + completion, last 24h)
sum by (email) (
increase(llm_tokens_total[24h])
* on (api_key_id) group_left(email) api_key_info
)
Cardinality and Hot-Reload¶
api_key_idcardinality is bounded at 1000 by default.- Hot-reload of API-key annotations is supported via the existing config-reload pipeline. The
api_key_infoinfo-metric is republished atomically on every reload; counter values forllm_tokens_totalare never reset. - The label set on
api_key_info(i.e., the contents ofannotation_labels) is frozen at startup. Adding or removing keys from the allowlist requires a restart — Prometheus does not allow renaming labels on a registered metric.
Example PromQL Queries¶
# Total prompt tokens consumed per API key in the last hour
sum by (api_key_id) (
increase(llm_tokens_total{kind="prompt"}[1h])
)
# Top 10 keys by completion tokens in the last 24h
topk(10,
sum by (api_key_id) (
increase(llm_tokens_total{kind="completion"}[24h])
)
)
# Tokens grouped by team (requires team in annotation_labels)
sum by (team) (
increase(llm_tokens_total[24h])
* on (api_key_id) group_left(team) api_key_info
)
# Combined prompt+completion rate per model (tokens/sec)
sum by (model) (rate(llm_tokens_total[5m]))
# Per-key consumption by backend (useful for cost attribution)
sum by (api_key_id, backend) (
increase(llm_tokens_total[24h])
)
Grafana Panel Example¶
A simple Grafana stat panel showing the top 10 teams by completion tokens over the last 24 hours:
{
"title": "Top 10 teams by completion tokens (24h)",
"type": "stat",
"targets": [
{
"expr": "topk(10, sum by (team) (increase(llm_tokens_total{kind=\"completion\"}[24h]) * on (api_key_id) group_left(team) api_key_info))",
"legendFormat": "{{team}}"
}
],
"options": {
"reduceOptions": {
"values": false,
"calcs": ["lastNotNull"]
}
}
}
For tracking spend trends, pair this with a time-series panel using rate(llm_tokens_total[5m]) grouped by team or model.
Verification Steps¶
After enabling the feature:
- Issue a chat-completion request with a configured API key.
- Scrape
/metricsand confirmllm_tokens_total{...}andapi_key_info{...}series appear. - For streaming, verify the counter still increments — usage is captured from the final SSE chunk. The router injects
stream_options.include_usage=trueautomatically for OpenAI-compat backends so this works regardless of client behavior. - Inspect
/metricscardinality on a typical workload (e.g.,wc -l < /metrics) to confirm no regression versus the prior baseline.
Integration¶
Prometheus Configuration¶
Complete Prometheus configuration example:
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'continuum-router'
static_configs:
- targets: ['router1:8000', 'router2:8000']
metric_relabel_configs:
# Drop high-cardinality metrics if needed
- source_labels: [__name__]
regex: 'http_request_duration_seconds_bucket'
action: drop
Kubernetes Integration¶
For Kubernetes deployments, use ServiceMonitor:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: continuum-router
namespace: monitoring
spec:
selector:
matchLabels:
app: continuum-router
endpoints:
- port: metrics
interval: 15s
path: /metrics
Grafana Dashboard¶
The provided Grafana dashboard includes:
Overview Panel¶
- Request rate and error rate
- P50, P95, P99 latencies
- Active connections
- Backend health status
Backend Performance¶
- Backend-specific latencies
- Health check success rate
- Connection pool utilization
- Circuit breaker status
Model Usage¶
- Model request distribution
- Cache hit rates
- Token consumption
- Model availability matrix
Alerts Overview¶
- Active alerts
- Alert history
- SLO compliance
To import the dashboard:
- Open Grafana
- Go to Dashboards → Import
- Upload
monitoring/grafana/dashboards/router-overview.json - Select your Prometheus data source
- Click Import
Alerting¶
Pre-configured alert rules are available in monitoring/prometheus/alerts.yml:
Critical Alerts¶
{% raw %}
- alert: BackendDown
expr: backend_health_status == 0
for: 1m
annotations:
summary: "Backend {{ $labels.backend_id }} is down"
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
for: 5m
annotations:
summary: "High error rate: {{ $value | humanizePercentage }}"
Warning Alerts¶
{% raw %}
- alert: HighLatency
expr: histogram_quantile(0.95, http_request_duration_seconds) > 1
for: 5m
annotations:
summary: "P95 latency above 1s: {{ $value | humanizeDuration }}"
- alert: LowCacheHitRate
expr: rate(model_cache_hits_total[5m]) / rate(model_cache_total[5m]) < 0.8
for: 10m
annotations:
summary: "Cache hit rate below 80%: {{ $value | humanizePercentage }}"
Examples¶
Query Examples¶
Request Rate by Status¶
P95 Latency by Endpoint¶
Backend Health Overview¶
Model Usage Ranking¶
Error Rate Percentage¶
Programmatic Access¶
You can also access metrics programmatically:
import requests
from prometheus_client.parser import text_string_to_metric_families
# Fetch metrics
response = requests.get('http://localhost:8000/metrics')
metrics = text_string_to_metric_families(response.text)
# Process metrics
for family in metrics:
for sample in family.samples:
if sample.name == 'http_requests_total':
print(f"Endpoint: {sample.labels['endpoint']}, Count: {sample.value}")
Custom Metrics Collection¶
#!/bin/bash
# Collect metrics every 30 seconds and save to file
while true; do
timestamp=$(date +%s)
curl -s http://localhost:8000/metrics > "metrics_${timestamp}.txt"
sleep 30
done
Best Practices¶
1. Label Cardinality¶
Keep label cardinality low to prevent metric explosion:
# Good: Low cardinality
labels:
status: "200" # ~5 possible values
method: "GET" # ~7 possible values
# Bad: High cardinality
labels:
user_id: "12345" # Unbounded
request_id: "abc-123" # Unique per request
2. Metric Naming¶
Follow Prometheus naming conventions:
- Use
snake_case - Include units in metric names (
_seconds,_bytes,_total) - Use standard prefixes (
http_,backend_,model_)
3. Dashboard Design¶
- Group related metrics together
- Use appropriate visualization types (gauge for current values, graph for time series)
- Include both absolute values and rates
- Set reasonable refresh intervals (15-30s for real-time, 1-5m for historical)
4. Alert Configuration¶
- Use appropriate evaluation periods (
for: 5mto avoid flapping) - Include context in alert descriptions
- Set up alert routing based on severity
- Test alerts in staging before production
5. Performance Considerations¶
- Disable optional metrics if not needed
- Use recording rules for complex queries
- Implement proper metric retention policies
- Consider using remote storage for long-term retention
6. Security¶
- Protect metrics endpoint if sensitive data is exposed
- Use TLS for Prometheus scraping in production
- Implement authentication for Grafana dashboards
- Audit metric access logs
Troubleshooting¶
Metrics Not Appearing¶
- Check if metrics are enabled in configuration
- Verify the metrics endpoint is accessible
- Check Prometheus target status
- Review router logs for metric initialization errors
High Memory Usage¶
- Review cardinality limits
- Check for unbounded labels
- Reduce histogram buckets if needed
- Enable metric expiration
Incorrect Values¶
- Verify metric types (counter vs gauge)
- Check aggregation functions
- Review label selectors
- Validate time ranges