Production Deployment Guide¶
This guide covers various deployment strategies, configurations, and best practices for running Continuum Router in production environments.
Table of Contents¶
- Deployment Options
- Docker Deployment
- Kubernetes Deployment
- Systemd Service
- Cloud Deployments
- High Availability Setup
- Performance Tuning
- Security Hardening
- Monitoring and Observability
- Backup and Recovery
Deployment Options¶
| Method | Best For | Pros | Cons |
|---|---|---|---|
| Docker | Single instance, development | Easy setup, portable | Single point of failure |
| Kubernetes | Large scale, auto-scaling | HA, auto-scaling, orchestration | Complex setup |
| Systemd | Bare metal, VMs | Direct control, simple | Manual scaling |
| Docker Swarm | Medium scale | Simple orchestration | Limited features |
| Cloud PaaS | Managed deployment | Low maintenance | Vendor lock-in |
Docker Deployment¶
Continuum Router provides two Docker image options:
| Image | Base | Size | Use Case |
|---|---|---|---|
lablup/continuum-router:VERSION | Debian Bookworm | ~50MB | General use, better compatibility |
lablup/continuum-router:VERSION-alpine | Alpine 3.20 | ~10MB | Minimal size, Kubernetes |
Quick Start with Docker Compose¶
The fastest way to get started is using Docker Compose:
# Create a configuration file
curl -fsSL https://raw.githubusercontent.com/lablup/continuum-router/main/config.yaml.example > config.yaml
# Edit config.yaml to add your backends and API keys
# Then start the router
docker compose up -d
# View logs
docker compose logs -f continuum-router
Running with Docker¶
# Run with default configuration
docker run -d \
--name continuum-router \
-p 8080:8080 \
-v $(pwd)/config.yaml:/etc/continuum-router/config.yaml:ro \
lablup/continuum-router:latest
# Run Alpine variant for smaller image
docker run -d \
--name continuum-router \
-p 8080:8080 \
-v $(pwd)/config.yaml:/etc/continuum-router/config.yaml:ro \
lablup/continuum-router:latest-alpine
# Run with custom log level
docker run -d \
--name continuum-router \
-p 8080:8080 \
-e RUST_LOG=debug \
-v $(pwd)/config.yaml:/etc/continuum-router/config.yaml:ro \
lablup/continuum-router:latest
Building Custom Images¶
Two Dockerfiles are provided in the repository:
Dockerfile- Debian-based image using pre-built binariesDockerfile.alpine- Alpine-based image using musl binaries
Build from Pre-built Binaries (Recommended)¶
This method downloads pre-built binaries from GitHub Releases:
# Build Debian-based image
docker build --build-arg VERSION=0.21.0 -t continuum-router:0.21.0 .
# Build Alpine-based image
docker build -f Dockerfile.alpine --build-arg VERSION=0.21.0 -t continuum-router:0.21.0-alpine .
# Multi-platform build with buildx
docker buildx build --platform linux/amd64,linux/arm64 \
--build-arg VERSION=0.21.0 \
-t lablup/continuum-router:0.21.0 \
--push .
Build from Source¶
For development or customization, use the multi-stage build:
# Multi-stage build for optimal size
FROM rust:1.75-slim as builder
# Install build dependencies
RUN apt-get update && apt-get install -y \
pkg-config \
libssl-dev \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
# Cache dependencies
COPY Cargo.toml Cargo.lock ./
RUN mkdir src && echo "fn main() {}" > src/main.rs
RUN cargo build --release && rm -rf src
# Build application
COPY . .
RUN touch src/main.rs && cargo build --release
# Runtime image
FROM debian:bookworm-slim
# Install runtime dependencies
RUN apt-get update && apt-get install -y \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*
# Create non-root user
RUN useradd -r -s /bin/false continuum
# Copy binary
COPY --from=builder /app/target/release/continuum-router /usr/local/bin/
RUN chmod +x /usr/local/bin/continuum-router
# Set up directories
RUN mkdir -p /etc/continuum-router /var/log/continuum-router
RUN chown -R continuum:continuum /etc/continuum-router /var/log/continuum-router
USER continuum
EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD ["/usr/local/bin/continuum-router", "--health-check"]
ENTRYPOINT ["continuum-router"]
CMD ["--config", "/etc/continuum-router/config.yaml"]
Health Checks¶
Continuum Router includes a built-in health check command for container orchestration:
# Check health from within container
continuum-router --health-check
# Check health with custom URL
continuum-router --health-check --health-check-url http://localhost:8080/health
The health check:
- Returns exit code 0 if the server is healthy
- Returns exit code 1 if the server is unreachable or unhealthy
- Has a 5-second timeout by default
Docker Compose Production Setup¶
services:
continuum-router:
image: lablup/continuum-router:latest
container_name: continuum-router
restart: always
ports:
- "8080:8080"
volumes:
- ./config.yaml:/etc/continuum-router/config.yaml:ro
- ./logs:/var/log/continuum-router
environment:
- RUST_LOG=info
- RUST_BACKTRACE=1
networks:
- llm-network
deploy:
resources:
limits:
cpus: '2'
memory: 512M
reservations:
cpus: '1'
memory: 256M
healthcheck:
test: ["CMD", "continuum-router", "--health-check"]
interval: 30s
timeout: 3s
retries: 3
start_period: 5s
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
# Example backend services
ollama:
image: ollama/ollama:latest
volumes:
- ollama_data:/root/.ollama
networks:
- llm-network
deploy:
resources:
limits:
cpus: '4'
memory: 8G
networks:
llm-network:
driver: bridge
volumes:
ollama_data:
Docker Swarm Deployment¶
services:
continuum-router:
image: lablup/continuum-router:latest
deploy:
replicas: 3
update_config:
parallelism: 1
delay: 10s
failure_action: rollback
restart_policy:
condition: any
delay: 5s
max_attempts: 3
placement:
constraints:
- node.role == worker
ports:
- "8080:8080"
configs:
- source: router_config
target: /etc/continuum-router/config.yaml
networks:
- llm-overlay
healthcheck:
test: ["CMD", "continuum-router", "--health-check"]
interval: 30s
timeout: 3s
retries: 3
start_period: 5s
configs:
router_config:
external: true
networks:
llm-overlay:
driver: overlay
attachable: true
Kubernetes Deployment¶
Complete Kubernetes Manifests¶
# namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: continuum-router
---
# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: continuum-router-config
namespace: continuum-router
data:
config.yaml: |
server:
bind_address: "0.0.0.0:8080"
workers: 4
backends:
- url: http://ollama-service:11434
name: ollama
weight: 1
health_checks:
enabled: true
interval: 30s
timeout: 10s
---
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: continuum-router
namespace: continuum-router
labels:
app: continuum-router
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: continuum-router
template:
metadata:
labels:
app: continuum-router
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- continuum-router
topologyKey: kubernetes.io/hostname
containers:
- name: continuum-router
image: ghcr.io/lablup/continuum-router:latest
imagePullPolicy: Always
ports:
- name: http
containerPort: 8080
protocol: TCP
env:
- name: RUST_LOG
value: "info"
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: config
mountPath: /etc/continuum-router
readOnly: true
resources:
requests:
memory: "256Mi"
cpu: "200m"
limits:
memory: "512Mi"
cpu: "1000m"
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
startupProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 0
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 30
volumes:
- name: config
configMap:
name: continuum-router-config
---
# service.yaml
apiVersion: v1
kind: Service
metadata:
name: continuum-router
namespace: continuum-router
labels:
app: continuum-router
spec:
type: LoadBalancer
ports:
- port: 80
targetPort: http
protocol: TCP
name: http
selector:
app: continuum-router
---
# hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: continuum-router
namespace: continuum-router
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: continuum-router
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
---
# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: continuum-router
namespace: continuum-router
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: "100m"
nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
nginx.ingress.kubernetes.io/proxy-send-timeout: "600"
spec:
ingressClassName: nginx
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: continuum-router
port:
number: 80
tls:
- hosts:
- api.example.com
secretName: continuum-router-tls
Helm Chart¶
# values.yaml
replicaCount: 3
image:
repository: ghcr.io/lablup/continuum-router
pullPolicy: IfNotPresent
tag: "latest"
service:
type: LoadBalancer
port: 80
targetPort: 8080
ingress:
enabled: true
className: nginx
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
hosts:
- host: api.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: continuum-router-tls
hosts:
- api.example.com
resources:
limits:
cpu: 1000m
memory: 512Mi
requests:
cpu: 200m
memory: 256Mi
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 10
targetCPUUtilizationPercentage: 70
targetMemoryUtilizationPercentage: 80
config:
backends:
- url: http://ollama:11434
name: ollama
health_checks:
enabled: true
interval: 30s
Systemd Service¶
Installation Script¶
#!/bin/bash
# install-systemd.sh
# Create user
sudo useradd -r -s /bin/false continuum
# Create directories
sudo mkdir -p /etc/continuum-router
sudo mkdir -p /var/log/continuum-router
sudo mkdir -p /opt/continuum-router
# Copy binary
sudo cp continuum-router /usr/local/bin/
sudo chmod +x /usr/local/bin/continuum-router
# Copy configuration
sudo cp config.yaml /etc/continuum-router/
# Set permissions
sudo chown -R continuum:continuum /etc/continuum-router
sudo chown -R continuum:continuum /var/log/continuum-router
# Install service file
sudo cp continuum-router.service /etc/systemd/system/
# Enable and start service
sudo systemctl daemon-reload
sudo systemctl enable continuum-router
sudo systemctl start continuum-router
Service File¶
# /etc/systemd/system/continuum-router.service
[Unit]
Description=Continuum Router - LLM API Router
Documentation=https://github.com/lablup/backend.ai-continuum
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
User=continuum
Group=continuum
WorkingDirectory=/opt/continuum-router
# Service execution
ExecStart=/usr/local/bin/continuum-router --config /etc/continuum-router/config.yaml
ExecReload=/bin/kill -USR1 $MAINPID
# Restart configuration
Restart=always
RestartSec=10
TimeoutStopSec=30
# Security hardening
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/var/log/continuum-router
ProtectKernelTunables=true
ProtectKernelModules=true
ProtectControlGroups=true
RestrictRealtime=true
RestrictNamespaces=true
RestrictSUIDSGID=true
PrivateDevices=true
SystemCallFilter=@system-service
# Resource limits
LimitNOFILE=65536
LimitNPROC=4096
# Environment
Environment="RUST_LOG=continuum_router=info"
Environment="RUST_BACKTRACE=1"
[Install]
WantedBy=multi-user.target
Cloud Deployments¶
AWS ECS¶
{
"family": "continuum-router",
"taskRoleArn": "arn:aws:iam::ACCOUNT_ID:role/ecsTaskRole",
"executionRoleArn": "arn:aws:iam::ACCOUNT_ID:role/ecsTaskExecutionRole",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "1024",
"memory": "2048",
"containerDefinitions": [
{
"name": "continuum-router",
"image": "ghcr.io/lablup/continuum-router:latest",
"portMappings": [
{
"containerPort": 8080,
"protocol": "tcp"
}
],
"essential": true,
"environment": [
{
"name": "RUST_LOG",
"value": "info"
}
],
"mountPoints": [
{
"sourceVolume": "config",
"containerPath": "/etc/continuum-router"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/continuum-router",
"awslogs-region": "us-west-2",
"awslogs-stream-prefix": "ecs"
}
},
"healthCheck": {
"command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
"interval": 30,
"timeout": 5,
"retries": 3,
"startPeriod": 60
}
}
],
"volumes": [
{
"name": "config",
"efsVolumeConfiguration": {
"fileSystemId": "fs-12345678",
"rootDirectory": "/config"
}
}
]
}
Google Cloud Run¶
# service.yaml
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: continuum-router
annotations:
run.googleapis.com/ingress: all
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/minScale: "1"
autoscaling.knative.dev/maxScale: "100"
spec:
containerConcurrency: 1000
timeoutSeconds: 300
containers:
- image: gcr.io/PROJECT_ID/continuum-router:latest
ports:
- containerPort: 8080
env:
- name: RUST_LOG
value: info
resources:
limits:
cpu: "2"
memory: 2Gi
livenessProbe:
httpGet:
path: /health
initialDelaySeconds: 10
periodSeconds: 10
Azure Container Instances¶
{
"location": "eastus",
"properties": {
"containers": [
{
"name": "continuum-router",
"properties": {
"image": "ghcr.io/lablup/continuum-router:latest",
"ports": [
{
"port": 8080,
"protocol": "TCP"
}
],
"resources": {
"requests": {
"cpu": 1.0,
"memoryInGB": 1.5
}
},
"environmentVariables": [
{
"name": "RUST_LOG",
"value": "info"
}
],
"livenessProbe": {
"httpGet": {
"path": "/health",
"port": 8080
},
"initialDelaySeconds": 30,
"periodSeconds": 10
}
}
}
],
"osType": "Linux",
"ipAddress": {
"type": "Public",
"ports": [
{
"port": 80,
"protocol": "TCP"
}
]
}
}
}
High Availability Setup¶
Multi-Region Deployment¶
# Global Load Balancer Configuration
regions:
- name: us-west
endpoints:
- https://us-west-1.api.example.com
- https://us-west-2.api.example.com
weight: 33
- name: eu-central
endpoints:
- https://eu-central-1.api.example.com
- https://eu-central-2.api.example.com
weight: 33
- name: asia-pacific
endpoints:
- https://ap-southeast-1.api.example.com
- https://ap-southeast-2.api.example.com
weight: 34
health_check:
path: /health
interval: 10s
timeout: 5s
healthy_threshold: 2
unhealthy_threshold: 3
failover:
primary: us-west
secondary: eu-central
tertiary: asia-pacific
Database Replication¶
# PostgreSQL HA Configuration
postgresql:
primary:
host: primary.db.example.com
port: 5432
replicas:
- host: replica1.db.example.com
port: 5432
- host: replica2.db.example.com
port: 5432
pooling:
max_connections: 100
connection_timeout: 10s
failover:
automatic: true
promote_timeout: 30s
Performance Tuning¶
High-Load Configuration¶
# config-highload.yaml
server:
bind_address: "0.0.0.0:8080"
workers: 16 # 2x CPU cores
connection_pool_size: 1000 # Large pool for many backends
keepalive_timeout: 75s # Match ALB timeout
request:
timeout: "30s" # Lower timeout for responsiveness
max_retries: 1 # Minimal retries
buffer_size: 65536 # 64KB buffer
health_checks:
enabled: true
interval: "60s" # Less frequent under load
timeout: "10s"
parallel: true # Parallel health checks
cache:
model_cache_ttl: "900s" # 15 min cache
enable_deduplication: true
max_entries: 10000
rate_limiting:
enabled: true
requests_per_minute: 1000
burst_size: 100
logging:
level: "warn"
format: "json"
buffer_size: 8192
Memory-Optimized Configuration¶
# config-memory.yaml
server:
connection_pool_size: 25 # Minimal connections
cache:
model_cache_ttl: "60s" # Short TTL
max_entries: 100 # Limited cache size
request:
buffer_size: 8192 # 8KB buffer
logging:
level: "error"
buffer_size: 1024
CPU-Optimized Configuration¶
# config-cpu.yaml
server:
workers: 32 # Maximize parallelism
threading:
tokio_worker_threads: 16
blocking_threads: 8
request:
parallel_backend_queries: true
selection_strategy: LeastLatency # CPU-efficient routing
Security Hardening¶
Network Security¶
# firewall-rules.yaml
ingress:
- protocol: tcp
port: 8080
source: 10.0.0.0/8 # Internal only
- protocol: tcp
port: 443
source: 0.0.0.0/0 # HTTPS from anywhere
egress:
- protocol: tcp
port: 443
destination: 0.0.0.0/0 # HTTPS to anywhere
- protocol: tcp
port: 11434
destination: 10.0.0.0/8 # Backend communication
TLS Configuration¶
# tls-config.yaml
tls:
enabled: true
cert_file: /etc/ssl/certs/server.crt
key_file: /etc/ssl/private/server.key
# TLS 1.2+ only
min_version: "1.2"
# Strong ciphers only
ciphers:
- TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
- TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
client_auth:
enabled: false
ca_file: /etc/ssl/certs/ca.crt
Authentication¶
Continuum Router supports API key authentication with configurable enforcement modes.
Authentication Modes¶
| Mode | Description |
|---|---|
permissive (default) | Requests without API key are allowed. Backward compatible. |
blocking | Only authenticated requests are processed. Recommended for production. |
Production Configuration¶
# config.yaml - Production authentication setup
api_keys:
# Enable blocking mode for mandatory authentication
mode: blocking
# Define API keys
api_keys:
- key: "${PROD_API_KEY}" # Use environment variable
id: "key-production-1"
user_id: "prod-user"
organization_id: "prod-org"
scopes: [read, write, files]
rate_limit: 1000
enabled: true
# Or load from external file for better security
api_keys_file: "/etc/continuum-router/api-keys.yaml"
External Key File Format¶
# /etc/continuum-router/api-keys.yaml
keys:
- key: "sk-prod-xxxxxxxxxxxxx"
id: "key-external-1"
user_id: "service-account"
organization_id: "production"
scopes: [read, write, files]
enabled: true
Making Authenticated Requests¶
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer sk-your-api-key" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello"}]}'
Protected Endpoints (blocking mode)¶
/v1/chat/completions/v1/completions/v1/responses/v1/images/generations/v1/images/edits/v1/images/variations/v1/models
Note: Health endpoints (/health, /healthz) are always accessible. Admin, Files, and Metrics endpoints have separate authentication.
Per-API-Key Rate Limiting¶
Each API key can have individual rate limits:
api_keys:
mode: blocking
api_keys:
- key: "${PREMIUM_KEY}"
id: "premium-user"
rate_limit: 5000 # 5000 requests per minute
scopes: [read, write, files, admin]
- key: "${STANDARD_KEY}"
id: "standard-user"
rate_limit: 100 # 100 requests per minute
scopes: [read, write]
Monitoring and Observability¶
Prometheus Integration¶
# prometheus-config.yaml
metrics:
enabled: true
endpoint: /metrics
# Cardinality limits
max_labels_per_metric: 10
max_unique_label_values: 100
# Custom metrics
custom:
- name: llm_request_duration
type: histogram
buckets: [0.1, 0.5, 1, 2, 5, 10, 30, 60]
- name: backend_errors
type: counter
labels: [backend, error_type]
Logging Configuration¶
# logging-config.yaml
logging:
level: info
format: json
outputs:
- type: stdout
level: info
- type: file
path: /var/log/continuum-router/app.log
rotation:
size: 100MB
count: 10
compress: true
- type: syslog
address: syslog.example.com:514
facility: local0
structured_fields:
service: continuum-router
environment: production
version: ${VERSION}
Tracing¶
# tracing-config.yaml
tracing:
enabled: true
exporter:
type: otlp
endpoint: http://jaeger:4317
sampling:
rate: 0.1 # Sample 10% of requests
propagation:
- tracecontext
- baggage
Backup and Recovery¶
Configuration Backup¶
#!/bin/bash
# backup-config.sh
BACKUP_DIR="/backup/continuum-router"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
# Create backup directory
mkdir -p $BACKUP_DIR
# Backup configuration
tar -czf $BACKUP_DIR/config_$TIMESTAMP.tar.gz \
/etc/continuum-router/
# Backup logs
tar -czf $BACKUP_DIR/logs_$TIMESTAMP.tar.gz \
/var/log/continuum-router/
# Keep only last 30 days
find $BACKUP_DIR -name "*.tar.gz" -mtime +30 -delete
Disaster Recovery Plan¶
- Regular Backups
- Configuration: Daily
- Logs: Weekly
-
Metrics: Monthly
-
Recovery Time Objectives
- RTO: < 1 hour
-
RPO: < 24 hours
-
Recovery Procedures
Troubleshooting¶
Common Issues¶
High Memory Usage¶
# Check memory usage
ps aux | grep continuum-router
# Analyze heap dump
gdb -p $(pidof continuum-router)
(gdb) gcore memory.dump
Connection Issues¶
# Check open connections
netstat -an | grep 8080
# Test backend connectivity
curl -I http://backend:11434/v1/models
Performance Degradation¶
# Enable debug logging
export RUST_LOG=debug
systemctl restart continuum-router
# Monitor metrics
curl http://localhost:8080/metrics | grep latency