빠른 시작¶

이 가이드는 Continuum Router를 몇 분 내에 시작하고 실행할 수 있도록 도와줍니다.

사전 요구 사항¶

실행 중인 LLM 백엔드 (OpenAI API, Ollama, vLLM, LocalAI, LM Studio 등)
백엔드 엔드포인트에 대한 네트워크 접근

설치¶

바이너리 다운로드¶

Linux (x86_64)Linux (ARM64)macOS (ARM64)macOS (x86_64)

curl -L https://github.com/lablup/continuum-router/releases/latest/download/continuum-router-linux-x86_64.tar.gz | tar -xz
sudo mv continuum-router /usr/local/bin/

curl -L https://github.com/lablup/continuum-router/releases/latest/download/continuum-router-linux-arm64.tar.gz | tar -xz
sudo mv continuum-router /usr/local/bin/

curl -L https://github.com/lablup/continuum-router/releases/latest/download/continuum-router-macos-arm64.tar.gz | tar -xz
sudo mv continuum-router /usr/local/bin/

curl -L https://github.com/lablup/continuum-router/releases/latest/download/continuum-router-macos-x86_64.tar.gz | tar -xz
sudo mv continuum-router /usr/local/bin/

소스에서 빌드¶

# 저장소 복제
git clone https://github.com/lablup/continuum-router.git
cd continuum-router

# 빌드 및 설치
cargo build --release
sudo mv target/release/continuum-router /usr/local/bin/

설정¶

기본 설정 생성¶

continuum-router --generate-config > config.yaml

기본 설정 예제¶

backends:
    - url: http://localhost:11434
    name: ollama
    models: ["llama3.2", "qwen3"]

    - url: http://localhost:1234
    name: lm-studio
    models: ["gpt-4", "claude-3"]

selection_strategy: LeastLatency

health_checks:
  enabled: true
  interval: 30s

속도 제한이 포함된 설정¶

backends:
    - url: http://localhost:11434
    name: ollama
    models: ["llama3.2", "qwen3"]

selection_strategy: LeastLatency

health_checks:
  enabled: true
  interval: 30s

rate_limiting:
  enabled: true
  storage: memory

  limits:
    per_client:
      requests_per_second: 10
      burst_capacity: 20
    per_backend:
      requests_per_second: 100
      burst_capacity: 200
    global:
      requests_per_second: 1000
      burst_capacity: 2000

  whitelist:
        - "192.168.1.0/24"
        - "10.0.0.1"

  bypass_keys:
        - "admin-key-123"

라우터 실행¶

서버 시작¶

continuum-router --config config.yaml

실행 확인¶

# 헬스 엔드포인트 확인
curl http://localhost:8080/health

# 사용 가능한 모델 목록
curl http://localhost:8080/v1/models

API 사용¶

채팅 완료¶

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

스트리밍 채팅 완료¶

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2",
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": true
  }'

모델 목록¶

curl http://localhost:8080/v1/models

다음 단계¶

설치 가이드 - 모든 플랫폼에 대한 상세 설치 지침
설정 가이드 - 전체 설정 참조
API 레퍼런스 - 전체 API 문서
로드 밸런싱 - 로드 밸런싱 전략 설정
배포 가이드 - Docker, Kubernetes, systemd를 사용한 프로덕션 배포

문제 해결¶

일반적인 문제¶

라우터가 시작되지 않음¶

다음을 확인하세요:

설정 파일이 존재하고 유효한 YAML인지
백엔드 URL이 접근 가능한지
포트 8080을 사용하는 다른 서비스가 없는지

사용 가능한 백엔드 없음¶

다음을 확인하세요:

백엔드가 실행 중이고 정상인지
설정의 백엔드 URL이 올바른지
헬스 체크가 백엔드에 도달할 수 있는지

연결 거부 오류¶

다음을 확인하세요:

방화벽이 백엔드로의 연결을 허용하는지
백엔드가 설정된 포트에서 수신 대기 중인지
네트워크 경로가 올바르게 설정되어 있는지

추가 도움이 필요하면 오류 처리 가이드를 참조하거나 GitHub에서 이슈를 열어주세요.