Happy Horse WikiHappy Horse Wiki

Deployment Guide

Run Happy Horse 1.0 on your own infrastructure.

Deployment Options

Local GPU

Direct installation on a workstation with H100/A100

Cloud (AWS/GCP)

Spin up GPU instances on-demand

Docker

Containerized deployment for reproducibility

1. Local GPU Setup

# System requirements
# - NVIDIA H100 or A100 (≥48GB VRAM)
# - Ubuntu 22.04+ or similar Linux
# - CUDA 12.0+, Python 3.10+

# Clone and install
git clone https://github.com/happy-horse/happyhorse-1.git
cd happyhorse-1
pip install -r requirements.txt

# Download all weights (~30GB)
bash download_weights.sh

# Verify GPU
python -c "import torch; print(torch.cuda.get_device_name(0))"

# Run inference
python demo_generate.py --prompt "a sunset over the ocean" --duration 5

2. Cloud Deployment (AWS)

# Launch an H100 instance
# Recommended: p5.xlarge (1x H100 80GB)
# AMI: Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 22.04)

# SSH into instance
ssh -i your-key.pem ubuntu@<instance-ip>

# Setup (first time only)
git clone https://github.com/happy-horse/happyhorse-1.git
cd happyhorse-1
pip install -r requirements.txt
bash download_weights.sh

# Generate
python demo_generate.py --prompt "your prompt" --duration 5

# Estimated cost: ~$3.50/hour for p5.xlarge
ProviderInstanceGPU~Cost/hr
AWSp5.xlarge1x H100 80GB$3.50
GCPa3-highgpu-1g1x H100 80GB$3.70
AzureND H100 v51x H100 80GB$3.60
Lambdagpu_1x_h1001x H100 80GB$2.49

3. Docker Deployment

# Pull the Docker image (once available)
docker pull happyhorse/happyhorse-1:latest

# Run with GPU support
docker run --gpus all -v ./output:/app/output \
  happyhorse/happyhorse-1:latest \
  --prompt "a cat playing piano" \
  --duration 5 \
  --output /app/output/result.mp4

FP8 Quantization

Happy Horse supports FP8 quantization to reduce VRAM usage. The 8-step distilled checkpoint combined with FP8 can fit on a single GPU with less memory, though with a small quality tradeoff.

# Use the distilled model with FP8
python demo_generate.py \
  --model distilled \
  --precision fp8 \
  --prompt "your prompt" \
  --duration 5
ModeVRAM UsageQuality Impact
Base (FP16)~45GBFull quality
Distilled (FP16)~35GBMinimal loss
Distilled (FP8)~20GBSlight quality reduction