Deployment Guide
Run Happy Horse 1.0 on your own infrastructure.
Deployment Options
Local GPU
Direct installation on a workstation with H100/A100
Cloud (AWS/GCP)
Spin up GPU instances on-demand
Docker
Containerized deployment for reproducibility
1. Local GPU Setup
# System requirements
# - NVIDIA H100 or A100 (≥48GB VRAM)
# - Ubuntu 22.04+ or similar Linux
# - CUDA 12.0+, Python 3.10+
# Clone and install
git clone https://github.com/happy-horse/happyhorse-1.git
cd happyhorse-1
pip install -r requirements.txt
# Download all weights (~30GB)
bash download_weights.sh
# Verify GPU
python -c "import torch; print(torch.cuda.get_device_name(0))"
# Run inference
python demo_generate.py --prompt "a sunset over the ocean" --duration 52. Cloud Deployment (AWS)
# Launch an H100 instance
# Recommended: p5.xlarge (1x H100 80GB)
# AMI: Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 22.04)
# SSH into instance
ssh -i your-key.pem ubuntu@<instance-ip>
# Setup (first time only)
git clone https://github.com/happy-horse/happyhorse-1.git
cd happyhorse-1
pip install -r requirements.txt
bash download_weights.sh
# Generate
python demo_generate.py --prompt "your prompt" --duration 5
# Estimated cost: ~$3.50/hour for p5.xlarge| Provider | Instance | GPU | ~Cost/hr |
|---|---|---|---|
| AWS | p5.xlarge | 1x H100 80GB | $3.50 |
| GCP | a3-highgpu-1g | 1x H100 80GB | $3.70 |
| Azure | ND H100 v5 | 1x H100 80GB | $3.60 |
| Lambda | gpu_1x_h100 | 1x H100 80GB | $2.49 |
3. Docker Deployment
# Pull the Docker image (once available)
docker pull happyhorse/happyhorse-1:latest
# Run with GPU support
docker run --gpus all -v ./output:/app/output \
happyhorse/happyhorse-1:latest \
--prompt "a cat playing piano" \
--duration 5 \
--output /app/output/result.mp4FP8 Quantization
Happy Horse supports FP8 quantization to reduce VRAM usage. The 8-step distilled checkpoint combined with FP8 can fit on a single GPU with less memory, though with a small quality tradeoff.
# Use the distilled model with FP8
python demo_generate.py \
--model distilled \
--precision fp8 \
--prompt "your prompt" \
--duration 5| Mode | VRAM Usage | Quality Impact |
|---|---|---|
| Base (FP16) | ~45GB | Full quality |
| Distilled (FP16) | ~35GB | Minimal loss |
| Distilled (FP8) | ~20GB | Slight quality reduction |