Deployment Guide

Run Happy Horse 1.0 on your own infrastructure.

Deployment Options

Local GPU

Direct installation on a workstation with H100/A100

Cloud (AWS/GCP)

Spin up GPU instances on-demand

Docker

Containerized deployment for reproducibility

1. Local GPU Setup

# System requirements
# - NVIDIA H100 or A100 (≥48GB VRAM)
# - Ubuntu 22.04+ or similar Linux
# - CUDA 12.0+, Python 3.10+

# Clone and install
git clone https://github.com/happy-horse/happyhorse-1.git
cd happyhorse-1
pip install -r requirements.txt

# Download all weights (~30GB)
bash download_weights.sh

# Verify GPU
python -c "import torch; print(torch.cuda.get_device_name(0))"

# Run inference
python demo_generate.py --prompt "a sunset over the ocean" --duration 5

2. Cloud Deployment (AWS)

# Launch an H100 instance
# Recommended: p5.xlarge (1x H100 80GB)
# AMI: Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 22.04)

# SSH into instance
ssh -i your-key.pem ubuntu@<instance-ip>

# Setup (first time only)
git clone https://github.com/happy-horse/happyhorse-1.git
cd happyhorse-1
pip install -r requirements.txt
bash download_weights.sh

# Generate
python demo_generate.py --prompt "your prompt" --duration 5

# Estimated cost: ~$3.50/hour for p5.xlarge

Provider	Instance	GPU	~Cost/hr
AWS	p5.xlarge	1x H100 80GB	$3.50
GCP	a3-highgpu-1g	1x H100 80GB	$3.70
Azure	ND H100 v5	1x H100 80GB	$3.60
Lambda	gpu_1x_h100	1x H100 80GB	$2.49

3. Docker Deployment

# Pull the Docker image (once available)
docker pull happyhorse/happyhorse-1:latest

# Run with GPU support
docker run --gpus all -v ./output:/app/output \
  happyhorse/happyhorse-1:latest \
  --prompt "a cat playing piano" \
  --duration 5 \
  --output /app/output/result.mp4

FP8 Quantization

Happy Horse supports FP8 quantization to reduce VRAM usage. The 8-step distilled checkpoint combined with FP8 can fit on a single GPU with less memory, though with a small quality tradeoff.

# Use the distilled model with FP8
python demo_generate.py \
  --model distilled \
  --precision fp8 \
  --prompt "your prompt" \
  --duration 5

Mode	VRAM Usage	Quality Impact
Base (FP16)	~45GB	Full quality
Distilled (FP16)	~35GB	Minimal loss
Distilled (FP8)	~20GB	Slight quality reduction