Hardware Requirements

Everything you need to know about running Happy Horse 1.0.

TL;DR: You need an NVIDIA H100 or A100 with at least 48GB VRAM. Consumer GPUs (RTX 4090) cannot run the full model. FP8 quantization may enable A100 40GB usage with quality tradeoffs.

Minimum Requirements

GPU	NVIDIA H100 80GB or A100 80GB
VRAM	≥48GB (80GB recommended)
System RAM	≥64GB
Storage	~30GB for model weights
CUDA	12.0 or later
Python	3.10 or later
OS	Linux (Ubuntu 22.04+ recommended)

GPU Comparison

GPU	VRAM	Compatible?	Est. Speed	Notes
H100 80GB	80GB	Yes	38s (1080p)	Recommended — fastest inference
H100 NVL	94GB	Yes	~35s	Best performance, dual-GPU card
A100 80GB	80GB	Yes	~55s	Good performance, widely available
A100 40GB	40GB	FP8 only	~70s	Requires quantization, quality loss
L40S	48GB	FP8 only	~80s	Budget data center option
RTX 4090	24GB	No	—	Insufficient VRAM even with FP8
RTX 3090	24GB	No	—	Insufficient VRAM

Cloud Cost Comparison

On-demand pricing for single H100/A100 instances (as of April 2026).

Provider	GPU	$/hour	Cost per 1080p video	$/1000 videos
Lambda Cloud	H100	$2.49	$0.026	$26
AWS (p5.xlarge)	H100	$3.50	$0.037	$37
GCP	H100	$3.70	$0.039	$39
Azure	H100	$3.60	$0.038	$38
AWS (p4d.xlarge)	A100	$2.20	$0.034	$34

* Cost per video calculated based on ~38s generation time for a 5-second 1080p clip. Actual costs may vary with instance startup time and data transfer.

Cost Optimization Tips

1
Use spot/preemptible instances
Can save 60-70% on cloud costs. Suitable for batch generation where interruptions are acceptable.
2
Iterate at 256p, finalize at 1080p
Use the fast 256p mode (~2s) to refine prompts, then generate the final version at full quality.
3
Consider FP8 quantization
If slight quality loss is acceptable, FP8 reduces VRAM to ~20GB, enabling cheaper GPU options.
4
Batch processing
Generate multiple videos per session to amortize instance startup costs.