Happy Horse WikiHappy Horse Wiki

Hardware Requirements

Everything you need to know about running Happy Horse 1.0.

TL;DR: You need an NVIDIA H100 or A100 with at least 48GB VRAM. Consumer GPUs (RTX 4090) cannot run the full model. FP8 quantization may enable A100 40GB usage with quality tradeoffs.

Minimum Requirements

GPUNVIDIA H100 80GB or A100 80GB
VRAM≥48GB (80GB recommended)
System RAM≥64GB
Storage~30GB for model weights
CUDA12.0 or later
Python3.10 or later
OSLinux (Ubuntu 22.04+ recommended)

GPU Comparison

GPUVRAMCompatible?Est. SpeedNotes
H100 80GB80GBYes38s (1080p)Recommended — fastest inference
H100 NVL94GBYes~35sBest performance, dual-GPU card
A100 80GB80GBYes~55sGood performance, widely available
A100 40GB40GBFP8 only~70sRequires quantization, quality loss
L40S48GBFP8 only~80sBudget data center option
RTX 409024GBNoInsufficient VRAM even with FP8
RTX 309024GBNoInsufficient VRAM

Cloud Cost Comparison

On-demand pricing for single H100/A100 instances (as of April 2026).

ProviderGPU$/hourCost per 1080p video$/1000 videos
Lambda CloudH100$2.49$0.026$26
AWS (p5.xlarge)H100$3.50$0.037$37
GCPH100$3.70$0.039$39
AzureH100$3.60$0.038$38
AWS (p4d.xlarge)A100$2.20$0.034$34

* Cost per video calculated based on ~38s generation time for a 5-second 1080p clip. Actual costs may vary with instance startup time and data transfer.

Cost Optimization Tips

  • 1
    Use spot/preemptible instances

    Can save 60-70% on cloud costs. Suitable for batch generation where interruptions are acceptable.

  • 2
    Iterate at 256p, finalize at 1080p

    Use the fast 256p mode (~2s) to refine prompts, then generate the final version at full quality.

  • 3
    Consider FP8 quantization

    If slight quality loss is acceptable, FP8 reduces VRAM to ~20GB, enabling cheaper GPU options.

  • 4
    Batch processing

    Generate multiple videos per session to amortize instance startup costs.