Happy Horse WikiHappy Horse Wiki
#1 on AI Video ArenaElo 1333

Happy Horse AI Video Model

The 15B-parameter open-source model that generates video + audio from text. It beat Seedance 2.0 in blind human tests — and it's going fully open source.

Happy Horse AI Video Generation

Weights not yet released — try it now via AI Video Arena blind test. Full open-source expected ~April 10. GitHub

15B

Parameters

#1

AI Video Arena

1333

Elo Score

7

Lip-Sync Languages

8

Denoising Steps

1080p

Max Resolution

See It in Action

Real outputs from Happy Horse 1.0 — click to play.

Kid holds out the rest of her cookie, smiles, says "Love you mommy." Cookie offering, sweet smile, little voice.

A cobblestone street after rain, looking dark and glossy, reflecting the yellow streetlamps perfectly.

A candid, handheld camera shot follows a young woman bundled in a thick, charcoal wool coat, speed-walking hunched over down a slushy Manhattan sidewalk at 7:30 AM. Her breath plumes in thick white clouds against the freezing grey air, and her nose is bright red from the cold.

All videos + head-to-head comparisons

What Makes It Special

Unified Transformer

40-layer self-attention network with 4 modality-specific layers on each end and 32 shared layers — single-stream processing with per-head gating for stable training.

Joint Video + Audio

Generates synchronized dialogue, ambient sound, and Foley alongside video frames — no post-production dubbing required.

8-Step DMD-2 Distillation

Reduces denoising to just 8 steps without classifier-free guidance, accelerated further by the in-house MagiCompiler runtime.

Multilingual Lip-Sync

Native support for 7 languages with industry-leading low Word Error Rate (14.6%).

1080p Output

5-8 second clips at 1080p in standard aspect ratios (16:9, 9:16) — suitable for social, advertising, and cinematic use cases.

Open & Self-Hostable

Base model, distilled model, super-resolution module, and inference code released openly with commercial-use permission.

Benchmarks

Based on 2,000 human-rated comparisons on the Artificial Analysis Video Arena.

ModelVisual Quality Text Alignment Physical Realism WER (%)
Happy Horse 1.04.84.184.5214.6%
LTX 2.34.764.124.5619.23%
OVI 1.14.734.14.4140.45%
80% win rate vs OVI 1.1
60.9% win rate vs LTX 2.3

Inference Speed

On a single NVIDIA H100, generating a 5-second video clip.

2.0s

256p

5-sec clip on H100

8.0s

540p

with super-resolution

38.4s

1080p

full quality

Explore