Avatar V API is live at $0.05/sec The highest-quality AI avatar model for developers Benchmarked against Veo 3.1, Kling O3 Pro, OmniHuman 1.5, and Seedance 2.0 on cross-scene talking-head generation Avatar V won every category Research report + API ↓ https://t.co/WDbK1MeAT0
HeyGen Launches Avatar V API for High Fidelity Programmatic Video
HeyGen, an AI video platform, launched the API for its Avatar V engine. This model uses cross-reference-driven animation to produce more natural lip-sync and body motion than the Avatar IV engine. It follows the initial Avatar V model launch by making high-fidelity output programmatically accessible.
- Pricing
- $0.05 per second
- API version
- v3
- Default engine
- Avatar IV
- Animation method
- Cross-reference driven
- Supported types
- Digital twins, studio avatars, and more
The release shifts HeyGen's focus toward programmatic scale. By benchmarking Avatar V against frontier models like Google's Veo 3.1 and Kling O3 Pro, the company positions its specialized architecture as the superior choice for professional presenters. This move complements the recently released HeyGen CLI for automated production.
You can access the engine via the v3/videos endpoint by setting the engine parameter to avatar_v. Usage is priced at $0.05 per second of generated video. Developers must first check a digital twin's eligibility through the API, as the engine does not yet support arbitrary image inputs.
HeyGen
@HeyGen
98retweets385likes
View on XStill wondering? A few quick answers below.
Avatar V is HeyGen's latest AI video rendering engine designed for high-quality talking-head generation. It uses cross-reference-driven animation to achieve more natural lip-sync and body movement compared to previous models. Unlike the default Avatar IV engine, Avatar V requires an explicit opt-in through the API and is only available for eligible digital twins.
The Avatar V API is priced at 0.05 dollars per second of generated video. This pricing applies to developers using the v3 API to generate professional avatar content. Users should note that this engine is a premium offering and is not the default selection when making video generation requests through the HeyGen developer platform.
Avatar V offers superior motion and lip-sync quality through cross-reference animation, whereas Avatar IV is the standard default engine. However, Avatar V currently lacks support for arbitrary image inputs, natural-language motion prompts, and expressiveness controls. Developers must also perform an eligibility check on specific avatar looks before they can successfully use the Avatar V engine.
To use Avatar V, developers must explicitly set the engine type to avatar_v within the video generation request body. Before calling the rendering endpoint, it is necessary to fetch the specific avatar look details to confirm that avatar_v appears in the supported API engines list. Requesting this engine for an ineligible avatar will result in an error.
HeyGen benchmarks indicate that Avatar V outperformed several major competitors in cross-scene talking-head generation categories. The model was tested against Google's Veo 3.1, Kling O3 Pro, OmniHuman 1.5, and Seedance 2.0. According to the research report, Avatar V won every category specifically related to the quality and realism of AI-generated human presenters.

