3 new models from @xai's Grok creative stack are live on OpenRouter: • Grok Imagine Image Quality: photoreal image generation and editing • Grok Imagine Video: short clips from text, image, or reference • Grok Voice TTS 1.0: 5 voices across 20+ languages More on each below 🧵
OpenRouter Adds xAI Creative Stack for Unified Video and Voice Generation
· Updated
OpenRouter integrated xAI's multimodal suite, enabling developers to generate photorealistic images, short video clips, and natural speech through a single API. The update allows for complex creative workflows that combine xAI's generative models with existing reasoning and coding tools on the platform.
Grok Imagine Image Quality for photorealistic visuals, Grok Imagine Video for short clips, and Grok Voice TTS 1.0 for text-to-speech. This launch expands the platform's multimodal capabilities.- Video length
- 1 to 15 seconds
- Video resolution
- 480p or 720p
- Voice options
- 5 built-in voices
- Language support
- 20+ languages
- Video pricing
- From $0.05 per second
- Voice pricing
- $15 per million characters
This integration follows the launch of OpenRouter's unified video generation API, signaling a shift toward standardized access for generative media. By hosting these models alongside Grok 4.3's reasoning capabilities, the platform enables developers to build end-to-end creative agents that can reason about a task and execute final asset production.
You can now programmatically generate 15-second videos at 720p using text or up to seven reference images for character consistency. The speech model supports five voices across 20 languages with inline tags for pitch and pacing. Access is available via API, with video starting at $0.05 per second and voice at $15 per million characters.
Still wondering? A few quick answers below.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →


