🚨 HiDream-O1-Image drops on fal! 🎨 Unified pixel-level transformer. Raw pixels, text and task cues in one token space 🖼️ Long-text layouts, posters and multilingual copy with stronger alignment ✨ Subject-driven shots that keep faces, outfits and IP reads consistent across new scenes
fal Launches HiDream-O1-Image to Unify 2K Generation and Subject Consistency
fal· Updated
fal launched HiDream-O1-Image, a model that handles text-to-image, editing, and subject personalization within a single architecture. By processing raw pixels and text in a unified token space, the model eliminates the need for external components to maintain character consistency or render complex layouts.
- Resolution
- Up to 2K
- Pricing
- $0.01 per megapixel
- Architecture
- Pixel-level Unified Transformer
- Availability
- API and Playground
- Native capabilities
- Text-to-image, editing, and personalization
This release shifts image generation away from fragmented pipelines that rely on separate models for text rendering and character consistency. By unifying these tasks, the model achieves stronger alignment for long-text layouts. It mirrors OpenAI's functional design shift where single models handle complex visual reasoning natively.
You can use the model for text-to-image generation, image editing, and subject-driven shots that keep faces and outfits consistent across scenes. The model supports high-resolution outputs up to 2K and is available via API. Inference costs $0.01 per megapixel, following fal's genmedia CLI.
Still wondering? A few quick answers below.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →