fal Launches HiDream-O1-Image to Unify 2K Generation and Subject Consistency

fal

May 10, 2026 · Updated Jun 7, 2026

fal launched HiDream-O1-Image, a model that handles text-to-image, editing, and subject personalization within a single architecture. By processing raw pixels and text in a unified token space, the model eliminates the need for external components to maintain character consistency or render complex layouts.

fal, a generative media infrastructure platform for serverless inference, launched HiDream-O1-Image. The model uses a unified pixel-level transformer (a model processing raw image data directly) to process pixels, text, and task instructions in one token space. This architecture removes the need for a Variational Autoencoder (VAE, a tool for compressing image data).

Resolution: Up to 2K
Pricing: $0.01 per megapixel
Architecture: Pixel-level Unified Transformer
Availability: API and Playground
Native capabilities: Text-to-image, editing, and personalization

This release shifts image generation away from fragmented pipelines that rely on separate models for text rendering and character consistency. By unifying these tasks, the model achieves stronger alignment for long-text layouts. It mirrors OpenAI's functional design shift where single models handle complex visual reasoning natively.

You can use the model for text-to-image generation, image editing, and subject-driven shots that keep faces and outfits consistent across scenes. The model supports high-resolution outputs up to 2K and is available via API. Inference costs $0.01 per megapixel, following fal's genmedia CLI.

View the full update on fal.ai

fal

@falMay 10

🚨 HiDream-O1-Image drops on fal! 🎨 Unified pixel-level transformer. Raw pixels, text and task cues in one token space 🖼️ Long-text layouts, posters and multilingual copy with stronger alignment ✨ Subject-driven shots that keep faces, outfits and IP reads consistent across new scenes

556

View on X

Still wondering? A few quick answers below.

HiDream-O1-Image is an 8B parameter image generative foundation model available on the fal platform. It uses a Pixel-level Unified Transformer architecture to handle text-to-image generation, image editing, and subject personalization within a single native model. This unified approach allows for high-resolution outputs up to 2K without requiring external components or specialized fine-tuning.

Unlike traditional diffusion models that use a Variational Autoencoder to process images in a compressed latent space, HiDream-O1-Image operates as a unified pixel-level transformer. It processes raw pixels, text prompts, and task cues within a single token space. This design enables stronger alignment for complex layouts, multilingual text rendering, and consistent subject-driven generation across different scenes.

Inference for HiDream-O1-Image on the fal platform costs 0.01 dollars per megapixel. Users can access the model through serverless inference APIs or a web-based playground. The platform also provides specific development endpoints for text-to-image with references and image editing, allowing developers to integrate these unified generative capabilities into their own applications at scale.

Yes, HiDream-O1-Image is designed for subject-driven generation, which keeps faces, outfits, and intellectual property consistent across new scenes. Because it is a natively unified model, it handles this personalization alongside standard generation and editing tasks. This makes it useful for creating cinematic product photos or character-driven content where visual identity must remain stable across multiple outputs.

HiDream-O1-Image supports high-resolution image generation and editing up to 2048x2048 pixels, or 2K. The model is capable of rendering long-text layouts and posters with high fidelity. Because it processes pixels directly through its transformer architecture, it maintains alignment and detail even at these higher resolutions without the artifacts sometimes introduced by external upscaling or compression tools.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from fal →

Keep reading

fal Adds Krea 2 Turbo for Rapid, Aesthetically-Driven Image Generation

fal has integrated Krea AI's Krea 2 Turbo text-to-image model onto its platform. This provides users with a specialized tool for generating high-fidelity images at "turbo speed," particularly useful for creative applications requiring deep aesthetic understanding and consistent style references.

What is HiDream-O1-Image?

How does the HiDream-O1-Image architecture work?

What is the pricing for HiDream-O1-Image on fal?

Can HiDream-O1-Image maintain character consistency?

What are the resolution limits for HiDream-O1-Image?

Keep reading

fal Adds Krea 2 Turbo for Rapid, Aesthetically-Driven Image Generation

fal Adds Krea 2 Turbo for Rapid, Aesthetically-Driven Image Generation

Keep reading

fal Adds Krea 2 Turbo for Rapid, Aesthetically-Driven Image Generation

fal Adds Krea 2 Turbo for Rapid, Aesthetically-Driven Image Generation