Many research labs only consider inference efficiency after the fact. Step 3.7 Flash is a 198B sparse MoE VLM designed by @StepFun_ai for inference from the start. 196B language backbone with a 1.8B vision encoder. Built for real-world agent workloads, running at up to 400 tok/sec. Native multimodal understanding and action, reliable tool use, and enhanced web and visual search. Apache 2.0. Try it now → https://t.co/OYqzBUBxqL
Fireworks AI adds Step 3.7 Flash for high speed agentic reasoning
- Total Parameters
- 198B
- Active Parameters
- 11B
- Throughput
- Up to 400 tokens per second
- Context Window
- 256k tokens
- Reasoning Levels
- Low, Medium, High
This deployment follows the addition of MiniMax M3 to the platform. Engineered for high-frequency production workloads, the model activates only 11B parameters per token despite its massive total count. This sparse activation lets it reach up to 400 tokens per second, enabling real-time agentic loops.
While also available via the Nous Portal integration, the Fireworks deployment offers a 256k context window (the total information a model processes at once). The implementation includes three selectable reasoning levels—low, medium, and high—and uses an Apache 2.0 license.
Still wondering? A few quick answers below.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →






