Fireworks AI adds Step 3.7 Flash for high speed agentic reasoning

Fireworks AI

Jun 4, 2026 · Updated Jun 13, 2026

Fireworks AI has deployed Step 3.7 Flash, a 198B-parameter vision-language model designed for rapid inference. The model enables real-time agentic workflows by delivering up to 400 tokens per second with selectable reasoning depths.

Fireworks AI is now hosting Step 3.7 Flash, a 198B-parameter sparse Mixture-of-Experts (MoE) vision-language model (an architecture that activates only a subset of parameters for each task). Developed by StepFun, the model pairs a 196B language backbone with a 1.8B vision encoder for native multimodal understanding.

Total Parameters: 198B
Active Parameters: 11B
Throughput: Up to 400 tokens per second
Context Window: 256k tokens
Reasoning Levels: Low, Medium, High

This deployment follows the addition of MiniMax M3 to the platform. Engineered for high-frequency production workloads, the model activates only 11B parameters per token despite its massive total count. This sparse activation lets it reach up to 400 tokens per second, enabling real-time agentic loops.

While also available via the Nous Portal integration, the Fireworks deployment offers a 256k context window (the total information a model processes at once). The implementation includes three selectable reasoning levels—low, medium, and high—and uses an Apache 2.0 license.

View the full update on fireworks.ai

Fireworks AI

@FireworksAI_HQJun 4

Many research labs only consider inference efficiency after the fact. Step 3.7 Flash is a 198B sparse MoE VLM designed by @StepFun_ai for inference from the start. 196B language backbone with a 1.8B vision encoder. Built for real-world agent workloads, running at up to 400 tok/sec. Native multimodal understanding and action, reliable tool use, and enhanced web and visual search. Apache 2.0. Try it now → https://t.co/OYqzBUBxqL

210

View on X

Still wondering? A few quick answers below.

Step 3.7 Flash is a 198B-parameter vision-language model developed by StepFun. It uses a sparse Mixture-of-Experts architecture to activate only 11B parameters per token, allowing for high-speed inference. It is designed for agentic workloads, including coding, tool use, and multimodal reasoning across a 256k context window.

On the Fireworks AI platform, Step 3.7 Flash can reach a throughput of up to 400 tokens per second. This speed is achieved through the model's sparse architecture and Fireworks' optimized inference stack, making it suitable for real-time agentic loops that require rapid, multi-step reasoning and action.

The model features three selectable reasoning levels: low, medium, and high. This allows developers to dynamically adjust the model's cognitive depth based on the complexity of the task. By choosing a level, users can balance the trade-offs between generation speed, operational cost, and the accuracy of complex reasoning.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Fireworks AI →

Keep reading

Artificial Analysis finds Step 3.7 Flash sets a new speed intelligence frontier

Artificial Analysis has released independent benchmarking for StepFun's Step 3.7 Flash, confirming the model delivers over 412 output tokens per second. The results place the open-weights model on the Pareto frontier for speed versus intelligence, showing significant gains in autonomous agentic tasks.

Fireworks AI Adds NVIDIA Nemotron 3 Ultra for Agentic Reasoning

Fireworks AIJun 4

Fireworks AI Adds NVIDIA Nemotron 3 Ultra for Agentic Reasoning

Fireworks AI now offers NVIDIA Nemotron 3 Ultra, an open model for advanced autonomous agents, with immediate deployment support. This provides developers with optimized infrastructure for long-running agentic tasks that require frontier reasoning and orchestration.

Nous Research Offers Step 3.7 Flash Free for 30 Days via Nous Portal

Nous ResearchMay 29

Nous Research Offers Step 3.7 Flash Free for 30 Days via Nous Portal

Nous Research is providing 30 days of free access to Step 3.7 Flash through the Nous Portal. The integration allows Hermes Agent users to utilize the 196B-parameter MoE model for high-efficiency coding and multimodal tasks without cost during the promotional period.

Google Gemini 3.5 Flash Beats Larger Models on Agentic Benchmark

Google AI StudioMay 22

Google Gemini 3.5 Flash Beats Larger Models on Agentic Benchmark

Gemini 3.5 Flash has ranked first on the APEX-Agents-AA benchmark, outperforming larger frontier models in autonomous task execution. The result confirms that high-speed, low-cost models are now capable of handling complex agentic workflows previously reserved for larger architectures.

What is Step 3.7 Flash?

How fast is Step 3.7 Flash on Fireworks AI?

What are the reasoning levels in Step 3.7 Flash?

Keep reading

Artificial Analysis finds Step 3.7 Flash sets a new speed intelligence frontier

Artificial Analysis finds Step 3.7 Flash sets a new speed intelligence frontier

Fireworks AI Adds NVIDIA Nemotron 3 Ultra for Agentic Reasoning

Fireworks AI Adds NVIDIA Nemotron 3 Ultra for Agentic Reasoning

Nous Research Offers Step 3.7 Flash Free for 30 Days via Nous Portal

Nous Research Offers Step 3.7 Flash Free for 30 Days via Nous Portal

Google Gemini 3.5 Flash Beats Larger Models on Agentic Benchmark

Google Gemini 3.5 Flash Beats Larger Models on Agentic Benchmark

Keep reading

Artificial Analysis finds Step 3.7 Flash sets a new speed intelligence frontier

Artificial Analysis finds Step 3.7 Flash sets a new speed intelligence frontier

Fireworks AI Adds NVIDIA Nemotron 3 Ultra for Agentic Reasoning

Fireworks AI Adds NVIDIA Nemotron 3 Ultra for Agentic Reasoning

Nous Research Offers Step 3.7 Flash Free for 30 Days via Nous Portal

Nous Research Offers Step 3.7 Flash Free for 30 Days via Nous Portal

Google Gemini 3.5 Flash Beats Larger Models on Agentic Benchmark

Google Gemini 3.5 Flash Beats Larger Models on Agentic Benchmark