Fireworks AI Expands Azure Foundry Catalog With Frontier Reasoning Models

Fireworks AI

May 15, 2026 · Updated Jun 12, 2026

Fireworks AI added DeepSeek V4 Pro and Kimi K2.6 to Microsoft Azure AI Foundry while expanding provisioned throughput support to the US Data Zone. The update allows enterprise teams to run high-performance open models with guaranteed throughput and data residency within their existing Azure environment.

Fireworks AI, an inference platform for fast model serving (running a trained model to generate outputs), added DeepSeek V4 Pro and Kimi K2.6 to Microsoft Azure AI Foundry. This follows the platform's Day-0 Kimi K2.6 support and earlier DeepSeek V4 Pro hosting. The update also introduces Provisioned Throughput Units (PTUs) to the US Data Zone.

Kimi K2.6 Input: $0.95 per 1M tokens
Kimi K2.6 Output: $4.00 per 1M tokens
DeepSeek V4 Pro Input: $1.75 per 1M tokens
DeepSeek V4 Pro Output: $3.48 per 1M tokens
Availability: Microsoft Azure AI Foundry
Infrastructure: Provisioned Throughput Units (PTUs)

Scaling open models often requires re-architecting for new providers or compromising on governance. By integrating these models into the Azure control plane, organizations bypass separate security reviews. This move builds on the Fireworks AI 15 trillion token milestone where throughput and residency are the primary bottlenecks for enterprise deployment.

You can now deploy DeepSeek V4 Pro or Kimi K2.6 using a single Azure endpoint. Serverless pricing for Kimi K2.6 starts at $0.95 per million input tokens, while DeepSeek V4 Pro costs $1.75 per million. For production workloads, you can request PTU capacity to ensure consistent throughput with US data residency.

View the full update on techcommunity.microsoft.com

Fireworks AI

@FireworksAI_HQMay 14

Most teams can pick frontier models. Fewer can run them at production scale without hitting constraints in latency, throughput, and governance. Fireworks AI on @Azure AI Foundry provides the inference layer for that environment. Learn more: https://t.co/Ym0YrQ5Pmi

213

View on X

Still wondering? A few quick answers below.

It is a managed inference layer that brings high-performance open model inference directly into the Azure ecosystem. It allows developers to run open-weight models natively within Azure using existing enterprise governance frameworks like unified access controls, audit logs, and content filtering without needing to manage separate infrastructure or compliance reviews for new providers.

The catalog now includes Kimi K2.6 and DeepSeek V4 Pro. Kimi K2.6 is a reasoning model from Moonshot AI designed for complex multi-step tasks and long-context agent workflows. DeepSeek V4 Pro is a flagship model optimized for production-scale coding and reasoning tasks, available via both serverless per-token pricing and provisioned throughput endpoints.

For serverless per-token usage, Kimi K2.6 costs 0.95 dollars per million input tokens, 0.16 dollars per million cached tokens, and 4.00 dollars per million output tokens. DeepSeek V4 Pro is priced at 1.75 dollars per million input tokens, 0.15 dollars per million cached tokens, and 3.48 dollars per million output tokens.

Provisioned Throughput Units provide predictable, steady-state performance with guaranteed service level agreements for production workloads. While serverless pricing is ideal for experimentation, these units are designed for sustained loads where performance variability is unacceptable. Enterprise subscriptions already include these quotas, and support has now expanded to the US Data Zone for residency compliance.

The expansion of provisioned throughput unit support to the US Data Zone ensures that data for US-based customers remains within US Azure regions. This is a critical feature for regulated industries and organizations with strict data sovereignty requirements, allowing them to use frontier open models while maintaining standard enterprise security and residency commitments.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Fireworks AI →

Keep reading

Fireworks AI Hosts DeepSeek V4 Pro With 1M Context and MIT License

Fireworks AI added DeepSeek V4-Pro to its inference platform, offering frontier-level coding and reasoning via an open-weight model. The deployment standardizes a 1-million token context window at a price point significantly lower than closed-source competitors.

What is Fireworks AI on Microsoft Foundry?

What new models are available in the Fireworks AI on Foundry catalog?

What is the pricing for Kimi K2.6 and DeepSeek V4 Pro on Azure?

What are Provisioned Throughput Units in Azure AI Foundry?

How does the US Data Zone expansion affect data residency?

Keep reading

Fireworks AI Hosts DeepSeek V4 Pro With 1M Context and MIT License

Fireworks AI Hosts DeepSeek V4 Pro With 1M Context and MIT License

Keep reading

Fireworks AI Hosts DeepSeek V4 Pro With 1M Context and MIT License

Fireworks AI Hosts DeepSeek V4 Pro With 1M Context and MIT License