HeadsUpAI

Fireworks AI Expands Azure Foundry Catalog With Frontier Reasoning Models

Fireworks AI, an inference platform for fast model serving (running a trained model to generate outputs), added DeepSeek V4 Pro and Kimi K2.6 to Microsoft Azure AI Foundry. This follows the platform's Day-0 Kimi K2.6 support and earlier DeepSeek V4 Pro hosting. The update also introduces Provisioned Throughput Units (PTUs) to the US Data Zone.
Kimi K2.6 Input
$0.95 per 1M tokens
Kimi K2.6 Output
$4.00 per 1M tokens
DeepSeek V4 Pro Input
$1.75 per 1M tokens
DeepSeek V4 Pro Output
$3.48 per 1M tokens
Availability
Microsoft Azure AI Foundry
Infrastructure
Provisioned Throughput Units (PTUs)

Scaling open models often requires re-architecting for new providers or compromising on governance. By integrating these models into the Azure control plane, organizations bypass separate security reviews. This move builds on the Fireworks AI 15 trillion token milestone where throughput and residency are the primary bottlenecks for enterprise deployment.

You can now deploy DeepSeek V4 Pro or Kimi K2.6 using a single Azure endpoint. Serverless pricing for Kimi K2.6 starts at $0.95 per million input tokens, while DeepSeek V4 Pro costs $1.75 per million. For production workloads, you can request PTU capacity to ensure consistent throughput with US data residency.

Fireworks AI
Fireworks AI
@FireworksAI_HQ
X

Most teams can pick frontier models. Fewer can run them at production scale without hitting constraints in latency, throughput, and governance. Fireworks AI on @Azure AI Foundry provides the inference layer for that environment. Learn more: https://t.co/Ym0YrQ5Pmi

2retweets13likes
View on X

Still wondering? A few quick answers below.

It is a managed inference layer that brings high-performance open model inference directly into the Azure ecosystem. It allows developers to run open-weight models natively within Azure using existing enterprise governance frameworks like unified access controls, audit logs, and content filtering without needing to manage separate infrastructure or compliance reviews for new providers.

The catalog now includes Kimi K2.6 and DeepSeek V4 Pro. Kimi K2.6 is a reasoning model from Moonshot AI designed for complex multi-step tasks and long-context agent workflows. DeepSeek V4 Pro is a flagship model optimized for production-scale coding and reasoning tasks, available via both serverless per-token pricing and provisioned throughput endpoints.

For serverless per-token usage, Kimi K2.6 costs 0.95 dollars per million input tokens, 0.16 dollars per million cached tokens, and 4.00 dollars per million output tokens. DeepSeek V4 Pro is priced at 1.75 dollars per million input tokens, 0.15 dollars per million cached tokens, and 3.48 dollars per million output tokens.

Provisioned Throughput Units provide predictable, steady-state performance with guaranteed service level agreements for production workloads. While serverless pricing is ideal for experimentation, these units are designed for sustained loads where performance variability is unacceptable. Enterprise subscriptions already include these quotas, and support has now expanded to the US Data Zone for residency compliance.

The expansion of provisioned throughput unit support to the US Data Zone ensures that data for US-based customers remains within US Azure regions. This is a critical feature for regulated industries and organizations with strict data sovereignty requirements, allowing them to use frontier open models while maintaining standard enterprise security and residency commitments.

Share this update