Cohere Launches Command A+ to Bring Frontier Agentic AI to Private Hardware

Cohere

May 20, 2026 · Updated Jun 12, 2026

Cohere released Command A+, a 218-billion parameter open-source model optimized for complex reasoning and multimodal agentic tasks. By achieving high performance on as little as two H100 GPUs, the model allows enterprises to deploy frontier-class agents entirely within their own private infrastructure.

Cohere, an enterprise AI company building models for business search and retrieval, released Command A+ under an Apache 2.0 license. This Mixture-of-Experts (MoE) model (an architecture activating only a fraction of parameters per token) unifies multimodal understanding and tool use, building on MoE speculative decoding research to maximize inference speed.

Context window: 128K tokens
Model size: 218B total, 25B active
Languages: English, Arabic, Bulgarian, and others
Hardware: 2x H100 or 1x B200 (W4A4)
License: Apache 2.0

The release targets the growing demand for sovereign AI, mirroring the company's recent partnership with Aleph Alpha to provide secure alternatives to US-based ecosystems. This follows Cohere's strategic agreements with Indra Group to deploy localized models for government and defense sectors that prioritize full organizational ownership of infrastructure.

You can download the weights from Hugging Face in multiple formats, including a 4-bit version that builds on Cohere's vLLM integration for faster performance. The model supports 48 languages and is available for managed deployment via the Cohere API or Model Vault, following the company's recent acquisition of Reliant AI.

View the full update on cohere.com

Cohere

@cohereMay 20

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all. https://t.co/C1KYnvA8JB

3832.7k

View on X

Still wondering? A few quick answers below.

Command A+ is a large language model designed for enterprise agentic tasks, such as complex reasoning, tool use, and multimodal document processing. It uses a mixture of experts architecture with 218 billion total parameters, though only 25 billion are active per token, allowing it to deliver high performance with significantly reduced hardware requirements.

Yes, Cohere has released Command A+ under the Apache 2.0 license, making it available for both experimentation and production use. The model weights are hosted on Hugging Face in several formats, including 4-bit and 8-bit quantizations, which are compressed versions that maintain high quality while further reducing the computational resources needed for deployment.

Command A+ is engineered for extreme hardware efficiency and can run on as little as two NVIDIA H100 GPUs or a single NVIDIA Blackwell GPU when using 4-bit quantization. This efficiency is achieved through quantization-aware distillation, a training method that ensures the smaller, compressed model maintains the reasoning capabilities and accuracy of the full-precision version.

The model supports 48 world languages and features a new tokenizer that improves efficiency for non-European languages. It requires up to 20 percent fewer tokens to process languages like Arabic, Korean, and Japanese compared to previous versions. This reduction in token count directly lowers inference costs and improves generation speed for global enterprise applications.

Command A+ is optimized for agentic workflows where the AI must autonomously use tools, reason through multi-step problems, and interact with external APIs or databases. It shows significant performance gains in agentic coding and data analysis, outperforming previous models in the series on benchmarks that measure how well an AI can navigate real-world software environments.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Cohere →

Keep reading

Cohere Releases Command A+ W4A4 Weights for Single GPU Serving

Cohere released W4A4 quantized weights for its 218-billion parameter Command A+ model, enabling frontier-class reasoning on a single NVIDIA B200 GPU. By using quantization-aware distillation to maintain performance, the update allows enterprises to deploy massive agentic models with a significantly smaller hardware footprint.

Cohere Releases North Mini Code, a Small Open-Weight Model for Coding

Artificial AnalysisJun 10

Cohere Releases North Mini Code, a Small Open-Weight Model for Coding

Cohere released North Mini Code, a small 30B parameter (3B active) open weights coding model. This model achieves competitive coding performance for its size and speed, positioning it as a focused option in the open-weight ecosystem.

AWS Launches Claude Cowork on Bedrock to Secure Enterprise Agentic Workflows

Amazon Web ServicesApr 22

AWS Launches Claude Cowork on Bedrock to Secure Enterprise Agentic Workflows

AWS launched a public research preview of Claude Cowork and Claude Code Desktop on Amazon Bedrock, allowing organizations to run Anthropic's agentic tools within their own cloud perimeter. By routing inference through Bedrock, companies can deploy autonomous agents to non-technical teams while maintaining strict data residency and using consumption-based billing instead of seat licenses.

Tencent Open Sources Hy3 Preview to Scale Efficient Agentic Reasoning

Tencent HunyuanApr 30

Tencent Open Sources Hy3 Preview to Scale Efficient Agentic Reasoning

Tencent open-sourced Hy3 preview, a 295B-parameter model that activates only 21B parameters per token. The release signals a shift toward high-efficiency reasoning models designed specifically for autonomous agent workflows and complex reasoning tasks.

What is Cohere Command A+?

Is Cohere Command A+ open source?

What are the hardware requirements for running Command A+?

How does Command A+ handle different languages?

What are the agentic capabilities of Command A+?

Keep reading

Cohere Releases Command A+ W4A4 Weights for Single GPU Serving

Cohere Releases Command A+ W4A4 Weights for Single GPU Serving

Cohere Releases North Mini Code, a Small Open-Weight Model for Coding

Cohere Releases North Mini Code, a Small Open-Weight Model for Coding

AWS Launches Claude Cowork on Bedrock to Secure Enterprise Agentic Workflows

AWS Launches Claude Cowork on Bedrock to Secure Enterprise Agentic Workflows

Tencent Open Sources Hy3 Preview to Scale Efficient Agentic Reasoning

Tencent Open Sources Hy3 Preview to Scale Efficient Agentic Reasoning

Keep reading

Cohere Releases Command A+ W4A4 Weights for Single GPU Serving

Cohere Releases Command A+ W4A4 Weights for Single GPU Serving

Cohere Releases North Mini Code, a Small Open-Weight Model for Coding

Cohere Releases North Mini Code, a Small Open-Weight Model for Coding

AWS Launches Claude Cowork on Bedrock to Secure Enterprise Agentic Workflows

AWS Launches Claude Cowork on Bedrock to Secure Enterprise Agentic Workflows

Tencent Open Sources Hy3 Preview to Scale Efficient Agentic Reasoning

Tencent Open Sources Hy3 Preview to Scale Efficient Agentic Reasoning