HeadsUpAI

Cohere Launches Command A+ to Bring Frontier Agentic AI to Private Hardware

Cohere, an enterprise AI company building models for business search and retrieval, released Command A+ under an Apache 2.0 license. This Mixture-of-Experts (MoE) model (an architecture activating only a fraction of parameters per token) unifies multimodal understanding and tool use, building on MoE speculative decoding research to maximize inference speed.
Context window
128K tokens
Model size
218B total, 25B active
Languages
English, Arabic, Bulgarian, and others
Hardware
2x H100 or 1x B200 (W4A4)
License
Apache 2.0

The release targets the growing demand for sovereign AI, mirroring the company's recent partnership with Aleph Alpha to provide secure alternatives to US-based ecosystems. This follows Cohere's strategic agreements with Indra Group to deploy localized models for government and defense sectors that prioritize full organizational ownership of infrastructure.

You can download the weights from Hugging Face in multiple formats, including a 4-bit version that builds on Cohere's vLLM integration for faster performance. The model supports 48 languages and is available for managed deployment via the Cohere API or Model Vault, following the company's recent acquisition of Reliant AI.

Cohere
Cohere
@cohere
X

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all. https://t.co/C1KYnvA8JB

169retweets968likes
View on X

Still wondering? A few quick answers below.

Command A+ is a large language model designed for enterprise agentic tasks, such as complex reasoning, tool use, and multimodal document processing. It uses a mixture of experts architecture with 218 billion total parameters, though only 25 billion are active per token, allowing it to deliver high performance with significantly reduced hardware requirements.

Yes, Cohere has released Command A+ under the Apache 2.0 license, making it available for both experimentation and production use. The model weights are hosted on Hugging Face in several formats, including 4-bit and 8-bit quantizations, which are compressed versions that maintain high quality while further reducing the computational resources needed for deployment.

Command A+ is engineered for extreme hardware efficiency and can run on as little as two NVIDIA H100 GPUs or a single NVIDIA Blackwell GPU when using 4-bit quantization. This efficiency is achieved through quantization-aware distillation, a training method that ensures the smaller, compressed model maintains the reasoning capabilities and accuracy of the full-precision version.

The model supports 48 world languages and features a new tokenizer that improves efficiency for non-European languages. It requires up to 20 percent fewer tokens to process languages like Arabic, Korean, and Japanese compared to previous versions. This reduction in token count directly lowers inference costs and improves generation speed for global enterprise applications.

Command A+ is optimized for agentic workflows where the AI must autonomously use tools, reason through multi-step problems, and interact with external APIs or databases. It shows significant performance gains in agentic coding and data analysis, outperforming previous models in the series on benchmarks that measure how well an AI can navigate real-world software environments.

Share this update