Ollama is Open-source tool for running, managing, and serving large language models locally. HeadsUpAI tracks Ollama across the AI ecosystem and curates every significant update — the latest being "Ollama Adds Moonshot AI’s Kimi K3 to Cloud Model Library" (July 27, 2026) — so you get the whole story in a 30-second read.

What's new from Ollama?

The most recent Ollama update is "Ollama Adds Moonshot AI’s Kimi K3 to Cloud Model Library" (July 27, 2026). HeadsUpAI curates every significant Ollama release as a 30-second read — what shipped and why it matters.

What are the latest Ollama updates and releases?

The latest Ollama updates: "Ollama Adds Moonshot AI’s Kimi K3 to Cloud Model Library", "Ollama 0.32.1 Improves Gemma 4 Tool Calling Reliability", "Ollama Adds Support for OpenCode Desktop Coding Agent", "Ollama Expands Cloud Capacity for GLM-5.2 in US and Europe", and "Ollama Accelerates Gemma 4 on Apple Silicon with Multi-Token Prediction". HeadsUpAI has curated 13 Ollama updates over the last 90 days, covering product updates — listed newest first, presented straight, no hype, no bias.

Ollama is Open-source tool for running, managing, and serving large language models locally. On this page you'll find every significant Ollama development HeadsUpAI has tracked recently — product updates — so you can keep up with where Ollama is heading without reading a dozen sources.

How often is Ollama news updated here?

Continuously. HeadsUpAI adds new Ollama updates as they're announced — usually within hours — and the 13 updates currently shown cover the past 90 days, newest first.

Ollama AI News & Updates — Latest Releases & Features

OllamaJul 27

Ollama Adds Moonshot AI’s Kimi K3 to Cloud Model Library

Ollama added Moonshot AI’s Kimi K3, a 2.8-trillion-parameter multimodal model, to its cloud platform. The model features a 1-million-token context window and native visual understanding. It is available to Pro and Max subscribers via extra usage credits and integrates with agentic coding tools including Claude Code, OpenCode, Hermes Agent, and OpenClaw.

OllamaJul 17

Ollama 0.32.1 Improves Gemma 4 Tool Calling Reliability

Ollama 0.32.1 updates Gemma 4 with improved tool calling reliability for coding agents. The release addresses previous consistency issues, ensuring more dependable performance in autonomous workflows. The 26B model runs via the Pi agent, with an optional MLX engine available for maximum performance on Apple Silicon.

OllamaJul 15

Ollama Adds Support for OpenCode Desktop Coding Agent

Ollama now integrates with OpenCode Desktop, enabling the terminal-based coding agent to run with local or cloud models. The integration launches via the ollama launch opencode command and supports file editing, command execution, web fetching, and vision capabilities. Models require a context window of 64k tokens or higher for repository-wide tasks.

OllamaJul 8

Ollama Expands Cloud Capacity for GLM-5.2 in US and Europe

Ollama increased cloud capacity for Z.ai’s GLM-5.2 model across US and European regions. The infrastructure update delivers 80 to 120 output tokens per second, significantly outpacing the 30 to 40 tokens per second reported on other providers. The model remains accessible for agentic coding workflows via Claude Code, VS Code, Codex, and Hermes using the ollama launch command.

OllamaJul 1

Ollama Accelerates Gemma 4 on Apple Silicon with Multi-Token Prediction

Ollama 0.31 enables multi-token prediction by default for Gemma 4 on Apple Silicon, increasing generation speed by nearly 90%. The engine uses a small draft model to propose tokens, which the main model verifies in a single pass. This optimization, measured at 95.0 tokens per second on an M5 Max, accelerates agentic coding tasks without requiring manual configuration.

OllamaJun 28

Ollama Adds Ornith-1.0 Family for Agentic Coding Tasks

Ollama adds the Ornith-1.0 family of open-source agentic coding models to its library. The models, ranging from 9B to 397B parameters, feature a 256K context window and use reinforcement learning for self-improving code generation. The integration supports direct execution via ollama run and agentic workflows through ollama launch for Claude Code and Pi.

OllamaJun 16

Ollama Hosts Z.ai's GLM-5.2 Coding Model on NVIDIA Blackwell GPUs

Ollama is now hosting Z.ai's GLM-5.2 model on its US-based cloud, powered by NVIDIA Blackwell GPUs. The model features a 1-million-token context window and two reasoning effort levels for long-horizon coding tasks. It is available for immediate use within Claude Code, Codex App, and Hermes Agent via the ollama launch command, with a zero-data-retention privacy policy.

OllamaJun 15

Ollama Adds Support for Cline CLI and Parallel Kanban Tasks

Ollama now supports Cline CLI, enabling the autonomous coding agent to launch directly from the terminal. The integration auto-configures local model providers and supports the Kanban feature, which runs multiple coding agents in parallel to handle separate project tasks simultaneously. Sessions launch with a chosen local or cloud model via the ollama launch cline command.

OllamaJun 15

Ollama Adds Moonshot AI Kimi K2.7 Code to Cloud Platform

Ollama added Moonshot AI's Kimi K2.7 Code model to its US-hosted cloud on NVIDIA B300 GPUs. The model supports text and image input with a 256K context window. It is available for immediate use within Claude Code, Codex App, and OpenCode via the ollama launch command, with no user data retention or training.

OllamaJun 10

Ollama Adds Nous Research's Hermes Desktop for Local Multi-Agent Workflows

Ollama now supports Nous Research's Hermes Desktop, enabling users to run the multi-agent system locally or in the cloud. This integration brings Hermes Desktop's self-improving AI agents and messaging capabilities to Ollama's local model deployment platform. It allows users to manage complex agentic workflows with greater control over their compute environment.

OllamaJun 7

Ollama Adds NVIDIA Nemotron 3 Ultra for Faster, Cheaper AI Agents

Ollama has made NVIDIA's Nemotron 3 Ultra model available on its cloud. This 550 billion parameter Mixture of Experts (MoE) model is designed for long-running AI agents, delivering 5x faster inference and up to 30% lower costs for complex agentic tasks.

OllamaJun 7

Ollama Adds Google DeepMind's Gemma 4 12B for Local Agentic AI

Ollama has made Google DeepMind's Gemma 4 12B model available for local execution, including support for chat and agentic applications. This expands access to a powerful, open-weight multimodal model optimized for on-device reasoning and coding, enabling private and offline AI workflows on consumer hardware.

OllamaJun 7

Ollama Cloud Adds MiniMax M3 for Frontier Agentic Coding and 1M Context

Ollama has made the MiniMax M3 model available on its Cloud, providing US-based access with zero data retention. This integration offers a frontier-level, open-weight model for agentic coding and multimodal tasks, featuring a 1-million-token context window. It expands access to advanced AI capabilities for complex, autonomous workflows.

Ollama Adds Moonshot AI’s Kimi K3 to Cloud Model Library

Ollama 0.32.1 Improves Gemma 4 Tool Calling Reliability

Ollama Adds Support for OpenCode Desktop Coding Agent

Ollama Expands Cloud Capacity for GLM-5.2 in US and Europe

Ollama Accelerates Gemma 4 on Apple Silicon with Multi-Token Prediction

Ollama Adds Ornith-1.0 Family for Agentic Coding Tasks

Ollama Hosts Z.ai's GLM-5.2 Coding Model on NVIDIA Blackwell GPUs

Ollama Adds Support for Cline CLI and Parallel Kanban Tasks

Ollama Adds Moonshot AI Kimi K2.7 Code to Cloud Platform

Ollama Adds Nous Research's Hermes Desktop for Local Multi-Agent Workflows

Ollama Adds NVIDIA Nemotron 3 Ultra for Faster, Cheaper AI Agents

Ollama Adds Google DeepMind's Gemma 4 12B for Local Agentic AI

Ollama Cloud Adds MiniMax M3 for Frontier Agentic Coding and 1M Context

What is Ollama?

What's new from Ollama?

What are the latest Ollama updates and releases?

What does Ollama do?

How often is Ollama news updated here?