Cloudflare Launches Agent Memory to Give AI Agents Persistent Long Term State

Cloudflare

Apr 18, 2026 · Updated Apr 25, 2026

Cloudflare introduced Agent Memory, a managed service that extracts and stores key information from agent conversations to prevent context rot. By moving state management to a dedicated pipeline, agents can recall past decisions and facts across sessions without exhausting their context windows.

Cloudflare launched the private beta of Agent Memory, a managed service that provides AI agents with persistent, retrievable state. It's an automated pipeline that extracts facts, events, and instructions from conversation history, extending isolated storage facets. This lets agents remember information across different sessions and users.

Memory types: Facts, Events, Instructions, Tasks
Extraction model: Llama 4 Scout (17B MoE)
Synthesis model: Nemotron 3 (120B MoE)
Availability: Private beta waitlist

This update addresses context rot—the degradation of model performance as context windows fill up with redundant history. It follows programmable search primitives for external files, but Agent Memory focuses on internal conversation state. It builds on stateful agent harnesses by making knowledge a durable asset.

You can integrate the service via a Cloudflare Worker binding or a REST API to manage memory profiles for individual agents or entire teams. The system uses a hybrid retrieval engine that fuses vector search with full-text search to improve accuracy. Access is limited to a private beta waitlist for developers.

View the full update on blog.cloudflare.com

Cloudflare

@CloudflareApr 17

Today we're announcing the private beta of Agent Memory, a managed service that extracts information from agent conversations and makes it available when it’s needed, without filling up the context window. https://t.co/tcyjIzpiHd

46412

View on X

Still wondering? A few quick answers below.

Cloudflare Agent Memory is a managed service that provides AI agents with long-term, persistent memory. It extracts key information like facts, events, and instructions from conversation history and stores them outside the model context window. This allows agents to recall important details across different sessions and users without filling up the limited space available for active reasoning.

When a conversation is ingested, the service runs a multi-stage pipeline to extract and verify information. It uses deterministic ID generation for idempotency and runs parallel extraction passes to capture both broad context and specific details like names or dates. Verified memories are then classified into types like facts or tasks and stored in isolated Durable Objects for future retrieval.

While both are primitives for agentic knowledge, they solve distinct problems. AI Search is designed for finding information across unstructured and structured files, such as documentation or databases. Agent Memory is specifically for context recall derived from past interactions and sessions. The two services are designed to work together, allowing an agent to search files and remember conversations simultaneously.

Cloudflare Agent Memory is currently in a private beta phase. It is accessible via a binding from any Cloudflare Worker or through a REST API for agents running outside the Workers environment. Developers who want to use the service must join a waitlist to gain early access as Cloudflare continues to refine the extraction pipeline and retrieval quality.

The service uses a combination of models running on the Workers AI platform. Llama 4 Scout, a 17-billion parameter mixture-of-experts model, handles the structured tasks of extraction, verification, and classification. For generating natural-language answers during retrieval, the service uses Nemotron 3, a larger 120-billion parameter model chosen for its superior reasoning capacity and synthesis quality.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Cloudflare →

Keep reading

Cloudflare Launches Project Think to Build Durable Serverless AI Agents

Cloudflare introduced Project Think, a preview of its next-generation Agents SDK that provides primitives for long-running, stateful AI agents. The platform enables agents to survive crashes, delegate tasks to sub-agents, and execute code in sandboxed environments while costing nothing when idle.

Anthropic Launches Native Memory for Claude Managed Agents to Enable Persistent Learning

ClaudeApr 24

Anthropic Launches Native Memory for Claude Managed Agents to Enable Persistent Learning

Anthropic introduced a native memory layer for Claude Managed Agents in public beta, allowing autonomous systems to retain knowledge across multiple sessions. By storing memories as manageable files, the update removes the need for custom state-management infrastructure while giving developers full control over what an agent remembers.

WarpJun 4

Warp Introduces Agent Memory for Cross-Tool, Team-Wide AI Agent Learning

Warp is introducing Agent Memory, a shared context system designed to help AI agents retain information across different sessions, tools, and teams. This system enables agents to learn from past interactions and avoid repeating errors, enhancing their efficiency and effectiveness in collaborative development environments.

Cloudflare adds MiniMax M3 with 1M context for agentic coding

MiniMaxJun 2

Cloudflare adds MiniMax M3 with 1M context for agentic coding

Cloudflare has integrated the MiniMax M3 foundation model into its AI Gateway platform. The update provides developers with a high-context, multimodal model specialized for autonomous coding tasks directly within their existing infrastructure.

What is Cloudflare Agent Memory?

How does the Cloudflare Agent Memory ingestion pipeline work?

How is Agent Memory different from Cloudflare AI Search?

Who can access the Cloudflare Agent Memory beta?

What AI models power Cloudflare Agent Memory?

Keep reading

Cloudflare Launches Project Think to Build Durable Serverless AI Agents

Cloudflare Launches Project Think to Build Durable Serverless AI Agents

Anthropic Launches Native Memory for Claude Managed Agents to Enable Persistent Learning

Anthropic Launches Native Memory for Claude Managed Agents to Enable Persistent Learning

Warp Introduces Agent Memory for Cross-Tool, Team-Wide AI Agent Learning

Warp Introduces Agent Memory for Cross-Tool, Team-Wide AI Agent Learning

Cloudflare adds MiniMax M3 with 1M context for agentic coding

Cloudflare adds MiniMax M3 with 1M context for agentic coding

Keep reading

Cloudflare Launches Project Think to Build Durable Serverless AI Agents

Cloudflare Launches Project Think to Build Durable Serverless AI Agents

Anthropic Launches Native Memory for Claude Managed Agents to Enable Persistent Learning

Anthropic Launches Native Memory for Claude Managed Agents to Enable Persistent Learning

Warp Introduces Agent Memory for Cross-Tool, Team-Wide AI Agent Learning

Warp Introduces Agent Memory for Cross-Tool, Team-Wide AI Agent Learning

Cloudflare adds MiniMax M3 with 1M context for agentic coding

Cloudflare adds MiniMax M3 with 1M context for agentic coding