Today we're announcing the private beta of Agent Memory, a managed service that extracts information from agent conversations and makes it available when it’s needed, without filling up the context window. https://t.co/tcyjIzpiHd
Cloudflare Launches Agent Memory to Give AI Agents Persistent Long Term State
· Updated
Cloudflare launched the private beta of Agent Memory, a managed service that provides AI agents with persistent, retrievable state. It's an automated pipeline that extracts facts, events, and instructions from conversation history, extending isolated storage facets. This lets agents remember information across different sessions and users.
- Memory types
- Facts, Events, Instructions, Tasks
- Extraction model
- Llama 4 Scout (17B MoE)
- Synthesis model
- Nemotron 3 (120B MoE)
- Availability
- Private beta waitlist
This update addresses context rot—the degradation of model performance as context windows fill up with redundant history. It follows programmable search primitives for external files, but Agent Memory focuses on internal conversation state. It builds on stateful agent harnesses by making knowledge a durable asset.
You can integrate the service via a Cloudflare Worker binding or a REST API to manage memory profiles for individual agents or entire teams. The system uses a hybrid retrieval engine that fuses vector search with full-text search to improve accuracy. Access is limited to a private beta waitlist for developers.
Cloudflare
@Cloudflare
46retweets412likes
View on XStill wondering? A few quick answers below.
Cloudflare Agent Memory is a managed service that provides AI agents with long-term, persistent memory. It extracts key information like facts, events, and instructions from conversation history and stores them outside the model context window. This allows agents to recall important details across different sessions and users without filling up the limited space available for active reasoning.
When a conversation is ingested, the service runs a multi-stage pipeline to extract and verify information. It uses deterministic ID generation for idempotency and runs parallel extraction passes to capture both broad context and specific details like names or dates. Verified memories are then classified into types like facts or tasks and stored in isolated Durable Objects for future retrieval.
While both are primitives for agentic knowledge, they solve distinct problems. AI Search is designed for finding information across unstructured and structured files, such as documentation or databases. Agent Memory is specifically for context recall derived from past interactions and sessions. The two services are designed to work together, allowing an agent to search files and remember conversations simultaneously.
Cloudflare Agent Memory is currently in a private beta phase. It is accessible via a binding from any Cloudflare Worker or through a REST API for agents running outside the Workers environment. Developers who want to use the service must join a waitlist to gain early access as Cloudflare continues to refine the extraction pipeline and retrieval quality.
The service uses a combination of models running on the Workers AI platform. Llama 4 Scout, a 17-billion parameter mixture-of-experts model, handles the structured tasks of extraction, verification, and classification. For generating natural-language answers during retrieval, the service uses Nemotron 3, a larger 120-billion parameter model chosen for its superior reasoning capacity and synthesis quality.




