Anthropic Hardens AI Agents by Backing Human Oversight With Environment Containment

Anthropic

May 26, 2026 · Updated Jun 12, 2026

Anthropic disclosed the technical architecture used to isolate its agentic products, including server-side containers, local sandboxes, and full virtual machines. The company found that human-in-the-loop approvals fail due to user fatigue, necessitating deterministic environment-level boundaries to cap an agent's potential blast radius.

Anthropic detailed the containment strategies used to secure its agentic products, moving beyond human-in-the-loop oversight following Anthropic's self-hosted sandbox launch. The architecture uses server-side gVisor containers (a hardened sandbox for isolating processes), OS-level sandboxes, and full virtual machines. These deterministic boundaries cap the blast radius of autonomous actions.

This shift addresses approval fatigue, where users stop scrutinizing agent requests. Anthropic found that model-layer defenses remain probabilistic, validating why the company is using Anthropic's safety principle training alongside hard technical constraints. Environment-layer isolation allows developers to deploy agents that run unattended without risking the underlying host system.

You can now use these patterns to harden your own deployments, such as adopting the open-source sandbox runtime. Anthropic also introduced a defensive proxy to prevent data exfiltration through safe domains. These containment features are currently integrated into Claude Code and Claude Cowork, with enterprise-grade path allowlists available.

View the full update on anthropic.com

Anthropic

@AnthropicAIMay 26

New on the Engineering Blog: The access and permissions we grant agents should evolve with their capabilities. In our own products, we set these parameters through sandboxing, which limits the scope of any potentially destructive actions. Read more: https://t.co/KfBKW8O9kP

2852.1k

View on X

Still wondering? A few quick answers below.

Anthropic uses three distinct isolation patterns to contain agents based on their capabilities. These include server-side gVisor containers for ephemeral code execution, OS-level sandboxes like Seatbelt or bubblewrap for local developer tools, and full virtual machines for general knowledge work. These deterministic boundaries prevent agents from accessing sensitive files or networks without explicit authorization.

Anthropic found that human-in-the-loop oversight is fallible due to approval fatigue. Telemetry showed that users approved roughly 93% of permission prompts, leading them to pay less attention over time. To mitigate this, Anthropic is shifting toward environment-level containment that enforces hard access boundaries, allowing agents to work autonomously while keeping the potential blast radius capped.

Anthropic discovered vulnerabilities where malicious repositories could execute code before a user accepted a trust prompt. This occurred because the agent parsed project-local configuration files during startup. Additionally, red-teaming showed that direct prompt injection could trick agents into exfiltrating credentials via legitimate API endpoints, which Anthropic fixed using a defensive man-in-the-middle proxy.

Claude Cowork runs code execution inside a sealed virtual machine using the platform's native hypervisor. This VM has its own kernel and filesystem, ensuring that only the user-selected workspace is visible to the agent. Credentials remain on the host machine and never enter the guest environment, protecting against misaligned model behavior or external prompt injection attacks.

Standard endpoint detection and response software often cannot see inside the isolated virtual machines used by products like Claude Cowork. Because the isolation is so strong, the hypervisor appears as an opaque process to host-based security tools. Anthropic currently provides pull-based event logs to help administrators maintain visibility and compliance for these autonomous agentic workflows.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Anthropic →

Keep reading

Anthropic virtualizes agent architecture to decouple reasoning from execution environments

Anthropic transitioned its Managed Agents service to a decoupled architecture that separates the model's reasoning from its execution sandbox and session history. This shift treats execution environments as interchangeable resources rather than persistent servers, reducing initial latency by up to 90%. By virtualizing these components, agents can now securely access private infrastructure and maintain state across long-horizon tasks.

ClaudeMay 19

Anthropic Launches Self Hosted Sandboxes to Run Claude Agents Inside Your Perimeter

Anthropic introduced self-hosted sandboxes and MCP tunnels to allow Claude Managed Agents to execute tools and access data within a company's private infrastructure. This update addresses enterprise security concerns by decoupling the AI's reasoning loop from the sensitive environments where code is run and data is stored.

Guillermo RauchMay 20

Vercel and Anthropic Secure Autonomous Agents via Firewall Level Credential Injection

Anthropic launched self-hosted sandboxes for Claude Managed Agents alongside a dedicated Vercel Sandbox integration. This architecture keeps sensitive API keys outside the execution environment, allowing autonomous agents to safely interact with private infrastructure without the risk of credential leakage.

Perplexity Secures Computer Agent With Hardware Isolated Sandboxes and Proxy Tokens

PerplexityMay 14

Perplexity Secures Computer Agent With Hardware Isolated Sandboxes and Proxy Tokens

Perplexity detailed the security architecture for its autonomous agent, using hardware-level microVMs to isolate every task. This shift from software to hardware isolation aims to prevent agents from leaking credentials or being hijacked by malicious web content.

How does Anthropic contain Claude agents across different products?

Why is Anthropic moving away from human-in-the-loop approvals for agents?

What security risks did Anthropic identify in Claude Code?

How does the Claude Cowork virtual machine protect user data?

Can enterprise security tools monitor Anthropic's agent sandboxes?

Keep reading

Anthropic virtualizes agent architecture to decouple reasoning from execution environments

Anthropic virtualizes agent architecture to decouple reasoning from execution environments

Anthropic Launches Self Hosted Sandboxes to Run Claude Agents Inside Your Perimeter

Anthropic Launches Self Hosted Sandboxes to Run Claude Agents Inside Your Perimeter

Vercel and Anthropic Secure Autonomous Agents via Firewall Level Credential Injection

Vercel and Anthropic Secure Autonomous Agents via Firewall Level Credential Injection

Perplexity Secures Computer Agent With Hardware Isolated Sandboxes and Proxy Tokens

Perplexity Secures Computer Agent With Hardware Isolated Sandboxes and Proxy Tokens

Keep reading

Anthropic virtualizes agent architecture to decouple reasoning from execution environments

Anthropic virtualizes agent architecture to decouple reasoning from execution environments

Anthropic Launches Self Hosted Sandboxes to Run Claude Agents Inside Your Perimeter

Anthropic Launches Self Hosted Sandboxes to Run Claude Agents Inside Your Perimeter

Vercel and Anthropic Secure Autonomous Agents via Firewall Level Credential Injection

Vercel and Anthropic Secure Autonomous Agents via Firewall Level Credential Injection

Perplexity Secures Computer Agent With Hardware Isolated Sandboxes and Proxy Tokens

Perplexity Secures Computer Agent With Hardware Isolated Sandboxes and Proxy Tokens