Anthropic: Agent-Friendly Infrastructure Crucial for AI in Biology

AnthropicAnthropic

Anthropic published a new Science Blog post detailing why AI agents have advanced faster in coding than in biology. The research highlights that biological data infrastructure is often not designed for agents, leading to unreliable performance in scientific tasks. Building deterministic retrieval layers is crucial for agents to navigate scientific data effectively.

Anthropic's new Science Blog post, "Paving the way for agents in biology," argues that AI agents advance faster in coding than in biology because biological data infrastructure is not designed for them. This contrasts with software development's structured digital workflows that enable agentic coding (AI systems that autonomously plan, reason, and act).
Benchmark
VirBench
Agent performance (without gget virus)
16.9% to 91.3% mean accuracy
Agent performance (with gget virus)
>90% for all agents, peaking at 99.7% for GPT-5.5
Deterministic retrieval layer
gget virus
Models tested
Claude Sonnet 4, Claude Opus 4.7, Biomni OSS, Edison Analysis, GPT-5.2-pro, GPT-5.5

Even frontier models like Claude and GPT struggled to retrieve viral sequence data from NCBI Virus, achieving accuracies as low as 16.9% with high variability. Small errors in biological data retrieval can have severe consequences, invalidating downstream analyses. The bottleneck is not just agent reasoning, but the absence of dependable execution layers.

Adding a deterministic retrieval layer, such as gget virus developed with NCBI, dramatically improved agent accuracy to nearly 100% and eliminated variability. This suggests making biological data infrastructure agent-friendly, with reliable access paths, is more critical for scientific agents than relying on model power. This research is part of Anthropic's Science Blog efforts.

Anthropic
Anthropic
@AnthropicAI
X

New Science Blog: Why has AI advanced faster in coding than in biology? To agents, bio databases are like cities built before cars—maddening to drive in because they're designed for different traffic. How do we build infrastructure agents can use? https://t.co/PQaNQ4GRJZ

447retweets3.4klikes
View on X

Still wondering? A few quick answers below.

The main challenge is that biological data infrastructure, unlike software development environments, is often not designed for autonomous AI agents. It features idiosyncratic formats, scattered databases, and complex, human-centric retrieval processes that agents struggle to navigate reliably.

Anthropic developed VirBench, a benchmark with 120 realistic viral sequence queries across 40 pathogens. They tasked state-of-the-art scientific research agents (including Claude and GPT models) to retrieve data from NCBI Virus and compared their accuracy against manually verified ground-truth counts.

gget virus is a deterministic retrieval layer developed in collaboration with NCBI. It translates complex, browser-based viral data retrieval workflows into an accurate and reproducible interface. When agents were given access to gget virus, their accuracy rose to nearly 100%, and run-to-run variability was largely eliminated.

Deterministic retrieval ensures that the underlying data access—gene identifiers, schemas, retrieval logic, and data paths—is reliably executed. This foundational reliability is crucial because even small errors in scientific data can have severe consequences for downstream analyses, making consistent and accurate data access more critical than raw model reasoning power.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Share this update