Algorithmic Research Group Releases s2orc-safety to Standardize 16,806 AI Safety Papers

Algorithmic Research Group

Apr 5, 2026 · Updated Apr 25, 2026

Algorithmic Research Group released s2orc-safety on Hugging Face, a curated collection of 16,806 academic papers focused on AI safety topics like jailbreaks and red teaming. By enriching these papers with normalized metrics and code links, the dataset turns fragmented academic research into a machine-readable knowledge layer for safety engineering.

Algorithmic Research Group released s2orc-safety on Hugging Face, a specialized slice of the S2ORC (Semantic Scholar Open Research Corpus) containing 16,806 academic papers. The collection focuses on critical safety domains including jailbreaks, prompt injection, red teaming, model security, privacy, robustness, and alignment.

While academic safety research is abundant, it is often difficult to aggregate and compare across different studies. This release enriches each paper with structured fields for safety taxonomy, experimental details, and normalized names for models, datasets, and metrics. This standardization allows teams to programmatically analyze safety trends and reproducibility across thousands of documents.

You can access the dataset on Hugging Face to integrate safety research into automated evaluation pipelines or literature reviews. The inclusion of code-link metadata and practicality scores helps engineers identify which safety mitigations have functional implementations ready for testing. The repository was recently updated to ensure the most accurate metadata is available.

View the full update on huggingface.co

Algorithmic Research Group

@algoresearch_Apr 4

We're releasing 's2orc-safety' on @huggingface: a AI safety slice of our s2orc-enriched dataset with 16,806 papers across jailbreaks, prompt injection, red teaming, model security, privacy, robustness, alignment, and more. Each paper is enriched with structured fields for reproducibility, safety taxonomy, experimental details, practicality, normalized model/dataset/metric names, code-link metadata, and more. Link below:

View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Keep reading

OpenRouter and alphaXiv Turn Research Paper Citations Into Interactive Model Previews

alphaXiv now automatically detects AI model mentions in research papers and generates interactive previews with provider data and use-case rankings. By linking directly to OpenRouter, the update allows developers to move from reading a paper to testing a model in a single click.

Anthropic Cybersecurity Skills Library Maps 754 Capabilities to Five Frameworks

Nicolas KrassasApr 6

Anthropic Cybersecurity Skills Library Maps 754 Capabilities to Five Frameworks

The Anthropic-Cybersecurity-Skills library now aligns 754 modular AI agent capabilities with five major security frameworks, including new coverage for AI-specific threats and risk management. This update provides a standardized knowledge layer that allows autonomous agents to perform security tasks while remaining compliant with enterprise standards.

Karpathy Open-Sources autoresearch for Autonomous LLM Training by AI Agents

Andrej KarpathyMar 15

Karpathy Open-Sources autoresearch for Autonomous LLM Training by AI Agents

Andrej Karpathy released autoresearch, a minimal single-GPU repo where an AI agent autonomously runs LLM training experiments overnight. The agent edits train.py, runs 5-minute experiments, and keeps only the runs that lower validation loss — no human involvement needed.

Anthropic Automates AI Safety Research Using Claude Opus 4.6 Agents

AnthropicApr 19

Anthropic Automates AI Safety Research Using Claude Opus 4.6 Agents

Anthropic deployed autonomous Claude Opus 4.6 agents to solve weak-to-strong supervision tasks, achieving a 97% performance recovery rate. The study highlights a future where AI brute-forces alignment hypotheses, though early results show these methods often fail to generalize to production-scale models.