New Framework Automates Agent Skill Extraction From Open-Source GitHub Repositories

DAIR.AI

Mar 18, 2026 · Updated Apr 25, 2026

A research paper proposes a pipeline that mines GitHub repositories for procedural knowledge and converts it into standardized agent skill files. The approach achieved 40% gains in knowledge transfer efficiency while maintaining quality comparable to human-authored tutorials.

The paper introduces a pipeline that analyzes repository structure, identifies procedural knowledge through dense retrieval, and translates it into a standardized SKILL.md format. A progressive disclosure architecture lets agents discover thousands of skills without context window degradation. The researchers tested the framework on visualization and education repos using the Manim mathematical animation engine, including TheoremExplainAgent (theorem-proving video generator) and Code2Video (code-to-animation system).

This addresses a growing bottleneck in agent development: manually authoring skills doesn't scale. Automated extraction from agentic repositories on GitHub offers a path to expand agent capabilities without model retraining. Extracted skills achieved 40% gains in knowledge transfer efficiency while maintaining pedagogical quality comparable to human-crafted tutorials.

If you maintain an open-source agentic repository, the framework's structural analysis and dense retrieval approach offers a systematic way to surface and package the procedural knowledge already embedded in your codebase.

View the full update on arxiv.org

DAIR.AI

@dair_aiMar 16

GitHub already has millions of repos full of procedural knowledge. The work introduces a framework for extracting agent skills directly from open-source repos. The pipeline analyzes repo structure, identifies procedural knowledge through dense retrieval, and translates it into standardized SKILL.md format with a progressive disclosure architecture so agents can discover thousands of skills without context window degradation. Manually authoring agent skills doesn't scale. Automated extraction achieved 40% gains in knowledge transfer efficiency while matching human-crafted quality. Still early on this, and there is more work needed for self-discovered and self-improving skills to work well at scale. As the agent skill ecosystem grows, mining existing repos could unlock scalable capability acquisition without having to retrain models. Paper: https://t.co/MAt8Goetcr Learn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c

View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Keep reading

SkillsBench Measures Whether Agent Skills Actually Improve AI Performance

SkillsBench Measures Whether Agent Skills Actually Improve AI Performance

SkillsBench launched as a benchmark of 86 tasks across 11 domains, testing whether agent skills actually improve AI agent performance. Curated human-authored skills raise pass rates by 16.2 percentage points on average, while self-generated skills provide no benefit.

Researchers Reveal Performance Gaps in Agent Skills and Propose Refinement Fix

DAIR.AIApr 8

Researchers Reveal Performance Gaps in Agent Skills and Propose Refinement Fix

New research finds that AI agent performance gains from domain-specific skills disappear when agents must search through large, noisy collections of 34,000 real-world options. Introducing a query-specific refinement step recovers this lost performance, boosting Claude Opus 4.6 success rates on terminal tasks by nearly 8%.

Google ADK Adds Six Agent Skills for Coding Agents to Its Docs

Richard SeroterMar 15

Google ADK Adds Six Agent Skills for Coding Agents to Its Docs

Google's Agent Development Kit now ships six agent skills in its docs repo, loadable into any coding agent via npx skills. The skills cover development lifecycle, evaluation, deployment, observability, and project scaffolding — structured ADK expertise for your coding agent.

GitHub Pilots AI Agent to Proactively Fix Accessibility Issues in Pull Requests

GitHubJun 9

GitHub Pilots AI Agent to Proactively Fix Accessibility Issues in Pull Requests

GitHub is piloting an experimental general-purpose accessibility agent that has reviewed 3,535 pull requests and achieved a 68% resolution rate. This agent aims to prevent accessibility barriers from the start by automatically identifying and remediating issues in front-end code.