skillgrade allows you to evaluate your agent skills and keep them from regressing over time Just shipped a couple of more versions over the past few days. Give it a try! https://t.co/NPVCKSG7CI
Skillgrade Brings Regression Testing to Agent Skills via Automated Evals
· Updated
Skillgrade released an open-source CLI that runs automated evals against agent skills, catching regressions when a skill, model, or agent changes. Until now, there was no standard way to verify agent skills hold up across model updates.
eval.yaml — scoring results with deterministic checks, LLM rubric graders, or a weighted mix of both. Running skillgrade init on a skill directory reads its SKILL.md and generates eval tasks with AI assistance.Previously, there was no standard way to verify agent skills still work after a model or agent swap. Three run presets cover smoke tests during development, balanced estimates before merging, and regression detection before shipping. A --ci flag exits non-zero when pass rate falls below a configurable threshold, making skill quality a build gate.
If you maintain agent skills for Claude Code, Gemini, or Codex, the gap between skills that work in demos and skills that hold up across model updates is exactly what Skillgrade is built for.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →






