HeadsUpAI

Skillgrade Brings Regression Testing to Agent Skills via Automated Evals

· Updated

Skillgrade, a CLI tool for testing agent skills, released v0.1.3 this week. It runs an AI agent against tasks defined in a single eval.yaml — scoring results with deterministic checks, LLM rubric graders, or a weighted mix of both. Running skillgrade init on a skill directory reads its SKILL.md and generates eval tasks with AI assistance.

Previously, there was no standard way to verify agent skills still work after a model or agent swap. Three run presets cover smoke tests during development, balanced estimates before merging, and regression detection before shipping. A --ci flag exits non-zero when pass rate falls below a configurable threshold, making skill quality a build gate.

If you maintain agent skills for Claude Code, Gemini, or Codex, the gap between skills that work in demos and skills that hold up across model updates is exactly what Skillgrade is built for.

MG
Minko Gechev
@mgechev
X

skillgrade allows you to evaluate your agent skills and keep them from regressing over time Just shipped a couple of more versions over the past few days. Give it a try! https://t.co/NPVCKSG7CI

10retweets63likes
View on X

Share this update