# agent-benchmark-kit

Automated quality assurance for Claude Code agents using LLM-as-judge evaluation