# agent-benchmark-kit Automated quality assurance for Claude Code agents using LLM-as-judge evaluation