4 lines
104 B
Markdown
4 lines
104 B
Markdown
# agent-benchmark-kit
|
|
|
|
Automated quality assurance for Claude Code agents using LLM-as-judge evaluation
|