affaan-m/everything-claude-codeSoftware EngineeringFrontend and Design

agent-eval

A lightweight CLI tool for comparing coding agents head-to-head on reproducible tasks. Every "which coding agent is best?" comparison runs on vibes — this tool systematizes it.

SkillJury keeps community verdicts, source metadata, and external repository signals in separate lanes so ranking data never pretends to be a review.

SkillJury verdict

Pending

No approved reviews yet

Would recommend

Pending

Waiting on enough review volume

Install signal

Weekly or total install activity from catalog data

0 review requests

Install command

npx skills add https://github.com/affaan-m/everything-claude-code --skill agent-eval

SkillJury does not have enough approved reviews to publish a community verdict yet. Source metadata and repository proof are still available above.

SkillJury Signal Summary

As of Apr 30, 2026, agent-eval has 2 weekly installs, 0 community reviews on SkillJury. Community votes currently stand at 0 upvotes and 0 downvotes. Source: affaan-m/everything-claude-code. Canonical URL: https://skills.sh/affaan-m/everything-claude-code/agent-eval.

Security audits

Gen Agent Trust HubFAIL

SocketPASS

SnykWARN

About this skill

A lightweight CLI tool for comparing coding agents head-to-head on reproducible tasks. Every "which coding agent is best?" comparison runs on vibes — this tool systematizes it. Note: Install agent-eval from its repository after reviewing the source. Define tasks declaratively. Each task specifies what to do, which files to touch, and how to judge success: Each agent run gets its own git worktree — no Docker required. This provides reproducibility isolation so agents cannot interfere with each other or corrupt the base repo. Create a tasks/ directory with YAML files, one per task: Execute agents against your tasks: Each run: Generate a comparison report: - Comparing coding agents (Claude Code, Aider, Codex, etc.) on your own codebase - Measuring agent performance before adopting a new tool or model - Running regression checks when an agent updates its model or tooling - Producing data-backed agent selection decisions for a team - Creates a fresh git worktree from the specified commit - Hands the prompt to the agent - Runs the judge criteria - Records pass/fail, cost, and time - Start with 3-5 tasks that represent your real workload, not toy examples - Run at least 3 trials per agent to capture variance — agents are non-deterministic - Pin the commit in your task YAML so results are reproducible across days/weeks - Include at least one deterministic judge (tests, build) per task...

Source description provided by the upstream listing. Community review signal and install context stay separate from this narrative layer.

Community reviews

Latest reviews

No community reviews yet. Be the first to review.

Browse this skill in context

Agents

Claude Code Codex Skills CLI

Source

affaan-m/everything-claude-code

FAQ

What does agent-eval do?

A lightweight CLI tool for comparing coding agents head-to-head on reproducible tasks. Every "which coding agent is best?" comparison runs on vibes — this tool systematizes it.

Is agent-eval good?

agent-eval does not have approved reviews yet, so SkillJury cannot publish a community verdict.

Which AI agents support agent-eval?

agent-eval currently lists compatibility with Claude Code, Codex, Skills CLI.

Is agent-eval safe to install?

agent-eval has been scanned by security audit providers tracked on SkillJury. Check the security audits section on this page for detailed results from Socket.dev and Snyk.

What are alternatives to agent-eval?

Skills in the same category include grimoire-morpho-blue, conversation-memory, second-brain-ingest, zai-tts.

How do I install agent-eval?

Run the following command to install agent-eval: npx skills add https://github.com/affaan-m/everything-claude-code --skill agent-eval

grimoire-morpho-blue conversation-memory second-brain-ingest zai-tts

Related skills

More from affaan-m/everything-claude-code

affaan-m/everything-claude-code/Software Engineering

gateguard

A PreToolUse hook that forces Claude to investigate before editing. Instead of self-evaluation ("are you sure?"), it demands concrete facts. The act of investigation creates awareness that self-evaluation never did.

Software EngineeringFrontend and DesignNo reviews yetSource affaan-m/everything-claude-code

affaan-m/everything-claude-code/Software Engineering

golang-patterns

Idiomatic Go patterns, best practices, and conventions for building robust applications.

Software EngineeringFrontend and DesignNo reviews yetSource affaan-m/everything-claude-code

affaan-m/everything-claude-code/Software Engineering

security-review

Comprehensive security checklist and patterns for authentication, input validation, secrets management, and sensitive operations.

Software EngineeringFrontend and DesignNo reviews yetSource affaan-m/everything-claude-code

affaan-m/everything-claude-code/Software Engineering

backend-patterns

Architectural patterns, API design, and database optimization for Node.js, Express, and Next.js backends.

Software EngineeringFrontend and DesignNo reviews yetSource affaan-m/everything-claude-code

Related skills

Alternatives in Software Engineering

franalgaba/grimoire/Software Engineering

grimoire-morpho-blue

Query Morpho Blue deployment metadata and vault snapshots via the Grimoire CLI.

Software EngineeringFrontend and DesignNo reviews yetSource franalgaba/grimoire

sickn33/antigravity-awesome-skills/Software Engineering

conversation-memory

Persistent memory systems for LLM conversations with tiered storage and intelligent retrieval.

Software EngineeringFrontend and DesignNo reviews yetSource sickn33/antigravity-awesome-skills

nicholasspisak/second-brain/Software Engineering

second-brain-ingest

Process raw source documents into structured, interlinked wiki pages.

Software EngineeringWriting and ContentNo reviews yetSource nicholasspisak/second-brain

aahl/skills/Software Engineering

zai-tts

High-quality text-to-speech audio generation using GLM-TTS with customizable voices and playback parameters.

Software EngineeringFrontend and DesignNo reviews yetSource aahl/skills