Skip to main content
Back to the directory
alirezarezvani/claude-skillsSoftware EngineeringTesting and QA

eval

Rank all agent results for a session. Supports metric-based evaluation (run a command), LLM judge (compare diffs), or hybrid.

SkillJury keeps community verdicts, source metadata, and external repository signals in separate lanes so ranking data never pretends to be a review.

SkillJury verdict
Pending

No approved reviews yet

Would recommend
Pending

Waiting on enough review volume

Install signal
791

Weekly or total install activity from catalog data

Sign in to review
0 review requests
Install command
npx skills add https://github.com/alirezarezvani/claude-skills --skill eval
SkillJury does not have enough approved reviews to publish a community verdict yet. Source metadata and repository proof are still available above.
SkillJury Signal Summary

As of Apr 30, 2026, eval has 791 weekly installs, 0 community reviews on SkillJury. Community votes currently stand at 0 upvotes and 0 downvotes. Source: alirezarezvani/claude-skills. Canonical URL: https://skills.sh/alirezarezvani/claude-skills/eval.

Security audits
Gen Agent Trust HubWARN
SocketPASS
SnykPASS
About this skill
Rank all agent results for a session. Supports metric-based evaluation (run a command), LLM judge (compare diffs), or hybrid. Run the evaluation command in each agent's worktree: Output: For each agent: Present rankings with justification. Example LLM judge output for a content task: - Get the diff: git diff {base_branch}...{agent_branch} - Read the agent's result post from .agenthub/board/results/agent-{i}-result.md - Compare all diffs and rank by: - Correctness — Does it solve the task? - Simplicity — Fewer lines changed is better (when equal correctness) - Quality — Clean execution, good structure, no regressions - Run metric evaluation first - If top agents are within 10% of each other, use LLM judge to break ties - Present both metric and qualitative rankings - Update session state: - Tell the user: - Ranked results with winner highlighted - Next step: /hub:merge to merge the winner - Or /hub:merge {session-id} --agent {winner} to be explicit

Source description provided by the upstream listing. Community review signal and install context stay separate from this narrative layer.

Community reviews

Latest reviews

No community reviews yet. Be the first to review.

Browse this skill in context
FAQ
What does eval do?

Rank all agent results for a session. Supports metric-based evaluation (run a command), LLM judge (compare diffs), or hybrid.

Is eval good?

eval does not have approved reviews yet, so SkillJury cannot publish a community verdict.

Which AI agents support eval?

eval currently lists compatibility with Skills CLI.

Is eval safe to install?

eval has been scanned by security audit providers tracked on SkillJury. Check the security audits section on this page for detailed results from Socket.dev and Snyk.

What are alternatives to eval?

Skills in the same category include grimoire-morpho-blue, conversation-memory, second-brain-ingest, zai-tts.

How do I install eval?

Run the following command to install eval: npx skills add https://github.com/alirezarezvani/claude-skills --skill eval

Related skills

More from alirezarezvani/claude-skills

Related skills

Alternatives in Software Engineering