affaan-m/everything-claude-codeSoftware EngineeringFrontend and Design

eval-harness

Formal evaluation framework for Claude Code sessions implementing eval-driven development principles.

SkillJury keeps community verdicts, source metadata, and external repository signals in separate lanes so ranking data never pretends to be a review.

SkillJury verdict

Pending

No approved reviews yet

Would recommend

Pending

Waiting on enough review volume

Install signal

Weekly or total install activity from catalog data

0 review requests

Install command

npx skills add https://github.com/affaan-m/everything-claude-code --skill eval-harness

SkillJury does not have enough approved reviews to publish a community verdict yet. Source metadata and repository proof are still available above.

SkillJury Signal Summary

As of May 1, 2026, eval-harness has 3 weekly installs, 0 community reviews on SkillJury. Community votes currently stand at 0 upvotes and 0 downvotes. Source: affaan-m/everything-claude-code. Canonical URL: https://skills.sh/affaan-m/everything-claude-code/eval-harness.

Security audits

Gen Agent Trust HubPASS

SocketPASS

SnykPASS

About this skill

Formal evaluation framework for Claude Code sessions implementing eval-driven development principles. A formal evaluation framework for Claude Code sessions, implementing eval-driven development (EDD) principles. Eval-Driven Development treats evals as the "unit tests of AI development": Test if Claude can do something it couldn't before: Ensure changes don't break existing functionality: Deterministic checks using code: Use Claude to evaluate open-ended outputs: Flag for manual review: "At least one success in k attempts" "All k trials succeed" Write code to pass the defined evals. Creates eval definition file at .claude/evals/feature-name.md Runs current evals and reports status Generates full eval report Store evals in project: Use product evals when behavior quality cannot be captured by unit tests alone. Recommended thresholds: - Defines capability and regression evals with pass/fail criteria before implementation, treating evals as unit tests for AI-assisted workflows - Supports three grader types: code-based (deterministic checks via bash/grep), model-based (Claude-as-judge), and human review for manual adjudication - Tracks reliability with pass@k metrics (success within k attempts) and pass^k (all k trials succeed), with recommended thresholds of pass@3 ≥ 90% for capabilities and pass^3 = 100% for regressions - Integrates into Claude Code workflow with commands to...

Source description provided by the upstream listing. Community review signal and install context stay separate from this narrative layer.

Community reviews

Latest reviews

No community reviews yet. Be the first to review.

Browse this skill in context

Agents

Claude Code Skills CLI

Source

affaan-m/everything-claude-code

FAQ

What does eval-harness do?

Formal evaluation framework for Claude Code sessions implementing eval-driven development principles.

Is eval-harness good?

eval-harness does not have approved reviews yet, so SkillJury cannot publish a community verdict.

Which AI agents support eval-harness?

eval-harness currently lists compatibility with Claude Code, Skills CLI.

Is eval-harness safe to install?

eval-harness has been scanned by security audit providers tracked on SkillJury. Check the security audits section on this page for detailed results from Socket.dev and Snyk.

What are alternatives to eval-harness?

Skills in the same category include review-management, conversation-memory, coverage, grimoire-aave.

How do I install eval-harness?

Run the following command to install eval-harness: npx skills add https://github.com/affaan-m/everything-claude-code --skill eval-harness

review-management conversation-memory coverage grimoire-aave

Related skills

More from affaan-m/everything-claude-code

affaan-m/everything-claude-code/Software Engineering

gateguard

A PreToolUse hook that forces Claude to investigate before editing. Instead of self-evaluation ("are you sure?"), it demands concrete facts. The act of investigation creates awareness that self-evaluation never did.

Software EngineeringFrontend and DesignNo reviews yetSource affaan-m/everything-claude-code

affaan-m/everything-claude-code/Software Engineering

golang-patterns

Idiomatic Go patterns, best practices, and conventions for building robust applications.

Software EngineeringFrontend and DesignNo reviews yetSource affaan-m/everything-claude-code

affaan-m/everything-claude-code/Software Engineering

security-review

Comprehensive security checklist and patterns for authentication, input validation, secrets management, and sensitive operations.

Software EngineeringFrontend and DesignNo reviews yetSource affaan-m/everything-claude-code

affaan-m/everything-claude-code/Software Engineering

backend-patterns

Architectural patterns, API design, and database optimization for Node.js, Express, and Next.js backends.

Software EngineeringFrontend and DesignNo reviews yetSource affaan-m/everything-claude-code

Related skills

Alternatives in Software Engineering

eronred/aso-skills/Software Engineering

review-management

Source details, install context, and public review data are available on the full page.

Software EngineeringFrontend and DesignNo reviews yetSource eronred/aso-skills

sickn33/antigravity-awesome-skills/Software Engineering

conversation-memory

Persistent memory systems for LLM conversations with tiered storage and intelligent retrieval.

Software EngineeringFrontend and DesignNo reviews yetSource sickn33/antigravity-awesome-skills

alirezarezvani/claude-skills/Software Engineering

coverage

Map all testable surfaces in the application and identify what's tested vs. what's missing.

Software EngineeringFrontend and DesignNo reviews yetSource alirezarezvani/claude-skills

franalgaba/grimoire/Software Engineering

grimoire-aave

Query Aave V3 market data, reserve snapshots, and health metrics across supported chains.

Software EngineeringFrontend and DesignNo reviews yetSource franalgaba/grimoire