Skip to main content
Back to registry

evaluating-llms-harness

davila7/claude-code-templates

lm-evaluation-harness evaluates LLMs across 60+ academic benchmarks using standardized prompts and metrics.

Installs188
Install command
npx skills add https://github.com/davila7/claude-code-templates --skill evaluating-llms-harness
Security audits
Gen Agent Trust HubPASS
SocketPASS
SnykWARN
Community Reviews

Latest reviews

Sign in to review

No community reviews yet. Be the first to review.

Browse this skill in context
FAQ
What does evaluating-llms-harness do?

lm-evaluation-harness evaluates LLMs across 60+ academic benchmarks using standardized prompts and metrics.

Is evaluating-llms-harness good?

evaluating-llms-harness does not have approved reviews yet, so SkillJury cannot publish a community verdict.

What agent does evaluating-llms-harness work with?

evaluating-llms-harness currently lists compatibility with codex, gemini-cli, opencode, cursor, github-copilot, claude-code.

What are alternatives to evaluating-llms-harness?

Skills in the same category include telegram-bot-builder, flutter-app-size, sharp-edges, iterative-retrieval.

How do I install evaluating-llms-harness?

npx skills add https://github.com/davila7/claude-code-templates --skill evaluating-llms-harness

Related skills

More from davila7/claude-code-templates

Related skills

Alternatives in Software Engineering