evaluating-code-models
davila7/claude-code-templates
BigCode Evaluation Harness evaluates code generation models across 15+ benchmarks including HumanEval, MBPP, and MultiPL-E (18 languages).
npx skills add https://github.com/davila7/claude-code-templates --skill evaluating-code-models
Latest reviews
No community reviews yet. Be the first to review.
What does evaluating-code-models do?
BigCode Evaluation Harness evaluates code generation models across 15+ benchmarks including HumanEval, MBPP, and MultiPL-E (18 languages).
Is evaluating-code-models good?
evaluating-code-models does not have approved reviews yet, so SkillJury cannot publish a community verdict.
What agent does evaluating-code-models work with?
evaluating-code-models currently lists compatibility with codex, gemini-cli, opencode, cursor, github-copilot, claude-code.
What are alternatives to evaluating-code-models?
Skills in the same category include telegram-bot-builder, flutter-app-size, sharp-edges, iterative-retrieval.
How do I install evaluating-code-models?
npx skills add https://github.com/davila7/claude-code-templates --skill evaluating-code-models
More from davila7/claude-code-templates
senior-frontend
by davila7/claude-code-templates
Complete toolkit for senior frontend with modern tools and best practices.
market-research-reports
by davila7/claude-code-templates
Market research reports are comprehensive strategic documents that analyze industries, markets, and competitive landscapes to inform business decisions, investment strategies, and strategic planning. This skill generates professional-grade reports of 50+ pages with extensive visual content, modeled after deliverables...
senior-data-engineer
by davila7/claude-code-templates
World-class senior data engineer skill for production-grade AI/ML/Data systems.
senior-devops
by davila7/claude-code-templates
Complete toolkit for senior devops with modern tools and best practices.
Alternatives in Software Engineering
telegram-bot-builder
by sickn33/antigravity-awesome-skills
Source details, install context, and public review data are available on the full page.
flutter-app-size
by flutter/skills
Analyzes and optimizes Flutter application size by measuring build artifacts, generating size analysis reports, utilizing Dart DevTools for component breakdown, and implementing specific size reduction strategies such as debug info splitting, resource compression, and platform-specific tree-shaking. Assumes a...
sharp-edges
by trailofbits/skills
Evaluates whether APIs, configurations, and interfaces are resistant to developer misuse. Identifies designs where the "easy path" leads to insecurity.
iterative-retrieval
by affaan-m/everything-claude-code
Solves the "context problem" in multi-agent workflows where subagents don't know what context they need until they start working.