Skip to main content
Back to registry

moe-training

davila7/claude-code-templates

Installs181
Install command
npx skills add https://github.com/davila7/claude-code-templates --skill moe-training
Security audits
Gen Agent Trust HubPASS
SocketPASS
SnykWARN
About this skill
Use MoE Training when you need to: Notable MoE Models : Mixtral 8x7B (Mistral AI), DeepSeek-V3, Switch Transformers (Google), GLaM (Google), NLLB-MoE (Meta) Key Components: Top-1 Routing (Switch Transformer): Top-2 Routing (Mixtral): Expert Choice Routing: Auxiliary Loss: Router Z-Loss (Stability): - Train larger models with limited compute (5× cost reduction vs dense models) - Scale model capacity without proportional compute increase - Achieve better performance per compute budget than dense models - Specialize experts for different domains/tasks/languages - Reduce inference latency with sparse activation (only 13B/47B params active in Mixtral) - Implement SOTA models like Mixtral 8x7B, DeepSeek-V3, Switch Transformers - Experts : Multiple specialized FFN networks (typically 8-128) - Router/Gate : Learned network that selects which experts to use - Top-k Routing : Activate only k experts per token (k=1 or k=2) - Load Balancing : Ensure even expert utilization - DeepSpeed MoE Tutorial : - Mixtral Paper : - Switch Transformers : - HuggingFace MoE Guide : - NVIDIA MoE Blog : - references/architectures.md - MoE model architectures (Mixtral, Switch, DeepSeek-V3) - references/training.md - Advanced training techniques and optimization - references/inference.md - Production deployment and serving patterns

Source description provided by the upstream skill listing. Community reviews and install context appear in the sections below.

Community Reviews

Latest reviews

Sign in to review

No community reviews yet. Be the first to review.

Browse this skill in context
FAQ
What does moe-training do?

moe-training is listed in SkillJury, but the source summary is still sparse.

Is moe-training good?

moe-training does not have approved reviews yet, so SkillJury cannot publish a community verdict.

What agent does moe-training work with?

moe-training currently lists compatibility with codex, gemini-cli, opencode, cursor, github-copilot, claude-code.

What are alternatives to moe-training?

Skills in the same category include telegram-bot-builder, flutter-app-size, sharp-edges, iterative-retrieval.

How do I install moe-training?

npx skills add https://github.com/davila7/claude-code-templates --skill moe-training

Related skills

More from davila7/claude-code-templates

Related skills

Alternatives in Software Engineering