Skip to main content
Back to registry

sparse-autoencoder-training

davila7/claude-code-templates

SAELens is the primary library for training and analyzing Sparse Autoencoders (SAEs) - a technique for decomposing polysemantic neural network activations into sparse, interpretable features. Based on Anthropic's groundbreaking research on monosemanticity.

Installs148
Install command
npx skills add https://github.com/davila7/claude-code-templates --skill sparse-autoencoder-training
About this skill
SAELens is the primary library for training and analyzing Sparse Autoencoders (SAEs) - a technique for decomposing polysemantic neural network activations into sparse, interpretable features. Based on Anthropic's groundbreaking research on monosemanticity. GitHub : jbloomAus/SAELens (1,100+ stars) Individual neurons in neural networks are polysemantic - they activate in multiple, semantically distinct contexts. This happens because models use superposition to represent more features than they have neurons, making interpretability difficult. SAEs solve this by decomposing dense activations into sparse, monosemantic features - typically only a small number of features activate for any given input, and each feature corresponds to an interpretable concept. Use SAELens when you need to: Consider alternatives when: Requirements: Python 3.10+, transformer-lens>=2.0.0 SAEs are trained to reconstruct model activations through a sparse bottleneck: Loss Function : MSE(original, reconstructed) + L1_coefficient × L1(features) In "Towards Monosemanticity", human evaluators found 70% of SAE features genuinely interpretable . Features discovered include: Browse pre-trained SAE features at neuronpedia.org : For detailed API documentation, tutorials, and advanced usage, see the references/ folder: - Discover interpretable features in model activations - Understand what concepts a model has...

Source description provided by the upstream skill listing. Community reviews and install context appear in the sections below.

Community Reviews

Latest reviews

Sign in to review

No community reviews yet. Be the first to review.

Browse this skill in context
FAQ
What does sparse-autoencoder-training do?

SAELens is the primary library for training and analyzing Sparse Autoencoders (SAEs) - a technique for decomposing polysemantic neural network activations into sparse, interpretable features. Based on Anthropic's groundbreaking research on monosemanticity.

Is sparse-autoencoder-training good?

sparse-autoencoder-training does not have approved reviews yet, so SkillJury cannot publish a community verdict.

What agent does sparse-autoencoder-training work with?

sparse-autoencoder-training currently lists compatibility with codex, gemini-cli, opencode, cursor, claude-code, antigravity.

What are alternatives to sparse-autoencoder-training?

Skills in the same category include telegram-bot-builder, flutter-app-size, sharp-edges, iterative-retrieval.

How do I install sparse-autoencoder-training?

npx skills add https://github.com/davila7/claude-code-templates --skill sparse-autoencoder-training

Related skills

More from davila7/claude-code-templates

Related skills

Alternatives in Software Engineering