在稀疏自编码器中蒸馏一致的特征

Paper #LLM 🔬 Research|分析: 2026年1月3日 06:17•

发布: 2025年12月31日 17:12

•

1分で読める

分析

本文解决了稀疏自编码器（SAE）中特征冗余和不一致的问题，这阻碍了可解释性和可重用性。作者提出了一种新的蒸馏方法，即Distilled Matryoshka Sparse Autoencoders (DMSAEs)，以提取有用特征的紧凑且一致的核心。这通过一个迭代蒸馏循环来实现，该循环使用梯度 x 激活来衡量特征贡献，并且仅保留最重要的特征。该方法在Gemma-2-2B上进行了验证，证明了学习到的特征的性能和可迁移性得到了提高。

关键要点

引用 / 来源

查看原文

"DMSAEs run an iterative distillation cycle: train a Matryoshka SAE with a shared core, use gradient X activation to measure each feature's contribution to next-token loss in the most nested reconstruction, and keep only the smallest subset that explains a fixed fraction of the attribution."

ArXiv2025年12月31日 17:12

* 根据版权法第32条进行合法引用。

较旧

Comprehension debt: A ticking time bomb of LLM-generated code

较新

New SOTA in 4D Gaussian Reconstruction for Autonomous Driving Simulation

在稀疏自编码器中蒸馏一致的特征

分析

关键要点

相关分析

从未对齐图像即时进行3D场景编辑

基于选择策略的协调人形机器人操作

用于未来预测的LLM预测

📬 Get AI News Delivered

按类别浏览

热门话题

📬 Get AI News Delivered

按类别浏览

热门话题