Data-Free Pruning of Self-Attention Layers in LLMs

Research #llm 🔬 Research|分析: 2025年12月25日 09:28•

发布: 2025年12月25日 05:00

•

1分で読める

分析

This paper introduces Gate-Norm, a novel method for pruning self-attention layers in large language models (LLMs) without requiring any training data. The core idea revolves around the \

要点

引用 / 来源

查看原文

"Pruning $8$--$16$ attention sublayers yields up to $1.30\times$ higher inference throughput while keeping average zero-shot accuracy within $2\%$ of the unpruned baseline."

ArXiv ML2025年12月25日 05:00

* 根据版权法第32条进行合法引用。

较旧

SHRP: Specialized Head Routing and Pruning for Efficient Encoder Compression

较新

Forecasting N-Body Dynamics: Neural ODEs vs. Universal Differential Equations

Data-Free Pruning of Self-Attention Layers in LLMs

分析

要点

相关分析

人类AI检测

侧重于实现的深度学习书籍

个性化 Gemini

📬 获取AI新闻

按类别浏览

热门话题

📬 获取AI新闻

按类别浏览

热门话题