Data-Free Pruning of Self-Attention Layers in LLMs

Research#llm🔬 Research|Analyzed: Dec 25, 2025 09:28
Published: Dec 25, 2025 05:00
1 min read
ArXiv ML

Analysis

This paper introduces Gate-Norm, a novel method for pruning self-attention layers in large language models (LLMs) without requiring any training data. The core idea revolves around the \
Reference / Citation
View Original
"Pruning $8$--$16$ attention sublayers yields up to $1.30\times$ higher inference throughput while keeping average zero-shot accuracy within $2\%$ of the unpruned baseline."
A
ArXiv MLDec 25, 2025 05:00
* Cited for critical analysis under Article 32.