Search: Gate-Normは、LLMにおける自己注意層のデータフリーな枝刈りを可能にします。 - ai.jp.net

Research #llm 🔬 ResearchAnalyzed: Dec 25, 2025 09:28

Data-Free Pruning of Self-Attention Layers in LLMs

Published:Dec 25, 2025 05:00

•

1 min read

•

ArXiv ML

Analysis

This paper introduces Gate-Norm, a novel method for pruning self-attention layers in large language models (LLMs) without requiring any training data. The core idea revolves around the \

Key Takeaways

•Gate-Norm enables data-free pruning of self-attention layers in LLMs.
•It leverages the Attention Suppression Hypothesis to identify redundant layers.
•The method achieves significant inference throughput improvements with minimal accuracy loss.

Reference

“Pruning $8$--$16$ attention sublayers yields up to $1.30\times$ higher inference throughput while keeping average zero-shot accuracy within $2\%$ of the unpruned baseline.”

Permalink ArXiv ML

Data-Free Pruning of Self-Attention Layers in LLMs

Analysis

Key Takeaways

📬 Get AI News Delivered

Browse by Category

Trending Topics

📬 Get AI News Delivered

Browse by Category

Trending Topics