Improving LLM Pruning Generalization with Function-Aware Grouping
Published:Dec 28, 2025 17:26
•1 min read
•ArXiv
Analysis
This paper addresses the challenge of limited generalization in post-training structured pruning of Large Language Models (LLMs). It proposes a novel framework, Function-Aware Neuron Grouping (FANG), to mitigate calibration bias and improve downstream task accuracy. The core idea is to group neurons based on their functional roles and prune them independently, giving higher weight to tokens correlated with the group's function. The adaptive sparsity allocation based on functional complexity is also a key contribution. The results demonstrate improved performance compared to existing methods, making this a valuable contribution to the field of LLM compression.
Key Takeaways
- •Proposes Function-Aware Neuron Grouping (FANG) for improved LLM pruning.
- •Addresses generalization issues caused by calibration bias.
- •Groups neurons based on semantic context and functional roles.
- •Achieves state-of-the-art results compared to existing pruning methods (FLAP, OBC).
- •Improves downstream accuracy while preserving language modeling performance.
Reference
“FANG outperforms FLAP and OBC by 1.5%--8.5% in average accuracy under 30% and 40% sparsity.”