Width Pruning in Llama-3: Enhancing Instruction Following by Reducing Factual Knowledge
Analysis
This paper challenges the common understanding of model pruning by demonstrating that width pruning, guided by the Maximum Absolute Weight (MAW) criterion, can selectively improve instruction-following capabilities while degrading performance on tasks requiring factual knowledge. This suggests that pruning can be used to trade off knowledge for improved alignment and truthfulness, offering a novel perspective on model optimization and alignment.
Key Takeaways
- •Width pruning, guided by MAW, reveals a dichotomy: knowledge degrades while instruction-following improves.
- •Expansion ratio is a critical architectural parameter that modulates cognitive capabilities.
- •Inverse correlation between factual knowledge and truthfulness is observed.
- •Pruned configurations offer energy efficiency gains but may impact latency in single-request scenarios.
Reference
“Instruction-following capabilities improve substantially (+46% to +75% in IFEval for Llama-3.2-1B and 3B models).”