Search: Pruned - ai.jp.net

Paper #llm 🔬 ResearchAnalyzed: Jan 3, 2026 19:14

Stable LLM RL via Dynamic Vocabulary Pruning

Published:Dec 28, 2025 21:44

•

1 min read

•

ArXiv

Analysis

This paper addresses the instability in Reinforcement Learning (RL) for Large Language Models (LLMs) caused by the mismatch between training and inference probability distributions, particularly in the tail of the token probability distribution. The authors identify that low-probability tokens in the tail contribute significantly to this mismatch and destabilize gradient estimation. Their proposed solution, dynamic vocabulary pruning, offers a way to mitigate this issue by excluding the extreme tail of the vocabulary, leading to more stable training.

Key Takeaways

•Addresses the training-inference mismatch problem in LLM RL.
•Identifies the tail of the token probability distribution as a key source of instability.
•Proposes dynamic vocabulary pruning as a solution to stabilize training.
•Offers a theoretical bound on the optimization bias introduced by pruning.

Reference

“The authors propose constraining the RL objective to a dynamically-pruned ``safe'' vocabulary that excludes the extreme tail.”

Permalink ArXiv

Paper #LLM 🔬 ResearchAnalyzed: Jan 3, 2026 16:22

Width Pruning in Llama-3: Enhancing Instruction Following by Reducing Factual Knowledge

Published:Dec 27, 2025 18:09

•

1 min read

•

ArXiv

Analysis

This paper challenges the common understanding of model pruning by demonstrating that width pruning, guided by the Maximum Absolute Weight (MAW) criterion, can selectively improve instruction-following capabilities while degrading performance on tasks requiring factual knowledge. This suggests that pruning can be used to trade off knowledge for improved alignment and truthfulness, offering a novel perspective on model optimization and alignment.

Key Takeaways

•Width pruning, guided by MAW, reveals a dichotomy: knowledge degrades while instruction-following improves.
•Expansion ratio is a critical architectural parameter that modulates cognitive capabilities.
•Inverse correlation between factual knowledge and truthfulness is observed.
•Pruned configurations offer energy efficiency gains but may impact latency in single-request scenarios.

Reference

“Instruction-following capabilities improve substantially (+46% to +75% in IFEval for Llama-3.2-1B and 3B models).”

Permalink ArXiv

Research #llm 🔬 ResearchAnalyzed: Dec 25, 2025 09:28

Data-Free Pruning of Self-Attention Layers in LLMs

Published:Dec 25, 2025 05:00

•

1 min read

•

ArXiv ML

Analysis

This paper introduces Gate-Norm, a novel method for pruning self-attention layers in large language models (LLMs) without requiring any training data. The core idea revolves around the \

Key Takeaways

•Gate-Norm enables data-free pruning of self-attention layers in LLMs.
•It leverages the Attention Suppression Hypothesis to identify redundant layers.
•The method achieves significant inference throughput improvements with minimal accuracy loss.

Reference

“Pruning $8$--$16$ attention sublayers yields up to $1.30\times$ higher inference throughput while keeping average zero-shot accuracy within $2\%$ of the unpruned baseline.”

Permalink ArXiv ML

Research #LLM Pruning 🔬 ResearchAnalyzed: Jan 10, 2026 10:59

OPTIMA: Efficient LLM Pruning with Quadratic Programming

Published:Dec 15, 2025 20:41

•

1 min read

•

ArXiv

Analysis

This research explores a novel method for pruning Large Language Models (LLMs) to improve efficiency. The use of quadratic programming for reconstruction suggests a potentially mathematically sound and efficient approach to model compression.

Key Takeaways

•Proposes a new one-shot pruning technique for LLMs.
•Employs quadratic programming for reconstructing the pruned model.
•Aims to improve LLM efficiency through model compression.

Reference

“OPTIMA utilizes Quadratic Programming Reconstruction for LLM pruning.”

Permalink ArXiv

Research #llm 📝 BlogAnalyzed: Dec 28, 2025 21:57

Jonathan Frankle: Neural Network Pruning and Training

Published:Apr 10, 2023 21:47

•

1 min read

•

Weights & Biases

Analysis

This article summarizes a discussion between Jonathan Frankle and Lukas Biewald on the Gradient Dissent podcast. The primary focus is on neural network pruning and training, including the "Lottery Ticket Hypothesis." The article likely delves into the techniques and challenges associated with reducing the size of neural networks (pruning) while maintaining or improving performance. It probably explores methods for training these pruned networks effectively and the implications of the Lottery Ticket Hypothesis, which suggests that within a large, randomly initialized neural network, there exists a subnetwork (a "winning ticket") that can achieve comparable performance when trained in isolation. The discussion likely covers practical applications and research advancements in this field.

Key Takeaways

•The discussion centers on neural network pruning, a technique to reduce model size.
•The "Lottery Ticket Hypothesis" is a key concept, suggesting the existence of trainable subnetworks within larger networks.
•The episode likely explores practical aspects of training and applying pruned networks.

Reference

“The article doesn't contain a direct quote, but the discussion likely revolves around pruning techniques, training methodologies, and the Lottery Ticket Hypothesis.”

Permalink Weights & Biases

Stable LLM RL via Dynamic Vocabulary Pruning

Analysis

Key Takeaways

Width Pruning in Llama-3: Enhancing Instruction Following by Reducing Factual Knowledge

Analysis

Key Takeaways

Data-Free Pruning of Self-Attention Layers in LLMs

Analysis

Key Takeaways

OPTIMA: Efficient LLM Pruning with Quadratic Programming

Analysis

Key Takeaways

Jonathan Frankle: Neural Network Pruning and Training

Analysis

Key Takeaways

📬 Get AI News Delivered

Browse by Category

Trending Topics

📬 Get AI News Delivered

Browse by Category

Trending Topics