Search:
Match:
5 results
Paper#llm🔬 ResearchAnalyzed: Jan 3, 2026 19:14

Stable LLM RL via Dynamic Vocabulary Pruning

Published:Dec 28, 2025 21:44
1 min read
ArXiv

Analysis

This paper addresses the instability in Reinforcement Learning (RL) for Large Language Models (LLMs) caused by the mismatch between training and inference probability distributions, particularly in the tail of the token probability distribution. The authors identify that low-probability tokens in the tail contribute significantly to this mismatch and destabilize gradient estimation. Their proposed solution, dynamic vocabulary pruning, offers a way to mitigate this issue by excluding the extreme tail of the vocabulary, leading to more stable training.
Reference

The authors propose constraining the RL objective to a dynamically-pruned ``safe'' vocabulary that excludes the extreme tail.

Paper#LLM🔬 ResearchAnalyzed: Jan 3, 2026 16:22

Width Pruning in Llama-3: Enhancing Instruction Following by Reducing Factual Knowledge

Published:Dec 27, 2025 18:09
1 min read
ArXiv

Analysis

This paper challenges the common understanding of model pruning by demonstrating that width pruning, guided by the Maximum Absolute Weight (MAW) criterion, can selectively improve instruction-following capabilities while degrading performance on tasks requiring factual knowledge. This suggests that pruning can be used to trade off knowledge for improved alignment and truthfulness, offering a novel perspective on model optimization and alignment.
Reference

Instruction-following capabilities improve substantially (+46% to +75% in IFEval for Llama-3.2-1B and 3B models).

Research#llm🔬 ResearchAnalyzed: Dec 25, 2025 09:28

Data-Free Pruning of Self-Attention Layers in LLMs

Published:Dec 25, 2025 05:00
1 min read
ArXiv ML

Analysis

This paper introduces Gate-Norm, a novel method for pruning self-attention layers in large language models (LLMs) without requiring any training data. The core idea revolves around the \
Reference

Pruning $8$--$16$ attention sublayers yields up to $1.30\times$ higher inference throughput while keeping average zero-shot accuracy within $2\%$ of the unpruned baseline.

Research#LLM Pruning🔬 ResearchAnalyzed: Jan 10, 2026 10:59

OPTIMA: Efficient LLM Pruning with Quadratic Programming

Published:Dec 15, 2025 20:41
1 min read
ArXiv

Analysis

This research explores a novel method for pruning Large Language Models (LLMs) to improve efficiency. The use of quadratic programming for reconstruction suggests a potentially mathematically sound and efficient approach to model compression.
Reference

OPTIMA utilizes Quadratic Programming Reconstruction for LLM pruning.

Research#llm📝 BlogAnalyzed: Dec 28, 2025 21:57

Jonathan Frankle: Neural Network Pruning and Training

Published:Apr 10, 2023 21:47
1 min read
Weights & Biases

Analysis

This article summarizes a discussion between Jonathan Frankle and Lukas Biewald on the Gradient Dissent podcast. The primary focus is on neural network pruning and training, including the "Lottery Ticket Hypothesis." The article likely delves into the techniques and challenges associated with reducing the size of neural networks (pruning) while maintaining or improving performance. It probably explores methods for training these pruned networks effectively and the implications of the Lottery Ticket Hypothesis, which suggests that within a large, randomly initialized neural network, there exists a subnetwork (a "winning ticket") that can achieve comparable performance when trained in isolation. The discussion likely covers practical applications and research advancements in this field.
Reference

The article doesn't contain a direct quote, but the discussion likely revolves around pruning techniques, training methodologies, and the Lottery Ticket Hypothesis.