Search: training-inference - ai.jp.net

Paper #llm 🔬 ResearchAnalyzed: Jan 3, 2026 19:14

Stable LLM RL via Dynamic Vocabulary Pruning

Published:Dec 28, 2025 21:44

•

1 min read

•

ArXiv

Analysis

This paper addresses the instability in Reinforcement Learning (RL) for Large Language Models (LLMs) caused by the mismatch between training and inference probability distributions, particularly in the tail of the token probability distribution. The authors identify that low-probability tokens in the tail contribute significantly to this mismatch and destabilize gradient estimation. Their proposed solution, dynamic vocabulary pruning, offers a way to mitigate this issue by excluding the extreme tail of the vocabulary, leading to more stable training.

Key Takeaways

•Addresses the training-inference mismatch problem in LLM RL.
•Identifies the tail of the token probability distribution as a key source of instability.
•Proposes dynamic vocabulary pruning as a solution to stabilize training.
•Offers a theoretical bound on the optimization bias introduced by pruning.

Reference

“The authors propose constraining the RL objective to a dynamically-pruned ``safe'' vocabulary that excludes the extreme tail.”

Permalink ArXiv

Research #llm 🔬 ResearchAnalyzed: Jan 4, 2026 10:47

Deterministic Inference across Tensor Parallel Sizes That Eliminates Training-Inference Mismatch

Published:Nov 21, 2025 22:40

•

1 min read

•

ArXiv

Analysis

This article likely discusses a method to ensure consistent results during inference, regardless of the tensor parallel size used. This is a crucial problem in large language model (LLM) deployment, as different hardware configurations can lead to varying outputs. The deterministic approach aims to provide reliable and predictable results.

Key Takeaways

•Addresses the training-inference mismatch problem in LLMs.
•Focuses on deterministic inference for consistent results.
•Relevant to LLM deployment and hardware scalability.

Reference

“”

Permalink ArXiv

Stable LLM RL via Dynamic Vocabulary Pruning

Analysis

Key Takeaways

Deterministic Inference across Tensor Parallel Sizes That Eliminates Training-Inference Mismatch

Analysis

Key Takeaways

📬 Get AI News Delivered

Browse by Category

Trending Topics

📬 Get AI News Delivered

Browse by Category

Trending Topics