Entropy-Guided Token Dropout for LLMs with Limited Data
Published:Dec 29, 2025 12:35
•1 min read
•ArXiv
Analysis
This paper addresses the problem of overfitting in autoregressive language models when trained on limited, domain-specific data. It identifies that low-entropy tokens are learned too quickly, hindering the model's ability to generalize on high-entropy tokens during multi-epoch training. The proposed solution, EntroDrop, is a novel regularization technique that selectively masks low-entropy tokens, improving model performance and robustness.
Key Takeaways
- •Addresses overfitting in autoregressive language models trained on limited data.
- •Introduces EntroDrop, an entropy-guided token dropout method.
- •EntroDrop selectively masks low-entropy tokens to improve generalization.
- •Experiments show consistent performance improvements over standard regularization baselines.
- •Offers a promising approach for adapting LLMs in data-constrained domains.
Reference
“EntroDrop selectively masks low-entropy tokens during training and employs a curriculum schedule to adjust regularization strength in alignment with training progress.”