Focal Loss for LLMs: An Untapped Potential or a Hidden Pitfall?
Analysis
Key Takeaways
- •Focal loss is designed to address class imbalance by focusing on hard examples.
- •LLM training involves predicting the next token, which can be viewed as a highly imbalanced classification task.
- •The effectiveness of focal loss in LLM pretraining remains largely unexplored.
“Now i have been thinking that LLM models based on the transformer architecture are essentially an overglorified classifier during training (forced prediction of the next token at every step).”