Grokking, Generalization Collapse, and the Dynamics of Training Deep Neural Networks with Charles Martin - #734
Analysis
Key Takeaways
- •Weight Watcher is an open-source tool for analyzing and improving DNNs.
- •The tool utilizes Heavy-Tailed Self-Regularization (HTSR) theory.
- •Weight Watcher can identify underfitting, grokking, and generalization collapse phases.
“Charles walks us through WeightWatcher’s ability to detect three distinct learning phases—underfitting, grokking, and generalization collapse—and how its signature “layer quality” metric reveals whether individual layers are underfit, overfit, or optimally tuned.”