Validating Validation Sets
Analysis
This article discusses a method for validating validation sets, particularly when dealing with small sample sizes. The core idea involves resampling different holdout choices multiple times to create a histogram, allowing users to assess the quality and representativeness of their chosen validation split. This approach aims to address concerns about whether the validation set is effectively flagging overfitting or if it's too perfect, potentially leading to misleading results. The provided GitHub link offers a toy example using MNIST, suggesting the principle's potential for broader application pending rigorous review. This is a valuable exploration for improving the reliability of model evaluation, especially in data-scarce scenarios.
Key Takeaways
“This exploratory, p-value-adjacent approach to validating the data universe (train and hold out split) resamples different holdout choices many times to create a histogram to shows where your split lies.”