Understanding Deep Learning Algorithms that Leverage Unlabeled Data, Part 1: Self-training
Research#llm🔬 Research|Analyzed: Dec 25, 2025 12:34•
Published: Feb 24, 2022 08:00
•1 min read
•Stanford AIAnalysis
This article from Stanford AI introduces a series on leveraging unlabeled data in deep learning, focusing on self-training. It highlights the challenge of obtaining labeled data and the potential of using readily available unlabeled data to approach fully-supervised performance. The article sets the stage for a theoretical analysis of self-training, a significant paradigm in semi-supervised learning and domain adaptation. The promise of analyzing self-supervised contrastive learning in Part 2 is also mentioned, indicating a broader exploration of unsupervised representation learning. The clear explanation of self-training's core idea, using a pre-existing classifier to generate pseudo-labels, makes the concept accessible.
Key Takeaways
- •Deep learning models benefit from large datasets, but labeled data is scarce.
- •Self-training leverages unlabeled data by using a pseudo-labeler.
- •This approach can achieve performance approaching fully-supervised learning.
Reference / Citation
View Original"The core idea is to use some pre-existing classifier \(F_{pl}\) (referred to as the “pseudo-labeler”) to make predictions (referred to as “pseudo-labels”) on a large unlabeled dataset, and then retrain a new model with the pseudo-labels."