Understanding Deep Learning Algorithms that Leverage Unlabeled Data, Part 1: Self-training

Research#llm🔬 Research|Analyzed: Dec 25, 2025 12:34
Published: Feb 24, 2022 08:00
1 min read
Stanford AI

Analysis

This article from Stanford AI introduces a series on leveraging unlabeled data in deep learning, focusing on self-training. It highlights the challenge of obtaining labeled data and the potential of using readily available unlabeled data to approach fully-supervised performance. The article sets the stage for a theoretical analysis of self-training, a significant paradigm in semi-supervised learning and domain adaptation. The promise of analyzing self-supervised contrastive learning in Part 2 is also mentioned, indicating a broader exploration of unsupervised representation learning. The clear explanation of self-training's core idea, using a pre-existing classifier to generate pseudo-labels, makes the concept accessible.
Reference / Citation
View Original
"The core idea is to use some pre-existing classifier \(F_{pl}\) (referred to as the “pseudo-labeler”) to make predictions (referred to as “pseudo-labels”) on a large unlabeled dataset, and then retrain a new model with the pseudo-labels."
S
Stanford AIFeb 24, 2022 08:00
* Cited for critical analysis under Article 32.