Search:
Match:
1 results
Research#llm🏛️ OfficialAnalyzed: Dec 28, 2025 21:57

Synthetic Bootstrapped Pretraining

Published:Dec 16, 2025 00:00
1 min read
Apple ML

Analysis

This article introduces Synthetic Bootstrapped Pretraining (SBP), a novel language model pretraining method developed by Apple ML. SBP aims to improve language model performance by modeling inter-document correlations, which are often overlooked in standard pretraining approaches. The core idea is to first learn a model of relationships between documents and then use it to generate a larger synthetic corpus for joint training. This approach is designed to capture richer, more complex relationships within the data, potentially leading to more effective language models. The article highlights the potential of SBP to improve model performance by leveraging inter-document relationships.
Reference

While the standard pretraining teaches LMs to learn causal correlations among tokens within a single document, it is not designed to efficiently model the rich, learnable inter-document correlations that can potentially lead to better performance.