Bayesian Transformers for Population Intelligence
Analysis
Key Takeaways
- •Proposes Population Bayesian Transformers (B-Trans) to create a distribution over model behaviors from a single pre-trained LLM.
- •Uses a Gaussian variational approximation on normalization layer biases to induce stochasticity without full Bayesian training.
- •Freezes sampled noise at the sequence level to maintain temporal consistency.
- •Demonstrates improved performance in zero-shot generation and Reinforcement Learning tasks by aggregating predictions from multiple model instances.
“B-Trans effectively leverage the wisdom of crowds, yielding superior semantic diversity while achieving better task performance compared to deterministic baselines.”