Search: トランスフォーマーにおけるベイジアン的振る舞いは、創発的な特性ではなく、目的関数の幾何学的構造の結果です。 - ai.jp.net

Research Paper #Neural Networks, Optimization, Bayesian Inference 🔬 ResearchAnalyzed: Jan 3, 2026 06:26

Gradient Descent as Implicit EM in Distance-Based Neural Models

Published:Dec 31, 2025 10:56

•

1 min read

•

ArXiv

Analysis

This paper provides a direct mathematical derivation showing that gradient descent on objectives with log-sum-exp structure over distances or energies implicitly performs Expectation-Maximization (EM). This unifies various learning regimes, including unsupervised mixture modeling, attention mechanisms, and cross-entropy classification, under a single mechanism. The key contribution is the algebraic identity that the gradient with respect to each distance is the negative posterior responsibility. This offers a new perspective on understanding the Bayesian behavior observed in neural networks, suggesting it's a consequence of the objective function's geometry rather than an emergent property.

Key Takeaways

•Gradient descent on distance/energy-based objectives implicitly performs EM.
•This unifies unsupervised learning, attention, and classification under a single mechanism.
•Bayesian behavior in transformers is a consequence of objective geometry, not an emergent property.
•Optimization and inference are the same process in these models.

Reference

“For any objective with log-sum-exp structure over distances or energies, the gradient with respect to each distance is exactly the negative posterior responsibility of the corresponding component: $\partial L / \partial d_j = -r_j$.”

Permalink ArXiv

Gradient Descent as Implicit EM in Distance-Based Neural Models

Analysis

Key Takeaways

📬 Get AI News Delivered

Browse by Category

Trending Topics

📬 Get AI News Delivered

Browse by Category

Trending Topics