DeepSeek's mHC: Improving Residual Connections

Published:Jan 2, 2026 15:44
1 min read
r/LocalLLaMA

Analysis

The article highlights DeepSeek's innovation in addressing the limitations of the standard residual connection in deep learning models. By introducing Manifold-Constrained Hyper-Connections (mHC), DeepSeek tackles the instability issues associated with previous attempts to make residual connections more flexible. The core of their solution lies in constraining the learnable matrices to be double stochastic, ensuring signal stability and preventing gradient explosion. The results demonstrate significant improvements in stability and performance compared to baseline models.

Reference

DeepSeek solved the instability by constraining the learnable matrices to be "Double Stochastic" (all elements ≧ 0, rows/cols sum to 1). Mathematically, this forces the operation to act as a weighted average (convex combination). It guarantees that signals are never amplified beyond control, regardless of network depth.