Search:
Match:
3 results

DeepSeek's mHC: Improving Residual Connections

Published:Jan 2, 2026 15:44
1 min read
r/LocalLLaMA

Analysis

The article highlights DeepSeek's innovation in addressing the limitations of the standard residual connection in deep learning models. By introducing Manifold-Constrained Hyper-Connections (mHC), DeepSeek tackles the instability issues associated with previous attempts to make residual connections more flexible. The core of their solution lies in constraining the learnable matrices to be double stochastic, ensuring signal stability and preventing gradient explosion. The results demonstrate significant improvements in stability and performance compared to baseline models.
Reference

DeepSeek solved the instability by constraining the learnable matrices to be "Double Stochastic" (all elements ≧ 0, rows/cols sum to 1). Mathematically, this forces the operation to act as a weighted average (convex combination). It guarantees that signals are never amplified beyond control, regardless of network depth.

DeepSeek's mHC: Improving the Untouchable Backbone of Deep Learning

Published:Jan 2, 2026 15:40
1 min read
r/singularity

Analysis

The article highlights DeepSeek's innovation in addressing the limitations of residual connections in deep learning models. By introducing Manifold-Constrained Hyper-Connections (mHC), they've tackled the instability issues associated with flexible information routing, leading to significant improvements in stability and performance. The core of their solution lies in constraining the learnable matrices to be double stochastic, ensuring signals are not amplified uncontrollably. This represents a notable advancement in model architecture.
Reference

DeepSeek solved the instability by constraining the learnable matrices to be "Double Stochastic" (all elements ≧ 0, rows/cols sum to 1).

Analysis

This paper addresses a key limitation in iterative refinement methods for diffusion models, specifically the instability caused by Classifier-Free Guidance (CFG). The authors identify that CFG's extrapolation pushes the sampling path off the data manifold, leading to error divergence. They propose Guided Path Sampling (GPS) as a solution, which uses manifold-constrained interpolation to maintain path stability. This is a significant contribution because it provides a more robust and effective approach to improving the quality and control of diffusion models, particularly in complex scenarios.
Reference

GPS replaces unstable extrapolation with a principled, manifold-constrained interpolation, ensuring the sampling path remains on the data manifold.