DeepSeek AI's Engram: A Novel Memory Axis for Sparse LLMs
Analysis
Key Takeaways
“DeepSeek’s new Engram module targets exactly this gap by adding a conditional memory axis that works alongside MoE rather than replacing it.”
“DeepSeek’s new Engram module targets exactly this gap by adding a conditional memory axis that works alongside MoE rather than replacing it.”
“Claude Desktop and other AI agents use MCP (Model Context Protocol) to connect with external services.”
“インタラクティブなヒートマップ、コロプレスマ...”
“N/A - Insufficient information to provide a quote.”
“GaMO expands the field of view from existing camera poses, which inherently preserves geometric consistency while providing broader scene coverage.”
“DMSAEs run an iterative distillation cycle: train a Matryoshka SAE with a shared core, use gradient X activation to measure each feature's contribution to next-token loss in the most nested reconstruction, and keep only the smallest subset that explains a fixed fraction of the attribution.”
“The main new input is a hypersingular sparse domination principle combined with Bourgain's interpolation method, which provides a flexible mechanism to establish critical-line (and endpoint) estimates.”
“From the measured image- and pupil plane correlations, we observe position and momentum correlations consistent with an EPR-type entanglement witness.”
“The silhouette score accurately identifies the true number of communities when clusters are well separated and balanced, but it tends to underestimate under strong imbalance or weak separation and to overestimate in sparse networks.”
“The paper demonstrates a proof-of-concept generative surrogate for reconstructing coherent turbulent dynamics between sparse snapshots.”
“Models that anticoncentrate are not trainable on average.”
“The paper provides the first non-vacuous guarantees in high-dimensional sparse MDPs with single-policy concentrability coverage and corruption, showing that learning a near-optimal policy remains possible in regimes where traditional robust offline RL techniques may fail.”
“The proposed framework achieves superior performance over conventional sensing methods with reduced sensing power.”
“The authors obtain accurate ground-state energies for lattices up to 80 x 80 (6400 spins) and train deep Boltzmann machines for a system with 35 x 35 (1225 spins).”
“The paper proposes a novel sparse-penalization framework for high-dimensional Pconf classification.”
“The paper focuses on the real eigenvalues of the non-backtracking matrix and their relation to the non-backtracking Laplacian for node clustering.”
“The method 'combines vision-based frame processing with systematic state-space exploration using graph-structured representations.'”
“The novelty of this work is two-fold: extending the catalogue of known optimal RMRAs and formulating a sub-optimal RMRA that abides by CFEs.”
“RainFusion2.0 can achieve 80% sparsity while achieving an end-to-end speedup of 1.5~1.8x without compromising video quality.”
“The paper derives least squares estimators for the drift, diffusion, and jump-diffusion coefficients and establishes their asymptotic rate of convergence.”
“Targeted interventions on SAE-derived vectors can controllably amplify or suppress specific reasoning behaviors, altering inference trajectories without retraining.”
“LoZA can achieve significant speed-ups both for prefill-intensive (e.g., retrieval-augmented generation) and decode-intensive (e.g., tool-integrated reasoning) cases.”
“SPM layers implement a global linear transformation in $O(nL)$ time with $O(nL)$ parameters, where $L$ is typically constant or $log_2n$.”
“The PMTC model simultaneously leverages a characteristics tensor and a return matrix to identify latent asset groups.”
“Our method uses geometry-driven path augmentation, guided by the geometry in the system's invariant density to reconstruct likely trajectories and infer the underlying dynamics without assuming specific parametric models.”
“The HiR framework employs a select-then-rewrite strategy to replay failed attempts as successes based on the constraints that have been satisfied in hindsight.”
“DSC models the weight update as a residual trajectory within a Star-Shaped Domain, employing a Magnitude-Gated Simplex Interpolation to ensure continuity at the identity.”
“The paper introduces the new flexible class of intrinsic Whittle--Matérn Gaussian random fields obtained as the solution to a stochastic partial differential equation (SPDE).”
“The paper highlights a clustering effect in bank locations, especially at small scales, and uses socio-economic data to model the intensity function.”
“YOLO-Master achieves 42.4% AP with 1.62ms latency, outperforming YOLOv13-N by +0.8% mAP and 17.8% faster inference.”
“The method achieves up to 99.6% safety rate--exceeding full fine-tuning by 7.4 percentage points and approaching RLHF-based methods--while updating only 0.19-0.24% of parameters.”
“”
“Contrastive Learning (CL) induces a more robust feature space for sparse geometry, achieving superior retrieval performance particularly in the 5--10m range.”
“DCEN consistently outperforms state-of-the-art methods in sparse signal recovery, high-dimensional variable selection under strong collinearity, and Magnetic Resonance Imaging (MRI) image reconstruction, achieving superior recovery accuracy and robustness.”
“The Dense Gradient admits a closed-form logit-level formula, enabling efficient GPU implementation.”
“MUSIC accurately learns solutions to complex coupled systems under data-scarce and noisy conditions, consistently outperforming non-sparse formulations.”
“PI-MFM consistently outperforms purely data-driven counterparts, especially with sparse labeled spatiotemporal points, partially observed time domains, or few labeled function pairs.”
“Even advanced search-augmented models like GPT-5.1 (w/ Search) achieve only 15.24% accuracy.”
“The method just relies on a sparse-matrix vector product where only vectors change on time.”
“The residual PINN with sinusoidal activations achieves the highest accuracy for both interpolation and extrapolation of RIRs.”
“Our method achieves an average bitrate reduction of 8% compared to the baseline approach.”
“SCaR-3D, a novel 3D scene change detection framework that identifies object-level changes from a dense-view pre-change image sequence and sparse-view post-change images.”
“It exposes the exact moment a network switches from memorization to generalization ("grokking") by monitoring the geometric arrangement of embeddings in real-time.”
“TEXT achieves the best performance cross four datasets among all tested models, including three recently proposed approaches and three MLLMs.”
“The core idea is to exploit encoded pilots (EP), enabling the use of both pilot and parity bits to iteratively refine channel estimates.”
“The method achieves superior reconstruction quality and faster processing compared to other algorithms.”
“”
“The Sparse Differential Transformer (SDT) is proposed to eliminate noise and enhance the model's anti-noise capabilities.”
“The paper uncovers a topological phase transition--driven purely by the finite connectivity structure of the network--that leads to multi-stability.”
“The effectiveness of sparse subnetworks depends more on how much sparsity is applied in each layer than on the exact weights included in the subnetwork.”
Daily digest of the most important AI developments
No spam. Unsubscribe anytime.
Support free AI news
Support Us