NUS Unveils 'DMax': A Breakthrough Paradigm for Ultra-Fast Diffusion Language Models

research #llm 📝 Blog|Analyzed: Apr 10, 2026 22:19•

Published: Apr 10, 2026 17:23

•

1 min read

Analysis

The National University of Singapore has introduced DMax, an incredibly exciting advancement for diffusion language models (dLLMs) that supercharges parallel decoding. By intelligently reformulating the generation process into a progressive self-refinement mechanism, the model can iteratively correct its own mistakes at the embedding level. This breakthrough achieves a massive leap in tokens per second without sacrificing accuracy, marking a thrilling step toward ultra-efficient inference.

Key Takeaways

•DMax introduces 'Soft Parallel Decoding', allowing AI models to iteratively revise and refine their own outputs in the embedding space.
•The new 'On-Policy Uniform Training' strategy brilliantly unifies masked and uniform dLLMs to help the model recover from its own erroneous predictions.
•This innovative approach delivers massive speedups, achieving an incredible 1,338 tokens per second on just two H200 GPUs while maintaining high accuracy.

Reference / Citation

View Original

"DMax reformulates decoding as a progressive self-refinement from mask embeddings to token embeddings... Extensive experiments across a variety of benchmarks demonstrate the effectiveness of DMax. Compared with the original LLaDA-2.0-mini, our method improves TPF on GSM8K from 2.04 to 5.47 while preserving accuracy."

r/LocalLLaMAApr 10, 2026 17:23

* Cited for critical analysis under Article 32.

Older

From Generative AI to the DJ Booth: An Exciting Journey into AI-Driven Music Performance

Newer

Navigating SaaS Reliability: Exploring OpenAI's Infrastructure and Uptime Monitoring