Research#Transformer Quantization📝 BlogAnalyzed: Dec 29, 2025 07:28

Quantizing Transformers by Helping Attention Heads Do Nothing with Markus Nagel - #663

Published:Dec 26, 2023 20:07
1 min read
Practical AI

Analysis

This article summarizes a podcast episode from Practical AI featuring Markus Nagel, a research scientist at Qualcomm AI Research. The primary focus is on Nagel's research presented at NeurIPS 2023, specifically his paper on quantizing Transformers. The core problem addressed is activation quantization issues within the attention mechanism. The discussion also touches upon a comparison between pruning and quantization for model weight compression. Furthermore, the episode covers other research areas from Qualcomm AI Research, including multitask learning, diffusion models, geometric algebra in transformers, and deductive verification of LLM reasoning. The episode provides a broad overview of cutting-edge AI research.

Reference

Markus’ first paper, Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing, focuses on tackling activation quantization issues introduced by the attention mechanism and how to solve them.