Search:
Match:
2 results
Paper#llm🔬 ResearchAnalyzed: Jan 3, 2026 17:02

OptRot: Data-Free Rotations Improve LLM Quantization

Published:Dec 30, 2025 10:13
1 min read
ArXiv

Analysis

This paper addresses the challenge of quantizing Large Language Models (LLMs) by introducing a novel method, OptRot, that uses data-free rotations to mitigate weight outliers. This is significant because weight outliers hinder quantization, and efficient quantization is crucial for deploying LLMs on resource-constrained devices. The paper's focus on a data-free approach is particularly noteworthy, as it reduces computational overhead compared to data-dependent methods. The results demonstrate that OptRot outperforms existing methods like Hadamard rotations and more complex data-dependent techniques, especially for weight quantization. The exploration of both data-free and data-dependent variants (OptRot+) provides a nuanced understanding of the trade-offs involved in optimizing for both weight and activation quantization.
Reference

OptRot outperforms both Hadamard rotations and more expensive, data-dependent methods like SpinQuant and OSTQuant for weight quantization.

Paper#llm🔬 ResearchAnalyzed: Jan 3, 2026 16:07

Quantization for Efficient OpenPangu Deployment on Atlas A2

Published:Dec 29, 2025 10:50
1 min read
ArXiv

Analysis

This paper addresses the computational challenges of deploying large language models (LLMs) like openPangu on Ascend NPUs by using low-bit quantization. It focuses on optimizing for the Atlas A2, a specific hardware platform. The research is significant because it explores methods to reduce memory and latency overheads associated with LLMs, particularly those with complex reasoning capabilities (Chain-of-Thought). The paper's value lies in demonstrating the effectiveness of INT8 and W4A8 quantization in preserving accuracy while improving performance on code generation tasks.
Reference

INT8 quantization consistently preserves over 90% of the FP16 baseline accuracy and achieves a 1.5x prefill speedup on the Atlas A2.