Llama.cpp Achieves Impressive Performance on M2 Max: 40 Tokens/Second, 0% CPU Usage

Infrastructure#LLM👥 Community|Analyzed: Jan 10, 2026 16:08
Published: Jun 4, 2023 17:24
1 min read
Hacker News

Analysis

This Hacker News article highlights a significant performance achievement for Llama.cpp, demonstrating its efficiency in utilizing GPU resources. The claim of 40 tokens/second with 0% CPU usage suggests efficient offloading and optimization.
Reference / Citation
View Original
"Llama.cpp can do 40 tok/s on M2 Max, 0% CPU usage, using all 38 GPU cores"
H
Hacker NewsJun 4, 2023 17:24
* Cited for critical analysis under Article 32.