Llama.cpp Achieves Impressive Performance on M2 Max: 40 Tokens/Second, 0% CPU Usage

Infrastructure #LLM 👥 Community|Analyzed: Jan 10, 2026 16:08•

Published: Jun 4, 2023 17:24

•

1 min read

Analysis

This Hacker News article highlights a significant performance achievement for Llama.cpp, demonstrating its efficiency in utilizing GPU resources. The claim of 40 tokens/second with 0% CPU usage suggests efficient offloading and optimization.

Key Takeaways

•Llama.cpp achieves a high token generation rate (40 tok/s) on the M2 Max.
•The process leverages all 38 GPU cores for accelerated computation.
•The efficiency results in 0% CPU utilization, indicating effective offloading to the GPU.

Reference / Citation

View Original

"Llama.cpp can do 40 tok/s on M2 Max, 0% CPU usage, using all 38 GPU cores"

Hacker NewsJun 4, 2023 17:24

* Cited for critical analysis under Article 32.

Older

Open-Source Platform for LLM Fine-Tuning and RLHF Data Collection

Newer

Accelerating Neural Networks: CUDA/HIP Code Generation

Related Analysis

Infrastructure

China Launches Nationwide Distributed AI Computing Network

Dec 27, 2025 15:32

Infrastructure

Why high-speed rail may not work the best in the U.S.

Dec 28, 2025 21:57

Infrastructure

Introducing Stargate Norway

Jan 3, 2026 09:36

Source: Hacker News

Llama.cpp Achieves Impressive Performance on M2 Max: 40 Tokens/Second, 0% CPU Usage

Analysis

Key Takeaways

Related Analysis

China Launches Nationwide Distributed AI Computing Network

Why high-speed rail may not work the best in the U.S.

Introducing Stargate Norway

📬 Get AI News Delivered

Browse by Category

Trending Topics

📬 Get AI News Delivered

Browse by Category

Trending Topics