Search: 生成率（每秒 - ai.jp.net

Infrastructure #LLM 👥 CommunityAnalyzed: Jan 10, 2026 16:08

Llama.cpp Achieves Impressive Performance on M2 Max: 40 Tokens/Second, 0% CPU Usage

Published:Jun 4, 2023 17:24

•

1 min read

•

Hacker News

Analysis

This Hacker News article highlights a significant performance achievement for Llama.cpp, demonstrating its efficiency in utilizing GPU resources. The claim of 40 tokens/second with 0% CPU usage suggests efficient offloading and optimization.

Key Takeaways

•Llama.cpp achieves a high token generation rate (40 tok/s) on the M2 Max.
•The process leverages all 38 GPU cores for accelerated computation.
•The efficiency results in 0% CPU utilization, indicating effective offloading to the GPU.

Reference

“Llama.cpp can do 40 tok/s on M2 Max, 0% CPU usage, using all 38 GPU cores”

Permalink Hacker News

Llama.cpp Achieves Impressive Performance on M2 Max: 40 Tokens/Second, 0% CPU Usage

Analysis

Key Takeaways

📬 Get AI News Delivered

Browse by Category

Trending Topics

📬 Get AI News Delivered

Browse by Category

Trending Topics