Search:
Match:
1 results
Infrastructure#LLM👥 CommunityAnalyzed: Jan 10, 2026 16:08

Llama.cpp Achieves Impressive Performance on M2 Max: 40 Tokens/Second, 0% CPU Usage

Published:Jun 4, 2023 17:24
1 min read
Hacker News

Analysis

This Hacker News article highlights a significant performance achievement for Llama.cpp, demonstrating its efficiency in utilizing GPU resources. The claim of 40 tokens/second with 0% CPU usage suggests efficient offloading and optimization.
Reference

Llama.cpp can do 40 tok/s on M2 Max, 0% CPU usage, using all 38 GPU cores