Strix Halo Llama-bench Results (GLM-4.5-Air)

Research #llm 📝 Blog|Analyzed: Dec 27, 2025 08:31•

Published: Dec 27, 2025 05:16

•

1 min read

Analysis

This post on r/LocalLLaMA shares benchmark results for the GLM-4.5-Air model running on a Strix Halo (EVO-X2) system with 128GB of RAM. The user is seeking to optimize their setup and is requesting comparisons from others. The benchmarks include various configurations of the GLM4moe 106B model with Q4_K quantization, using ROCm 7.10. The data presented includes model size, parameters, backend, number of GPU layers (ngl), threads, n_ubatch, type_k, type_v, fa, mmap, test type, and tokens per second (t/s). The user is specifically interested in optimizing for use with Cline.

Key Takeaways

•Strix Halo performance with GLM-4.5-Air is being benchmarked.
•The user is seeking optimization advice and comparative data.
•ROCm 7.10 is used as the backend for the benchmarks.

Reference / Citation

View Original

"Looking for anyone who has some benchmarks they would like to share. I am trying to optimize my EVO-X2 (Strix Halo) 128GB box using GLM-4.5-Air for use with Cline."

r/LocalLLaMADec 27, 2025 05:16

* Cited for critical analysis under Article 32.

Older

AI Dinner Party Pretension Guide: Become an Industry Leader in 3 Minutes

Newer

Huawei to Launch Ascend 950 Chip and HarmonyOS in South Korea Next Year

Related Analysis

Research

Strix Halo Llama-bench Results (GLM-4.5-Air)

Analysis

Key Takeaways

Related Analysis

Human AI Detection

Deep Learning Book Implementation Focus

Personalizing Gemini

📬 Get AI News Delivered

Browse by Category

Trending Topics

📬 Get AI News Delivered

Browse by Category

Trending Topics