Blazing-Fast LLM Inference on a Raspberry Pi: Qwen 3.5 Shows Impressive Performance
infrastructure#llm📝 Blog|Analyzed: Mar 12, 2026 13:47•
Published: Mar 12, 2026 12:56
•1 min read
•r/LocalLLaMAAnalysis
This is exciting news for anyone interested in running Generative AI models locally! The progress in optimizing Large Language Model inference on resource-constrained devices like the Raspberry Pi 5 is impressive. The reported performance numbers for Qwen 3.5 are truly promising.
Key Takeaways
- •Qwen 3.5 is achieving impressive token-per-second (t/s) speeds on a Raspberry Pi 5.
- •Inference speeds vary based on the model size and the Raspberry Pi configuration.
- •The project is actively being optimized with prompt caching and other tweaks, suggesting further performance gains.
Reference / Citation
View Original"2-bit big-ish quants of Qwen3.5 35B A3B: 3.5 t/s on the 16GB Pi, 2.5-ish t/s on the SSD-enabled 8GB Pi."