Granite 4 Small: A Viable Option for Limited VRAM Systems with Large Contexts
Analysis
Key Takeaways
- •Granite 4.0 Small (32B total / 9B activated) maintains ~7 tkps with a 50k token context on a Thinkpad P15 with 8GB VRAM.
- •Offloading MoE experts to CPU frees up VRAM for a larger KV cache, enabling larger context windows.
- •Hybrid transformer-Mamba architecture contributes to sustained performance as context fills.
“due to being a hybrid transformer+mamba model, it stays fast as context fills”