Local Vision LLMs Excel at Reading Image PDFs: Gemma 4 and Qwen 3.5 Showdown

research#llm📝 Blog|Analyzed: Apr 10, 2026 01:01
Published: Apr 9, 2026 22:08
1 min read
Zenn LLM

Analysis

This exciting exploration into local Vision Large Language Models (LLMs) showcases the incredible potential of running advanced AI directly on consumer hardware. Using an NVIDIA RTX 5090, the tests reveal that open-source models like Gemma 4 and Qwen 3.5 can successfully and accurately extract complex financial data from image-based PDFs. The standout performer, Gemma 4:26b, delivers lightning-fast throughput while maintaining incredibly low VRAM usage, making advanced document processing highly accessible!
Reference / Citation
View Original
"gemma4:26b (MoE) is the best practical choice, offering the fastest speed and lowest VRAM usage, successfully completing a 77-page document while maintaining high accuracy."
Z
Zenn LLMApr 9, 2026 22:08
* Cited for critical analysis under Article 32.