MLLMs Struggle with Vertical Japanese Text: New Research Reveals Performance Gaps

Research #MLLMs 🔬 Research|Analyzed: Jan 26, 2026 11:43•

Published: Nov 19, 2025 03:04

•

1 min read

Analysis

This research highlights a critical challenge for Multimodal Large Language Models (MLLMs) in processing Japanese documents: the models' underperformance on vertically written text. The study demonstrates the need for specialized training data to improve MLLMs' ability to understand this common form of Japanese writing.

Key Takeaways

•MLLMs show reduced accuracy on vertically written Japanese text compared to horizontal text.
•A synthetic Japanese OCR dataset was created for both fine-tuning and evaluation.
•Training with the synthesized dataset improves performance on vertical writing.

Reference / Citation

View Original

"Using these datasets, we demonstrate that the existing MLLMs perform worse on vertically written Japanese text than on horizontally written Japanese text."

ArXivNov 19, 2025 03:04

* Cited for critical analysis under Article 32.

Older

On Decision-Making Agents and Higher-Order Causal Processes

Newer

Evaluating Multimodal Large Language Models on Vertically Written Japanese Text