MLLMs Struggle with Vertical Japanese Text: New Research Reveals Performance Gaps
Research#MLLMs🔬 Research|Analyzed: Jan 26, 2026 11:43•
Published: Nov 19, 2025 03:04
•1 min read
•ArXivAnalysis
This research highlights a critical challenge for Multimodal Large Language Models (MLLMs) in processing Japanese documents: the models' underperformance on vertically written text. The study demonstrates the need for specialized training data to improve MLLMs' ability to understand this common form of Japanese writing.
Key Takeaways
- •MLLMs show reduced accuracy on vertically written Japanese text compared to horizontal text.
- •A synthetic Japanese OCR dataset was created for both fine-tuning and evaluation.
- •Training with the synthesized dataset improves performance on vertical writing.
Reference / Citation
View Original"Using these datasets, we demonstrate that the existing MLLMs perform worse on vertically written Japanese text than on horizontally written Japanese text."