LVLMs Struggle with Instruction Following After Fine-tuning

Research Paper#Large Vision-Language Models (LVLMs), Instruction Following, Fine-tuning🔬 Research|Analyzed: Jan 3, 2026 18:39
Published: Dec 29, 2025 16:12
1 min read
ArXiv

Analysis

This paper addresses a critical issue in the development of Large Vision-Language Models (LVLMs): the degradation of instruction-following capabilities after fine-tuning. It highlights a significant problem where models lose their ability to adhere to instructions, a core functionality of the underlying Large Language Model (LLM). The study's importance lies in its quantitative demonstration of this decline and its investigation into the causes, specifically the impact of output format specification during fine-tuning. This research provides valuable insights for improving LVLM training methodologies.
Reference / Citation
View Original
"LVLMs trained with datasets, including instructions on output format, tend to follow instructions more accurately than models that do not."
A
ArXivDec 29, 2025 16:12
* Cited for critical analysis under Article 32.