LVLMs Struggle with Instruction Following After Fine-tuning

Published:Dec 29, 2025 16:12
1 min read
ArXiv

Analysis

This paper addresses a critical issue in the development of Large Vision-Language Models (LVLMs): the degradation of instruction-following capabilities after fine-tuning. It highlights a significant problem where models lose their ability to adhere to instructions, a core functionality of the underlying Large Language Model (LLM). The study's importance lies in its quantitative demonstration of this decline and its investigation into the causes, specifically the impact of output format specification during fine-tuning. This research provides valuable insights for improving LVLM training methodologies.

Reference

LVLMs trained with datasets, including instructions on output format, tend to follow instructions more accurately than models that do not.