LVLMs Struggle with Instruction Following After Fine-tuning
Analysis
This paper addresses a critical issue in the development of Large Vision-Language Models (LVLMs): the degradation of instruction-following capabilities after fine-tuning. It highlights a significant problem where models lose their ability to adhere to instructions, a core functionality of the underlying Large Language Model (LLM). The study's importance lies in its quantitative demonstration of this decline and its investigation into the causes, specifically the impact of output format specification during fine-tuning. This research provides valuable insights for improving LVLM training methodologies.
Key Takeaways
- •LVLMs often lose instruction-following ability after fine-tuning with common datasets.
- •Specifying output format during fine-tuning improves instruction following.
- •Including output format instructions in training data can mitigate the decline in instruction-following abilities.
“LVLMs trained with datasets, including instructions on output format, tend to follow instructions more accurately than models that do not.”