Self-Rewarded Multimodal Reasoning Improves LLM Coherence
Analysis
This paper addresses the critical issue of reasoning coherence in Multimodal LLMs (MLLMs). Existing methods often focus on final answer accuracy, neglecting the reliability of the reasoning process. SR-MCR offers a novel, label-free approach using self-referential cues to guide the reasoning process, leading to improved accuracy and coherence. The use of a critic-free GRPO objective and a confidence-aware cooling mechanism further enhances the training stability and performance. The results demonstrate state-of-the-art performance on visual benchmarks.
Key Takeaways
- •SR-MCR is a novel, label-free framework for aligning reasoning in MLLMs.
- •It uses self-referential cues to provide fine-grained process-level guidance.
- •The approach improves both answer accuracy and reasoning coherence.
- •SR-MCR-7B achieves state-of-the-art performance on visual benchmarks.
“SR-MCR improves both answer accuracy and reasoning coherence across a broad set of visual benchmarks; among open-source models of comparable size, SR-MCR-7B achieves state-of-the-art performance with an average accuracy of 81.4%.”