Research Paper#Vision-Language Models, Agentic Reasoning, Reinforcement Learning🔬 ResearchAnalyzed: Jan 3, 2026 15:38
SenseNova-MARS: Agentic Reasoning with Tools via RL
Published:Dec 30, 2025 16:31
•1 min read
•ArXiv
Analysis
This paper introduces SenseNova-MARS, a novel framework that enhances Vision-Language Models (VLMs) with agentic reasoning and tool use capabilities, specifically focusing on integrating search and image manipulation tools. The use of reinforcement learning (RL) and the introduction of the HR-MMSearch benchmark are key contributions. The paper claims state-of-the-art performance, surpassing even proprietary models on certain benchmarks, which is significant. The release of code, models, and datasets further promotes reproducibility and research in this area.
Key Takeaways
- •SenseNova-MARS is a novel framework for agentic VLMs.
- •It uses RL to integrate visual reasoning and tool use (search, image crop).
- •Introduces the HR-MMSearch benchmark.
- •Achieves state-of-the-art performance, surpassing proprietary models.
- •Code, models, and datasets will be released.
Reference
“SenseNova-MARS achieves state-of-the-art performance on open-source search and fine-grained image understanding benchmarks. Specifically, on search-oriented benchmarks, SenseNova-MARS-8B scores 67.84 on MMSearch and 41.64 on HR-MMSearch, surpassing proprietary models such as Gemini-3-Flash and GPT-5.”