RSAgent: Agentic MLLM for Text-Guided Segmentation

Paper#MLLM, Computer Vision, Segmentation🔬 Research|Analyzed: Jan 3, 2026 17:05
Published: Dec 30, 2025 06:50
1 min read
ArXiv

Analysis

This paper introduces RSAgent, an agentic MLLM designed to improve text-guided object segmentation. The key innovation is the multi-turn approach, allowing for iterative refinement of segmentation masks through tool invocations and feedback. This addresses limitations of one-shot methods by enabling verification, refocusing, and refinement. The paper's significance lies in its novel agent-based approach to a challenging computer vision task, demonstrating state-of-the-art performance on multiple benchmarks.
Reference / Citation
View Original
"RSAgent achieves a zero-shot performance of 66.5% gIoU on ReasonSeg test, improving over Seg-Zero-7B by 9%, and reaches 81.5% cIoU on RefCOCOg, demonstrating state-of-the-art performance."
A
ArXivDec 30, 2025 06:50
* Cited for critical analysis under Article 32.