RSAgent: Agentic MLLM for Text-Guided Segmentation
Published:Dec 30, 2025 06:50
•1 min read
•ArXiv
Analysis
This paper introduces RSAgent, an agentic MLLM designed to improve text-guided object segmentation. The key innovation is the multi-turn approach, allowing for iterative refinement of segmentation masks through tool invocations and feedback. This addresses limitations of one-shot methods by enabling verification, refocusing, and refinement. The paper's significance lies in its novel agent-based approach to a challenging computer vision task, demonstrating state-of-the-art performance on multiple benchmarks.
Key Takeaways
- •RSAgent uses an agentic MLLM for text-guided segmentation.
- •It employs a multi-turn approach with tool invocations and feedback for iterative refinement.
- •The method addresses limitations of one-shot segmentation approaches.
- •RSAgent achieves state-of-the-art performance on multiple benchmarks.
Reference
“RSAgent achieves a zero-shot performance of 66.5% gIoU on ReasonSeg test, improving over Seg-Zero-7B by 9%, and reaches 81.5% cIoU on RefCOCOg, demonstrating state-of-the-art performance.”