Search: distractors - ai.jp.net

Research Paper #Computer Vision, Remote Sensing, Visual Question Answering, Reinforcement Learning 🔬 ResearchAnalyzed: Jan 3, 2026 08:54

Improving CDVQA with Decision-Ambiguity-guided Reinforcement Fine-Tuning

Published:Dec 31, 2025 03:28

•

1 min read

•

ArXiv

Analysis

This paper addresses the challenge of decision ambiguity in Change Detection Visual Question Answering (CDVQA), where models struggle to distinguish between the correct answer and strong distractors. The authors propose a novel reinforcement learning framework, DARFT, to specifically address this issue by focusing on Decision-Ambiguous Samples (DAS). This is a valuable contribution because it moves beyond simply improving overall accuracy and targets a specific failure mode, potentially leading to more robust and reliable CDVQA models, especially in few-shot settings.

Key Takeaways

•Addresses the problem of decision ambiguity in CDVQA.
•Proposes DARFT, a reinforcement learning framework to improve discriminability.
•Focuses on Decision-Ambiguous Samples (DAS).
•Demonstrates consistent gains over SFT baselines, especially in few-shot settings.

Reference

“DARFT suppresses strong distractors and sharpens decision boundaries without additional supervision.”

Permalink ArXiv

Research Paper #LLM Agents, Tool Use, Benchmarking 🔬 ResearchAnalyzed: Jan 3, 2026 09:18

MCPAgentBench: Evaluating LLM Agents with Real-World Tools

Published:Dec 31, 2025 02:09

•

1 min read

•

ArXiv

Analysis

This paper addresses the limitations of current LLM agent evaluation methods, specifically focusing on tool use via the Model Context Protocol (MCP). It introduces a new benchmark, MCPAgentBench, designed to overcome issues like reliance on external services and lack of difficulty awareness. The benchmark uses real-world MCP definitions, authentic tasks, and a dynamic sandbox environment with distractors to test tool selection and discrimination abilities. The paper's significance lies in providing a more realistic and challenging evaluation framework for LLM agents, which is crucial for advancing their capabilities in complex, multi-step tool invocations.

Key Takeaways

•Introduces MCPAgentBench, a new benchmark for evaluating LLM agents' tool use.
•Uses real-world MCP definitions and authentic tasks.
•Employs a dynamic sandbox environment with distractors to test tool selection.
•Provides comprehensive metrics for task completion and execution efficiency.
•Open-source code available on Github.

Reference

“The evaluation employs a dynamic sandbox environment that presents agents with candidate tool lists containing distractors, thereby testing their tool selection and discrimination abilities.”

Permalink ArXiv

Improving CDVQA with Decision-Ambiguity-guided Reinforcement Fine-Tuning

Analysis

Key Takeaways

MCPAgentBench: Evaluating LLM Agents with Real-World Tools

Analysis

Key Takeaways

📬 Get AI News Delivered

Browse by Category

Trending Topics

📬 Get AI News Delivered

Browse by Category

Trending Topics