Search: 基准测试中取得了最先进的性能。 - ai.jp.net

Paper #llm 🔬 ResearchAnalyzed: Jan 3, 2026 08:52

Youtu-Agent: Automated Agent Generation and Hybrid Policy Optimization

Published:Dec 31, 2025 04:17

•

1 min read

•

ArXiv

Analysis

This paper introduces Youtu-Agent, a modular framework designed to address the challenges of LLM agent configuration and adaptability. It tackles the high costs of manual tool integration and prompt engineering by automating agent generation. Furthermore, it improves agent adaptability through a hybrid policy optimization system, including in-context optimization and reinforcement learning. The results demonstrate state-of-the-art performance and significant improvements in tool synthesis, performance on specific benchmarks, and training speed.

Key Takeaways

•Youtu-Agent automates agent generation, reducing manual effort in tool integration and prompt engineering.
•The framework uses a hybrid policy optimization system, including in-context optimization and reinforcement learning, to improve agent adaptability.
•Experiments show state-of-the-art performance on WebWalkerQA and GAIA benchmarks.
•The automated generation pipeline achieves a high tool synthesis success rate.
•The Agent Practice module improves performance on AIME benchmarks.
•Agent RL training achieves significant speedup and performance improvements on coding/reasoning and searching tasks.

Reference

“Experiments demonstrate that Youtu-Agent achieves state-of-the-art performance on WebWalkerQA (71.47%) and GAIA (72.8%) using open-weight models.”

Permalink ArXiv

Research Paper #Vision-Language Models, Agentic Reasoning, Reinforcement Learning 🔬 ResearchAnalyzed: Jan 3, 2026 15:38

SenseNova-MARS: Agentic Reasoning with Tools via RL

Published:Dec 30, 2025 16:31

•

1 min read

•

ArXiv

Analysis

This paper introduces SenseNova-MARS, a novel framework that enhances Vision-Language Models (VLMs) with agentic reasoning and tool use capabilities, specifically focusing on integrating search and image manipulation tools. The use of reinforcement learning (RL) and the introduction of the HR-MMSearch benchmark are key contributions. The paper claims state-of-the-art performance, surpassing even proprietary models on certain benchmarks, which is significant. The release of code, models, and datasets further promotes reproducibility and research in this area.

Key Takeaways

•SenseNova-MARS is a novel framework for agentic VLMs.
•It uses RL to integrate visual reasoning and tool use (search, image crop).
•Introduces the HR-MMSearch benchmark.
•Achieves state-of-the-art performance, surpassing proprietary models.
•Code, models, and datasets will be released.

Reference

“SenseNova-MARS achieves state-of-the-art performance on open-source search and fine-grained image understanding benchmarks. Specifically, on search-oriented benchmarks, SenseNova-MARS-8B scores 67.84 on MMSearch and 41.64 on HR-MMSearch, surpassing proprietary models such as Gemini-3-Flash and GPT-5.”

Permalink ArXiv

Paper #Video Understanding, Vision-Language Models, Scene Segmentation 🔬 ResearchAnalyzed: Jan 4, 2026 00:06

Scene-VLM: Video Scene Segmentation with Vision-Language Models

Published:Dec 25, 2025 20:31

•

1 min read

•

ArXiv

Analysis

This paper introduces Scene-VLM, a novel approach to video scene segmentation using fine-tuned vision-language models. It addresses limitations of existing methods by incorporating multimodal cues (frames, transcriptions, metadata), enabling sequential reasoning, and providing explainability. The model's ability to generate natural-language rationales and achieve state-of-the-art performance on benchmarks highlights its significance.

Key Takeaways

•Scene-VLM is the first fine-tuned vision-language model for video scene segmentation.
•It leverages multimodal cues (frames, transcriptions, metadata) for improved scene understanding.
•The model enables sequential reasoning and provides explainability through natural language rationales.
•Scene-VLM achieves state-of-the-art performance on standard scene segmentation benchmarks.

Reference

“Scene-VLM yields significant improvements of +6 AP and +13.7 F1 over the previous leading method on MovieNet.”

Permalink ArXiv

Youtu-Agent: Automated Agent Generation and Hybrid Policy Optimization

Analysis

Key Takeaways

SenseNova-MARS: Agentic Reasoning with Tools via RL

Analysis

Key Takeaways

Scene-VLM: Video Scene Segmentation with Vision-Language Models

Analysis

Key Takeaways

📬 Get AI News Delivered

Browse by Category

Trending Topics

📬 Get AI News Delivered

Browse by Category

Trending Topics