Proactive Web Agents with Devi Parikh
Published:Nov 19, 2025 01:49
•1 min read
•Practical AI
Analysis
This article discusses the future of web interaction through proactive, autonomous agents, focusing on the work of Yutori. It highlights the technical challenges of building reliable web agents, particularly the advantages of visually-grounded models over DOM-based approaches. The article also touches upon Yutori's training methods, including rejection sampling and reinforcement learning, and how their "Scouts" agents orchestrate multiple tools for complex tasks. The importance of background operation and the progression from simple monitoring to full automation are also key takeaways.
Key Takeaways
- •Visually-grounded models are more robust for web agent interaction than DOM-based models.
- •Yutori uses rejection sampling and reinforcement learning in their training pipeline.
- •"Scouts" agents orchestrate multiple tools and sub-agents for complex web tasks.
Reference
“We explore the technical challenges of creating reliable web agents, the advantages of visually-grounded models that operate on screenshots rather than the browser’s more brittle document object model, or DOM, and why this counterintuitive choice has proven far more robust and generalizable for handling complex web interfaces.”