ShowUI-$π$: Flow-based Generative Model for GUI Dexterity
Analysis
This paper introduces ShowUI-$π$, a novel approach to GUI agent control using flow-based generative models. It addresses the limitations of existing agents that rely on discrete click predictions, enabling continuous, closed-loop trajectories like dragging. The work's significance lies in its innovative architecture, the creation of a new benchmark (ScreenDrag), and its demonstration of superior performance compared to existing proprietary agents, highlighting the potential for more human-like interaction in digital environments.
Key Takeaways
- •Proposes ShowUI-$π$, a flow-based generative model for GUI control.
- •Introduces a unified discrete-continuous action space for flexible interaction.
- •Employs flow-based action generation for smooth drag trajectories.
- •Creates ScreenDrag, a new benchmark for evaluating GUI agent drag capabilities.
- •Demonstrates superior performance compared to existing proprietary agents.
“ShowUI-$π$ achieves 26.98 with only 450M parameters, underscoring both the difficulty of the task and the effectiveness of our approach.”