iSHIFT: Lightweight GUI Agent with Adaptive Perception

Published:Dec 26, 2025 12:09
1 min read
ArXiv

Analysis

This paper introduces iSHIFT, a novel lightweight GUI agent designed for efficient and precise interaction with graphical user interfaces. The core contribution lies in its slow-fast hybrid inference approach, allowing the agent to switch between detailed visual grounding for accuracy and global cues for efficiency. The use of perception tokens to guide attention and the agent's ability to adapt reasoning depth are also significant. The paper's claim of achieving state-of-the-art performance with a compact 2.5B model is particularly noteworthy, suggesting potential for resource-efficient GUI agents.

Reference

iSHIFT matches state-of-the-art performance on multiple benchmark datasets.