Widget2Code: From Visual Widgets to UI Code via Multimodal LLMs
Published:Dec 24, 2025 05:00
•1 min read
•ArXiv Vision
Analysis
This paper introduces Widget2Code, a novel approach to generating UI code from visual widgets using multimodal large language models (MLLMs). It addresses the underexplored area of widget-to-code conversion, highlighting the challenges posed by the compact and context-free nature of widgets compared to web or mobile UIs. The paper presents an image-only widget benchmark and evaluates the performance of generalized MLLMs, revealing their limitations in producing reliable and visually consistent code. To overcome these limitations, the authors propose a baseline that combines perceptual understanding and structured code generation, incorporating widget design principles and a framework-agnostic domain-specific language (WidgetDSL). The introduction of WidgetFactory, an end-to-end infrastructure, further enhances the practicality of the approach.
Key Takeaways
- •Introduces Widget2Code for generating UI code from visual widgets.
- •Highlights the challenges of widget-to-code conversion due to the nature of widgets.
- •Proposes a baseline combining perceptual understanding and structured code generation.
Reference
“widgets are compact, context-free micro-interfaces that summarize key information through dense layouts and iconography under strict spatial constraints.”