Research#llm📝 BlogAnalyzed: Dec 28, 2025 23:00

Semantic Image Disassembler (SID): A VLM-Based Tool for Image Manipulation

Published:Dec 28, 2025 22:20
1 min read
r/StableDiffusion

Analysis

The Semantic Image Disassembler (SID) is presented as a versatile tool leveraging Vision Language Models (VLMs) for image manipulation tasks. Its core functionality revolves around disassembling images into semantic components, separating content (wireframe/skeleton) from style (visual physics). This structured approach, using JSON for analysis, enables various processing modes without redundant re-interpretation. The tool supports both image and text inputs, offering functionalities like style DNA extraction, full prompt extraction, and de-summarization. Its model-agnostic design, tested with Qwen3-VL and Gemma 3, enhances its adaptability. The ability to extract reusable visual physics and reconstruct generation-ready prompts makes SID a potentially valuable asset for image editing and generation workflows, especially within the Stable Diffusion ecosystem.

Reference

SID analyzes inputs using a structured analysis stage that separates content (wireframe / skeleton) from style (visual physics) in JSON form.