InstructMoLE: Instruction-Guided Experts for Image Generation
Published:Dec 25, 2025 21:37
•1 min read
•ArXiv
Analysis
This paper addresses the challenge of multi-conditional image generation using diffusion transformers, specifically focusing on parameter-efficient fine-tuning. It identifies limitations in existing methods like LoRA and token-level MoLE routing, which can lead to artifacts. The core contribution is InstructMoLE, a framework that uses instruction-guided routing to select experts, preserving global semantics and improving image quality. The introduction of an orthogonality loss further enhances performance. The paper's significance lies in its potential to improve compositional control and fidelity in instruction-driven image generation.
Key Takeaways
- •Proposes InstructMoLE, a novel framework for instruction-driven fine-tuning of diffusion transformers.
- •Employs Instruction-Guided Routing (IGR) for global expert selection, improving semantic consistency.
- •Introduces an output-space orthogonality loss to promote expert diversity.
- •Outperforms existing methods on multi-conditional image generation benchmarks.
Reference
“InstructMoLE utilizes a global routing signal, Instruction-Guided Routing (IGR), derived from the user's comprehensive instruction. This ensures that a single, coherently chosen expert council is applied uniformly across all input tokens, preserving the global semantics and structural integrity of the generation process.”