MoFu: Scale-Aware Video Generation
Published:Dec 26, 2025 09:29
•1 min read
•ArXiv
Analysis
This paper addresses critical issues in multi-subject video generation: scale inconsistency and permutation sensitivity. The proposed MoFu framework, with its Scale-Aware Modulation (SMO) and Fourier Fusion strategy, offers a novel approach to improve subject fidelity and visual quality. The introduction of a dedicated benchmark for evaluation is also significant.
Key Takeaways
- •Addresses scale inconsistency and permutation sensitivity in multi-subject video generation.
- •Proposes Scale-Aware Modulation (SMO) for scale consistency.
- •Employs Fourier Fusion for permutation invariance.
- •Introduces a Scale-Permutation Stability Loss.
- •Establishes a dedicated benchmark for evaluation.
Reference
“MoFu significantly outperforms existing methods in preserving natural scale, subject fidelity, and overall visual quality.”