High-Fidelity, Long-Duration Human Image Animation with Diffusion Transformer

Published:Dec 26, 2025 07:36
1 min read
ArXiv

Analysis

This paper addresses key limitations in human image animation, specifically the generation of long-duration videos and fine-grained details. It proposes a novel diffusion transformer (DiT)-based framework with several innovative modules and strategies to improve fidelity and temporal consistency. The focus on facial and hand details, along with the ability to handle arbitrary video lengths, suggests a significant advancement in the field.

Reference

The paper's core contribution is a DiT-based framework incorporating hybrid guidance signals, a Position Shift Adaptive Module, and a novel data augmentation strategy to achieve superior performance in both high-fidelity and long-duration human image animation.