Pose-Guided Residual Refinement for Text-to-Motion Generation
Published:Dec 27, 2025 04:45
•1 min read
•ArXiv
Analysis
This paper addresses the limitations of existing text-to-motion generation methods, particularly those based on pose codes, by introducing a hybrid representation that combines interpretable pose codes with residual codes. This approach aims to improve both the fidelity and controllability of generated motions, making it easier to edit and refine them based on text descriptions. The use of residual vector quantization and residual dropout are key innovations to achieve this.
Key Takeaways
- •Proposes PGR$^2$M, a novel approach for text-to-motion generation and editing.
- •Combines pose codes and residual codes for improved fidelity and controllability.
- •Employs residual vector quantization and residual dropout.
- •Demonstrates improved performance compared to existing methods on benchmark datasets.
- •Enables intuitive and structure-preserving motion edits.
Reference
“PGR$^2$M improves Fréchet inception distance and reconstruction metrics for both generation and editing compared with CoMo and recent diffusion- and tokenization-based baselines, while user studies confirm that it enables intuitive, structure-preserving motion edits.”