EgoReAct: Generating 3D Human Reactions from Egocentric Video
Research Paper#Computer Vision, Human Pose Estimation, Reaction Generation🔬 Research|Analyzed: Jan 3, 2026 16:20•
Published: Dec 28, 2025 06:44
•1 min read
•ArXivAnalysis
This paper addresses the challenge of generating realistic 3D human reactions from egocentric video, a problem with significant implications for areas like VR/AR and human-computer interaction. The creation of a new, spatially aligned dataset (HRD) is a crucial contribution, as existing datasets suffer from misalignment. The proposed EgoReAct framework, leveraging a Vector Quantised-Variational AutoEncoder and a Generative Pre-trained Transformer, offers a novel approach to this problem. The incorporation of 3D dynamic features like metric depth and head dynamics is a key innovation for enhancing spatial grounding and realism. The claim of improved realism, spatial consistency, and generation efficiency, while maintaining causality, suggests a significant advancement in the field.
Key Takeaways
- •Addresses the challenge of generating 3D human reactions from egocentric video.
- •Introduces the Human Reaction Dataset (HRD) to address data scarcity and misalignment.
- •Proposes EgoReAct, an autoregressive framework for real-time 3D reaction generation.
- •Incorporates 3D dynamic features (metric depth, head dynamics) for improved spatial grounding.
- •Demonstrates improved realism, spatial consistency, and generation efficiency compared to prior methods.
Reference / Citation
View Original"EgoReAct achieves remarkably higher realism, spatial consistency, and generation efficiency compared with prior methods, while maintaining strict causality during generation."