EgoReAct: Generating 3D Human Reactions from Egocentric Video

Published:Dec 28, 2025 06:44
1 min read
ArXiv

Analysis

This paper addresses the challenge of generating realistic 3D human reactions from egocentric video, a problem with significant implications for areas like VR/AR and human-computer interaction. The creation of a new, spatially aligned dataset (HRD) is a crucial contribution, as existing datasets suffer from misalignment. The proposed EgoReAct framework, leveraging a Vector Quantised-Variational AutoEncoder and a Generative Pre-trained Transformer, offers a novel approach to this problem. The incorporation of 3D dynamic features like metric depth and head dynamics is a key innovation for enhancing spatial grounding and realism. The claim of improved realism, spatial consistency, and generation efficiency, while maintaining causality, suggests a significant advancement in the field.

Reference

EgoReAct achieves remarkably higher realism, spatial consistency, and generation efficiency compared with prior methods, while maintaining strict causality during generation.