Multimodal Learning for Micro-Gesture and Emotion Recognition
Analysis
This paper addresses the challenging tasks of micro-gesture recognition and behavior-based emotion prediction using multimodal learning. It leverages video and skeletal pose data, integrating RGB and 3D pose information for micro-gesture classification and facial/contextual embeddings for emotion recognition. The work's significance lies in its application to the iMiGUE dataset and its competitive performance in the MiGA 2025 Challenge, securing 2nd place in emotion prediction. The paper highlights the effectiveness of cross-modal fusion techniques for capturing nuanced human behaviors.
Key Takeaways
- •Proposes multimodal frameworks for micro-gesture and emotion recognition.
- •Utilizes video and skeletal pose data, integrating RGB and 3D pose information.
- •Employs cross-modal fusion techniques for improved performance.
- •Achieves strong results on the iMiGUE dataset, including 2nd place in emotion prediction.
“The approach secured 2nd place in the behavior-based emotion prediction task.”
Marked point processes intensity estimation using sparse group Lasso method applied to locations of lucrative and cooperative banks in mainland France
Revealing design archetypes and flexibility in e-molecule import pathways using Modeling to Generate Alternatives and interpretable machine learning