Multimodal Learning for Micro-Gesture and Emotion Recognition

Published:Dec 29, 2025 08:22
1 min read
ArXiv

Analysis

This paper addresses the challenging tasks of micro-gesture recognition and behavior-based emotion prediction using multimodal learning. It leverages video and skeletal pose data, integrating RGB and 3D pose information for micro-gesture classification and facial/contextual embeddings for emotion recognition. The work's significance lies in its application to the iMiGUE dataset and its competitive performance in the MiGA 2025 Challenge, securing 2nd place in emotion prediction. The paper highlights the effectiveness of cross-modal fusion techniques for capturing nuanced human behaviors.

Reference

The approach secured 2nd place in the behavior-based emotion prediction task.