UniAct: Unified Control for Humanoid Robots
Analysis
This paper addresses a key challenge in humanoid robotics: bridging high-level multimodal instructions with whole-body execution. The proposed UniAct framework offers a novel two-stage approach using a fine-tuned MLLM and a causal streaming pipeline to achieve low-latency execution of diverse instructions (language, music, trajectories). The use of a shared discrete codebook (FSQ) for cross-modal alignment and physically grounded motions is a significant contribution, leading to improved performance in zero-shot tracking. The validation on a new motion benchmark (UniMoCap) further strengthens the paper's impact, suggesting a step towards more responsive and general-purpose humanoid assistants.
Key Takeaways
- •UniAct is a two-stage framework for humanoid robot control.
- •It uses a fine-tuned MLLM and a causal streaming pipeline.
- •It achieves low-latency execution of multimodal instructions.
- •It utilizes a shared discrete codebook for cross-modal alignment.
- •It shows improved performance in zero-shot tracking.
- •Validated on a new humanoid motion benchmark (UniMoCap).
“UniAct achieves a 19% improvement in the success rate of zero-shot tracking of imperfect reference motions.”