OmniAgent: Audio-Guided Active Perception for Audio-Video Understanding
Analysis
Key Takeaways
- •OmniAgent is an active perception agent for audio-video understanding.
- •It uses dynamic planning and audio cues for fine-grained reasoning.
- •The approach achieves state-of-the-art performance on benchmarks.
“OmniAgent achieves state-of-the-art performance, surpassing leading open-source and proprietary models by substantial margins of 10% - 20% accuracy.”