Search:
Match:
1 results

Process-Aware Evaluation for Video Reasoning

Published:Dec 31, 2025 16:31
1 min read
ArXiv

Analysis

This paper addresses a critical issue in evaluating video generation models: the tendency for models to achieve correct outcomes through incorrect reasoning processes (outcome-hacking). The introduction of VIPER, a new benchmark with a process-aware evaluation paradigm, and the Process-outcome Consistency (POC@r) metric, are significant contributions. The findings highlight the limitations of current models and the need for more robust reasoning capabilities.
Reference

State-of-the-art video models achieve only about 20% POC@1.0 and exhibit a significant outcome-hacking.