Search:
Match:
1 results

Analysis

This article summarizes a podcast episode discussing a research paper on Deep Reinforcement Learning (DRL). The paper, which won an award at NeurIPS, critiques the common practice of evaluating DRL algorithms using only point estimates on benchmarks with a limited number of runs. The researchers, including Rishabh Agarwal, found significant discrepancies between conclusions drawn from point estimates and those from statistical analysis, particularly when using benchmarks like Atari 100k. The podcast explores the paper's reception, surprising results, and the challenges of changing self-reporting practices in research.
Reference

The paper calls for a change in how deep RL performance is reported on benchmarks when using only a few runs.