Sparse Offline RL Robust to Data Corruption
Analysis
Key Takeaways
- •Addresses robust offline RL in high-dimensional, sparse MDPs.
- •Highlights limitations of LSVI when incorporating sparsity.
- •Proposes actor-critic methods with sparse robust estimators.
- •Provides the first non-vacuous guarantees under specific coverage and corruption assumptions.
- •Demonstrates the possibility of learning near-optimal policies even with data corruption.
“The paper provides the first non-vacuous guarantees in high-dimensional sparse MDPs with single-policy concentrability coverage and corruption, showing that learning a near-optimal policy remains possible in regimes where traditional robust offline RL techniques may fail.”