FQE Improvement Without Bellman Completeness
Analysis
This paper addresses a key limitation of Fitted Q-Evaluation (FQE), a core technique in off-policy reinforcement learning. FQE typically requires Bellman completeness, a difficult condition to satisfy. The authors identify a norm mismatch as the root cause and propose a simple reweighting strategy using the stationary density ratio. This allows for strong evaluation guarantees without the restrictive Bellman completeness assumption, improving the robustness and practicality of FQE.
Key Takeaways
- •Addresses the Bellman completeness requirement of FQE.
- •Identifies a norm mismatch as the core issue.
- •Proposes a reweighting strategy using the stationary density ratio.
- •Enables strong evaluation guarantees without Bellman completeness.
- •Improves the robustness and practicality of FQE.
“The authors propose a simple fix: reweight each regression step using an estimate of the stationary density ratio, thereby aligning FQE with the norm in which the Bellman operator contracts.”