FQE Improvement Without Bellman Completeness

Published:Dec 29, 2025 19:04
1 min read
ArXiv

Analysis

This paper addresses a key limitation of Fitted Q-Evaluation (FQE), a core technique in off-policy reinforcement learning. FQE typically requires Bellman completeness, a difficult condition to satisfy. The authors identify a norm mismatch as the root cause and propose a simple reweighting strategy using the stationary density ratio. This allows for strong evaluation guarantees without the restrictive Bellman completeness assumption, improving the robustness and practicality of FQE.

Reference

The authors propose a simple fix: reweight each regression step using an estimate of the stationary density ratio, thereby aligning FQE with the norm in which the Bellman operator contracts.