FQE Improvement Without Bellman Completeness

Research Paper#Reinforcement Learning, Off-Policy Evaluation, Fitted Q-Evaluation🔬 Research|Analyzed: Jan 3, 2026 16:59
Published: Dec 29, 2025 19:04
1 min read
ArXiv

Analysis

This paper addresses a key limitation of Fitted Q-Evaluation (FQE), a core technique in off-policy reinforcement learning. FQE typically requires Bellman completeness, a difficult condition to satisfy. The authors identify a norm mismatch as the root cause and propose a simple reweighting strategy using the stationary density ratio. This allows for strong evaluation guarantees without the restrictive Bellman completeness assumption, improving the robustness and practicality of FQE.
Reference / Citation
View Original
"The authors propose a simple fix: reweight each regression step using an estimate of the stationary density ratio, thereby aligning FQE with the norm in which the Bellman operator contracts."
A
ArXivDec 29, 2025 19:04
* Cited for critical analysis under Article 32.