FQE Improvement Without Bellman Completeness

Research Paper #Reinforcement Learning, Off-Policy Evaluation, Fitted Q-Evaluation 🔬 Research|Analyzed: Jan 3, 2026 16:59•

Published: Dec 29, 2025 19:04

•

1 min read

•ArXiv

Analysis

This paper addresses a key limitation of Fitted Q-Evaluation (FQE), a core technique in off-policy reinforcement learning. FQE typically requires Bellman completeness, a difficult condition to satisfy. The authors identify a norm mismatch as the root cause and propose a simple reweighting strategy using the stationary density ratio. This allows for strong evaluation guarantees without the restrictive Bellman completeness assumption, improving the robustness and practicality of FQE.

Key Takeaways

•Addresses the Bellman completeness requirement of FQE.
•Identifies a norm mismatch as the core issue.
•Proposes a reweighting strategy using the stationary density ratio.
•Enables strong evaluation guarantees without Bellman completeness.
•Improves the robustness and practicality of FQE.

Reference / Citation

View Original

"The authors propose a simple fix: reweight each regression step using an estimate of the stationary density ratio, thereby aligning FQE with the norm in which the Bellman operator contracts."

ArXivDec 29, 2025 19:04

* Cited for critical analysis under Article 32.

Older

Generative AI's Act Two

Newer

Squawk bots: Can generative AI lead us to understanding animals?