Bellman Calibration for Improved Offline RL

Published:Dec 29, 2025 18:52
1 min read
ArXiv

Analysis

This paper introduces Iterated Bellman Calibration, a novel post-hoc method to improve the accuracy of value predictions in offline reinforcement learning. The method is model-agnostic and doesn't require strong assumptions like Bellman completeness or realizability, making it widely applicable. The use of doubly robust pseudo-outcomes to handle off-policy data is a key contribution. The paper provides finite-sample guarantees, which is crucial for practical applications.

Reference

Bellman calibration requires that states with similar predicted long-term returns exhibit one-step returns consistent with the Bellman equation under the target policy.