Stationary Reweighting Improves Soft Fitted Q-Iteration Convergence

Published:Dec 30, 2025 00:58
1 min read
ArXiv

Analysis

This paper addresses the instability of soft Fitted Q-Iteration (FQI) in offline reinforcement learning, particularly when using function approximation and facing distribution shift. It identifies a geometric mismatch in the soft Bellman operator as a key issue. The core contribution is the introduction of stationary-reweighted soft FQI, which uses the stationary distribution of the current policy to reweight regression updates. This approach is shown to improve convergence properties, offering local linear convergence guarantees under function approximation and suggesting potential for global convergence through a temperature annealing strategy.

Reference

The paper introduces stationary-reweighted soft FQI, which reweights each regression update using the stationary distribution of the current policy. It proves local linear convergence under function approximation with geometrically damped weight-estimation errors.