Research Paper#Reinforcement Learning, Offline RL, Fitted Q-Iteration🔬 ResearchAnalyzed: Jan 3, 2026 18:24
Stationary Reweighting Improves Soft Fitted Q-Iteration Convergence
Published:Dec 30, 2025 00:58
•1 min read
•ArXiv
Analysis
This paper addresses the instability of soft Fitted Q-Iteration (FQI) in offline reinforcement learning, particularly when using function approximation and facing distribution shift. It identifies a geometric mismatch in the soft Bellman operator as a key issue. The core contribution is the introduction of stationary-reweighted soft FQI, which uses the stationary distribution of the current policy to reweight regression updates. This approach is shown to improve convergence properties, offering local linear convergence guarantees under function approximation and suggesting potential for global convergence through a temperature annealing strategy.
Key Takeaways
- •Addresses instability issues in soft Fitted Q-Iteration (FQI) for offline reinforcement learning.
- •Identifies a geometric mismatch in the soft Bellman operator as a cause of instability.
- •Introduces stationary-reweighted soft FQI to improve convergence.
- •Proves local linear convergence under function approximation.
- •Suggests a temperature annealing approach for potential global convergence.
Reference
“The paper introduces stationary-reweighted soft FQI, which reweights each regression update using the stationary distribution of the current policy. It proves local linear convergence under function approximation with geometrically damped weight-estimation errors.”