Stationary Reweighting Improves Soft Fitted Q-Iteration Convergence
Analysis
Key Takeaways
- •Addresses instability issues in soft Fitted Q-Iteration (FQI) for offline reinforcement learning.
- •Identifies a geometric mismatch in the soft Bellman operator as a cause of instability.
- •Introduces stationary-reweighted soft FQI to improve convergence.
- •Proves local linear convergence under function approximation.
- •Suggests a temperature annealing approach for potential global convergence.
“The paper introduces stationary-reweighted soft FQI, which reweights each regression update using the stationary distribution of the current policy. It proves local linear convergence under function approximation with geometrically damped weight-estimation errors.”