FoldAct: Stable Context Folding for Long-Horizon RL

Published:Dec 28, 2025 00:24
1 min read
ArXiv

Analysis

This paper addresses the scalability challenges of long-horizon reinforcement learning (RL) for large language models, specifically focusing on context folding methods. It identifies and tackles the issues arising from treating summary actions as standard actions, which leads to non-stationary observation distributions and training instability. The proposed FoldAct framework offers innovations to mitigate these problems, improving training efficiency and stability.

Reference

FoldAct explicitly addresses challenges through three key innovations: separated loss computation, full context consistency loss, and selective segment training.