Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents
Analysis
This ArXiv NLP paper introduces Memory-T1, a novel reinforcement learning framework designed to enhance temporal reasoning in conversational agents operating across multiple sessions. The core problem addressed is the difficulty current long-context models face in accurately identifying temporally relevant information within lengthy and noisy dialogue histories. Memory-T1 tackles this by employing a coarse-to-fine strategy, initially pruning the dialogue history using temporal and relevance filters, followed by an RL agent that selects precise evidence sessions. The multi-level reward function, incorporating answer accuracy, evidence grounding, and temporal consistency, is a key innovation. The reported state-of-the-art performance on the Time-Dialog benchmark, surpassing a 14B baseline, suggests the effectiveness of the approach. The ablation studies further validate the importance of temporal consistency and evidence grounding rewards.
Key Takeaways
- •Memory-T1 uses reinforcement learning for temporal reasoning in multi-session dialogues.
- •It employs a coarse-to-fine strategy with temporal and relevance filters.
- •The system achieves state-of-the-art performance on the Time-Dialog benchmark.
“Temporal reasoning over long, multi-session dialogues is a critical capability for conversational agents.”