每日 Harness 开源 · Source
返回本期 · Back to 2026-06-04

论文 · Papers2026-06-04 · Thursday, June 4, 2026

InfoMem: Training Long-Context Memory Agents with Answer-Conditioned Information Gain

arxiv.org原文 ↗

InfoMem: Training Long-Context Memory Agents with Answer-Conditioned Information Gain
提出 InfoMem,用 answer-conditioned information gain 训练 chunk-wise long-context memory agents。核心奖励衡量最终 memory 对 ground-truth answer 每 token log-likelihood 的提升,而不是只看稀疏最终答案或词面重合。论文在相同 GRPO 框架和训练预算下优于可比 RL memory-agent baseline,并指出 reward 应只作用于成功轨迹、先归一化、且条件应是 answer 而非 query。它给长上下文 memory 训练提供了更贴近最终可回答性的中间信号。
浏览

评论 · Comments