每日 Harness 开源 · Source
返回本期 · Back to 2026-05-29

论文 · Papers2026-05-29 · Friday, May 29, 2026

LiveBrowseComp: Are Search Agents Searching, or Just Verifying What They Already Know?

arxiv.org原文 ↗

LiveBrowseComp: Are Search Agents Searching, or Just Verifying What They Already Know?
它指出搜索 benchmark 可能奖励“记忆验证”而非“证据发现”。LiveBrowseComp 的设计用新近、低显著性事实切断参数记忆,对搜索 agent 的检索链、查询生成和证据依赖更有诊断价值。
浏览

评论 · Comments