论文 · Papers2026-06-05 · Friday, June 5, 2026

From Untrusted Input to Trusted Memory: A Systematic Study of Memory Poisoning Attacks in LLM Agents

这篇把持久记忆当成 agent 的信任边界来研究：一次恶意写入可能在原始攻击交互结束后继续影响行为。摘要强调 single adversarial memory write 的长期作用，提示防线应覆盖写入、检索和使用记忆的全过程，而不是只过滤当前输入。

–浏览

评论 · Comments