每日 Harness 开源 · Source
全部刊期 · All issues

每日 Harness

2026-06-03 · Wednesday, June 3, 2026

智能体工程化加速

视图 · View

今日重点 · Today's Highlights

[Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses](https://arxiv.org/abs/2606.02373)[^1] - Harness-1 把搜索 agent 的证据、约束、候选答案和检查状态外置到 harness,而不是要求模型在越来越长的 transcript 中自行维护所有状态。

全文 ↓

[OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents](https://arxiv.org/abs/2606.02031)[^2] - OpenWebRL 研究视觉 Web agent 的在线多轮强化学习,重点是让 agent 在动态网页环境中交互试错,而不是只模仿静态监督轨迹。

全文 ↓

[Leyline: KV Cache Directives for Agentic Inference](https://arxiv.org/abs/2606.01065)[^3] - Leyline 针对 agentic inference 提出 KV cache directives,用来处理工具调用失败、输出删除、轨迹分叉、回滚和重试等非线性对话操作。

全文 ↓

[DepsGuard](https://github.com/arnica/depsguard)[^4] - DepsGuard 是供应链安全 CLI,用一条命令为 npm、pnpm、yarn、bun 和 uv 写入更保守的包管理器配置。

全文 ↓

[dmtrKovalenko/fff](https://github.com/dmtrKovalenko/fff)[^5] - fff 是 Rust 写的高速文件搜索与内容索引工具包,面向长期运行进程、编辑器和 agent 场景。

全文 ↓

论文 · Papers

15 项 · 论文

本期重点Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses1arxiv.org原文 ↗

Harness-1 把搜索 agent 的证据、约束、候选答案和检查状态外置到 harness,而不是要求模型在越来越长的 transcript 中自行维护所有状态。贡献是把 RL 训练对象从纯对话策略改成模型加外部状态机,使检索、引用和验证步骤能被显式记录、检查和奖励。值得看的是,搜索 agent 的瓶颈常在跨多轮证据管理和自检,而这篇把状态管理变成了可训练接口。

本期重点OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents2arxiv.org原文 ↗

OpenWebRL 研究视觉 Web agent 的在线多轮强化学习,重点是让 agent 在动态网页环境中交互试错,而不是只模仿静态监督轨迹。论文讨论浏览器环境、视觉观察、动作空间、奖励与长程 credit assignment 等系统问题。值得看的是,Web agent 训练正在从“看截图做 imitation”转向“在网页里持续探索并修正策略”。

本期重点Leyline: KV Cache Directives for Agentic Inference3arxiv.org原文 ↗

Leyline 针对 agentic inference 提出 KV cache directives,用来处理工具调用失败、输出删除、轨迹分叉、回滚和重试等非线性对话操作。传统 KV cache 默认上下文按前缀追加,但 agent 工作流经常需要废弃 stale observation 或从中间节点另开分支。值得看的是,它把推理系统优化从单条聊天流吞吐扩展到 agent 状态编辑和分支探索。

开源 / 项目 · Projects

15 项 · 开源 / 项目

本期重点DepsGuard4github.com原文 ↗

github.com

DepsGuard 是供应链安全 CLI,用一条命令为 npm、pnpm、yarn、bun 和 uv 写入更保守的包管理器配置。README 强调它会 harden 安装脚本、registry、lockfile 和版本解析相关选项,降低依赖安装阶段的脚本执行与混淆包风险。值得上手的是,它把分散在不同包管理器里的安全基线变成可重复执行的项目初始化步骤。

RePlaya18github.com原文 ↗

github.com

自托管浏览器 session replay 工具,基于 rrweb,并支持 live tailing。

Piqc30github.com原文 ↗

github.com

面向 LLM inference cluster 的 GPU 浪费扫描工具。

行业动态 · Industry News

11 项 · 行业动态

MAI-Code-1-Flash32microsoft.ai原文 ↗

microsoft.ai

Microsoft AI 发布面向代码任务的 MAI-Code-1-Flash 模型及模型卡。

MAI-Thinking-133microsoft.ai原文 ↗

microsoft.ai

Microsoft AI 发布 MAI-Thinking-1,作为其新一批 MAI 模型的一部分。

GitHub Copilot App34github.com原文 ↗

github.com

GitHub 公开 Copilot App 预览页面,展示其面向 GitHub 工作流的应用形态。

博客文章 · Blog Posts

10 项 · 博客文章

How we index images for RAG43kapa.ai原文 ↗

检索与知识接地数据·分析

Kapa.ai 介绍其为 RAG 系统索引图片的 pipeline:图片需要被抽取、描述、OCR、和周边文本上下文绑定,而不能只存 URL 或 alt text。文章指出文档中的截图、图表和 UI 状态经常承载回答所需证据。值得看的是,多模态 RAG 的难点在切分、引用和排序如何与文本证据合并。

Rethinking Search as Code Generation44research.perplexity.ai原文 ↗

Perplexity Research 把搜索重新表述为代码生成:模型不只生成查询词,而是生成可执行检索程序,组合搜索、过滤、解析和聚合步骤。这个视角把复杂信息需求拆成控制流、数据流和验证逻辑,适合多跳事实查找和结构化答案生成。值得看的是,它让 search agent 的推理过程更容易调试、复现和审计。

Holo3.1: Fast & Local Computer Use Agents45huggingface.co原文 ↗

Hugging Face 博客介绍 Holo3.1,本地运行的 computer-use agent 系列,强调速度、本地部署和桌面/浏览器操作能力。它把 computer-use agent 放在低延迟与隐私需求中,而不是完全依赖远程托管模型。值得看的是,本地 agent 若能保持可用性,会改变 GUI 自动化的部署边界。

Farewell Ai246interconnects.ai原文 ↗

interconnects.ai

Nathan Lambert 回顾离开 Ai2 前参与 Olmo 模型和开放 AI 研究工作的经历。

Pasted File Editor47simonwillison.net原文 ↗

simonwillison.net

Simon Willison 记录一个把粘贴大文本转成可编辑文件附件的原型工具。

引用来源 · References

61 条 · 引用
  1. 1 Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses. arXiv:2606.02373https://arxiv.org/abs/2606.02373 ↩ 回到正文 · back to text
  2. 2 OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents. arXiv:2606.02031https://arxiv.org/abs/2606.02031 ↩ 回到正文 · back to text
  3. 3 Leyline: KV Cache Directives for Agentic Inference. arXiv:2606.01065https://arxiv.org/abs/2606.01065 ↩ 回到正文 · back to text
  4. 4 DepsGuard. GitHub: arnica/depsguardhttps://github.com/arnica/depsguard ↩ 回到正文 · back to text
  5. 5 dmtrKovalenko/fff. GitHub: dmtrKovalenko/fffhttps://github.com/dmtrKovalenko/fff ↩ 回到正文 · back to text
  6. 6 DOT-MoE: Differentiable Optimal Transport for MoEfication. arXiv:2606.01666https://arxiv.org/abs/2606.01666 ↩ 回到正文 · back to text
  7. 7 Policy and World Modeling Co-Training for Language Agents. arXiv:2606.02388https://arxiv.org/abs/2606.02388 ↩ 回到正文 · back to text
  8. 8 MCP-Persona: Benchmarking LLM Agents on Real-World Personal Applications via Environment Simulation. arXiv:2606.02470https://arxiv.org/abs/2606.02470 ↩ 回到正文 · back to text
  9. 9 Multi-Agent Computer Use. arXiv:2606.01533https://arxiv.org/abs/2606.01533 ↩ 回到正文 · back to text
  10. 10 Agent Skills Should Go Beyond Text: The Case for Visual Skills. arXiv:2606.01414https://arxiv.org/abs/2606.01414 ↩ 回到正文 · back to text
  11. 11 FineVerify: Scaling Test-Time Compute with Fine-Grained Self-Verification for Agentic Search. arXiv:2606.00660https://arxiv.org/abs/2606.00660 ↩ 回到正文 · back to text
  12. 12 AMP: A Vendor-Neutral Wire Format for Agent Memory Operations. arXiv:2606.01138https://arxiv.org/abs/2606.01138 ↩ 回到正文 · back to text
  13. 13 TRACE: Trajectory Risk-Aware Compression for Long-Horizon Agent Safety. arXiv:2606.00611https://arxiv.org/abs/2606.00611 ↩ 回到正文 · back to text
  14. 14 PrivacyPeek: Auditing What LLM-Based Agents Acquire, Not Just What They Say. arXiv:2606.00152https://arxiv.org/abs/2606.00152 ↩ 回到正文 · back to text
  15. 15 When Safe Skills Collide: Measuring Compositional Risk in Agent Skill Ecosystems. arXiv:2606.00448https://arxiv.org/abs/2606.00448 ↩ 回到正文 · back to text
  16. 16 LongAttnComp: Cross-Family Context Compression for Long-Context Reasoning. arXiv:2606.01336https://arxiv.org/abs/2606.01336 ↩ 回到正文 · back to text
  17. 17 Speculative Pipeline Decoding: Higher-Accruacy and Zero-Bubble Speculation via Pipeline Parallelism. arXiv:2605.30852https://arxiv.org/abs/2605.30852 ↩ 回到正文 · back to text
  18. 18 RePlaya. GitHub: s2-streamstore/replayahttps://github.com/s2-streamstore/replaya ↩ 回到正文 · back to text
  19. 19 Clorhttps://clor.com/ ↩ 回到正文 · back to text
  20. 20 Open-source general-purpose alternative to Exa Websets. GitHub: tinyfish-io/bigsethttps://github.com/tinyfish-io/bigset ↩ 回到正文 · back to text
  21. 21 Terse, TypeScript First Workflow Builder. GitHub: TerseAI/Tersehttps://github.com/TerseAI/Terse ↩ 回到正文 · back to text
  22. 22 Claude Code plugin for deep multi-agent code reviews. GitHub: Farfield-Dev/deep-reviewhttps://github.com/Farfield-Dev/deep-review ↩ 回到正文 · back to text
  23. 23 Jabsco. GitHub: jrecyclebin/jabscohttps://github.com/jrecyclebin/jabsco ↩ 回到正文 · back to text
  24. 24 NUA an agent that tests for product correctnesshttps://trynua.dev/ ↩ 回到正文 · back to text
  25. 25 Parleyhttps://parley.cloudflavor.io ↩ 回到正文 · back to text
  26. 26 ASys. GitHub: vincentping/asyshttps://github.com/vincentping/asys ↩ 回到正文 · back to text
  27. 27 AERF, signed receipts for AI agent actions. GitHub: aerf-spec/aerfhttps://github.com/aerf-spec/aerf ↩ 回到正文 · back to text
  28. 28 MetaBrainhttps://metabrain.eu ↩ 回到正文 · back to text
  29. 29 Krimto. GitHub: krimto-labs/krimtohttps://github.com/krimto-labs/krimto ↩ 回到正文 · back to text
  30. 30 Piqc. GitHub: paralleliq/piqchttps://github.com/paralleliq/piqc ↩ 回到正文 · back to text
  31. 31 Live breath detection and biofeedback from a phone microphone. GitHub: shiihaa-app/shiihaa-breath-detectionhttps://github.com/shiihaa-app/shiihaa-breath-detection ↩ 回到正文 · back to text
  32. 32 MAI-Code-1-Flashhttps://microsoft.ai/news/introducingmai-code-1-flash/ ↩ 回到正文 · back to text
  33. 33 MAI-Thinking-1https://microsoft.ai/news/introducing-mai-thinking-1/ ↩ 回到正文 · back to text
  34. 34 GitHub Copilot App. GitHub: features/previewhttps://github.com/features/preview/github-app ↩ 回到正文 · back to text
  35. 35 OpenAI frontier models and Codex are now available on AWShttps://openai.com/index/openai-frontier-models-and-codex-are-now-available-on-aws/ ↩ 回到正文 · back to text
  36. 36 Codex for every role, tool, and workflowhttps://openai.com/index/codex-for-every-role-tool-workflow ↩ 回到正文 · back to text
  37. 37 Codex is becoming a productivity tool for everyonehttps://openai.com/index/codex-for-knowledge-work ↩ 回到正文 · back to text
  38. 38 Our views on AI policy and political advocacyhttps://openai.com/index/our-views-on-ai-policy-and-political-advocacy ↩ 回到正文 · back to text
  39. 39 Advancing youth safety and opportunity through global leadershiphttps://openai.com/index/advancing-youth-safety-and-opportunity-through-global-leadership ↩ 回到正文 · back to text
  40. 40 Anthropic expands Project Glasswinghttps://www.anthropic.com/news/expanding-project-glasswing ↩ 回到正文 · back to text
  41. 41 Trump signs downsized AI order after weeks of reversalshttps://www.politico.com/news/2026/06/02/trump-signs-downsized-ai-order-00946389 ↩ 回到正文 · back to text
  42. 42 Florida sues OpenAI and Sam Altman over AI riskshttps://www.politico.com/news/2026/06/01/openai-hit-with-florida-lawsuit-00944215 ↩ 回到正文 · back to text
  43. 43 How we index images for RAGhttps://www.kapa.ai/blog/how-we-index-images-for-rag ↩ 回到正文 · back to text
  44. 44 Rethinking Search as Code Generationhttps://research.perplexity.ai/articles/rethinking-search-as-code-generation ↩ 回到正文 · back to text
  45. 45 Holo3.1: Fast & Local Computer Use Agentshttps://huggingface.co/blog/Hcompany/holo31 ↩ 回到正文 · back to text
  46. 46 Farewell Ai2https://www.interconnects.ai/p/farewell-ai2 ↩ 回到正文 · back to text
  47. 47 Pasted File Editorhttps://simonwillison.net/2026/Jun/2/pasted-file-editor/#atom-everything ↩ 回到正文 · back to text
  48. 48 DiffusionBlocks: Save 2-3x Training Memory!?https://mail.bycloud.ai/p/diffusionblocks-save-2-3x-training-memory ↩ 回到正文 · back to text
  49. 49 not much happened todayhttps://news.smol.ai/issues/26-06-01-not-much/ ↩ 回到正文 · back to text
  50. 50 The Sequence Knowledge #870: Liquid Models and the Search for a Post-Transformer Architecturehttps://thesequence.substack.com/p/the-sequence-knowledge-870-liquid ↩ 回到正文 · back to text
  51. 51 The advertising cartel coming to your web browserhttps://blog.zgp.org/the-advertising-cartel-coming-to-your-web-browser/ ↩ 回到正文 · back to text
  52. 52 Quality in the Age of Slophttps://sinclairtarget.com/blog/2026/06/01/quality-in-the-age-of-slop/ ↩ 回到正文 · back to text
  53. 53 pbakaus/impeccable. GitHub: pbakaus/impeccablehttps://github.com/pbakaus/impeccable ↩ 回到正文 · back to text
  54. 54 TauricResearch/TradingAgents. GitHub: TauricResearch/TradingAgentshttps://github.com/TauricResearch/TradingAgents ↩ 回到正文 · back to text
  55. 55 can1357/oh-my-pi. GitHub: can1357/oh-my-pihttps://github.com/can1357/oh-my-pi ↩ 回到正文 · back to text
  56. 56 zeroclaw-labs/zeroclaw. GitHub: zeroclaw-labs/zeroclawhttps://github.com/zeroclaw-labs/zeroclaw ↩ 回到正文 · back to text
  57. 57 ruvnet/ruflo. GitHub: ruvnet/ruflohttps://github.com/ruvnet/ruflo ↩ 回到正文 · back to text
  58. 58 better-auth/better-auth. GitHub: better-auth/better-authhttps://github.com/better-auth/better-auth ↩ 回到正文 · back to text
  59. 59 uutils/coreutils. GitHub: uutils/coreutilshttps://github.com/uutils/coreutils ↩ 回到正文 · back to text
  60. 60 lakehq/sail. GitHub: lakehq/sailhttps://github.com/lakehq/sail ↩ 回到正文 · back to text
  61. 61 AlexsJones/llmfit. GitHub: AlexsJones/llmfithttps://github.com/AlexsJones/llmfit ↩ 回到正文 · back to text