每日 Harness

2026-06-03 · Wednesday, June 3, 2026

智能体工程化加速

视图 · View

今日重点 · Today's Highlights

[Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses](https://arxiv.org/abs/2606.02373)[^1] - Harness-1 把搜索 agent 的证据、约束、候选答案和检查状态外置到 harness，而不是要求模型在越来越长的 transcript 中自行维护所有状态。

全文 ↓

[OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents](https://arxiv.org/abs/2606.02031)[^2] - OpenWebRL 研究视觉 Web agent 的在线多轮强化学习，重点是让 agent 在动态网页环境中交互试错，而不是只模仿静态监督轨迹。

全文 ↓

[Leyline: KV Cache Directives for Agentic Inference](https://arxiv.org/abs/2606.01065)[^3] - Leyline 针对 agentic inference 提出 KV cache directives，用来处理工具调用失败、输出删除、轨迹分叉、回滚和重试等非线性对话操作。

全文 ↓

[DepsGuard](https://github.com/arnica/depsguard)[^4] - DepsGuard 是供应链安全 CLI，用一条命令为 npm、pnpm、yarn、bun 和 uv 写入更保守的包管理器配置。

全文 ↓

[dmtrKovalenko/fff](https://github.com/dmtrKovalenko/fff)[^5] - fff 是 Rust 写的高速文件搜索与内容索引工具包，面向长期运行进程、编辑器和 agent 场景。

全文 ↓

论文 · Papers

15 项 · 论文

DOT-MoE: Differentiable Optimal Transport for MoEfication 6arxiv.org原文 ↗

arxiv.org

论文提出用可微最优传输把预训练 dense LLM 转换为 sparse MoE，以降低从零训练 MoE 的不稳定性和成本。

–

Policy and World Modeling Co-Training for Language Agents 7arxiv.org原文 ↗

Agent RL / 可验证奖励合成数据与训练环境其他垂直

论文把 agent policy 与文本 world model 联合训练，让 RL rollout 同时学习动作选择和环境动态。

–

本期重点Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses 1arxiv.org原文 ↗

Agent RL / 可验证奖励上下文工程检索与知识接地研究·科学

Harness-1 把搜索 agent 的证据、约束、候选答案和检查状态外置到 harness，而不是要求模型在越来越长的 transcript 中自行维护所有状态。贡献是把 RL 训练对象从纯对话策略改成模型加外部状态机，使检索、引用和验证步骤能被显式记录、检查和奖励。值得看的是，搜索 agent 的瓶颈常在跨多轮证据管理和自检，而这篇把状态管理变成了可训练接口。

–

本期重点OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents 2arxiv.org原文 ↗

Agent RL / 可验证奖励合成数据与训练环境计算机·Web

OpenWebRL 研究视觉 Web agent 的在线多轮强化学习，重点是让 agent 在动态网页环境中交互试错，而不是只模仿静态监督轨迹。论文讨论浏览器环境、视觉观察、动作空间、奖励与长程 credit assignment 等系统问题。值得看的是，Web agent 训练正在从“看截图做 imitation”转向“在网页里持续探索并修正策略”。

–

MCP-Persona: Benchmarking LLM Agents on Real-World Personal Applications via Environment Simulation 8arxiv.org原文 ↗

基准协议与互操作工具使用其他垂直

论文构建模拟个人应用环境的 MCP agent benchmark，用于评估社交、日程、邮件等个人数据场景中的工具使用。

–

Multi-Agent Computer Use 9arxiv.org原文 ↗

多智能体工具使用计算机·Web

论文提出从单个串行 computer-use agent 转向多 agent computer-use 系统，并讨论任务分解、并行执行和重规划评估。

–

Agent Skills Should Go Beyond Text: The Case for Visual Skills 10arxiv.org原文 ↗

技能系统计算机·Web

论文指出文本技能文件对视觉任务存在表达瓶颈，提出把可复用 agent skill 扩展到视觉形式。

–

FineVerify: Scaling Test-Time Compute with Fine-Grained Self-Verification for Agentic Search 11arxiv.org原文 ↗

测试时计算评测方法研究·科学

论文把 agentic search 的答案验证拆成细粒度 claim 检查，用于更稳健地利用 test-time compute。

–

本期重点Leyline: KV Cache Directives for Agentic Inference 3arxiv.org原文 ↗

上下文工程执行环境与沙箱系统·基础设施

Leyline 针对 agentic inference 提出 KV cache directives，用来处理工具调用失败、输出删除、轨迹分叉、回滚和重试等非线性对话操作。传统 KV cache 默认上下文按前缀追加，但 agent 工作流经常需要废弃 stale observation 或从中间节点另开分支。值得看的是，它把推理系统优化从单条聊天流吞吐扩展到 agent 状态编辑和分支探索。

–

AMP: A Vendor-Neutral Wire Format for Agent Memory Operations 12arxiv.org原文 ↗

Agent 记忆协议与互操作系统·基础设施

论文提出 agent memory 操作的中立 wire format，覆盖写入、迁移和人工审查等记忆治理接口。

–

TRACE: Trajectory Risk-Aware Compression for Long-Horizon Agent Safety 13arxiv.org原文 ↗

评测方法上下文工程其他垂直

论文把长程 agent 安全检测建模为轨迹级压缩问题，用于保留稀疏和延迟出现的风险证据。

–

PrivacyPeek: Auditing What LLM-Based Agents Acquire, Not Just What They Say 14arxiv.org原文 ↗

评测方法安全与攻防其他垂直

论文评估 LLM agent 在完成任务时获取了哪些敏感信息，而不仅仅检查输出或外发动作。

–

When Safe Skills Collide: Measuring Compositional Risk in Agent Skill Ecosystems 15arxiv.org原文 ↗

安全与攻防技能系统其他垂直

论文研究多个单独安全的 agent skills 组合后是否形成不安全能力集合。

–

LongAttnComp: Cross-Family Context Compression for Long-Context Reasoning 16arxiv.org原文 ↗

上下文工程系统·基础设施

论文研究跨模型家族的长上下文压缩，用于减少 100k+ token 输入的 prefill 成本。

–

Speculative Pipeline Decoding: Higher-Accruacy and Zero-Bubble Speculation via Pipeline Parallelism 17arxiv.org原文 ↗

arxiv.org

论文提出用 pipeline parallelism 改造 speculative decoding，减少串行 drafting 延迟。

–

开源 / 项目 · Projects

15 项 · 开源 / 项目

本期重点DepsGuard 4github.com原文 ↗

github.com

DepsGuard 是供应链安全 CLI，用一条命令为 npm、pnpm、yarn、bun 和 uv 写入更保守的包管理器配置。README 强调它会 harden 安装脚本、registry、lockfile 和版本解析相关选项，降低依赖安装阶段的脚本执行与混淆包风险。值得上手的是，它把分散在不同包管理器里的安全基线变成可重复执行的项目初始化步骤。

–

RePlaya 18github.com原文 ↗

github.com

自托管浏览器 session replay 工具，基于 rrweb，并支持 live tailing。

–

Clor 19clor.com原文 ↗

执行环境与沙箱框架与脚手架编码

面向 coding agent 的运行与托管平台，围绕 agent 执行环境和安全模型构建。

–

Open-source general-purpose alternative to Exa Websets 20github.com原文 ↗

检索与知识接地工作流与控制流数据·分析

用搜索引擎递归构建结构化数据集的开源工具。

–

Terse, TypeScript First Workflow Builder 21github.com原文 ↗

工作流与控制流框架与脚手架编码

TypeScript-first 的开源 workflow builder，面向 IDE 和 Claude Code 工作流。

–

Claude Code plugin for deep multi-agent code reviews 22github.com原文 ↗

多智能体评测方法编码

Claude Code 插件，用多 agent 流程执行代码审查。

–

Jabsco 23github.com原文 ↗

执行环境与沙箱计算机·Web

通过 RDP 管理远程桌面和测试 VM 的 agent harness。

–

NUA an agent that tests for product correctness 24trynua.dev原文 ↗

评测方法推理与规划编码

面向产品正确性的测试 agent，用上下文生成检查用户意图的测试。

–

Parley 25parley.cloudflavor.io原文 ↗

框架与脚手架评测方法编码

本地 TUI 代码审查工具，支持与 Codex、Claude、OpenCode 等 agent harness 协作。

–

ASys 26github.com原文 ↗

协议与互操作执行环境与沙箱系统·基础设施

给 AI agent 操作服务器用的 typed binary protocol，目标是替代 SSH 式交互。

–

AERF, signed receipts for AI agent actions 27github.com原文 ↗

协议与互操作可观测性与调试系统·基础设施

为 agent 动作生成签名收据的规范项目。

–

MetaBrain 28metabrain.eu原文 ↗

Agent 记忆检索与知识接地编码

本地文档记忆系统，让 AI agent 可检索项目上下文。

–

Krimto 29github.com原文 ↗

Agent 记忆其他垂直

把 AI 记忆保存为用户自己 git 仓库中的 Markdown 文件。

–

Piqc 30github.com原文 ↗

github.com

面向 LLM inference cluster 的 GPU 浪费扫描工具。

–

Live breath detection and biofeedback from a phone microphone 31github.com原文 ↗

github.com

用手机麦克风做实时呼吸检测和生物反馈的开源项目。

–

行业动态 · Industry News

11 项 · 行业动态

MAI-Code-1-Flash 32microsoft.ai原文 ↗

microsoft.ai

Microsoft AI 发布面向代码任务的 MAI-Code-1-Flash 模型及模型卡。

–

MAI-Thinking-1 33microsoft.ai原文 ↗

microsoft.ai

Microsoft AI 发布 MAI-Thinking-1，作为其新一批 MAI 模型的一部分。

–

GitHub Copilot App 34github.com原文 ↗

github.com

GitHub 公开 Copilot App 预览页面，展示其面向 GitHub 工作流的应用形态。

–

OpenAI frontier models and Codex are now available on AWS 35openai.com原文 ↗

openai.com

OpenAI 宣布 frontier models 与 Codex 可通过 AWS 使用。

–

Codex for every role, tool, and workflow 36openai.com原文 ↗

openai.com

OpenAI 发布 Codex plugins、sites 和 annotations，扩展 Codex 在不同岗位与工具中的用法。

–

Codex is becoming a productivity tool for everyone 37openai.com原文 ↗

openai.com

OpenAI 发布关于 Codex 用于研究、数据分析、自动化和内容工作的报告。

–

Our views on AI policy and political advocacy 38openai.com原文 ↗

openai.com

OpenAI 说明其 AI 政策、政治倡议、监管和外部组织关系立场。

–

Advancing youth safety and opportunity through global leadership 39openai.com原文 ↗

openai.com

OpenAI 提出面向青少年 AI 安全与机会的全球治理倡议。

–

Anthropic expands Project Glasswing 40anthropic.com原文 ↗

anthropic.com

Anthropic 宣布扩展 Project Glasswing，围绕 Claude 在教育与公共部门场景的部署。

–

Trump signs downsized AI order after weeks of reversals 41politico.com原文 ↗

politico.com

Politico 报道美国总统签署缩减版 AI 行政命令。

–

Florida sues OpenAI and Sam Altman over AI risks 42politico.com原文 ↗

politico.com

Politico 报道佛罗里达州就 AI 风险起诉 OpenAI 和 Sam Altman。

–

博客文章 · Blog Posts

10 项 · 博客文章

How we index images for RAG 43kapa.ai原文 ↗

检索与知识接地数据·分析

Kapa.ai 介绍其为 RAG 系统索引图片的 pipeline：图片需要被抽取、描述、OCR、和周边文本上下文绑定，而不能只存 URL 或 alt text。文章指出文档中的截图、图表和 UI 状态经常承载回答所需证据。值得看的是，多模态 RAG 的难点在切分、引用和排序如何与文本证据合并。

–

Rethinking Search as Code Generation 44research.perplexity.ai原文 ↗

推理与规划检索与知识接地工作流与控制流研究·科学

Perplexity Research 把搜索重新表述为代码生成：模型不只生成查询词，而是生成可执行检索程序，组合搜索、过滤、解析和聚合步骤。这个视角把复杂信息需求拆成控制流、数据流和验证逻辑，适合多跳事实查找和结构化答案生成。值得看的是，它让 search agent 的推理过程更容易调试、复现和审计。

–

Holo3.1: Fast & Local Computer Use Agents 45huggingface.co原文 ↗

执行环境与沙箱工具使用计算机·Web

Hugging Face 博客介绍 Holo3.1，本地运行的 computer-use agent 系列，强调速度、本地部署和桌面/浏览器操作能力。它把 computer-use agent 放在低延迟与隐私需求中，而不是完全依赖远程托管模型。值得看的是，本地 agent 若能保持可用性，会改变 GUI 自动化的部署边界。

–

Farewell Ai2 46interconnects.ai原文 ↗

interconnects.ai

Nathan Lambert 回顾离开 Ai2 前参与 Olmo 模型和开放 AI 研究工作的经历。

–

Pasted File Editor 47simonwillison.net原文 ↗

simonwillison.net

Simon Willison 记录一个把粘贴大文本转成可编辑文件附件的原型工具。

–

DiffusionBlocks: Save 2-3x Training Memory!?48mail.bycloud.ai原文 ↗

mail.bycloud.ai

The AI Timeline 汇总一周 AI 研究与行业动态，重点解释 DiffusionBlocks 的训练内存优化。

–

not much happened today 49news.smol.ai原文 ↗

news.smol.ai

smol.ai news 汇总 2026-05-30 到 2026-06-01 的 AI 新闻与社区讨论。

–

The Sequence Knowledge #870: Liquid Models and the Search for a Post-Transformer Architecture 50thesequence.substack.com原文 ↗

thesequence.substack.com

TheSequence 介绍 liquid models 及其作为 transformer 之外架构路线的背景。

–

The advertising cartel coming to your web browser 51blog.zgp.org原文 ↗

blog.zgp.org

个人博客讨论浏览器广告相关标准与行业协调问题。

–

Quality in the Age of Slop 52sinclairtarget.com原文 ↗

sinclairtarget.com

个人博客讨论生成式内容环境下的软件质量与审稿标准。

–

引用来源 · References

61 条 · 引用

1 Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses. arXiv:2606.02373https://arxiv.org/abs/2606.02373 ↩ 回到正文 · back to text
2 OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents. arXiv:2606.02031https://arxiv.org/abs/2606.02031 ↩ 回到正文 · back to text
3 Leyline: KV Cache Directives for Agentic Inference. arXiv:2606.01065https://arxiv.org/abs/2606.01065 ↩ 回到正文 · back to text
4 DepsGuard. GitHub: arnica/depsguardhttps://github.com/arnica/depsguard ↩ 回到正文 · back to text
5 dmtrKovalenko/fff. GitHub: dmtrKovalenko/fffhttps://github.com/dmtrKovalenko/fff ↩ 回到正文 · back to text
6 DOT-MoE: Differentiable Optimal Transport for MoEfication. arXiv:2606.01666https://arxiv.org/abs/2606.01666 ↩ 回到正文 · back to text
7 Policy and World Modeling Co-Training for Language Agents. arXiv:2606.02388https://arxiv.org/abs/2606.02388 ↩ 回到正文 · back to text
8 MCP-Persona: Benchmarking LLM Agents on Real-World Personal Applications via Environment Simulation. arXiv:2606.02470https://arxiv.org/abs/2606.02470 ↩ 回到正文 · back to text
9 Multi-Agent Computer Use. arXiv:2606.01533https://arxiv.org/abs/2606.01533 ↩ 回到正文 · back to text
10 Agent Skills Should Go Beyond Text: The Case for Visual Skills. arXiv:2606.01414https://arxiv.org/abs/2606.01414 ↩ 回到正文 · back to text
11 FineVerify: Scaling Test-Time Compute with Fine-Grained Self-Verification for Agentic Search. arXiv:2606.00660https://arxiv.org/abs/2606.00660 ↩ 回到正文 · back to text
12 AMP: A Vendor-Neutral Wire Format for Agent Memory Operations. arXiv:2606.01138https://arxiv.org/abs/2606.01138 ↩ 回到正文 · back to text
13 TRACE: Trajectory Risk-Aware Compression for Long-Horizon Agent Safety. arXiv:2606.00611https://arxiv.org/abs/2606.00611 ↩ 回到正文 · back to text
14 PrivacyPeek: Auditing What LLM-Based Agents Acquire, Not Just What They Say. arXiv:2606.00152https://arxiv.org/abs/2606.00152 ↩ 回到正文 · back to text
15 When Safe Skills Collide: Measuring Compositional Risk in Agent Skill Ecosystems. arXiv:2606.00448https://arxiv.org/abs/2606.00448 ↩ 回到正文 · back to text
16 LongAttnComp: Cross-Family Context Compression for Long-Context Reasoning. arXiv:2606.01336https://arxiv.org/abs/2606.01336 ↩ 回到正文 · back to text
17 Speculative Pipeline Decoding: Higher-Accruacy and Zero-Bubble Speculation via Pipeline Parallelism. arXiv:2605.30852https://arxiv.org/abs/2605.30852 ↩ 回到正文 · back to text
18 RePlaya. GitHub: s2-streamstore/replayahttps://github.com/s2-streamstore/replaya ↩ 回到正文 · back to text
19 Clorhttps://clor.com/ ↩ 回到正文 · back to text
20 Open-source general-purpose alternative to Exa Websets. GitHub: tinyfish-io/bigsethttps://github.com/tinyfish-io/bigset ↩ 回到正文 · back to text
21 Terse, TypeScript First Workflow Builder. GitHub: TerseAI/Tersehttps://github.com/TerseAI/Terse ↩ 回到正文 · back to text
22 Claude Code plugin for deep multi-agent code reviews. GitHub: Farfield-Dev/deep-reviewhttps://github.com/Farfield-Dev/deep-review ↩ 回到正文 · back to text
23 Jabsco. GitHub: jrecyclebin/jabscohttps://github.com/jrecyclebin/jabsco ↩ 回到正文 · back to text
24 NUA an agent that tests for product correctnesshttps://trynua.dev/ ↩ 回到正文 · back to text
25 Parleyhttps://parley.cloudflavor.io ↩ 回到正文 · back to text
26 ASys. GitHub: vincentping/asyshttps://github.com/vincentping/asys ↩ 回到正文 · back to text
27 AERF, signed receipts for AI agent actions. GitHub: aerf-spec/aerfhttps://github.com/aerf-spec/aerf ↩ 回到正文 · back to text
28 MetaBrainhttps://metabrain.eu ↩ 回到正文 · back to text
29 Krimto. GitHub: krimto-labs/krimtohttps://github.com/krimto-labs/krimto ↩ 回到正文 · back to text
30 Piqc. GitHub: paralleliq/piqchttps://github.com/paralleliq/piqc ↩ 回到正文 · back to text
31 Live breath detection and biofeedback from a phone microphone. GitHub: shiihaa-app/shiihaa-breath-detectionhttps://github.com/shiihaa-app/shiihaa-breath-detection ↩ 回到正文 · back to text
32 MAI-Code-1-Flashhttps://microsoft.ai/news/introducingmai-code-1-flash/ ↩ 回到正文 · back to text
33 MAI-Thinking-1https://microsoft.ai/news/introducing-mai-thinking-1/ ↩ 回到正文 · back to text
34 GitHub Copilot App. GitHub: features/previewhttps://github.com/features/preview/github-app ↩ 回到正文 · back to text
35 OpenAI frontier models and Codex are now available on AWShttps://openai.com/index/openai-frontier-models-and-codex-are-now-available-on-aws/ ↩ 回到正文 · back to text
36 Codex for every role, tool, and workflowhttps://openai.com/index/codex-for-every-role-tool-workflow ↩ 回到正文 · back to text
37 Codex is becoming a productivity tool for everyonehttps://openai.com/index/codex-for-knowledge-work ↩ 回到正文 · back to text
38 Our views on AI policy and political advocacyhttps://openai.com/index/our-views-on-ai-policy-and-political-advocacy ↩ 回到正文 · back to text
39 Advancing youth safety and opportunity through global leadershiphttps://openai.com/index/advancing-youth-safety-and-opportunity-through-global-leadership ↩ 回到正文 · back to text
40 Anthropic expands Project Glasswinghttps://www.anthropic.com/news/expanding-project-glasswing ↩ 回到正文 · back to text
41 Trump signs downsized AI order after weeks of reversalshttps://www.politico.com/news/2026/06/02/trump-signs-downsized-ai-order-00946389 ↩ 回到正文 · back to text
42 Florida sues OpenAI and Sam Altman over AI riskshttps://www.politico.com/news/2026/06/01/openai-hit-with-florida-lawsuit-00944215 ↩ 回到正文 · back to text
43 How we index images for RAGhttps://www.kapa.ai/blog/how-we-index-images-for-rag ↩ 回到正文 · back to text
44 Rethinking Search as Code Generationhttps://research.perplexity.ai/articles/rethinking-search-as-code-generation ↩ 回到正文 · back to text
45 Holo3.1: Fast & Local Computer Use Agentshttps://huggingface.co/blog/Hcompany/holo31 ↩ 回到正文 · back to text
46 Farewell Ai2https://www.interconnects.ai/p/farewell-ai2 ↩ 回到正文 · back to text
47 Pasted File Editorhttps://simonwillison.net/2026/Jun/2/pasted-file-editor/#atom-everything ↩ 回到正文 · back to text
48 DiffusionBlocks: Save 2-3x Training Memory!?https://mail.bycloud.ai/p/diffusionblocks-save-2-3x-training-memory ↩ 回到正文 · back to text
49 not much happened todayhttps://news.smol.ai/issues/26-06-01-not-much/ ↩ 回到正文 · back to text
50 The Sequence Knowledge #870: Liquid Models and the Search for a Post-Transformer Architecturehttps://thesequence.substack.com/p/the-sequence-knowledge-870-liquid ↩ 回到正文 · back to text
51 The advertising cartel coming to your web browserhttps://blog.zgp.org/the-advertising-cartel-coming-to-your-web-browser/ ↩ 回到正文 · back to text
52 Quality in the Age of Slophttps://sinclairtarget.com/blog/2026/06/01/quality-in-the-age-of-slop/ ↩ 回到正文 · back to text
53 pbakaus/impeccable. GitHub: pbakaus/impeccablehttps://github.com/pbakaus/impeccable ↩ 回到正文 · back to text
54 TauricResearch/TradingAgents. GitHub: TauricResearch/TradingAgentshttps://github.com/TauricResearch/TradingAgents ↩ 回到正文 · back to text
55 can1357/oh-my-pi. GitHub: can1357/oh-my-pihttps://github.com/can1357/oh-my-pi ↩ 回到正文 · back to text
56 zeroclaw-labs/zeroclaw. GitHub: zeroclaw-labs/zeroclawhttps://github.com/zeroclaw-labs/zeroclaw ↩ 回到正文 · back to text
57 ruvnet/ruflo. GitHub: ruvnet/ruflohttps://github.com/ruvnet/ruflo ↩ 回到正文 · back to text
58 better-auth/better-auth. GitHub: better-auth/better-authhttps://github.com/better-auth/better-auth ↩ 回到正文 · back to text
59 uutils/coreutils. GitHub: uutils/coreutilshttps://github.com/uutils/coreutils ↩ 回到正文 · back to text
60 lakehq/sail. GitHub: lakehq/sailhttps://github.com/lakehq/sail ↩ 回到正文 · back to text
61 AlexsJones/llmfit. GitHub: AlexsJones/llmfithttps://github.com/AlexsJones/llmfit ↩ 回到正文 · back to text

智能体工程化加速

论文 · Papers

DOT-MoE: Differentiable Optimal Transport for MoEfication6arxiv.org原文 ↗

Policy and World Modeling Co-Training for Language Agents7arxiv.org原文 ↗

本期重点Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses1arxiv.org原文 ↗

本期重点OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents2arxiv.org原文 ↗

MCP-Persona: Benchmarking LLM Agents on Real-World Personal Applications via Environment Simulation8arxiv.org原文 ↗

Multi-Agent Computer Use9arxiv.org原文 ↗

Agent Skills Should Go Beyond Text: The Case for Visual Skills10arxiv.org原文 ↗

FineVerify: Scaling Test-Time Compute with Fine-Grained Self-Verification for Agentic Search11arxiv.org原文 ↗

本期重点Leyline: KV Cache Directives for Agentic Inference3arxiv.org原文 ↗

AMP: A Vendor-Neutral Wire Format for Agent Memory Operations12arxiv.org原文 ↗

TRACE: Trajectory Risk-Aware Compression for Long-Horizon Agent Safety13arxiv.org原文 ↗

PrivacyPeek: Auditing What LLM-Based Agents Acquire, Not Just What They Say14arxiv.org原文 ↗

When Safe Skills Collide: Measuring Compositional Risk in Agent Skill Ecosystems15arxiv.org原文 ↗

LongAttnComp: Cross-Family Context Compression for Long-Context Reasoning16arxiv.org原文 ↗

Speculative Pipeline Decoding: Higher-Accruacy and Zero-Bubble Speculation via Pipeline Parallelism17arxiv.org原文 ↗

开源 / 项目 · Projects

本期重点DepsGuard4github.com原文 ↗

RePlaya18github.com原文 ↗

Clor19clor.com原文 ↗

Open-source general-purpose alternative to Exa Websets20github.com原文 ↗

Terse, TypeScript First Workflow Builder21github.com原文 ↗

Claude Code plugin for deep multi-agent code reviews22github.com原文 ↗

Jabsco23github.com原文 ↗

NUA an agent that tests for product correctness24trynua.dev原文 ↗

Parley25parley.cloudflavor.io原文 ↗

ASys26github.com原文 ↗

AERF, signed receipts for AI agent actions27github.com原文 ↗

MetaBrain28metabrain.eu原文 ↗

Krimto29github.com原文 ↗

Piqc30github.com原文 ↗

Live breath detection and biofeedback from a phone microphone31github.com原文 ↗

行业动态 · Industry News

MAI-Code-1-Flash32microsoft.ai原文 ↗

MAI-Thinking-133microsoft.ai原文 ↗

GitHub Copilot App34github.com原文 ↗

OpenAI frontier models and Codex are now available on AWS35openai.com原文 ↗

Codex for every role, tool, and workflow36openai.com原文 ↗

Codex is becoming a productivity tool for everyone37openai.com原文 ↗

Our views on AI policy and political advocacy38openai.com原文 ↗

Advancing youth safety and opportunity through global leadership39openai.com原文 ↗

Anthropic expands Project Glasswing40anthropic.com原文 ↗

Trump signs downsized AI order after weeks of reversals41politico.com原文 ↗

Florida sues OpenAI and Sam Altman over AI risks42politico.com原文 ↗

博客文章 · Blog Posts

How we index images for RAG43kapa.ai原文 ↗

Rethinking Search as Code Generation44research.perplexity.ai原文 ↗

Holo3.1: Fast & Local Computer Use Agents45huggingface.co原文 ↗

Farewell Ai246interconnects.ai原文 ↗

Pasted File Editor47simonwillison.net原文 ↗

DiffusionBlocks: Save 2-3x Training Memory!?48mail.bycloud.ai原文 ↗

not much happened today49news.smol.ai原文 ↗

The Sequence Knowledge #870: Liquid Models and the Search for a Post-Transformer Architecture50thesequence.substack.com原文 ↗

The advertising cartel coming to your web browser51blog.zgp.org原文 ↗

Quality in the Age of Slop52sinclairtarget.com原文 ↗

GitHub 热门 · GitHub Trending

pbakaus/impeccable53github.com原文 ↗

TauricResearch/TradingAgents54github.com原文 ↗

can1357/oh-my-pi55github.com原文 ↗

本期重点dmtrKovalenko/fff5github.com原文 ↗

zeroclaw-labs/zeroclaw56github.com原文 ↗

ruvnet/ruflo57github.com原文 ↗

better-auth/better-auth58github.com原文 ↗

uutils/coreutils59github.com原文 ↗

lakehq/sail60github.com原文 ↗

AlexsJones/llmfit61github.com原文 ↗

引用来源 · References

DOT-MoE: Differentiable Optimal Transport for MoEfication 6arxiv.org原文 ↗

Policy and World Modeling Co-Training for Language Agents 7arxiv.org原文 ↗

本期重点Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses 1arxiv.org原文 ↗

本期重点OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents 2arxiv.org原文 ↗

MCP-Persona: Benchmarking LLM Agents on Real-World Personal Applications via Environment Simulation 8arxiv.org原文 ↗

Multi-Agent Computer Use 9arxiv.org原文 ↗

Agent Skills Should Go Beyond Text: The Case for Visual Skills 10arxiv.org原文 ↗

FineVerify: Scaling Test-Time Compute with Fine-Grained Self-Verification for Agentic Search 11arxiv.org原文 ↗

本期重点Leyline: KV Cache Directives for Agentic Inference 3arxiv.org原文 ↗

AMP: A Vendor-Neutral Wire Format for Agent Memory Operations 12arxiv.org原文 ↗

TRACE: Trajectory Risk-Aware Compression for Long-Horizon Agent Safety 13arxiv.org原文 ↗

PrivacyPeek: Auditing What LLM-Based Agents Acquire, Not Just What They Say 14arxiv.org原文 ↗

When Safe Skills Collide: Measuring Compositional Risk in Agent Skill Ecosystems 15arxiv.org原文 ↗

LongAttnComp: Cross-Family Context Compression for Long-Context Reasoning 16arxiv.org原文 ↗

Speculative Pipeline Decoding: Higher-Accruacy and Zero-Bubble Speculation via Pipeline Parallelism 17arxiv.org原文 ↗

本期重点DepsGuard 4github.com原文 ↗

RePlaya 18github.com原文 ↗

Clor 19clor.com原文 ↗

Open-source general-purpose alternative to Exa Websets 20github.com原文 ↗

Terse, TypeScript First Workflow Builder 21github.com原文 ↗

Claude Code plugin for deep multi-agent code reviews 22github.com原文 ↗

Jabsco 23github.com原文 ↗

NUA an agent that tests for product correctness 24trynua.dev原文 ↗

Parley 25parley.cloudflavor.io原文 ↗

ASys 26github.com原文 ↗

AERF, signed receipts for AI agent actions 27github.com原文 ↗

MetaBrain 28metabrain.eu原文 ↗

Krimto 29github.com原文 ↗

Piqc 30github.com原文 ↗

Live breath detection and biofeedback from a phone microphone 31github.com原文 ↗

MAI-Code-1-Flash 32microsoft.ai原文 ↗

MAI-Thinking-1 33microsoft.ai原文 ↗

GitHub Copilot App 34github.com原文 ↗

OpenAI frontier models and Codex are now available on AWS 35openai.com原文 ↗

Codex for every role, tool, and workflow 36openai.com原文 ↗

Codex is becoming a productivity tool for everyone 37openai.com原文 ↗

Our views on AI policy and political advocacy 38openai.com原文 ↗

Advancing youth safety and opportunity through global leadership 39openai.com原文 ↗

Anthropic expands Project Glasswing 40anthropic.com原文 ↗

Trump signs downsized AI order after weeks of reversals 41politico.com原文 ↗

Florida sues OpenAI and Sam Altman over AI risks 42politico.com原文 ↗

How we index images for RAG 43kapa.ai原文 ↗

Rethinking Search as Code Generation 44research.perplexity.ai原文 ↗

Holo3.1: Fast & Local Computer Use Agents 45huggingface.co原文 ↗

Farewell Ai2 46interconnects.ai原文 ↗

Pasted File Editor 47simonwillison.net原文 ↗

not much happened today 49news.smol.ai原文 ↗

The Sequence Knowledge #870: Liquid Models and the Search for a Post-Transformer Architecture 50thesequence.substack.com原文 ↗

The advertising cartel coming to your web browser 51blog.zgp.org原文 ↗

Quality in the Age of Slop 52sinclairtarget.com原文 ↗

pbakaus/impeccable 53github.com原文 ↗

TauricResearch/TradingAgents 54github.com原文 ↗

can1357/oh-my-pi 55github.com原文 ↗

本期重点dmtrKovalenko/fff 5github.com原文 ↗

zeroclaw-labs/zeroclaw 56github.com原文 ↗

ruvnet/ruflo 57github.com原文 ↗

better-auth/better-auth 58github.com原文 ↗

uutils/coreutils 59github.com原文 ↗

lakehq/sail 60github.com原文 ↗

AlexsJones/llmfit 61github.com原文 ↗