主题 · Topics
主题 · Topics — 每日 Harness
按 Agent / Harness 主题跨刊期浏览全部内容——进入任一主题查看其全部条目并按日期筛选,或回到全部刊期按时间浏览。
共 7 个主题分组 · 21 个子主题 · 207 条内容
1 认知与推理Reasoning
4 项1.1 推理与规划Reasoning & Planning 2 项
- Rethinking Search as Code Generation 2026-06-03research.perplexity.ai
- Biohub/esm 2026-05-31github.com
1.2 测试时计算Test-time Compute 2 项
- FineVerify: Scaling Test-Time Compute with Fine-Grained Self-Verification for Agentic Search 2026-06-03arxiv.org
- The Sequence Opinion #868: Recursion Is the New Scaling Law 2026-05-29thesequence.substack.com
2 学习与自进化Learning & Self-Evolution
17 项2.1 Agent RL / 可验证奖励Agent RL / Verifiable Rewards 9 项
- InfoMem: Training Long-Context Memory Agents with Answer-Conditioned Information Gain 2026-06-04arxiv.org
- Policy and World Modeling Co-Training for Language Agents 2026-06-03arxiv.org
- OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents 2026-06-03arxiv.org
- Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses 2026-06-03arxiv.org
- SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search 2026-06-02arxiv.org
2.2 蒸馏与压缩Distillation & Compression 1 项
- What Makes Interaction Trajectories Effective for Training Terminal Agents? 2026-06-04arxiv.org
2.3 自进化Self-Evolution 4 项
- Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories 2026-06-04arxiv.org
- Adaptive Auto-Harness: Sustained Self-Improvement for Agentic System Deployment on Open-Ended Task Streams 2026-06-04arxiv.org
- Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents 2026-06-02arxiv.org
- SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Tasks 2026-06-01arxiv.org
2.4 合成数据与训练环境Synthetic Data & Environments 3 项
- I built an Android OS in the browser 2026-05-31mobilegym.dev
- PhoneWorld: Scaling Phone-Use Agent Environments 2026-05-30arxiv.org
- SynthTools: A Framework for Scaling Synthetic Tools for Agent Development 2026-05-29arxiv.org
3 记忆与上下文Memory & Context
35 项3.1 Agent 记忆Agent Memory 10 项
- supermemoryai/supermemory 2026-06-04github.com
- Mnemo 2026-06-04github.com
- DELTAMEM: Incremental Experience Memory for LLM Agents via Residual Trees 2026-06-04arxiv.org
- MetaBrain 2026-06-03metabrain.eu
- Krimto 2026-06-03github.com
3.2 上下文工程Context Engineering 12 项
- mksglu/context-mode 2026-06-04github.com
- chopratejas/headroom 2026-06-04github.com
- Moxie Docs 2026-06-04moxiedocs.com
- KVarN: Variance-Normalized KV-Cache Quantization Mitigates Error Accumulation in Reasoning Tasks 2026-06-04arxiv.org
- Carto 2026-06-04github.com
3.3 检索与知识接地Retrieval / RAG 13 项
- HelixDB/helix-db 2026-06-04github.com
- Extract 2026-06-04extract.page
- CLI for crawling documentation sites into Markdown with defuddle 2026-06-04github.com
- dmtrKovalenko/fff 2026-06-03github.com
- Open-source general-purpose alternative to Exa Websets 2026-06-03github.com
4 工具与技能Tools & Skills
31 项4.1 工具使用Tool Use 11 项
- modelscope/FunASR 2026-06-04github.com
- ToolGate: Token-Efficient Pre-Call Control for Tool-Augmented Vision-Language Agents 2026-06-04arxiv.org
- The Sequence AI of the Week #871: Inside the Loop with Claude Opus 4.8 2026-06-04thesequence.substack.com
- Introducing new capabilities to GPT-Rosalind 2026-06-04openai.com
- D4Vinci/Scrapling 2026-06-02github.com
4.2 技能系统Skills 14 项
- googleworkspace/cli 2026-06-04github.com
- SkillDAG: Self-Evolving Typed Skill Graphs for LLM Skill Selection at Scale 2026-06-04arxiv.org
- Inducing Reasoning Primitives from Agent Traces 2026-06-04arxiv.org
- pbakaus/impeccable 2026-06-03github.com
- Agent Skills Should Go Beyond Text: The Case for Visual Skills 2026-06-03arxiv.org
4.3 协议与互操作Protocols & Interop 6 项
- agentgateway/agentgateway 2026-06-04github.com
- Now AI agents need what RSS does 2026-06-04julienreszka.com
- ASys 2026-06-03github.com
- AERF, signed receipts for AI agent actions 2026-06-03github.com
- theta-spec 2026-05-30github.com
5 编排与多智能体Orchestration & Multi-Agent
20 项5.1 多智能体Multi-Agent 8 项
- Division Swarm 2026-06-04github.com
- ruvnet/ruflo 2026-06-03github.com
- TauricResearch/TradingAgents 2026-06-03github.com
- Multi-Agent Computer Use 2026-06-03arxiv.org
- Claude Code plugin for deep multi-agent code reviews 2026-06-03github.com
5.2 工作流与控制流Workflows & Control 12 项
- OpenSOP 2026-06-04opensop.ai
- Chatcode 2026-06-04chatcode.dev
- Terse, TypeScript First Workflow Builder 2026-06-03github.com
- Cloud CI and agentic workflows for embedded hardware development 2026-06-02github.com
- BloopAI/vibe-kanban 2026-06-02github.com
6 运行时与基础设施Harness Runtime & Infra
52 项6.1 框架与脚手架Frameworks & Scaffolds 27 项
- Paseo 2026-06-04github.com
- Microsoft Build: MAI-Thinking-1 and MAI Family models, Surface RTX Spark Dev Box, and OpenClaw in Windows 2026-06-04news.smol.ai
- LiteHarness 2026-06-04github.com
- Agent libOS: A Library-OS-Inspired Runtime for Long-Running, Capability-Controlled LLM Agents 2026-06-04arxiv.org
- zeroclaw-labs/zeroclaw 2026-06-03github.com
6.2 执行环境与沙箱Execution & Sandboxing 15 项
- nanocoai/nanoclaw 2026-06-04github.com
- micropython-wasm 0.1a1 2026-06-04simonwillison.net
- datasette-agent-micropython 0.1a0 2026-06-04simonwillison.net
- Self-hosted dev sandboxes with preview URLs 2026-06-04github.com
- NVIDIA/OpenShell 2026-06-04github.com
6.3 可观测性与调试Observability & Debugging 10 项
- Viewport 2026-06-04github.com
- TraceGraph: Shared Decision Landscapes for Diagnosing and Improving Agent Trajectories 2026-06-02arxiv.org
- Tokentoll, a CI gate for LLM API cost regressions 2026-05-31github.com
- ryoppippi/ccusage 2026-05-30github.com
- git-ai-project/git-ai 2026-05-30github.com
7 评测与安全Evaluation & Safety
48 项7.1 基准Benchmarks 9 项
- DeskCraft: Benchmarking Desktop Agents on Professional Workflows and Human-in-the-Loop Collaboration 2026-06-04arxiv.org
- MCP-Persona: Benchmarking LLM Agents on Real-World Personal Applications via Environment Simulation 2026-06-03arxiv.org
- SoundnessBench: Can Your AI Scientist Really Tell Good Research Ideas from Bad Ones? 2026-06-02arxiv.org
- OpenSkillEval: Automatically Auditing the Open Skill Ecosystem for LLM Agents 2026-06-02arxiv.org
- LongDS-Bench: On the Failure of Long-Horizon Agentic Data Analysis 2026-06-02arxiv.org
7.2 评测方法Eval Methodology 16 项
- What Benchmarks Don't Measure: The Case for Evaluating Abstention Competence in Autonomous Agents 2026-06-04arxiv.org
- Handoff Debt: The Rediscovery Cost When Coding Agents Take Over Interrupted Tasks 2026-06-04arxiv.org
- TRACE: Trajectory Risk-Aware Compression for Long-Horizon Agent Safety 2026-06-03arxiv.org
- PrivacyPeek: Auditing What LLM-Based Agents Acquire, Not Just What They Say 2026-06-03arxiv.org
- NUA an agent that tests for product correctness 2026-06-03trynua.dev
7.3 安全与攻防Security 19 项
- Agent-browser-shield 2026-06-04github.com
- AI Agents Enable Adaptive Computer Worms 2026-06-04arxiv.org
- 1-Click GitHub Token Stealing via a VSCode Bug 2026-06-04blog.ammaraskar.com
- When Safe Skills Collide: Measuring Compositional Risk in Agent Skill Ecosystems 2026-06-03arxiv.org
- From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors 2026-06-02arxiv.org
7.4 对齐与治理Alignment & Governance 4 项
- Overlaying Governance: A Compositional Authorization Framework for Delegation and Scope in Agentic AI 2026-06-04arxiv.org
- OpenAI: Strengthening societal resilience with Rosalind Biodefense 2026-05-30openai.com
- Governing Technical Debt in Agentic AI Systems 2026-05-30arxiv.org
- OpenAI’s Frontier Governance Framework 2026-05-29openai.com