@field_validator("tool_calls") @classmethod defcap_tools(cls, v: list[ToolCall]) -> list[ToolCall]: iflen(v) > 5: raise ValueError("too many tool calls in one turn") return v
# 模拟 LLM 脏输出 raw = {"thought": "查天气", "tool_calls": [{"name": "search", "arguments": {"q": "北京"}}]} out = AgentOutput.model_validate(raw)
If you’ve built ReAct agents with LangChain but hit state loss, crash recovery gaps, missing human approval, and runaway loops in production, LangGraph was built for exactly these problems. This article covers architecture and runnable code to help you master LangGraph systematically.
Compared to LangChain’s create_agent, LangGraph is a low-level runtime, not a high-level wrapper — you get full control at the cost of more boilerplate.
2. 核心概念详解 | Core Concepts
2.1 State — 状态模式
中文: State 通常用 TypedDict 定义,每个节点读取并返回状态更新。LangGraph 自动合并(merge)各节点的返回值。
English: State is typically defined with TypedDict. Each node reads and returns state updates; LangGraph automatically merges node outputs.
1 2 3 4 5 6 7
from typing import Annotated, TypedDict from langgraph.graph.message import add_messages
from langgraph.graph import StateGraph, END from langgraph.prebuilt import ToolNode from langchain_openai import ChatOpenAI from langchain_core.tools import tool
English: LangSmith provides full-chain observability for LangGraph. In the console you can inspect per-node I/O, LLM prompts/responses, tool args/results, and end-to-end latency and token usage.
7. 生产部署清单 | Production Checklist
中文:
检查项
说明
✅ 终止条件
step_count 上限、超时、Token 预算
✅ Checkpoint
生产环境用 PostgreSQL 而非 SQLite
✅ HITL
资金/删除/外发操作必经审批
✅ 工具权限
最小权限原则,避免 Agent 越权
✅ 结构化输出
关键节点强制 JSON Schema
✅ 错误处理
节点内 try/catch + 图级 fallback 边
✅ 可观测性
LangSmith Trace + 告警
✅ 评估集
Golden Dataset 回归测试
English:
Check
Description
✅ Termination
step_count cap, timeout, token budget
✅ Checkpoint
PostgreSQL in production, not SQLite
✅ HITL
Human approval for financial/delete/outbound ops
✅ Tool permissions
Least privilege; prevent escalation
✅ Structured output
Enforce JSON Schema at critical nodes
✅ Error handling
try/catch in nodes + graph-level fallback edges
✅ Observability
LangSmith traces + alerts
✅ Evaluation
Golden Dataset regression tests
8. 常见陷阱 | Common Pitfalls
中文:
状态膨胀 — messages 列表无限增长,需定期摘要压缩
循环死锁 — 忘记设置 step_count 上限,Agent 反复调用同一工具
Checkpoint 膨胀 — 高频写入导致存储暴涨,需设置保留策略
过度设计 — 简单顺序流程不需要 LangGraph,用 LangChain Chain 即可
忽略评估 — 没有 Golden Dataset,Prompt 微调后无法验证回归
English:
State bloat — Unbounded messages; summarize periodically
Loop deadlock — Missing step_count cap; agent retries the same tool forever
Checkpoint bloat — High-frequency writes; set retention policies
Over-engineering — Simple sequential flows don’t need LangGraph
Skipping evaluation — No Golden Dataset means no regression verification
English: LangGraph’s core value is turning agent workflows from opaque loops into auditable, recoverable, interruptible state machines. The learning curve is steep, but for production reliability it remains the most mature open-source choice. Recommended path:
Validate business value with LangChain create_agent
Migrate to LangGraph when you need state, loops, or approvals
Large Language Models are evolving from conversational Q&A to autonomous action through the Agent paradigm: instead of merely generating text, the model operates in a loop of perceive → reason → invoke tools → observe results → reason again until the task is complete. LangChain is one of the most representative open-source frameworks, with an ecosystem spanning LangGraph, LangSmith, and more.
Unlike simple prompt chains, an Agent features an Agentic Loop and environmental feedback, enabling dynamic strategy adjustment in uncertain environments.
English:ReAct (Reasoning + Acting) is the classic single-agent architecture: the model alternates between “Thought” and “Action,” continuing to reason based on tool-returned Observations.
中文: LangGraph 将工作流建模为 有向图:节点是处理步骤,边定义流转逻辑,共享 State 贯穿全流程。支持 Checkpointing、Human-in-the-Loop、循环与分支。
English: LangGraph models workflows as a Directed Graph: nodes are processing steps, edges define transitions, and a shared State flows through the pipeline. It supports checkpointing, human-in-the-loop, loops, and branches.
English: LangGraph is the low-level runtime from the LangChain team and the de facto standard for production agents in 2026. It offers deterministic execution, LangSmith tracing, sub-graph nesting, and model-agnostic design.
English: Companion observability and evaluation platform: logs LLM calls, tool execution, latency, and tokens; supports regression testing and production monitoring.
English: From Microsoft Research; multi-agent conversation at its core. Suited for research-style open tasks, but high token overhead; strict termination caps required.
PydanticAI
中文: 强调类型安全与 Python 原生体验,适合高并发 API 与强合规场景。常与 LangGraph 组合使用。
English: Emphasizes type safety and native Python DX; suited for high-throughput APIs and compliance scenarios. Often combined with LangGraph.
LlamaIndex
中文: 专注数据连接与 RAG,擅长知识库问答、文档分析 Agent。
English: Focused on data connectivity and RAG; strong at knowledge-base Q&A and document analysis agents.
English: LLM Agents are systems engineering combining model capability + tool ecosystem + orchestration runtime + observability. The pragmatic path: start with Chain/ReAct to validate value, upgrade to LangGraph as needed, and build evaluation and observability from day one.
The agent can execute shell, read/write files, access network, send messages. Messengers can prompt-inject and social-engineer. Response: control who can talk, where the agent acts, assume the model is manipulable, limit blast radius.
3.2 Trust Boundary Matrix
gateway.auth authenticates API callers; sessionKey routes sessions (not auth); exec approvals are operator guardrails, not multi-tenant isolation.
3.3 DM Access Model
dmPolicy: pairing (default), allowlist, open (high risk), disabled. Group mention gates prevent accidental triggers.
3.4 Context Visibility
Separates trigger authorization from supplemental context injection (all, allowlist, allowlist_quote).
3.5 Tool Blast Radius
Hardened baseline denies automation/runtime/fs tools, enables workspace-only filesystem, denies exec by default. Deny gateway, cron, sessions_spawn for untrusted surfaces.
3.6 Exec Approvals
security + ask + host configuration. Default full/off is intentional for trusted personal assistants.
Docker: cap-drop ALL, no-new-privileges, pids-limit, size-limited tmpfs. Configurable CPU/memory/disk.
4.6 Terminal Backend Security
local/ssh: approval checks on. docker/singularity/modal/daytona: container is boundary, checks skipped.
4.7 Credential Filtering
Strip sensitive env vars by default. Skill-declared passthrough only when skill is loaded. MCP gets filtered env + explicit config only. Error message redaction.
hermes doctor flags known compromised package versions.
五、安全模型对比矩阵 | Security Model Comparison Matrix
中文
安全能力
OpenClaw
Hermes
身份优先
✅ dmPolicy + allowlist
✅ 多层 allowlist + pairing
命令审批
Exec approvals (allowlist + ask)
Pattern matching + Tirith + smart mode
不可覆盖黑名单
无明确硬线层
✅ UNRECOVERABLE_BLOCKLIST
容器沙箱
Docker sandbox(可选)
6 后端,容器即边界
文件安全
@openclaw/fs-safe 根边界
工作目录 allowlist + 上下文扫描
SSRF 防护
浏览器 SSRF 策略可配
内置多类地址阻断
Prompt 注入防护
contextVisibility 过滤
上下文文件 + 记忆写入扫描
MCP 凭证隔离
配置级 env
严格白名单 + 脱敏
安全审计 CLI
openclaw security audit
hermes doctor
供应链锁定
npm shrinkwrap
tirith + lazy_deps + Skills Guard
默认安全姿态
信任操作者(full exec)
人工审批(manual)
硬化基线
audit –fix 一键加固
生产清单(Docker + allowlist)
English
Security capability
OpenClaw
Hermes
Identity-first
✅ dmPolicy + allowlist
✅ layered allowlist + pairing
Command approval
Exec approvals
Pattern + Tirith + smart mode
Non-overridable blocklist
No explicit hardline layer
✅ UNRECOVERABLE_BLOCKLIST
Container sandbox
Optional Docker
6 backends; container as boundary
File safety
@openclaw/fs-safe root bounds
cwd allowlist + context scanning
SSRF protection
Configurable browser SSRF policy
Built-in multi-class address blocking
Prompt injection
contextVisibility filtering
Context file + memory write scanning
MCP credential isolation
Config-level env
Strict whitelist + redaction
Security audit CLI
openclaw security audit
hermes doctor
Supply chain
npm shrinkwrap
tirith + lazy_deps + Skills Guard
Default posture
Trust operator (full exec)
Manual approval
Hardening baseline
audit –fix
Production checklist (Docker + allowlist)
六、共享收件箱场景 | Shared Inbox Scenarios
中文
若多人可 DM 你的 Bot,核心风险是 委派工具权限:
任一允许发送者可诱导 exec、浏览器、网络/文件工具
一个发送者的 Prompt 注入可影响共享状态/设备/输出
若 Agent 持有敏感凭证,任何允许发送者都可能驱动外泄
OpenClaw 建议:
session.dmScope: "per-channel-peer"
dmPolicy: "pairing" 或严格 allowlist
不要对共享 DM 开放广泛工具访问
团队工作流用独立 Agent/Gateway,最小工具集
Hermes 建议:
配置平台 allowlist,禁用 GATEWAY_ALLOW_ALL_USERS
生产环境 terminal.backend: docker
Cron 任务设 cron_mode: deny(遇危险命令拒绝而非自动批准)
English
If multiple people can DM your bot, the core risk is delegated tool authority. Any allowed sender can induce exec/browser/network tools; prompt injection from one sender affects shared state.
OpenClaw: per-channel-peer DM scope, pairing/allowlist, no broad tools on shared DMs, separate agents for team workflows.
Hermes: platform allowlists, Docker backend in production, cron_mode: deny for headless jobs.
OpenClaw’s security model is identity first, scope second, model last — Gateway auth, DM policies, tool profiles, and exec approvals control blast radius, defaulting to trusted single-operator UX, hardened progressively via security audit. Hermes’s model is seven-layer defense-in-depth with cautious defaults — manual approval, hardline blocklist, Tirith scanning, container isolation, and credential filtering for higher-assurance long-running deployments. Neither is a hostile multi-tenant sandbox; for that, the only reliable approach is splitting trust boundaries, not piling more approval rules onto one Gateway.
Single TypeScript process on port 18789: WebSocket server handles all channels, session routing, tool execution, and memory I/O. No separate orchestration microservice — the Gateway is the product.
GatewayRunner is the messaging frontend for the shared AIAgent engine. 20+ platform adapters normalize inbound events; the runner handles auth, slash commands, agent creation, and delivery.
OpenClaw Gateway is a multi-channel operating system — one WebSocket control plane for 50+ channels, ideal for maximum connectivity and a browser dashboard. Hermes Gateway is the messaging frontend for the agent engine — 20 adapters into one AIAgent core, sharing session storage and slash commands with CLI, ACP, and Cron. Both let you “message from Telegram while the agent works in the cloud”; choose based on whether you need connectivity breadth or engine depth.
OpenClaw externalizes agent identity and knowledge as Markdown in the workspace. Default path: ~/.openclaw/workspace/.
2.2 System Prompt Injection Order
At session start, files assemble in fixed order: SOUL.md → IDENTITY.md → USER.md → AGENTS.md → MEMORY.md. Persona instructions precede volatile memory content.
2.3 Memory Type Taxonomy
Type
File
Maintainer
Typical content
Persona memory
SOUL.md
User (Git-managed)
Tone, boundaries, values
User profile
USER.md
User + agent
Name, timezone, preferences
Long-term facts
MEMORY.md
Agent
Project paths, conventions, decisions
Diary memory
memory/*.md
Agent
Daily conversation summaries
Procedural knowledge
skills/*/SKILL.md
User/community
Workflows, procedures
2.4 Update Mechanism
Updates follow rules in AGENTS.md. No built-in auto-skill generation, FTS5 history search, or capacity management. Transparent and auditable, but prompt token cost grows linearly with memory size.
2.5 Session Persistence
Transcripts live in ~/.openclaw/agents/<agent>/sessions/*.jsonl for continuity and optional indexing — not a structured memory retrieval layer. Filesystem access is the trust boundary.
sequenceDiagram
participant U as 用户
participant A as AIAgent
participant M as Memory Manager
participant S as Skill Store
participant DB as SQLite FTS5
U->>A: 复杂任务请求
A->>A: 执行工具(5+ 次)
A->>M: 策划记忆(add/replace)
A->>S: skill_manage(create)
A->>DB: 持久化会话消息
Note over A,S: 下次同类任务
A->>S: skill_view(按需加载)
A->>DB: session_search(历史召回)
A->>U: 更高效响应
English
3.1 Four-Layer Memory Model
Working memory (current context) → Session memory (SQLite + FTS5) → Persistent memory (MEMORY.md / USER.md with char limits) → Skill memory (progressive disclosure in ~/.hermes/skills/). Optional external providers (Honcho, Mem0, etc.) run in parallel.
3.2 Layer 1: Working Memory
Current session messages and tool results. ContextCompressor summarizes when context exceeds thresholds, triggering session splits via parent_session_id chains.
3.3 Layer 2: Session Memory
Stored in ~/.hermes/state.db (WAL mode). FTS5 + trigram indexes for full-text and CJK substring search. session_search tool for on-demand recall. Write contention handled with short timeouts and jittered retries.
3.4 Layer 3: Persistent Memory
MEMORY.md (2200 chars) and USER.md (1375 chars) in ~/.hermes/memories/. Frozen snapshot at session start for prefix-cache stability. memory tool for add/replace/remove with security scanning.
3.5 Layer 4: Skill Memory
Procedural memory in ~/.hermes/skills/ with progressive disclosure (index → full content on demand). Auto-created after complex tasks; self-improved via skill_manage patch.
OpenClaw’s memory is a transparent filing cabinet — simple, readable, fully human-controlled, ideal for users who value auditability and manual curation. Hermes’s memory is a layered library with a self-learning archivist — SQLite search, char budgets, progressive skill disclosure, and closed-loop learning deliver “knows you better over time” with controlled token cost. The choice is fundamentally a tradeoff between transparent control and automatic evolutionary efficiency.
Agent Hermes & OpenClaw (Lobster): Architecture, Applications, and Comparison
最后更新 | Last updated: 2026-06-05
一、背景与定位 | Background & Positioning
中文
2026 年初,个人 AI Agent 领域出现两个现象级开源项目:
OpenClaw(龙虾):吉祥物是太空龙虾 Molty 🦞,GitHub Star 超 35 万,社区称其为「养虾」——在本地或服务器上长期运行一只个人助理。
Hermes Agent(爱马仕):由 Nous Research 发布,定位是 “The agent that grows with you”(与你共同成长的智能体),核心差异是内置 闭环学习系统(Learning Loop)。
二者均为 MIT 协议、可自托管、支持多渠道消息,但设计哲学不同:
维度
OpenClaw(龙虾)
Hermes Agent
核心问题
连接 — 让 AI 接入工具与聊天渠道
进化 — 让 AI 从经验中学习并自我改进
架构重心
Gateway 控制平面
Agent Engine + 学习闭环
技能来源
用户/社区手动编写 SKILL.md
任务完成后 自动生成 + 自我迭代
记忆模式
Markdown 工作区文件
分层记忆 + SQLite FTS5 检索
English
In early 2026, two standout open-source personal AI agent projects emerged:
OpenClaw (Lobster): Mascot Molty the space lobster 🦞, 350K+ GitHub stars; Chinese communities call it “raising a lobster” — running a persistent personal assistant locally or on a server.
Hermes Agent: Built by Nous Research, positioned as “The agent that grows with you”, with a built-in closed learning loop.
Both are MIT-licensed, self-hostable, and multi-channel — but their design philosophies diverge:
OpenClaw’s core idea: the Gateway is the product. A single TypeScript process on port 18789 manages all channel connections, session routing, and tool execution.
2.1 Workspace Bootstrap Files
Agent identity and knowledge live as plain Markdown under ~/.openclaw/workspace/:
Hermes centers on the AIAgent synchronous orchestration engine (run_agent.py) — not microservices. CLI, Gateway, ACP (IDE), Cron, and Batch Runner share one agent core.
3.1 Agent Loop
1 2 3 4
User message → Build prompt (system + memory + context) → Call LLM (3 API modes: chat_completions / codex_responses / anthropic_messages) → Parse tool calls → Execute → Inject results → Loop until no more tool calls → Final response → Persist session
OpenClaw asks: “How do I connect AI to my world?” — Gateway + files-as-config.
Hermes asks: “How does AI get smarter through use?” — Engine + learning loop.
They’re not simple replacements. Hermes ships hermes claw migrate to onboard OpenClaw users with self-evolution on top. Community projects like HermesClaw run both on the same WeChat account.
Selection guide:
Your need
Recommendation
Multi-channel chat, fast start, large community
OpenClaw
Long-running use, auto skill accumulation, research trajectories
Hermes Agent
Existing OpenClaw setup, want self-learning
hermes claw migrate
Coding-first, deep IDE integration
Both work; OpenClaw + Cursor/Claude Code ecosystem is more mature
Agent Hermes 与 OpenClaw(龙虾)代表了个人 AI Agent 的两条演进路线:连接广度 与 学习深度。龙虾用 Gateway 把 AI 带进你的消息应用和工作区;Hermes 用闭环学习让 AI 从每一次任务中沉淀经验。在实际部署中,二者可以共存、迁移、甚至互补——关键取决于你是更需要「随时随地的多渠道助理」,还是「越用越懂你的自进化伙伴」。
English
Agent Hermes and OpenClaw (Lobster) represent two evolution paths for personal AI agents: connectivity breadth vs. learning depth. OpenClaw’s Gateway brings AI into your messaging apps and workspace; Hermes’s closed loop turns every task into lasting experience. In practice they can coexist, migrate, and complement each other — the choice depends on whether you need an always-available multi-channel assistant or a self-evolving partner that knows you better over time.
English: As microservices, cloud-native architectures, and continuous delivery become mainstream, environment management has become a major bottleneck in engineering productivity. A single business call chain may involve dozens or even hundreds of services. If every development, integration, and testing session requires spinning up a full copy of the entire chain, costs soar, maintenance becomes painful, and resource utilization stays very low. The industry has responded with environment layering: a stable, complete Baseline Environment hosts all services, while lightweight, isolated Minimum-Set Environments deploy only changed or essential services. Together, they balance stability, isolation, and cost.
二、核心概念 | Core Concepts
2.1 基准环境(Baseline Environment)
定义:包含全量稳定服务的参考环境,通常部署生产主干或最新稳定版本代码
定位:环境中的”锚点”与”公共底座”,为其他环境提供未变更服务的依赖支撑
特征:服务齐全、版本统一、变更受控、长期可用
类比:像城市的主干道与公共设施——稳定运行,供各分支道路接入
Definition: A reference environment containing the full set of stable services, typically running production mainline or the latest stable release. Role: The “anchor” and “shared foundation” that supplies unchanged services to other environments. Characteristics: Complete service coverage, unified versions, controlled changes, long-term availability. Analogy: Like a city’s main roads and public infrastructure—stable and shared by all branches.
2.2 最小集环境(Minimum-Set Environment)
定义:仅包含当前任务所需的最少服务子集的隔离环境
定位:面向特定需求、特性分支或缺陷修复的轻量工作空间
特征:服务精简、按需创建、可快速销毁、与基准环境协同
别名:子环境、特性环境、项目环境、沙箱环境
类比:像施工围挡内的局部改造区——只动必要部分,其余沿用主干设施
Definition: An isolated environment containing only the minimum subset of services required for the current task. Role: A lightweight workspace for a specific feature, branch, or bug fix. Also known as: Sub-environment, feature environment, project environment, or sandbox.
2.3 二者关系 | Relationship
中文要点:
最小集环境中的服务,是基准环境服务集合的子集
请求通过流量路由(如灰度标、Header、Service Mesh)在两类环境间调度
变更服务在最小集环境中验证,未变更服务由基准环境承接
English: Services in the minimum-set environment are a subset of those in the baseline. Requests are routed between the two via traffic routing (e.g., canary tags, headers, Service Mesh). Changed services are validated in the minimum-set environment; unchanged services are served by the baseline.
English: The core idea is “share what is stable, isolate what changes.” The baseline carries the shared stable portion; the minimum-set environment holds only the delta.
3.2 以稳定为基线,以最小为切口 | Stability as Baseline, Minimality as Entry Point
English: The baseline emphasizes predictability and consistency. The minimum-set environment emphasizes agility and focus—reducing cognitive load and blast radius.
English: The baseline prohibits direct deployment of in-development features. The minimum-set environment permits free experimentation and destructive testing without impacting others.
English: The Baseline Environment is a stable, complete, reusable foundation—it answers where full stable services come from and how they stay reliable. The Minimum-Set Environment is a lightweight, isolated, on-demand slice—it answers how to validate changes cheaply and quickly. Together they embody “shared stable foundation + isolated change validation” in modern software delivery.
本文系统介绍 LLM Wiki 的核心思想、意义、应用场景与优缺点,采用中英文对照形式,便于理解与分享。
This article introduces the core philosophy, significance, application scenarios, and pros & cons of LLM Wiki in a bilingual Chinese–English format.
一、什么是 LLM Wiki? | What Is LLM Wiki?
中文:LLM Wiki 并非某一款固定产品,而是由 AI 研究者 Andrej Karpathy(OpenAI 联合创始人、前特斯拉 AI 总监)提出的一种个人知识库构建范式。其核心思想是:不要每次提问都让 LLM 重新阅读原始文档,而是让 LLM 一次性将资料「编译」成结构化的 Wiki,并持续维护更新。与传统 RAG 不同,LLM Wiki 把知识当作可累积、可演化的持久化产物。
English: LLM Wiki is not a single fixed product. It is a personal knowledge-base pattern proposed by Andrej Karpathy. Instead of having the LLM re-read raw documents every time you ask a question, compile them once into a structured wiki and keep it updated forever. Knowledge is a persistent, compounding artifact—not a one-off answer assembled at query time.
二、核心思想 | Core Philosophy
2.1 从「检索」到「编译」 | From Retrieval to Compilation
中文:传统 RAG 每次问答都从零开始检索片段;LLM Wiki 采用 Compile(编译) 思路——将原始资料放入 raw/,由 LLM 生成摘要页、概念页、实体页与交叉链接;新资料加入时增量更新;查询时在 Wiki 中检索、交叉验证、综合作答。
English: Traditional RAG retrieves chunks on every query. LLM Wiki follows a Compile approach: drop materials into raw/, let the LLM generate summaries, concept pages, entity pages, and cross-links; update incrementally when new material arrives; at query time, search, cross-validate, and synthesize across the wiki.
2.2 三层架构 | Three-Layer Architecture
层级 / Layer
名称 / Name
职责 / Responsibility
Layer 1
Raw(原料库)
不可篡改的原始文档,地面真相 / Immutable originals—the ground truth
中文:LLM Wiki 是一种用 LLM 编译并维护个人知识库的方法论。它适合深度研究、长期学习、项目沉淀等需要综合与连接的场景,是对传统 RAG「每次从头检索」思路的有益补充。成功关键在于清晰的 Schema、高质量的原料,以及定期的 Lint 与人工校验。
English: LLM Wiki is a methodology for compiling and maintaining a personal knowledge base with LLMs. It fits deep research, long-term learning, and project documentation where synthesis and connection matter. Success depends on a clear schema, high-quality sources, and regular Lint plus human verification.