AI 技术编年史 2025：多智能体协同 MAM — Multi-Agent 从 Demo 到生产

发表于 2025-03-20 分类于 mechine 阅读次数：

2025 年 3 月，多智能体协同（MAM）成为企业 AI 落地标配：角色分工、消息总线、冲突消解。中英文对照。

多智能体协同 MAM：从 Demo 到生产 | Multi-Agent Collaboration (MAM)

English Title: AI Technology Timeline 2025 — Multi-Agent Machine (MAM) Collaboration

一、背景 | Background

English

By March 2025, Multi-Agent Collaboration—often abbreviated MAM in enterprise architecture docs—had evolved from 2023–2024 AutoGen/CrewAI demos into production orchestration patterns. A single monolithic agent with dozens of tools proved brittle: context overflow, tool-selection errors, and unbounded loops. MAM decomposes work into specialized agents (researcher, coder, reviewer, domain expert) coordinated by a supervisor, blackboard, or event bus.

MAM here means Multi-Agent Machine coordination layers, distinct from telecom “Mobile Application Part” but sharing the acronym in internal roadmaps. Core idea: divide labor, share state, enforce policies.

Keywords:

Term	Meaning
Agent	LLM + system prompt + tools + memory scoped to a role
Orchestrator	Routes tasks, merges outputs, enforces termination
Shared memory / blackboard	Structured store for facts, plans, and artifacts
Handoff	Transfer of control with summarized context
Human-in-the-loop (HITL)	Approval gates for high-risk actions

中文

至 2025 年 3 月，多智能体协同——企业架构文档中常缩写 MAM——已从 2023–2024 年 AutoGen/CrewAI 演示演进为 生产编排模式。单一 Agent 挂载数十工具 proved 脆弱：上下文溢出、选错工具、无限循环。MAM 将工作分解为 专职 Agent（研究员、编码员、审查员、领域专家），由主管、黑板或 事件总线 协调。

MAM 此处指 Multi-Agent Machine 协同层。核心思想：分工、共享状态、策略约束。

关键词：

术语	含义
Agent	LLM + 系统提示 + 工具 + 角色范围记忆
编排器	路由任务、合并输出、强制终止
共享记忆 / 黑板	事实、计划、产物的结构化存储
Handoff	带摘要上下文的控制权转移
人在回路 HITL	高风险动作审批门

From chat to workflow: 2024 年 demo 多为「多 Agent 群聊」；2025 生产系统绑定 工单系统、Git PR、SQL 事务——Agent 输出必须是 schema 化 JSON，非 markdown 散文。LangGraph、Temporal 等 durable execution 引擎成为 MAM 底座。

Token 经济学： 企业 CFO 2025 年开始按 $/resolved ticket 核算 Agent 项目。MAM 若不经 缓存共享上下文、子 Agent 用小模型，成本常超人工。最佳实践：主管用 frontier 模型，执行用 7B 本地或蒸馏模型。

Failure modes 2025 案例库： (1) 两 Agent 循环互相「确认」不终止；(2) 黑板写入冲突覆盖；(3) 工具 API rate limit 级联失败。编排层需 circuit breaker 与 max turns 硬限制。

二、架构 | Architecture

English

          User / API
              ↓
    Supervisor Agent (policy, routing)
         /    |    \
Research   Code   Review
Agent      Agent  Agent
    \       |      /
     Shared State Store
    (Redis / Postgres / vector)
              ↓
    Tool Gateway (MCP, REST, SQL)
              ↓
    Observability (traces, cost, audit)

Pattern catalog (2025):

Hierarchical: Supervisor delegates subtasks; sub-agents report upward. LangGraph Send API popularized this.
Peer debate: Two agents argue; third synthesizes—used in legal and financial analysis with guardrails.
Pipeline: Fixed DAG (extract → transform → validate); agents are stages, not free-form chat.
Market-based: Agents bid on subtasks via scoring function—experimental in cloud cost optimization.

Cross-cutting concerns: authentication per tool, PII redaction between agents, token budgets per role, checkpoint/resume for long jobs.

中文

          用户 / API
              ↓
    主管 Agent（策略、路由）
         /    |    \
研究     编码    审查
Agent    Agent   Agent
    \       |      /
     共享状态存储
    （Redis / Postgres / 向量库）
              ↓
    工具网关（MCP、REST、SQL）
              ↓
    可观测性（追踪、成本、审计）

模式目录（2025）：

层次式： 主管委派子任务；子 Agent 向上汇报。LangGraph Send API 普及。
** peer 辩论：** 两 Agent 辩论，第三合成——用于法务、金融分析（带护栏）。
流水线： 固定 DAG（抽取 → 转换 → 校验）；Agent 是阶段而非自由聊天。
市场式： Agent 通过评分函数竞标子任务——云成本优化中实验性使用。

横切关注点： 工具级认证、Agent 间 PII 脱敏、角色 Token 预算、长任务 检查点/恢复。

三、趋势 | Trends

English

Trend	Description
MCP as universal tool port	Model Context Protocol standardizes how agents attach to Slack, GitHub, DB
Agent SLOs	Latency p95, success rate, $/task—MAM ops teams mirror microservice SRE
Deterministic shells	LLM decides; code executes; reduces nondeterminism in financial workflows
Vertical MAM packs	Pre-built agent teams for HR, SOC, supply chain—not generic chat
Conflict resolution policies	When agents disagree, escalate to human or tie-breaker model

中文

趋势	说明
MCP 作通用工具口	Model Context Protocol 标准化连接 Slack、GitHub、数据库
Agent SLO	p95 延迟、成功率、$/任务——MAM 运维对标微服务 SRE
确定性外壳	LLM 决策、代码执行——降低金融工作流随机性
垂直 MAM 包	HR、SOC、供应链预置 Agent 团队
冲突消解策略	Agent 分歧时升级人工或 tie-breaker 模型

2025 Q1 企业采购清单中，「多 Agent 编排平台」常与 RAG、向量库并列，成为 AI 中台第三组件。

四、优缺点 | Pros/Cons

English

Pros

Specialization improves tool accuracy and prompt clarity
Parallel sub-agents reduce wall-clock time for research + coding tasks
Fault isolation: one agent failure need not crash entire session
Audit trail per role simplifies compliance

Cons

Coordination overhead: extra LLM calls for routing and summarization
State consistency bugs when blackboard schemas drift
Debugging multi-agent traces is harder than single-chain logs
Cost multiplication without strict budgets

中文

优点

专精提升工具准确率与提示清晰度
并行子 Agent 缩短研究 + 编码墙钟时间
故障隔离：单 Agent 失败不必拖垮会话
分角色审计轨迹便于合规

缺点

协调开销：路由与摘要额外 LLM 调用
黑板 schema 漂移导致状态不一致
多 Agent 追踪比单链日志难调试
无严格预算则成本倍增

五、应用场景 | Use Cases

English

Industry	MAM setup
Software eng	PM agent → coder → test agent → security reviewer
Customer support	Triage → knowledge agent → escalation human
Due diligence	Document extractor + financial analyst + red-flag checker
DevOps	Incident commander + log agent + runbook executor
Scientific lab	Hypothesis agent + literature agent + experiment planner
Marketing	Brand guard agent + copywriter + localization agent

中文

行业	MAM 配置
软件工程	PM Agent → 编码 → 测试 → 安全审查
客服	分流 → 知识 Agent → 人工升级
尽职调查	文档抽取 + 财务分析 + 红旗检测
DevOps	事件指挥 + 日志 Agent + Runbook 执行
科研实验室	假设 + 文献 + 实验规划 Agent
营销	品牌护栏 + 文案 + 本地化 Agent

六、GitHub 开源生态 | GitHub

English

Repository	Notes
Microsoft AutoGen / AG2	Multi-agent conversation frameworks
langchain-ai/langgraph	Graph-based orchestration with checkpointing
modelcontextprotocol servers	Standard tool connectors for MAM tool gateway
genesis-embodied-ai/Genesis	Multi-robot sim scenarios for embodied multi-agent RL

中文

仓库	说明
Microsoft AutoGen / AG2	多 Agent 对话框架
langchain-ai/langgraph	带检查点的图编排
MCP servers	MAM 工具网关标准连接器
genesis-embodied-ai/Genesis	具身多 Agent 强化学习仿真

七、参考资料 | References

Wu et al. — AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
Anthropic — Building effective agents (2024–2025 guidance)
LangGraph documentation — Supervisor and Send patterns
Gartner — Multi-agent AI platforms market guide (2025)
OpenAI — Agents SDK and handoff primitives

八、MAM 成熟度模型 | Maturity Model

English

Level	Characteristics
L1 Experimental	Single-process multi-agent chat, no audit
L2 Workflow	DAG + shared DB, manual deploy
L3 Production	SLOs, MCP gateway, HITL gates, cost caps
L4 Optimized	Auto-routing by task type, A/B agent policies, federated memory

Most enterprises reached L2→L3 during 2025 H1.

中文

级别	特征
L1 实验	单进程多 Agent 聊天，无审计
L2 工作流	DAG + 共享 DB，人工发布
L3 生产	SLO、MCP 网关、HITL、成本上限
L4 优化	按任务类型自动路由、Agent 策略 A/B、联邦记忆

多数企业在 2025 上半年处于 L2→L3 过渡。

八、产业观察与深度解读 | Industry Observations and Deep Dive

English

Supply chain and talent: By the second half of 2025, enterprises stopped treating this topic as a pilot KPI and moved it into annual operating plans. Procurement asked for three-year TCO, not demo accuracy. System integrators packaged reference architectures with SLA-backed support, mirroring how cloud migrations matured a decade earlier.

Interoperability: Open APIs (MCP, ONNX, MLIR dialects where relevant) reduced lock-in, but data gravity still tied customers to platforms with the best vertical corpus or compiler backend. Winners combined open runtimes with proprietary gold datasets or silicon-tuned kernels.

Risk register (2025 common items): (1) Evaluation gap—public benchmarks no longer predict production; (2) Security—prompt injection and tool abuse in agentic stacks; (3) Regulatory—algorithm filing, EU AI Act high-risk categories; (4) Talent—shortage of engineers who understand both ML and domain workflows.

Research frontiers carrying into 2026: Tighter world-model / spatial / sim integration; self-evolving alignment with human audit; cross-chip compilers (see 2026 timeline). Teams that invested in measurement—latency, cost per task, failure replay—outperformed teams chasing parameter counts.

中文

供应链与人才： 2025 年下半年，企业不再将此主题仅作试点 KPI，而是写入 年度经营计划。采购要求 三年 TCO，而非 demo 准确率。系统集成商打包 带 SLA 的参考架构，类似十年前的云迁移成熟路径。

互操作： 开放 API（MCP、ONNX、相关 MLIR dialect）降低锁定，但 数据重力 仍把客户绑在拥有最佳垂直语料或编译后端的平台上。胜者 = 开放运行时 + 专有 gold 数据 或 硅片级调优内核。

风险登记（2025 共性）： (1) 评估鸿沟——公开 benchmark 不再预测生产；(2) 安全——Agent 栈提示注入与工具滥用；(3) 监管——算法备案、EU AI Act 高风险类；(4) 人才——既懂 ML 又懂领域 workflow 的工程师短缺。

延续至 2026 的研究前沿： 世界模型 / 空间 / 仿真 更紧耦合；带人工 audit 的 自演化对齐；跨芯片编译器（见 2026 时间线）。投资度量——延迟、单任务成本、失败回放——的团队胜过追逐参数量。

Glossary reinforcement | 术语 reinforcement

EN	中文	One-line
Foundation model	基础模型	Large pretrained model finetuned for downstream tasks
Finetune	微调	Update weights on domain data
RAG	检索增强生成	Retrieve docs then generate grounded answers
Sim2real	仿真到真实	Transfer policies from simulator to physical world
TCO	总拥有成本	Full cost of ownership over deployment lifetime

总结 | Summary

中文： 2025 年 3 月，MAM 标志着 Agent 从「炫技对话」进入 可运维的多角色系统。成功部署依赖编排图、共享状态 schema、成本与 SLO——而非更多 Agent 数量。

English: March 2025 MAM marks agents becoming operable multi-role systems. Success depends on orchestration graphs, shared state schemas, cost and SLOs—not agent count alone.