AI 技术编年史 2025:多智能体协同 MAM — Multi-Agent 从 Demo 到生产

多智能体协同 MAM:从 Demo 到生产 | Multi-Agent Collaboration (MAM)

English Title: AI Technology Timeline 2025 — Multi-Agent Machine (MAM) Collaboration


一、背景 | Background

English

By March 2025, Multi-Agent Collaboration—often abbreviated MAM in enterprise architecture docs—had evolved from 2023–2024 AutoGen/CrewAI demos into production orchestration patterns. A single monolithic agent with dozens of tools proved brittle: context overflow, tool-selection errors, and unbounded loops. MAM decomposes work into specialized agents (researcher, coder, reviewer, domain expert) coordinated by a supervisor, blackboard, or event bus.

MAM here means Multi-Agent Machine coordination layers, distinct from telecom “Mobile Application Part” but sharing the acronym in internal roadmaps. Core idea: divide labor, share state, enforce policies.

Keywords:

Term Meaning
Agent LLM + system prompt + tools + memory scoped to a role
Orchestrator Routes tasks, merges outputs, enforces termination
Shared memory / blackboard Structured store for facts, plans, and artifacts
Handoff Transfer of control with summarized context
Human-in-the-loop (HITL) Approval gates for high-risk actions

中文

至 2025 年 3 月,多智能体协同——企业架构文档中常缩写 MAM——已从 2023–2024 年 AutoGen/CrewAI 演示演进为 生产编排模式。单一 Agent 挂载数十工具 proved 脆弱:上下文溢出、选错工具、无限循环。MAM 将工作分解为 专职 Agent(研究员、编码员、审查员、领域专家),由 主管黑板事件总线 协调。

MAM 此处指 Multi-Agent Machine 协同层。核心思想:分工、共享状态、策略约束

关键词:

术语 含义
Agent LLM + 系统提示 + 工具 + 角色范围记忆
编排器 路由任务、合并输出、强制终止
共享记忆 / 黑板 事实、计划、产物的结构化存储
Handoff 带摘要上下文的控制权转移
人在回路 HITL 高风险动作审批门

From chat to workflow: 2024 年 demo 多为「多 Agent 群聊」;2025 生产系统绑定 工单系统、Git PR、SQL 事务——Agent 输出必须是 schema 化 JSON,非 markdown 散文。LangGraph、Temporal 等 durable execution 引擎成为 MAM 底座。

Token 经济学: 企业 CFO 2025 年开始按 $/resolved ticket 核算 Agent 项目。MAM 若不经 缓存共享上下文子 Agent 用小模型,成本常超人工。最佳实践:主管用 frontier 模型,执行用 7B 本地或蒸馏模型。

Failure modes 2025 案例库: (1) 两 Agent 循环互相「确认」不终止;(2) 黑板写入冲突覆盖;(3) 工具 API rate limit 级联失败。编排层需 circuit breakermax turns 硬限制。


二、架构 | Architecture

English

1
2
3
4
5
6
7
8
9
10
11
12
13
          User / API

Supervisor Agent (policy, routing)
/ | \
Research Code Review
Agent Agent Agent
\ | /
Shared State Store
(Redis / Postgres / vector)

Tool Gateway (MCP, REST, SQL)

Observability (traces, cost, audit)

Pattern catalog (2025):

  1. Hierarchical: Supervisor delegates subtasks; sub-agents report upward. LangGraph Send API popularized this.
  2. Peer debate: Two agents argue; third synthesizes—used in legal and financial analysis with guardrails.
  3. Pipeline: Fixed DAG (extract → transform → validate); agents are stages, not free-form chat.
  4. Market-based: Agents bid on subtasks via scoring function—experimental in cloud cost optimization.

Cross-cutting concerns: authentication per tool, PII redaction between agents, token budgets per role, checkpoint/resume for long jobs.

中文

1
2
3
4
5
6
7
8
9
10
11
12
13
          用户 / API

主管 Agent(策略、路由)
/ | \
研究 编码 审查
Agent Agent Agent
\ | /
共享状态存储
(Redis / Postgres / 向量库)

工具网关(MCP、REST、SQL)

可观测性(追踪、成本、审计)

模式目录(2025):

  1. 层次式: 主管委派子任务;子 Agent 向上汇报。LangGraph Send API 普及。
  2. ** peer 辩论:** 两 Agent 辩论,第三合成——用于法务、金融分析(带护栏)。
  3. 流水线: 固定 DAG(抽取 → 转换 → 校验);Agent 是阶段而非自由聊天。
  4. 市场式: Agent 通过评分函数竞标子任务——云成本优化中实验性使用。

横切关注点: 工具级认证、Agent 间 PII 脱敏、角色 Token 预算、长任务 检查点/恢复


English

Trend Description
MCP as universal tool port Model Context Protocol standardizes how agents attach to Slack, GitHub, DB
Agent SLOs Latency p95, success rate, $/task—MAM ops teams mirror microservice SRE
Deterministic shells LLM decides; code executes; reduces nondeterminism in financial workflows
Vertical MAM packs Pre-built agent teams for HR, SOC, supply chain—not generic chat
Conflict resolution policies When agents disagree, escalate to human or tie-breaker model

中文

趋势 说明
MCP 作通用工具口 Model Context Protocol 标准化连接 Slack、GitHub、数据库
Agent SLO p95 延迟、成功率、$/任务——MAM 运维对标微服务 SRE
确定性外壳 LLM 决策、代码执行——降低金融工作流随机性
垂直 MAM 包 HR、SOC、供应链预置 Agent 团队
冲突消解策略 Agent 分歧时升级人工或 tie-breaker 模型

2025 Q1 企业采购清单中,「多 Agent 编排平台」常与 RAG、向量库并列,成为 AI 中台第三组件。


四、优缺点 | Pros/Cons

English

Pros

  • Specialization improves tool accuracy and prompt clarity
  • Parallel sub-agents reduce wall-clock time for research + coding tasks
  • Fault isolation: one agent failure need not crash entire session
  • Audit trail per role simplifies compliance

Cons

  • Coordination overhead: extra LLM calls for routing and summarization
  • State consistency bugs when blackboard schemas drift
  • Debugging multi-agent traces is harder than single-chain logs
  • Cost multiplication without strict budgets

中文

优点

  • 专精提升工具准确率与提示清晰度
  • 并行子 Agent 缩短研究 + 编码墙钟时间
  • 故障隔离:单 Agent 失败不必拖垮会话
  • 分角色审计轨迹便于合规

缺点

  • 协调开销:路由与摘要额外 LLM 调用
  • 黑板 schema 漂移导致状态不一致
  • 多 Agent 追踪比单链日志难调试
  • 无严格预算则成本倍增

五、应用场景 | Use Cases

English

Industry MAM setup
Software eng PM agent → coder → test agent → security reviewer
Customer support Triage → knowledge agent → escalation human
Due diligence Document extractor + financial analyst + red-flag checker
DevOps Incident commander + log agent + runbook executor
Scientific lab Hypothesis agent + literature agent + experiment planner
Marketing Brand guard agent + copywriter + localization agent

中文

行业 MAM 配置
软件工程 PM Agent → 编码 → 测试 → 安全审查
客服 分流 → 知识 Agent → 人工升级
尽职调查 文档抽取 + 财务分析 + 红旗检测
DevOps 事件指挥 + 日志 Agent + Runbook 执行
科研实验室 假设 + 文献 + 实验规划 Agent
营销 品牌护栏 + 文案 + 本地化 Agent

六、GitHub 开源生态 | GitHub

English

Repository Notes
Microsoft AutoGen / AG2 Multi-agent conversation frameworks
langchain-ai/langgraph Graph-based orchestration with checkpointing
modelcontextprotocol servers Standard tool connectors for MAM tool gateway
genesis-embodied-ai/Genesis Multi-robot sim scenarios for embodied multi-agent RL

中文

仓库 说明
Microsoft AutoGen / AG2 多 Agent 对话框架
langchain-ai/langgraph 带检查点的图编排
MCP servers MAM 工具网关标准连接器
genesis-embodied-ai/Genesis 具身多 Agent 强化学习仿真

七、参考资料 | References

  1. Wu et al. — AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
  2. Anthropic — Building effective agents (2024–2025 guidance)
  3. LangGraph documentation — Supervisor and Send patterns
  4. Gartner — Multi-agent AI platforms market guide (2025)
  5. OpenAI — Agents SDK and handoff primitives

八、MAM 成熟度模型 | Maturity Model

English

Level Characteristics
L1 Experimental Single-process multi-agent chat, no audit
L2 Workflow DAG + shared DB, manual deploy
L3 Production SLOs, MCP gateway, HITL gates, cost caps
L4 Optimized Auto-routing by task type, A/B agent policies, federated memory

Most enterprises reached L2→L3 during 2025 H1.

中文

级别 特征
L1 实验 单进程多 Agent 聊天,无审计
L2 工作流 DAG + 共享 DB,人工发布
L3 生产 SLO、MCP 网关、HITL、成本上限
L4 优化 按任务类型自动路由、Agent 策略 A/B、联邦记忆

多数企业在 2025 上半年处于 L2→L3 过渡。


八、产业观察与深度解读 | Industry Observations and Deep Dive

English

Supply chain and talent: By the second half of 2025, enterprises stopped treating this topic as a pilot KPI and moved it into annual operating plans. Procurement asked for three-year TCO, not demo accuracy. System integrators packaged reference architectures with SLA-backed support, mirroring how cloud migrations matured a decade earlier.

Interoperability: Open APIs (MCP, ONNX, MLIR dialects where relevant) reduced lock-in, but data gravity still tied customers to platforms with the best vertical corpus or compiler backend. Winners combined open runtimes with proprietary gold datasets or silicon-tuned kernels.

Risk register (2025 common items): (1) Evaluation gap—public benchmarks no longer predict production; (2) Security—prompt injection and tool abuse in agentic stacks; (3) Regulatory—algorithm filing, EU AI Act high-risk categories; (4) Talent—shortage of engineers who understand both ML and domain workflows.

Research frontiers carrying into 2026: Tighter world-model / spatial / sim integration; self-evolving alignment with human audit; cross-chip compilers (see 2026 timeline). Teams that invested in measurement—latency, cost per task, failure replay—outperformed teams chasing parameter counts.

中文

供应链与人才: 2025 年下半年,企业不再将此主题仅作试点 KPI,而是写入 年度经营计划。采购要求 三年 TCO,而非 demo 准确率。系统集成商打包 带 SLA 的参考架构,类似十年前的云迁移成熟路径。

互操作: 开放 API(MCP、ONNX、相关 MLIR dialect)降低锁定,但 数据重力 仍把客户绑在拥有最佳垂直语料或编译后端的平台上。胜者 = 开放运行时 + 专有 gold 数据硅片级调优内核

风险登记(2025 共性): (1) 评估鸿沟——公开 benchmark 不再预测生产;(2) 安全——Agent 栈提示注入与工具滥用;(3) 监管——算法备案、EU AI Act 高风险类;(4) 人才——既懂 ML 又懂领域 workflow 的工程师短缺。

延续至 2026 的研究前沿: 世界模型 / 空间 / 仿真 更紧耦合;带人工 audit 的 自演化对齐跨芯片编译器(见 2026 时间线)。投资 度量——延迟、单任务成本、失败回放——的团队胜过追逐参数量。

Glossary reinforcement | 术语 reinforcement

EN 中文 One-line
Foundation model 基础模型 Large pretrained model finetuned for downstream tasks
Finetune 微调 Update weights on domain data
RAG 检索增强生成 Retrieve docs then generate grounded answers
Sim2real 仿真到真实 Transfer policies from simulator to physical world
TCO 总拥有成本 Full cost of ownership over deployment lifetime

总结 | Summary

中文: 2025 年 3 月,MAM 标志着 Agent 从「炫技对话」进入 可运维的多角色系统。成功部署依赖编排图、共享状态 schema、成本与 SLO——而非更多 Agent 数量。

English: March 2025 MAM marks agents becoming operable multi-role systems. Success depends on orchestration graphs, shared state schemas, cost and SLOs—not agent count alone.