OpenAI Agents SDK #20：别让 LLM 做它不该做的决定

我见过太多人把所有控制权都交给 LLM。

三个 Agent 要串行跑，他们让 orchestrator 用 handoff 一个一个转。五个步骤有明确顺序，他们让 LLM 规划顺序。一个任务需要先检查质量、再决定是否继续，他们写了一个 prompt，让 LLM 输出「继续 / 停止」。

然后发现：慢。贵。不稳定。在某些边缘 case 下 LLM 还会走错路。

问题不在 LLM，在于这些决定本来就不该由 LLM 做。

一、Deterministic Flows 是什么

OpenAI 官方在 agent_patterns 里把常见模式分了六类1：

Deterministic flows（确定性流水线）
Handoffs and routing
Agents as tools
LLM-as-a-judge
Parallelization
Guardrails

其中第一类的定义是1：

"A common tactic is to break down a task into a series of smaller steps. Each task can be performed by an agent, and the output of one agent is used as input to the next."

把一个大任务拆成若干步骤，每步由一个 Agent 完成，前一步的输出直接作为后一步的输入。流程顺序由 Python 代码决定，不需要 LLM 参与规划。

这是把 Agent 当成可组合的异步函数来用。

官方文档对这种编排方式的概括是2：

"Orchestrating via code makes tasks more deterministic and predictable, in terms of speed, cost and performance."

确定、可预测、速度/成本/性能可控——这是代码驱动的核心价值主张。

二、代码驱动 vs LLM 驱动：控制权在哪里

SDK 里存在两种编排哲学，官方都支持，但适用场景截然不同2。

维度	代码驱动（Orchestrating via code）	LLM 驱动（Orchestrating via LLM）
流程控制权	Python 代码，`if/else`、`while`、`asyncio`	LLM 自主决策，通过 tools / handoffs
Agent 间通信	`result.final_output` 直接传递	tool call → tool result → LLM 综合
适用场景	步骤已知、顺序固定、需要确定性	开放域任务、多意图、需要推理能力
典型代表	`deterministic.py`、`parallelization.py`	`agents_as_tools.py`、`handoffs.py`

LLM 驱动的典型做法3：

# frontline_agent 通过 as_tool() 决定调用哪个子 Agent
# "The frontline agent receives a user message and then picks which agents to call, as tools."
frontline_agent = Agent(
    name="frontline_agent",
    tools=[
        spanish_agent.as_tool(...),
        french_agent.as_tool(...),
        italian_agent.as_tool(...),
    ]
)

LLM 看到三个工具，自己决定调哪个。这很好——当任务是「翻译成用户指定的语言」时，让 LLM 判断比写一堆 if "西班牙语" in prompt 更合理。

但如果你的流程是这样的：

先生成大纲 → 检查大纲质量 → 质量不过关就停止 → 质量通过就写故事

那这三步的顺序是固定的，「检查质量」的判断标准是明确的（bool 类型），「停止 / 继续」的逻辑是确定的。让 LLM 在这里规划流程，是在为非问题买单。

三、deterministic.py 源码精读

这是官方在 examples/agent_patterns/deterministic.py 里的完整示例4，84 行（65 loc），源码注释直接写着：

"This example demonstrates a deterministic flow, where each step is performed by an agent."

3.1 三个 Agent 定义

story_outline_agent = Agent(
    name="story_outline_agent",
    instructions="Generate a very short story outline based on the user's input.",
)

class OutlineCheckerOutput(BaseModel):
    good_quality: bool
    is_scifi: bool

outline_checker_agent = Agent(
    name="outline_checker_agent",
    instructions="Read the given story outline, and judge the quality. Also, determine if it is a scifi story.",
    output_type=OutlineCheckerOutput,
)

story_agent = Agent(
    name="story_agent",
    instructions="Write a short story based on the given outline.",
    output_type=str,
)

三个 Agent，职责单一：

story_outline_agent：生成故事大纲（自由文本输出）
outline_checker_agent：检验大纲——注意这里用了 output_type=OutlineCheckerOutput，强制 LLM 输出结构化的 Pydantic 对象，包含两个 bool 字段
story_agent：根据大纲写完整故事

outline_checker_agent 的 output_type 是关键设计。LLM 不是返回「这个大纲不错，是科幻风格」这样的自然语言——而是返回 {"good_quality": true, "is_scifi": true}，这样后续的 if 判断才能精确、稳定。

3.2 流水线主逻辑

async def main():
    input_prompt = input_with_fallback(
        "What kind of story do you want? ",
        "Write a short sci-fi story.",
    )

with trace("Deterministic story flow"):
        # Step 1: 生成大纲
        outline_result = await Runner.run(
            story_outline_agent,
            input_prompt,
        )

# Step 2: 检查大纲
        outline_checker_result = await Runner.run(
            outline_checker_agent,
            outline_result.final_output,   # 上一步的输出直接作为输入
        )

# Step 3: Gate — 代码控制流程，不是 LLM 控制
        assert isinstance(outline_checker_result.final_output, OutlineCheckerOutput)

if not outline_checker_result.final_output.good_quality:
            print("Outline is not good quality, so we stop here.")
            exit(0)

if not outline_checker_result.final_output.is_scifi:
            print("Outline is not a scifi story, so we stop here.")
            exit(0)

# Step 4: 写故事
        story_result = await Runner.run(
            story_agent,
            outline_result.final_output,
        )
        print(f"Story: {story_result.final_output}")

拆开来看几个细节：

① outline_result.final_output 传递

Runner.run() 返回的 RunResult 对象，.final_output 是 Agent 的最终输出（已经过类型验证）。在代码驱动模式里，Agent 间通信就是把这个值作为下一个 Runner.run() 的 input 参数。没有 handoff，没有 tool result，就是 Python 的变量传递。

② Gate 机制

源码注释说4：

"Add a gate to stop if the outline is not good quality or not a scifi story"

if not ... exit(0) 是 Python 原生的流程控制。检查条件是否满足——不满足就终止，完全不需要 LLM 参与这个决策。这个 gate 的逻辑可以是任意复杂的业务规则：正则校验、数据库查询、外部 API 调用，全都行。

③ trace("Deterministic story flow") 上下文管理器

把整个三步流水线包在同一个 trace() 里，确保它们在 OpenAI 的可观测性后台里归为同一条 trace，而不是三条独立的、看不出关联的请求。这对生产调试非常有价值4。

四、Runner 机制与 Agent Loop 的关系

理解代码驱动，绕不开 Runner 的工作方式5。

Runner 提供三个调用入口：

result = await Runner.run(agent, input)          # async，推荐
result = Runner.run_sync(agent, input)           # 同步封装，非 async 环境用
async for event in Runner.run_streamed(agent):   # 流式，逐 token 消费

Agent Loop 内部做的事5：

"When you use the run method in Runner, you pass in a starting agent and input. The runner then runs a loop."

每次循环：

调用 LLM
LLM 返回 final_output → 结束循环，返回结果
LLM 返回 handoff → 切换 Agent，重新进入循环
LLM 返回 tool_calls → 执行工具，追加结果，重新循环

max_turns 默认是 10，v0.16.0 起支持 max_turns=None 禁用上限5：

"If we exceed the max_turns passed, we raise a MaxTurnsExceeded exception. Pass max_turns=None to disable this turn limit."

在代码驱动场景里，每次 Runner.run() 调用对应一个独立的 Agent Loop。三步流水线 = 三次 Runner.run() = 三个独立的 Agent Loop 串行执行。流程控制权在你的 async def main() 里，不在任何一个 Agent Loop 内部。

五、代码驱动的其他官方变体

deterministic.py 只是代码驱动的一种形态。官方在 examples/agent_patterns 里还给了几个变体1 2：

文件	模式	核心机制
`parallelization.py`	并行执行	`asyncio.gather()` 同时跑多个 Agent
`llm_as_a_judge.py`	评估循环	`while` 循环 + 评估 Agent 打分，直到通过
`routing.py`	条件路由	结构化输出 + `if/match` 决定走哪条路

这几个变体共享同一个底层逻辑：用 Python 控制 Agent 执行顺序，用 LLM 做内容生成，两者职责分离。

asyncio.gather 变体特别值得提一下：

# 并行跑多个不相关任务，速度显著快于串行
results = await asyncio.gather(
    Runner.run(agent_a, input_a),
    Runner.run(agent_b, input_b),
    Runner.run(agent_c, input_c),
)

当三个子任务不互相依赖时，串行跑是在白白浪费时间。这不需要任何框架特性，就是 Python 原生的并发原语。

六、实践建议

① 用结构化输出做 gate，不用 LLM 输出 "yes/no"

让 LLM 输出 {"pass": true} 比让它输出「可以继续」更稳定。字符串解析会出错，Pydantic 类型校验不会。output_type=YourModel 是控制流的前提，不只是格式偏好。

② 从串行开始，再考虑并行

第一版总是写成 await Runner.run() 串行。跑通、正确之后，识别哪些步骤没有数据依赖，再改成 asyncio.gather。过早优化为并行会让调试难度成倍上升。

③ 把整个流水线包在一个 trace() 里

with trace("你的流水线名字"): 是一行代码，但它让你的可观测性数据从「三个不相关请求」变成「一条有完整上下文的流水线记录」。在生产环境里，这个差别会在你第一次排查问题时体现出来。

当前版本：本文基于 OpenAI Agents SDK v0.17.0（2026-05-08 发布）。默认模型已在 v0.16.0 更新为 gpt-5.4-mini6，如需维持旧行为可显式设置 model="gpt-4.1"。

下期预告 #21：进入 agent_patterns 的下一个模式——LLM-as-a-Judge：让一个 Agent 评估另一个 Agent 的输出，并在 while 循环里迭代直到通过。

封面图由 AI 生成