Skip to content

fix(model_runner): cap oversized tool results to avoid 413 on next request#252

Draft
CorrectRoadH wants to merge 1 commit into
bubbuild:mainfrom
CorrectRoadH:fix/cap-oversized-tool-output
Draft

fix(model_runner): cap oversized tool results to avoid 413 on next request#252
CorrectRoadH wants to merge 1 commit into
bubbuild:mainfrom
CorrectRoadH:fix/cap-oversized-tool-output

Conversation

@CorrectRoadH

@CorrectRoadH CorrectRoadH commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

WIP: 让AI后台跑的,我还没有review代码。 估计要反复好几轮,现在代码质量我还没有审过,只能是draft。然后还要再测一下。我提PR是为了方便review

Closes #249

问题

单次工具输出过大时,会被原样回灌进下一轮请求体,触发 provider / 反向代理的 413 Request Entity Too Large,整个 agent 回合直接失败。

复现路径:agent 在 node_modules/next 里跑 grep ... | head -50head -50 形同虚设,因为 compiled bundle / source map 是超长单行,50 行也有几 MB。数据流:

  1. 前台 bash 直接 return shell.output.strip()tools.py:190),无任何上限
  2. 完整结果写进 tape,下一轮 build_messages 还原成 {"role":"tool","content": <几 MB>}context.py:99
  3. 进 request body → openresty 返回 413 → 回合失败

补刀问题:413 返回的是 HTML,原本的 is_context_length_error 匹配不到,连 auto_handoff 自救都不会触发。

怎么解决的

唯一收口点 ModelRunner.run(工具执行之后、record_chat 之前)对每个结果做字节级截断。选这个点是因为它一次性保护了 tape、trace、stream 事件、下一轮 request,且每次执行只跑一次(落盘不重复)。

采用 issue 推荐的 spill-to-file,而非纯硬截断,以保留调试能力:超限的字符串结果截断到字节预算,完整输出落盘到 <bub.home>/tool-output/<run_id>-call-<n>.txt,inline 内容替换为:

<前 N bytes 输出>

[output truncated: original 4.0 MB exceeded 128.0 KB limit]
[full output saved to: /…/tool-output/run-…-call-0.txt]
[hint: inspect the end with `tail -c 4096 …` or search it with `rg <pattern> …`]

要点:

  • UTF-8 字节算(413 是 body 字节,不是 token);footer 计入预算,返回串严格 ≤ limit
  • errors="ignore" 避免在截断边界切坏多字节字符
  • 非字符串 / 未超限 / limit≤0 一律原样放行
  • 落盘用绝对路径,hint 在任何 cwd 下都能用

改动

  • 新增 bub.builtin.tool_output.cap_tool_result:隔离、纯函数、可单测
  • 接线 model_runner.py_cap_tool_results 在收口点替换结果,record + 两个 StreamEvent 统一用截断后的版本
  • 配置 BUB_MAX_TOOL_RESULT_BYTES(默认 128 KB,设 0 关闭)
  • 防御request entity too large 加进 CONTEXT_LENGTH_PATTERNS,让 auto_handoff 可作为 best-effort backstop
  • 文档 env.example + 中英 settings 文档同步

为什么 bash.output 没解决

bash.output 只服务后台 shell,且只是"让模型主动选窗口"的便利工具,不是强制护栏;不传 limit 照样拿全部。413 是安全问题,必须在结果边界强制收口。

测试

tests/test_tool_output_cap.py:小/非字符串放行、截断+落盘完整内容、多字节不切坏、端到端(4 MB 单行经 model_runner → 断言下一轮 request 的 tool message ≤ limit 且含 spill 路径、完整内容可从文件读回)、413 被识别。

验证:pytest 全绿(216 passed)· ruff 全过 · 改动文件 mypy 干净。

🤖 Generated with Claude Code

https://claude.ai/code/session_01WAgZfqMU5X8qbF4LX1ag4U

…quest

A single oversized tool result (e.g. `grep` over a minified bundle or
source map under node_modules, where individual lines are megabytes) was
returned inline in full. `head -50` is no protection when lines are that
long. The full output flowed into the tape and back into the next
request body, which the provider/reverse-proxy rejected with
`413 Request Entity Too Large`, failing the whole turn.

Cap each tool result at the single choke point in `ModelRunner.run`
(after execution, before record_chat), so the bound protects the tape,
the trace, the streamed event, and the next request alike, and runs once
per execution. Oversized string results are truncated to a byte budget
and the full output is spilled to `<bub.home>/tool-output/<run>-call-N.txt`,
with a footer pointing the agent at the file and `tail`/`rg` hints so
debugging info is preserved.

- new `bub.builtin.tool_output.cap_tool_result` (byte-budgeted, UTF-8 safe)
- `BUB_MAX_TOOL_RESULT_BYTES` setting (default 128 KB, 0 disables)
- treat `request entity too large` as a context-overflow so auto_handoff
  can recover as a best-effort backstop
- regression test: 4 MB single-line result -> next request stays bounded
  and the agent can read the full output from the spill file
- docs + env.example synced

Closes bubbuild#249

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01WAgZfqMU5X8qbF4LX1ag4U
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

工具输出过大时应截断,避免下一轮请求触发 413

1 participant