Diagnose

针对难 bug 的 discipline。仅在有 explicit 理由时跳过阶段。

探索代码库时，使用项目领域词汇表建立相关 module 的清晰 mental model，并检查你触及区域的 ADR。

Phase 1 — 构建反馈循环

这就是 skill 的核心。 其余都是 mechanical。若你有 fast、deterministic、agent 可运行的 pass/fail 信号，就能找到原因——二分、假设检验、插桩都只是消费该信号。若没有，再怎么看代码也救不了你。

在此 disproportionate 投入。要 aggressive。要 creative。拒绝放弃。

构建方式——大致按此顺序尝试

在能触及 bug 的 seam 处写 失败测试——unit、integration 或 e2e。
对运行中的 dev server 用 curl / HTTP script。
CLI 调用 fixture input，stdout 与 known-good snapshot diff。
Headless browser script（Playwright / Puppeteer）——驱动 UI，断言 DOM/console/network。
Replay 捕获 trace。 将真实 network request / payload / event log 存盘；在 isolation 中 replay 通过代码路径。
Throwaway harness。 启动系统 minimal 子集（单 service、mock deps），单次 function call exercise bug 代码路径。
Property / fuzz loop。 若 bug 是「有时输出错」，跑 1000 随机 input 找 failure mode。
Bisection harness。 若 bug 出现在两个 known state（commit、dataset、version）之间，自动化「在 state X 启动、检查、重复」以便 git bisect run。
Differential loop。 同一 input 跑 old-version vs new-version（或两种 config）并 diff 输出。
HITL bash script。 最后手段。若必须人工点击，用 scripts/hitl-loop.template.sh 驱动他们，使循环仍有结构。捕获的输出反馈给你。

构建正确的反馈循环，bug 就 90% 修好了。

迭代循环本身

把循环当 product。一旦有一个循环，问：

能否更快？（Cache setup、跳过无关 init、narrow 测试范围。）
能否 signal 更 sharp？（断言具体 symptom，而非「没 crash」。）
能否更 deterministic？（Pin time、seed RNG、isolate filesystem、freeze network。）

30 秒 flaky 循环几乎不比没有好。2 秒 deterministic 循环是调试 superpower。

非 deterministic bug

目标不是 clean repro，而是 更高复现率。循环 trigger 100×、parallelise、加 stress、narrow timing window、inject sleep。50% flake 的 bug 可 debug；1% 不行——持续提高 rate 直到可 debug。

确实无法构建循环时

Stop 并 explicit 说明。列出尝试过什么。向用户要：(a) 能复现的环境 access，(b) 捕获 artifact（HAR、log dump、core dump、带 timestamp 的 screen recording），或 (c) 添加临时 production instrumentation 的许可。没有循环不要进入 hypothesise。

在有你信任的循环之前，不要进入 Phase 2。

Phase 2 — Reproduce（复现）

运行循环。观察 bug 出现。

确认：

[ ] 循环产生用户描述的 failure mode——不是碰巧 nearby 的不同 failure。错 bug = 错 fix。
[ ] Failure 在多次运行中可复现（或非 deterministic bug 有足够高 rate 可 debug）。
[ ] 已捕获 exact symptom（error message、wrong output、slow timing），以便后续阶段 verify fix 确实针对它。

复现 bug 之前不要 proceed。

Phase 3 — Hypothesise（假设）

测试前先产生 3–5 个 ranked hypothesis。单假设生成会 anchor 在第一个 plausible idea。

每个假设必须 可 falsify：陈述它做出的 prediction。

格式：「若是原因，则会使 bug 消失 / 会使其 worse。」

若无法陈述 prediction，假设是 vibe——discard 或 sharpen。

测试前向用户展示 ranked list。 他们常有 domain knowledge 可 instant re-rank（「我们刚 deploy 了 #3 的变更」），或知道已 rule out 的假设。Cheap checkpoint，大 time saver。不要 block——用户 AFK 则按你的 ranking proceed。

Phase 4 — Instrument（插桩）

每个 probe 必须映射 Phase 3 的 specific prediction。一次只改一个变量。

工具偏好：

若环境支持，Debugger / REPL inspection。一个 breakpoint 胜过十个 log。
在区分 hypothesis 的 boundary 处 Targeted logs。
Never「log everything and grep」。

给每个 debug log 打 unique prefix tag，如 [DEBUG-a4f2]。结束时 cleanup 一次 grep 即可。Untagged logs 会 survive；tagged logs 应 die。

Perf branch。 性能回归通常 log 是错的。Instead：建立 baseline measurement（timing harness、performance.now()、profiler、query plan），再 bisect。先 measure，再 fix。

Phase 5 — Fix + regression test

在 fix 之前写 regression test——但仅当有 correct seam。

Correct seam 是 test 在 call site 以 真实 bug pattern exercise 的 seam。若唯一可用 seam 太 shallow（bug 需多 caller 却只有 single-caller test，unit test 无法 replicate 触发 bug 的 chain），那里的 regression test 给 false confidence。

若无 correct seam，这本身就是 finding。 记录。代码库 architecture 阻止 bug 被 lock down。为下一阶段 flag。

若有 correct seam：

在 seam 处将 minimised repro 转为 failing test。
观察 fail。
Apply fix。
观察 pass。
对 original（未 minimised）场景 re-run Phase 1 反馈循环。

Phase 6 — Cleanup + post-mortem

宣布 done 前必须：

[ ] Original repro 不再 repro（re-run Phase 1 循环）
[ ] Regression test pass（或 document seam 缺失）
[ ] 所有 [DEBUG-...] instrumentation 已移除（grep prefix）
[ ] Throwaway prototype 已删除（或移到 clearly-marked debug 位置）
[ ] 最终正确的 hypothesis 写在 commit / PR message——让下一 debugger 学习

然后问：什么能 prevent 这个 bug？ 若答案涉及 architectural change（无 good test seam、tangled caller、hidden coupling），hand off 到 /improve-codebase-architecture skill 并给 specifics。在 fix 落地后再 recommend，而非开始前——你现在比开始时信息更多。

由 skills/engineering/diagnose/SKILL.zh-CN.md 自动生成。英文原版见 mattpocock-skills 仓库。