refactor(runtime): superviseSurface — one capability for supervise-over-a-graded-surface#417
Conversation
…se-over-a-graded-surface
Folds the example worker-seam + 'self-improving supervisor' wrapper pair into ONE exported
core capability. superviseSurface(profile, { surface, task, worker, budget, router }) drives
a team of agents to solve an AgenticSurface task: workers runAgentic the surface (refine by
default, strategy-pluggable), settle on the surface's OWN check (settled ⟺ resolved), and the
driver self-improves from the failing tests by default (failuresAnalyst; swap via analysts).
It lives at the composition layer (above supervise + runAgentic) because runAgentic depends on
the supervise core — a surface worker can't be a supervise built-in without an import cycle.
- new src/runtime/supervise-surface.ts + barrel export.
- deleted examples/ablation-suite/{surface-worker,self-improving-supervisor}.ts — the ablation
+ gepa arms now call superviseSurface directly (no wrappers, no duplication).
- canonical-api decision row + regenerated docs/api.
tangletools
left a comment
There was a problem hiding this comment.
✅ Auto-approved drewstone PR — 7ce90bc6
This PR was opened by the trusted drewstone account.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: drewstone_author · 2026-06-30T01:43:10Z
…sk, opts), default router + budget
- task is positional (one mental model with supervise(profile, task, opts)).
- router defaults to the worker's substrate (driver + workers share one router unless separated).
- budget defaults to a handful of worker spawns sized off the worker bounds.
Minimal call is now superviseSurface(profile, task, { surface, worker }).
tangletools
left a comment
There was a problem hiding this comment.
✅ Auto-approved drewstone PR — b260c415
This PR was opened by the trusted drewstone account.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: drewstone_author · 2026-06-30T01:50:25Z
tangletools
left a comment
There was a problem hiding this comment.
🟢 Value Audit — sound
| Verdict | sound |
| Concerns | 0 (none) |
| Heuristic | 0.0s |
| Duplication | 0.0s |
| Interrogation | 125.8s (2 bridge agents) |
| Total | 125.8s |
💰 Value — error
value agent produced no parseable value-audit JSON.
- Model: opencode/deepseek/deepseek-v4-pro
- Bridge attempts: 3
- Bridge error: opencode/kimi-for-coding/k2p7: bridge stream ended without value-audit content; opencode/zai-coding-plan/glm-5.2: bridge stream ended without value-audit content; opencode/deepseek/deepseek-v4-pro: bridge stream ended without value-audit content
🎯 Usefulness — sound
Promotes a surface-solve-over-supervise pattern from duplicated example code into a cleanly-architected core export at the correct composability layer above both supervise() and runAgentic(), with two immediate callers converted.
- Integration: Reachable: barrel-exported from
src/runtime/index.ts:496-503, immediately called by both ablation arms (ablation.ts:213andgepa-driver-prompt.ts:121) which previously used the deletedselfImprovingSupervisor/surfaceWorkerSeampair. The deleted files are fully replaced — no dangling references remain (grep'd forselfImprovingSupervisor/surfaceWorkerfinds no callers). - Fit with existing patterns: Correct layering:
strategy.tsimports fromsupervise/(line 31-44), sosuperviseSurfacelives atsrc/runtime/importing from both — layering above the cycle. Matches the establishedsupervise(profile, task, opts)signature shape (same argument order). Does not compete withdelegate()(delegate.ts:1-10), which takes an open-ended INTENT and has the supervisor author worker profiles — - Real-world viability: Sensible defaults enable the minimal call
superviseSurface(profile, task, { surface, worker })— default router reuses the worker's, default budget scales off innerTurns, defaultmaxLiveWorkers:1serializes to avoid shared-artifact races,analysts:nullclean-disables self-improvement. Per-worker reservation isinnerTurns+2so multiple workers fit the pool (a debugged lesson from the delete - Model: opencode/deepseek/deepseek-v4-pro
- Bridge attempts: 3
- Bridge warning: opencode/zai-coding-plan/glm-5.2: bridge stream ended without value-audit content; opencode/kimi-for-coding/k2p7: bridge stream ended without value-audit content
No concerns — sound change, no better or existing approach found. ✅
What this audit checks
It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.
| Pass | What it asks |
|---|---|
| Heuristic | Vague title? Whitespace-only or cruft-bearing diff? (content signals only) |
| Duplication | Do added function/class names already exist elsewhere in the repo? |
| Value Audit | What does it do? What goal does it achieve? Is it good? Better architecture or already-exists? |
| Usefulness Audit | Does it integrate and fit? Will it hold up in real use and actually get used? |
Findings are concerns, not blocks — the human reviewer decides what to do with them.
…ise-and-polish # Conflicts: # docs/api/primitive-catalog.md
What
Folds the example worker-seam + "self-improving supervisor" wrapper pair into ONE exported core capability:
Drives a team of agents to solve an
AgenticSurfacetask — workersrunAgenticthe surface (refineby default, strategy-pluggable), settle on the surface's OWN check (settled ⟺ resolved), and the driver self-improves from the still-failing tests by default (failuresAnalyst; swap viaanalysts, ornullto turn off). Returns the deployable outcome + the full conserved spend.Why this design (the key call)
runAgenticdepends on the supervise core (strategy.ts→supervise/), so a surface-solving worker cannot be asupervise()built-in backend without an import cycle.superviseSurfacetherefore lives at the composition layer abovesupervise+runAgentic— the correct home for "supervise over a graded surface". The within-run self-improvement is the authored analyst; the across-run kind wraps this inimprove()/selfImprove.Impact
examples/ablation-suite/{surface-worker,self-improving-supervisor}.ts; the ablation + GEPA arms callsuperviseSurfacedirectly — no wrappers, no duplication.src/runtime/supervise-surface.ts+ barrel export + a canonical-api decision row (anti-reinvention).Verification
tscclean on src and examples (0/0); docs-freshness gate green (regenerateddocs/api).superviseSurface): resolves 100%, full metrics captured, zero errors.supervise()core untouched.