refactor(runtime): superviseSurface — one capability for supervise-over-a-graded-surface by drewstone · Pull Request #417 · tangle-network/agent-runtime

drewstone · 2026-06-30T01:43:04Z

What

Folds the example worker-seam + "self-improving supervisor" wrapper pair into ONE exported core capability:

superviseSurface(profile, { surface, task, worker, budget, router })

Drives a team of agents to solve an AgenticSurface task — workers runAgentic the surface (refine by default, strategy-pluggable), settle on the surface's OWN check (settled ⟺ resolved), and the driver self-improves from the still-failing tests by default (failuresAnalyst; swap via analysts, or null to turn off). Returns the deployable outcome + the full conserved spend.

Why this design (the key call)

runAgentic depends on the supervise core (strategy.ts → supervise/), so a surface-solving worker cannot be a supervise() built-in backend without an import cycle. superviseSurface therefore lives at the composition layer above supervise + runAgentic — the correct home for "supervise over a graded surface". The within-run self-improvement is the authored analyst; the across-run kind wraps this in improve()/selfImprove.

Impact

Simpler: deletes examples/ablation-suite/{surface-worker,self-improving-supervisor}.ts; the ablation + GEPA arms call superviseSurface directly — no wrappers, no duplication.
More capable: every package user gains "supervise agents over a graded task" as one call.
New src/runtime/supervise-surface.ts + barrel export + a canonical-api decision row (anti-reinvention).

Verification

tsc clean on src and examples (0/0); docs-freshness gate green (regenerated docs/api).
Live smoke (the supervisor arm through superviseSurface): resolves 100%, full metrics captured, zero errors.
Purely additive to the published surface — supervise() core untouched.

…se-over-a-graded-surface Folds the example worker-seam + 'self-improving supervisor' wrapper pair into ONE exported core capability. superviseSurface(profile, { surface, task, worker, budget, router }) drives a team of agents to solve an AgenticSurface task: workers runAgentic the surface (refine by default, strategy-pluggable), settle on the surface's OWN check (settled ⟺ resolved), and the driver self-improves from the failing tests by default (failuresAnalyst; swap via analysts). It lives at the composition layer (above supervise + runAgentic) because runAgentic depends on the supervise core — a surface worker can't be a supervise built-in without an import cycle. - new src/runtime/supervise-surface.ts + barrel export. - deleted examples/ablation-suite/{surface-worker,self-improving-supervisor}.ts — the ablation + gepa arms now call superviseSurface directly (no wrappers, no duplication). - canonical-api decision row + regenerated docs/api.

tangletools

✅ Auto-approved drewstone PR — `7ce90bc6`

This PR was opened by the trusted drewstone account.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: drewstone_author · 2026-06-30T01:43:10Z}

…sk, opts), default router + budget - task is positional (one mental model with supervise(profile, task, opts)). - router defaults to the worker's substrate (driver + workers share one router unless separated). - budget defaults to a handful of worker spawns sized off the worker bounds. Minimal call is now superviseSurface(profile, task, { surface, worker }).

tangletools

✅ Auto-approved drewstone PR — `b260c415`

This PR was opened by the trusted drewstone account.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: drewstone_author · 2026-06-30T01:50:25Z}

tangletools

🟢 Value Audit — sound


Verdict	sound
Concerns	0 (none)
Heuristic	0.0s
Duplication	0.0s
Interrogation	125.8s (2 bridge agents)
Total	125.8s

💰 Value — error

value agent produced no parseable value-audit JSON.

Model: opencode/deepseek/deepseek-v4-pro
Bridge attempts: 3
Bridge error: opencode/kimi-for-coding/k2p7: bridge stream ended without value-audit content; opencode/zai-coding-plan/glm-5.2: bridge stream ended without value-audit content; opencode/deepseek/deepseek-v4-pro: bridge stream ended without value-audit content

🎯 Usefulness — sound

Promotes a surface-solve-over-supervise pattern from duplicated example code into a cleanly-architected core export at the correct composability layer above both supervise() and runAgentic(), with two immediate callers converted.

Integration: Reachable: barrel-exported from src/runtime/index.ts:496-503, immediately called by both ablation arms (ablation.ts:213 and gepa-driver-prompt.ts:121) which previously used the deleted selfImprovingSupervisor/surfaceWorkerSeam pair. The deleted files are fully replaced — no dangling references remain (grep'd for selfImprovingSupervisor/surfaceWorker finds no callers).
Fit with existing patterns: Correct layering: strategy.ts imports from supervise/ (line 31-44), so superviseSurface lives at src/runtime/ importing from both — layering above the cycle. Matches the established supervise(profile, task, opts) signature shape (same argument order). Does not compete with delegate() (delegate.ts:1-10), which takes an open-ended INTENT and has the supervisor author worker profiles —
Real-world viability: Sensible defaults enable the minimal call superviseSurface(profile, task, { surface, worker }) — default router reuses the worker's, default budget scales off innerTurns, default maxLiveWorkers:1 serializes to avoid shared-artifact races, analysts:null clean-disables self-improvement. Per-worker reservation is innerTurns+2 so multiple workers fit the pool (a debugged lesson from the delete
Model: opencode/deepseek/deepseek-v4-pro
Bridge attempts: 3
Bridge warning: opencode/zai-coding-plan/glm-5.2: bridge stream ended without value-audit content; opencode/kimi-for-coding/k2p7: bridge stream ended without value-audit content

No concerns — sound change, no better or existing approach found. ✅

What this audit checks

It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.

Pass	What it asks
Heuristic	Vague title? Whitespace-only or cruft-bearing diff? (content signals only)
Duplication	Do added function/class names already exist elsewhere in the repo?
Value Audit	What does it do? What goal does it achieve? Is it good? Better architecture or already-exists?
Usefulness Audit	Does it integrate and fit? Will it hold up in real use and actually get used?

Findings are concerns, not blocks — the human reviewer decides what to do with them.

_{value-audit · 20260630T015708Z}

…ise-and-polish # Conflicts: # docs/api/primitive-catalog.md

tangletools previously approved these changes Jun 30, 2026

View reviewed changes

drewstone dismissed tangletools’s stale review via b260c41 June 30, 2026 01:50

tangletools previously approved these changes Jun 30, 2026

View reviewed changes

tangletools reviewed Jun 30, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into chore/collapse-superv…

ce64bf7

…ise-and-polish # Conflicts: # docs/api/primitive-catalog.md

drewstone dismissed tangletools’s stale review via ce64bf7 June 30, 2026 02:13

drewstone merged commit 8089ea1 into main Jun 30, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(runtime): superviseSurface — one capability for supervise-over-a-graded-surface#417

refactor(runtime): superviseSurface — one capability for supervise-over-a-graded-surface#417
drewstone merged 3 commits into
mainfrom
chore/collapse-supervise-and-polish

drewstone commented Jun 30, 2026

Uh oh!

tangletools left a comment

Uh oh!

tangletools left a comment

Uh oh!

tangletools left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

drewstone commented Jun 30, 2026

What

Why this design (the key call)

Impact

Verification

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved drewstone PR — 7ce90bc6

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved drewstone PR — b260c415

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

🟢 Value Audit — sound

💰 Value — error

🎯 Usefulness — sound

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

✅ Auto-approved drewstone PR — `7ce90bc6`

✅ Auto-approved drewstone PR — `b260c415`