Skip to content

refactor(runtime): superviseSurface — one capability for supervise-over-a-graded-surface#417

Merged
drewstone merged 3 commits into
mainfrom
chore/collapse-supervise-and-polish
Jun 30, 2026
Merged

refactor(runtime): superviseSurface — one capability for supervise-over-a-graded-surface#417
drewstone merged 3 commits into
mainfrom
chore/collapse-supervise-and-polish

Conversation

@drewstone

Copy link
Copy Markdown
Contributor

What

Folds the example worker-seam + "self-improving supervisor" wrapper pair into ONE exported core capability:

superviseSurface(profile, { surface, task, worker, budget, router })

Drives a team of agents to solve an AgenticSurface task — workers runAgentic the surface (refine by default, strategy-pluggable), settle on the surface's OWN check (settled ⟺ resolved), and the driver self-improves from the still-failing tests by default (failuresAnalyst; swap via analysts, or null to turn off). Returns the deployable outcome + the full conserved spend.

Why this design (the key call)

runAgentic depends on the supervise core (strategy.tssupervise/), so a surface-solving worker cannot be a supervise() built-in backend without an import cycle. superviseSurface therefore lives at the composition layer above supervise + runAgentic — the correct home for "supervise over a graded surface". The within-run self-improvement is the authored analyst; the across-run kind wraps this in improve()/selfImprove.

Impact

  • Simpler: deletes examples/ablation-suite/{surface-worker,self-improving-supervisor}.ts; the ablation + GEPA arms call superviseSurface directly — no wrappers, no duplication.
  • More capable: every package user gains "supervise agents over a graded task" as one call.
  • New src/runtime/supervise-surface.ts + barrel export + a canonical-api decision row (anti-reinvention).

Verification

  • tsc clean on src and examples (0/0); docs-freshness gate green (regenerated docs/api).
  • Live smoke (the supervisor arm through superviseSurface): resolves 100%, full metrics captured, zero errors.
  • Purely additive to the published surface — supervise() core untouched.

…se-over-a-graded-surface

Folds the example worker-seam + 'self-improving supervisor' wrapper pair into ONE exported
core capability. superviseSurface(profile, { surface, task, worker, budget, router }) drives
a team of agents to solve an AgenticSurface task: workers runAgentic the surface (refine by
default, strategy-pluggable), settle on the surface's OWN check (settled ⟺ resolved), and the
driver self-improves from the failing tests by default (failuresAnalyst; swap via analysts).

It lives at the composition layer (above supervise + runAgentic) because runAgentic depends on
the supervise core — a surface worker can't be a supervise built-in without an import cycle.

- new src/runtime/supervise-surface.ts + barrel export.
- deleted examples/ablation-suite/{surface-worker,self-improving-supervisor}.ts — the ablation
  + gepa arms now call superviseSurface directly (no wrappers, no duplication).
- canonical-api decision row + regenerated docs/api.
tangletools
tangletools previously approved these changes Jun 30, 2026

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved drewstone PR — 7ce90bc6

This PR was opened by the trusted drewstone account.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: drewstone_author · 2026-06-30T01:43:10Z

…sk, opts), default router + budget

- task is positional (one mental model with supervise(profile, task, opts)).
- router defaults to the worker's substrate (driver + workers share one router unless separated).
- budget defaults to a handful of worker spawns sized off the worker bounds.
Minimal call is now superviseSurface(profile, task, { surface, worker }).
tangletools
tangletools previously approved these changes Jun 30, 2026

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved drewstone PR — b260c415

This PR was opened by the trusted drewstone account.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: drewstone_author · 2026-06-30T01:50:25Z

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Value Audit — sound

Verdict sound
Concerns 0 (none)
Heuristic 0.0s
Duplication 0.0s
Interrogation 125.8s (2 bridge agents)
Total 125.8s

💰 Value — error

value agent produced no parseable value-audit JSON.

  • Model: opencode/deepseek/deepseek-v4-pro
  • Bridge attempts: 3
  • Bridge error: opencode/kimi-for-coding/k2p7: bridge stream ended without value-audit content; opencode/zai-coding-plan/glm-5.2: bridge stream ended without value-audit content; opencode/deepseek/deepseek-v4-pro: bridge stream ended without value-audit content

🎯 Usefulness — sound

Promotes a surface-solve-over-supervise pattern from duplicated example code into a cleanly-architected core export at the correct composability layer above both supervise() and runAgentic(), with two immediate callers converted.

  • Integration: Reachable: barrel-exported from src/runtime/index.ts:496-503, immediately called by both ablation arms (ablation.ts:213 and gepa-driver-prompt.ts:121) which previously used the deleted selfImprovingSupervisor/surfaceWorkerSeam pair. The deleted files are fully replaced — no dangling references remain (grep'd for selfImprovingSupervisor/surfaceWorker finds no callers).
  • Fit with existing patterns: Correct layering: strategy.ts imports from supervise/ (line 31-44), so superviseSurface lives at src/runtime/ importing from both — layering above the cycle. Matches the established supervise(profile, task, opts) signature shape (same argument order). Does not compete with delegate() (delegate.ts:1-10), which takes an open-ended INTENT and has the supervisor author worker profiles —
  • Real-world viability: Sensible defaults enable the minimal call superviseSurface(profile, task, { surface, worker }) — default router reuses the worker's, default budget scales off innerTurns, default maxLiveWorkers:1 serializes to avoid shared-artifact races, analysts:null clean-disables self-improvement. Per-worker reservation is innerTurns+2 so multiple workers fit the pool (a debugged lesson from the delete
  • Model: opencode/deepseek/deepseek-v4-pro
  • Bridge attempts: 3
  • Bridge warning: opencode/zai-coding-plan/glm-5.2: bridge stream ended without value-audit content; opencode/kimi-for-coding/k2p7: bridge stream ended without value-audit content

No concerns — sound change, no better or existing approach found. ✅


What this audit checks

It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.

Pass What it asks
Heuristic Vague title? Whitespace-only or cruft-bearing diff? (content signals only)
Duplication Do added function/class names already exist elsewhere in the repo?
Value Audit What does it do? What goal does it achieve? Is it good? Better architecture or already-exists?
Usefulness Audit Does it integrate and fit? Will it hold up in real use and actually get used?

Findings are concerns, not blocks — the human reviewer decides what to do with them.

value-audit · 20260630T015708Z

…ise-and-polish

# Conflicts:
#	docs/api/primitive-catalog.md
@drewstone drewstone merged commit 8089ea1 into main Jun 30, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants