Skip to content

docs(examples): intelligence-coding-bench — full Intelligence SDK over the webcode benchmark#418

Merged
drewstone merged 1 commit into
mainfrom
feat/intelligence-coding-bench
Jun 30, 2026
Merged

docs(examples): intelligence-coding-bench — full Intelligence SDK over the webcode benchmark#418
drewstone merged 1 commit into
mainfrom
feat/intelligence-coding-bench

Conversation

@drewstone

Copy link
Copy Markdown
Contributor

What

A new example that takes the webcode-matrix harness×model coding benchmark and wraps every cell in the full Tangle Intelligence SDK. It imports the exact grid + task set next door (no fork) and adds only the instrumentation.

The three intelligence layers (all on one cell)

Layer Primitive Gives you
Boundary withTangleIntelligence(cell, { project, effort }) the bill + the control. effort ∈ off·eco·standard·thorough·max; 'off' = provable passthrough floor (intelligence spend clamped to 0, cell still runs)
Waterfall createWaterfallCollector() the cost truth — sum of its spans IS the billed run cost, per tool/phase
OTLP createOtelExporter() + loopEventToOtelSpan stream every span to your OTLP/HTTP collector (no-op until OTEL_EXPORTER_OTLP_ENDPOINT set)

Two seams: the boundary wraps the whole cell (works over any async fn); the internal trace rides openSandboxRun's hooks (the one run-verb that emits per-tool spans).

Linked, both ways

  • webcode-matrix exports its grid + tasks + WebCodeTask; this example imports them — same benchmark, observability view.
  • README links: main showcase row + a "instrument it" back-link in webcode-matrix/README.md.

Verification

  • tsc clean on examples (0 errors); biome clean.
  • $0 in-process smoke of the new wiring: withTangleIntelligence passthrough returns input unchanged; the otel-hook adapter produces a valid span (normalized traceId + spanId + name); createOtelExporter() is undefined without an endpoint.
  • The sandbox cell itself reuses the proven openSandboxRun pattern from webcode-matrix. The full 12-cell live run needs SANDBOX_API_KEY + EXA_API_KEY and is not CI-run (cost) — same as webcode-matrix.

…nchmark with the full Intelligence SDK

Imports the EXACT webcode-matrix grid + tasks and wraps every harness×model cell in all three
Tangle Intelligence layers:
  1. withTangleIntelligence — the billing boundary + effort tiers ('off' = provable passthrough floor)
  2. createWaterfallCollector — the per-tool cost waterfall (sum of spans IS the billed cost)
  3. createOtelExporter + loopEventToOtelSpan — stream spans to an OTLP/HTTP collector

- new examples/intelligence-coding-bench/{intelligence-coding-bench.ts,README.md}
- webcode-matrix exports its grid + tasks + WebCodeTask so the example reuses the same benchmark
- bidirectional README links (main showcase row + webcode-matrix back-link)
@drewstone drewstone merged commit c00383e into main Jun 30, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant