docs(examples): intelligence-coding-bench — full Intelligence SDK over the webcode benchmark by drewstone · Pull Request #418 · tangle-network/agent-runtime

drewstone · 2026-06-30T02:25:06Z

What

A new example that takes the webcode-matrix harness×model coding benchmark and wraps every cell in the full Tangle Intelligence SDK. It imports the exact grid + task set next door (no fork) and adds only the instrumentation.

The three intelligence layers (all on one cell)

Layer	Primitive	Gives you
Boundary	`withTangleIntelligence(cell, { project, effort })`	the bill + the control. `effort ∈ off·eco·standard·thorough·max`; `'off'` = provable passthrough floor (intelligence spend clamped to 0, cell still runs)
Waterfall	`createWaterfallCollector()`	the cost truth — sum of its spans IS the billed run cost, per tool/phase
OTLP	`createOtelExporter()` + `loopEventToOtelSpan`	stream every span to your OTLP/HTTP collector (no-op until `OTEL_EXPORTER_OTLP_ENDPOINT` set)

Two seams: the boundary wraps the whole cell (works over any async fn); the internal trace rides openSandboxRun's hooks (the one run-verb that emits per-tool spans).

Linked, both ways

webcode-matrix exports its grid + tasks + WebCodeTask; this example imports them — same benchmark, observability view.
README links: main showcase row + a "instrument it" back-link in webcode-matrix/README.md.

Verification

tsc clean on examples (0 errors); biome clean.
$0 in-process smoke of the new wiring: withTangleIntelligence passthrough returns input unchanged; the otel-hook adapter produces a valid span (normalized traceId + spanId + name); createOtelExporter() is undefined without an endpoint.
The sandbox cell itself reuses the proven openSandboxRun pattern from webcode-matrix. The full 12-cell live run needs SANDBOX_API_KEY + EXA_API_KEY and is not CI-run (cost) — same as webcode-matrix.

…nchmark with the full Intelligence SDK Imports the EXACT webcode-matrix grid + tasks and wraps every harness×model cell in all three Tangle Intelligence layers: 1. withTangleIntelligence — the billing boundary + effort tiers ('off' = provable passthrough floor) 2. createWaterfallCollector — the per-tool cost waterfall (sum of spans IS the billed cost) 3. createOtelExporter + loopEventToOtelSpan — stream spans to an OTLP/HTTP collector - new examples/intelligence-coding-bench/{intelligence-coding-bench.ts,README.md} - webcode-matrix exports its grid + tasks + WebCodeTask so the example reuses the same benchmark - bidirectional README links (main showcase row + webcode-matrix back-link)

drewstone merged commit c00383e into main Jun 30, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs(examples): intelligence-coding-bench — full Intelligence SDK over the webcode benchmark#418

docs(examples): intelligence-coding-bench — full Intelligence SDK over the webcode benchmark#418
drewstone merged 1 commit into
mainfrom
feat/intelligence-coding-bench

drewstone commented Jun 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

drewstone commented Jun 30, 2026

What

The three intelligence layers (all on one cell)

Linked, both ways

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant