Heron

The Wireshark for AI Agents

Passive agent observability — reconstructed from the traffic itself, off the wire or at the host's TLS boundary, never in the request path.

Zero SDK. Zero Proxy. Zero Intrusion. Replay a .pcap and instantly see every agent turn, tool call, and LLM interaction. No code changes. No cooperation from the workloads being observed.

One command. Live TTFT, latency, throughput, error rate, and per-agent mix — reconstructed straight off the wire. Try it now ↓

30-Second Quick Start

No live capture needed. No privileges. Just a .pcap with LLM traffic.

# Install (Linux/macOS, user-local, no sudo)
curl -fsSL https://raw.githubusercontent.com/Netis/heron/main/install.sh \
  | INSTALL_DIR="$HOME/.local" sh

# Replay a pcap — no privileges needed
heron --pcap-file capture.pcap --no-retention

Open http://localhost:3000 to see your agent turns, timelines, and metrics. After a pcap finishes replaying, the process keeps the API/console available so you can browse — press Ctrl+C to exit, or pass --exit-after-drain for batch/CI use that exits as soon as the pipeline drains.

No pcap handy? The repo ships fixtures in testdata/pcaps/ — replay any of them. Have a live interface? Run heron -i eth0 (needs CAP_NET_RAW on Linux — see install docs). Grant it once with sudo setcap cap_net_raw,cap_net_admin=eip ~/.local/bin/heron so no sudo is needed at runtime. On Linux? An experimental on-host eBPF source reads TLS-encrypted traffic as plaintext at the in-process SSL_read/SSL_write boundary, attributed per process — opt-in, built behind the ebpf cargo feature, needs CAP_BPF.

Heron sees plaintext HTTP. Install it where the traffic is already decrypted: on the inference host, behind the TLS terminator, or fed from a trusted packet source.

What Makes Heron Different

Agent Turn Reconstruction

Stitches multi-call agent interactions (planner → tool → result → next tool) into single addressable turns — see the full agent narrative, not raw HTTP calls.

Named profiles for Claude Code and OpenAI Codex CLI, plus a generic profile for everything else.

Agent turn timeline — 247 calls in one turn

Service Topology

See your inference fleet as a directed graph: clients → litellm proxies → vLLM / SGLang backends, edge thickness scaled by turn count.

Heron classifies what each endpoint serves — vLLM, SGLang, Ollama, llama.cpp, LiteLLM — from the bytes on the wire, not from configuration.

SFT Trajectory Export

Turn real agent traffic into fine-tuning data. Export any turn or session as OpenAI-style messages JSONL — tool calls, results, and reasoning preserved, arguments rehydrated to objects.

One click from a turn's detail view, or batch-export from the Agent Turns list with the current filter (time · agent kind · model · wire API). Anthropic and OpenAI-chat wire formats today; unsupported formats are reported and skipped, not failed.

Why Not an SDK / Proxy / OpenTelemetry?

Approach	In request path	Needs client changes	Sees full bodies	Reconstructs turns	Exports SFT data
SDK instrumentation	yes	every client	yes	manual	manual
Reverse proxy (LiteLLM …)	yes	re-point clients	yes	per-call only	no
OpenTelemetry from server	yes	server must emit	partial	if tagged	no
Heron	no	none	yes¹	automatic	one click

¹ TLS-terminated traffic — Heron sees plaintext HTTP. Install it where the traffic is already decrypted (inference host, behind the TLS terminator, or fed by cloud-probe from a SPAN/TAP point), or use the experimental Linux eBPF source for on-host encrypted capture with per-process attribution.

The trade-off is honest: you give up cross-cluster client tracing. You get a single passive evidence chain that can't break the call when the observer fails, requires zero cooperation from the workloads being observed, and assembles the agent narrative for you instead of leaving you to join calls into turns in your data warehouse.

Architecture

NIC / .pcap file / cloud-probe (ZMQ) / eBPF SSL uprobes
        │
        ▼
   capture → flow dispatcher (hash by 5-tuple)
        │
        ▼
   N parallel workers: HTTP/SSE parse → wire-API detection → semantic extraction
        │
        ▼
   turn tracker  +  metrics aggregator  +  storage sink
        │
        ▼
       DuckDB ─── REST API ─── React console (localhost:3000)

Same connection's packets always land on the same worker, so parsing state is local and lock-free. Multiple independent pipelines can run side-by-side — e.g. low-latency local capture isolated from bursty cloud-probe ingress. The pipeline never sits in the request path, so the observer can fail without breaking the calls being observed.

On-host eBPF capture (Linux, experimental). Packet capture only sees plaintext. Heron's eBPF SSL-uprobe source lifts that on Linux: it hooks SSL_read / SSL_write in-process and reads TLS-encrypted LLM calls as plaintext on the host that makes them — no proxy, no TLS terminator, no SDK, never in the request path — and stamps every call with its owning process (pid · command · executable). It covers dynamically-linked OpenSSL/BoringSSL (Python openai/anthropic SDKs, curl, Node, most CLIs) and statically-linked, symbol-stripped BoringSSL single-executable runtimes (Claude Code's and opencode's Bun binaries, located by byte-signature offset). It follows the agent CLIs' frequent npm self-updates (re-attaches across the rotated binary inode) and reaches already-running sessions (via /proc/<pid>/exe) without a restart. Opt-in — built behind the ebpf cargo feature; needs CAP_BPF + kernel BTF. See eBPF capture.

What's in the Box

Ingress sources

libpcap on a live interface
Replay from .pcap files (any speed)
ZMQ from cloud-probe for hosts you can't install on directly
eBPF SSL uprobes (Linux · opt-in) — hook SSL_read / SSL_write in-process to read TLS-encrypted traffic as plaintext on the host that makes the calls, with no proxy or TLS terminator, and stamp every call with its owning process. Covers dynamically-linked OpenSSL/BoringSSL and statically-linked, symbol-stripped BoringSSL runtimes (Claude Code's / opencode's Bun). Built behind the ebpf cargo feature, needs CAP_BPF. See eBPF capture.

Wire-API decoders

OpenAI Chat Completions (/v1/chat/completions)
OpenAI Responses (/v1/responses)
Anthropic Messages (/v1/messages)
Gemini AI Studio (generativelanguage.googleapis.com)

Covers OpenAI direct, Azure OpenAI, Anthropic, AWS Bedrock / GCP Vertex (Anthropic wire), Google Gemini, and any OpenAI-compatible server — vLLM, SGLang, Ollama, llama.cpp, LM Studio, etc. Every LLM call is also captured with structured request/response and the raw body, so stalled tool calls, malformed prompts, and unexpected token counts are evidence on the page, not behind a re-run.

Console pages

http://localhost:3000 — Overview · Performance · Usage · Errors · Services (table / path / model) · Agent Turns · Agent Sessions · LLM Calls (with full request/response drill-down) · Raw HTTP · Pipeline Health.

Three themes, one toggle: Kami (紙神 — warm washi-paper, the default and what the screenshots show), Dark (slate), Light (high-contrast). The choice persists per browser; charts, the topology graph, and the timeline gantt all re-theme with it.

Metrics

Agent layer: turn count and duration distribution per agent kind, call count per turn, tool-call success rate.
Call layer: TTFT · E2E latency · TPOT · token throughput · call rate · active calls · error rate · prompt-cache hit ratio.

See the glossary for what each means and why.

Storage & distribution

Storage: DuckDB (default, embedded, single-file) with per-table retention out of the box, or ClickHouse for high-volume columnar analytics (storage.backend = "clickhouse"). Pluggable backend trait; PostgreSQL is designed but not yet wired.
Distribution: prebuilt static binaries for Linux musl (x86_64 + aarch64) and macOS (Intel + Apple Silicon). The web console is embedded in the binary — single artifact, no separate frontend deploy.

Who It's For

Role	Use case
Agent developers	Debug stalled tool calls, detect plan-loop / "no submit" failures, see exactly which model+endpoint each turn hit — without modifying the agent or its SDK
AI platform / inference ops	See the real service-to-service topology (clients → litellm → vLLM / SGLang), measure each hop, catch silent model substitutions
FinOps & eng managers	Attribute spend across teams/repos/projects from real turns, not periodic SDK exports that drift
Compliance & security	Capture-once evidence chain of what each agent sent and received, scoped per agent kind and session
Model trainers / fine-tuners	Turn real captured agent runs into SFT datasets — per turn or whole session — without hand-labeling or re-running the agent

Install & Verify with an AI Agent

Running an AI coding agent (Claude Code, Codex, etc.)? Hand it the prompt below and let it do the install + smoke test. It needs only shell access to the target machine.

Install and smoke-test Heron (https://github.com/Netis/heron) on this machine:

1. Read the README and docs/install.md to pick the right install path.
   Use the one-line installer; user-local (no sudo) is fine.
2. Verify the binary: `heron --version` and `heron --help` both work.
3. Smoke-test WITHOUT live capture (no privileges needed): find or fetch a
   small .pcap with LLM traffic (the repo's testdata/pcaps/ has fixtures),
   then run `heron --pcap-file <file> --no-retention`.
4. Confirm the API is up: `curl -s http://localhost:3000/api/health` returns
   healthy, and `curl -s 'http://localhost:3000/api/traces?limit=5'` returns
   reconstructed traces.
5. (Optional, needs CAP_NET_RAW) for a live test: setcap the binary and run
   `heron -i <iface>`, generate some LLM traffic through the host, then
   re-check the console at http://localhost:3000.

Report the console URL and the trace count you saw. Don't hard-code or commit
any host/credential — this repo rejects infra leakage in CI.

The last line matters: a check-leakage.sh CI gate fails any PR that commits a private IP, plaintext credential, or key — keep your own infra out of anything you push back.

Documentation

Doc	Description
Install	One-line installer, systemd, capabilities, uninstall
Configure	Pipelines, sources, storage, retention
eBPF capture	On-host TLS capture + process attribution (Linux, opt-in)
Architecture	Pipeline design and trade-offs
Glossary	What every metric means
Filing issues	How issues are triaged + how to file one an agent can pick up
Mission	Long-arc vision
Changelog	Release history

Roadmap

The current surface is the foundation layer (Ops use cases). On the way:

Storage — PostgreSQL backend (ClickHouse shipped in v0.5.0; PG schema already designed)
Wire APIs — more provider-specific extensions (Bedrock variants, Vertex non-Anthropic, etc.)

See docs/mission.md for the full ladder.

Contributing

Bug reports and PRs welcome. Before opening a PR, run:

just build all       # single binary with embedded console
just quality all     # rust fmt + clippy + ts lint + tsc
just test all        # cargo test (all crates)

Run just help for the full menu. Design docs under docs/design/ describe the per-module contract — read the relevant one before changing anything load-bearing.

Build via just build all, not a bare cargo build. The web console is embedded behind the non-default console cargo feature; a raw cargo build --release yields a working API with a blank console. If you invoke cargo directly, run bun run build in console/ first and pass --features console — see docs/install.md → Building from source.

License

Apache 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 358 Commits
.agents/skills		.agents/skills
.claude		.claude
.github/workflows		.github/workflows
console		console
docs		docs
launch		launch
scripts		scripts
server		server
site		site
testdata		testdata
.gitattributes		.gitattributes
.gitignore		.gitignore
.olympus.json		.olympus.json
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
VERSION		VERSION
install.sh		install.sh
justfile		justfile
project.yaml		project.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Heron

30-Second Quick Start

What Makes Heron Different

Agent Turn Reconstruction

Service Topology

SFT Trajectory Export

Why Not an SDK / Proxy / OpenTelemetry?

Architecture

What's in the Box

Who It's For

Install & Verify with an AI Agent

Documentation

Roadmap

Contributing

License

About

Uh oh!

Releases 10

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Heron

30-Second Quick Start

What Makes Heron Different

Agent Turn Reconstruction

Service Topology

SFT Trajectory Export

Why Not an SDK / Proxy / OpenTelemetry?

Architecture

What's in the Box

Who It's For

Install & Verify with an AI Agent

Documentation

Roadmap

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages