Security model

Olympus runs LLM agents that read issues/PRs and write code, push branches, and (optionally) merge on your repo. This page states the threat model and the controls, so an operator can reason about what the agents can and cannot do — and what hardening is still the operator's job.

Threat model

The defining assumption for a public repo: issue and PR authors are untrusted. Anyone can file an issue, and its text flows into an agent. The two highest-risk surfaces:

Implement / revise (hephaestus) — runs work derived from issue/review text with broad shell + file-write tools. Untrusted text reaching a shell-wielding LLM is a remote-code-execution / exfiltration vector.
Triage (hermes) — investigates untrusted text and posts public replies; a do verdict dispatches the implement agent.

Trusted, by contrast: the maintainers (repo write access), the runner, the model gateway, and .olympus.json itself (committed by maintainers).

Controls (defense in depth)

Layer	Control	Where
Authorization	Maintainer-dispatch gate. A `do` verdict auto-dispatches the unattended agent only for authors with write/maintain/admin access; others get a warm reply + a maintainer control to dispatch by hand. A human reviews stranger issues before the agent acts.	`.triage.auto_dispatch` (`trusted`\|`all`\|`never`, default `trusted`) — `run_triage.sh`
Prompt	Untrusted-input framing. Every agent prompt states that issue/review text is data describing what to change, never instructions to obey, with the interpolated title fenced in explicit BEGIN/END UNTRUSTED markers.	`run_hephaestus.sh`, `run_triage.sh`, `run_revise.sh`
Tools	Network egress denied (claude harness). The implement/revise agent runs with `--disallowed-tools` for `curl/wget/nc/ncat/netcat/telnet/ssh/scp/sftp/socat/ftp` + `mcp__*`. Deny beats the broad `Bash` allow and survives `bash -c` / `&&` / `;` / `	`wrappers. The`codex`/`custom` harnesses have no equivalent tool deny-list — see residual risks.
Credentials	Token stripping. `GH_TOKEN`/`GITHUB_TOKEN`/`AGENT_GH_TOKEN`/`ADMIN_GH_TOKEN` are removed from the implement subprocess (it edits code + builds; the driver script makes the `gh` calls). Model-gateway creds are kept.	`agent-harness.sh` (`env -u`)
Outbound hygiene	Guard linters (no LLM). Leakage / secret-reference / secret-value gates keep internal IPs, machine paths, and key material out of every outbound surface (issues, PR bodies, reviews, commits).	`guard.yml`, `scripts/lint/check-*.sh`
Blast radius	Revise round cap → human escalation; per-issue/PR workflow concurrency; the observer scrubs incident bodies before filing.	`revise_dispatch.sh`, workflow `concurrency`

A regression test for the combined prompt+tool defense lives at evals/tasks/implement/prompt-injection/ — an issue whose body embeds a malicious instruction; it passes only if the legitimate fix lands and the injected command does not run.

Residual risks — NOT covered by the above

These need controls the operator owns at the OS / infrastructure layer:

Indirect network egress. The deny-list blocks direct curl/ssh. It does not stop a build script, a package manager, or python -c "..." that shells out to the network. Mitigation: run the implement/revise agent on a runner with an egress firewall that allows only the model gateway. This is the single most important hardening step and the only complete fix for exfil.
Trusted-author assumption. auto_dispatch: trusted trusts anyone with repo write access. A compromised or malicious maintainer account bypasses the dispatch gate. Scope write access accordingly.
Arbitrary build toolchain. build_cmd runs whatever the consumer configured; a malicious .olympus.json (committed by a maintainer) is out of scope — config is part of the trusted base.
Model fallibility. Prompt framing reduces, but cannot guarantee, that the agent ignores a cleverly injected instruction. The tool/network/credential controls are what bound the damage when framing fails.
Non-claude harnesses lack the tool deny-list. The --disallowed-tools egress block is claude-specific; codex/custom harnesses get the prompt framing and token-stripping, but not the direct-egress deny. Run codex only in a trusted environment, behind the OS-level egress firewall, with harness.proxy as the single allowed egress path — the proxy doubles as an egress allow-list. The HARNESS_PROXY secret keeps that internal address out of committed config.
Staging soak runs PR code. When .testing.enabled, a complex PR is deployed to the testing environment via testing.deploy_cmd — i.e. PR code executes there (as CI already does). Soak only runs for PRs that would otherwise auto-merge (same author trust gate), but the testing environment must be isolated from prod and the soak runner should be egress-firewalled like the implement runner. deploy_cmd/health_cmd are trusted config.

Operator hardening checklist

Egress-firewall the runner to the model gateway only (closes indirect egress).
Use a dedicated, low-privilege, ideally ephemeral self-hosted runner for implement/revise — not a shared CI box.
Minimize AGENT_GH_TOKEN scope to exactly what the loop needs (issues, PRs, contents, workflow); never an org-admin token.
Keep auto_dispatch: trusted (or never) on public repos; reserve all for internal repos where every author is already trusted.
Leave AUTO_MERGE_TEAM empty until you trust the loop; gated auto-merge is opt-in.
If you run the codex harness, set harness.proxy / the HARNESS_PROXY secret and make that proxy the only egress the runner can reach (codex has no tool deny-list).
If you enable staging soak, keep the testing environment isolated from prod and egress-firewall the soak runner; the soaked PR still needs a human to merge it.

Reporting a vulnerability

Until a dedicated SECURITY.md disclosure policy is published, report suspected vulnerabilities privately via the repository's GitHub Security advisories (Report a vulnerability) rather than a public issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Security

docs/security.md

Security model

Threat model

Controls (defense in depth)

Residual risks — NOT covered by the above

Operator hardening checklist

Reporting a vulnerability

There aren't any published security advisories

Uh oh!

Security: Netis/olympus

Security

docs/security.md

Security model

Threat model

Controls (defense in depth)

Residual risks — NOT covered by the above

Operator hardening checklist

Reporting a vulnerability

There aren't any published security advisories