Olympus runs LLM agents that read issues/PRs and write code, push branches, and (optionally) merge on your repo. This page states the threat model and the controls, so an operator can reason about what the agents can and cannot do — and what hardening is still the operator's job.
The defining assumption for a public repo: issue and PR authors are untrusted. Anyone can file an issue, and its text flows into an agent. The two highest-risk surfaces:
- Implement / revise (
hephaestus) — runs work derived from issue/review text with broad shell + file-write tools. Untrusted text reaching a shell-wielding LLM is a remote-code-execution / exfiltration vector. - Triage (
hermes) — investigates untrusted text and posts public replies; adoverdict dispatches the implement agent.
Trusted, by contrast: the maintainers (repo write access), the runner, the model
gateway, and .olympus.json itself (committed by maintainers).
| Layer | Control | Where |
|---|---|---|
| Authorization | Maintainer-dispatch gate. A do verdict auto-dispatches the unattended agent only for authors with write/maintain/admin access; others get a warm reply + a maintainer control to dispatch by hand. A human reviews stranger issues before the agent acts. |
.triage.auto_dispatch (trusted|all|never, default trusted) — run_triage.sh |
| Prompt | Untrusted-input framing. Every agent prompt states that issue/review text is data describing what to change, never instructions to obey, with the interpolated title fenced in explicit BEGIN/END UNTRUSTED markers. | run_hephaestus.sh, run_triage.sh, run_revise.sh |
| Tools | Network egress denied (claude harness). The implement/revise agent runs with --disallowed-tools for curl/wget/nc/ncat/netcat/telnet/ssh/scp/sftp/socat/ftp + mcp__*. Deny beats the broad Bash allow and survives bash -c / && / ; / ` |
wrappers. **Thecodex/custom` harnesses have no equivalent tool deny-list** — see residual risks. |
| Credentials | Token stripping. GH_TOKEN/GITHUB_TOKEN/AGENT_GH_TOKEN/ADMIN_GH_TOKEN are removed from the implement subprocess (it edits code + builds; the driver script makes the gh calls). Model-gateway creds are kept. |
agent-harness.sh (env -u) |
| Outbound hygiene | Guard linters (no LLM). Leakage / secret-reference / secret-value gates keep internal IPs, machine paths, and key material out of every outbound surface (issues, PR bodies, reviews, commits). | guard.yml, scripts/lint/check-*.sh |
| Blast radius | Revise round cap → human escalation; per-issue/PR workflow concurrency; the observer scrubs incident bodies before filing. | revise_dispatch.sh, workflow concurrency |
A regression test for the combined prompt+tool defense lives at
evals/tasks/implement/prompt-injection/ — an issue whose body embeds a
malicious instruction; it passes only if the legitimate fix lands and the
injected command does not run.
These need controls the operator owns at the OS / infrastructure layer:
- Indirect network egress. The deny-list blocks direct
curl/ssh. It does not stop a build script, a package manager, orpython -c "..."that shells out to the network. Mitigation: run the implement/revise agent on a runner with an egress firewall that allows only the model gateway. This is the single most important hardening step and the only complete fix for exfil. - Trusted-author assumption.
auto_dispatch: trustedtrusts anyone with repo write access. A compromised or malicious maintainer account bypasses the dispatch gate. Scope write access accordingly. - Arbitrary build toolchain.
build_cmdruns whatever the consumer configured; a malicious.olympus.json(committed by a maintainer) is out of scope — config is part of the trusted base. - Model fallibility. Prompt framing reduces, but cannot guarantee, that the agent ignores a cleverly injected instruction. The tool/network/credential controls are what bound the damage when framing fails.
- Non-claude harnesses lack the tool deny-list. The
--disallowed-toolsegress block is claude-specific;codex/customharnesses get the prompt framing and token-stripping, but not the direct-egress deny. Run codex only in a trusted environment, behind the OS-level egress firewall, withharness.proxyas the single allowed egress path — the proxy doubles as an egress allow-list. TheHARNESS_PROXYsecret keeps that internal address out of committed config. - Staging soak runs PR code. When
.testing.enabled, a complex PR is deployed to the testing environment viatesting.deploy_cmd— i.e. PR code executes there (as CI already does). Soak only runs for PRs that would otherwise auto-merge (same author trust gate), but the testing environment must be isolated from prod and the soak runner should be egress-firewalled like the implement runner.deploy_cmd/health_cmdare trusted config.
- Egress-firewall the runner to the model gateway only (closes indirect egress).
- Use a dedicated, low-privilege, ideally ephemeral self-hosted runner for implement/revise — not a shared CI box.
- Minimize
AGENT_GH_TOKENscope to exactly what the loop needs (issues, PRs, contents, workflow); never an org-admin token. - Keep
auto_dispatch: trusted(ornever) on public repos; reserveallfor internal repos where every author is already trusted. - Leave
AUTO_MERGE_TEAMempty until you trust the loop; gated auto-merge is opt-in. - If you run the codex harness, set
harness.proxy/ theHARNESS_PROXYsecret and make that proxy the only egress the runner can reach (codex has no tool deny-list). - If you enable staging soak, keep the testing environment isolated from prod and egress-firewall the soak runner; the soaked PR still needs a human to merge it.
Until a dedicated SECURITY.md disclosure policy is published, report suspected
vulnerabilities privately via the repository's GitHub Security advisories
(Report a vulnerability) rather than a public issue.