Skip to content

fix: independent temporal-leakage verifier for search_web#161

Merged
ethancjackson merged 3 commits into
mainfrom
search-leakage-verifier
Jul 1, 2026
Merged

fix: independent temporal-leakage verifier for search_web#161
ethancjackson merged 3 commits into
mainfrom
search-leakage-verifier

Conversation

@ethancjackson

Copy link
Copy Markdown
Collaborator

Summary

search_web (the shared news/context-retrieval tool in agent_factory.py, used by all 5 forecasting domains) was leaking post-cutoff information into historical backtests despite an explicit natural-language cutoff instruction — confirmed directly in a Langfuse trace (a5d33fccd8c99929c3784f774dda34e6, cutoff 2026-06-01, result citing WTI/Brent prices "by mid-June 2026"). The existing ContextRetrievalConfig docstring already called this out as "soft (LLM-judgment-based)... not a hard guarantee." A prior attempt at a hard date-filtered search API (Tavily) also leaked, which matches published research: date-restricted search is insufficient because underlying page content/metadata gets updated after original publish dates.

Clickup Ticket(s): N/A

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📝 Documentation update
  • 🔧 Refactoring (no functional changes)
  • ⚡ Performance improvement
  • 🧪 Test improvements
  • 🔒 Security fix

Changes Made

  • Added an independent temporal-leakage verifier to _build_search_tool in aieng-forecasting/aieng/forecasting/methods/agentic/agent_factory.py: after every grounded search_web call with a cutoff date, a separate LLM call (verifier_model, defaults to ADVANCED_MODEL — distinct from search_model — so it doesn't share the same knowledge-attribution blind spot) extracts discrete factual claims and judges each against the cutoff, strips violations, and reports a confidence score.
  • Retry loop: on a non-clean/low-confidence verdict, search_web retries (same query, with the previously flagged claims injected as explicit negative feedback) up to verifier_max_attempts (default 3) before returning an explicit [SEARCH_VERIFICATION_FAILED] ... sentinel — never silently returning unverified content.
  • New ContextRetrievalConfig fields: verifier_model, verifier_max_attempts (3), verifier_confidence_threshold (8/10 — kept tunable rather than hardcoded since LLM self-reported confidence isn't well-calibrated).
  • implementations/energy_oil_forecasting/analyst_agent/agent.py: threaded the three verifier knobs through all four news-enabled builders; hardened the search sub-agent's own instruction (reason step-by-step about claim recency from content, not source metadata; never supplement from background knowledge) as cheap defense-in-depth; root analyst instruction now explicitly handles the [SEARCH_VERIFICATION_FAILED] sentinel by proceeding on price history alone instead of guessing.
  • 8 new unit tests in test_agent_factory.py covering immediate-accept, retry-then-accept (with feedback-injection verified), exhaustion→sentinel, both skip conditions (no cutoff / enforce_cutoff=False), default-model selection, configurable threshold, and verifier parse-failure handling.
  • Verifier calls get their own Langfuse span for free (same litellm.acompletion callback path as the existing search call) — no extra instrumentation needed.

Testing

  • Tests pass locally (uv run pytest tests/)
  • Type checking passes (uv run mypy <src_dir>)
  • Linting passes (uv run ruff check src_dir/)
  • Manual testing performed (describe below)

Manual testing details:

  • uv run pytest aieng-forecasting/tests/aieng/forecasting/methods/agentic -q — 89 passed (including the 8 new leakage-verifier tests), no regressions.
  • uv run ruff check clean on all three changed source files.
  • Verified against live production traces post-fix: confirmed the verifier now runs as its own span on every search_web call for both gemini-3.1-flash-lite-preview and gemini-3.5-flash news-agent variants (previously 1 child LLM call per search_web, now 2: search + verify). Directly observed it flagging and stripping borderline post-cutoff claims (e.g. "Brent trading in a range of approximately $65–$70 per barrel in September 2025" against a 2025-09-01 cutoff) and accepting the filtered result with confidence: 9. Sampled 200 search_web calls from the last 3 days across both models: 0 occurrences of the [SEARCH_VERIFICATION_FAILED] sentinel, no latency evidence of retries in that sample (so the retry-exhaustion path is implemented and tested but not yet observed in production).

Related Issues

N/A — follow-up to #160, which merged before this leakage issue was caught.

Deployment Notes

NB04 (04_systematic_backtest_eval.ipynb) is committed here mid-flight: it's actively re-running with the guard in place after clearing only the two news-agent predictors' cached results (all other predictors, e.g. LightGBM, stayed cached and untouched). Only the completed gemini-3.1-flash-lite-preview 2025-backtest prediction file is included in this commit; the gemini-3.5-flash backtest and both models' 2026-eval prediction files will follow in a follow-up commit once that run finishes — hence this is opened as a draft PR.

Checklist

  • Code follows the project's style guidelines
  • Self-review of code completed
  • Documentation updated (if applicable)
  • No sensitive information (API keys, credentials) exposed

ethancjackson and others added 3 commits July 1, 2026 06:55
Backtest traces showed the WTI news agent's search_web tool leaking
post-cutoff information despite the existing natural-language cutoff
instruction — the same-model self-restraint it relied on isn't a hard
guarantee, and a real trace confirmed it (Langfuse trace
a5d33fccd8c99929c3784f774dda34e6: cutoff 2026-06-01, result stated WTI/Brent
prices "by mid-June 2026").

search_web now runs an independent verifier call (different model by
default) after every grounded search when a cutoff applies: it extracts
discrete claims, judges each on content rather than source timestamps,
strips violations, and retries (same query + flagged claims as negative
feedback) up to 3 times before returning an explicit
[SEARCH_VERIFICATION_FAILED] sentinel rather than ever silently returning
unverified content. This lives in the shared agent_factory.py used by all
5 forecasting domains, so the fix applies everywhere search_web is used.

Verified against live traces post-fix: the verifier is correctly wired as
its own span, and is actively catching and stripping borderline
post-cutoff claims (confirmed on both gemini-3.1-flash-lite-preview and
gemini-3.5-flash news-agent variants).

NB04 and its committed prediction-cache artifacts are re-running with the
guard in place; the 2025 backtest result for the lite-preview news agent
is included here, with the remaining variants to follow in a follow-up
commit once that run completes.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
The independent verifier itself (agent_factory.py) already protects every
domain unconditionally via shared defaults, but two complementary pieces
were energy_oil-analyst-agent-only:

- The recency-reasoning / no-background-knowledge prompt hardening on each
  domain's context-retrieval sub-agent instruction.
- Root-agent handling of the [SEARCH_VERIFICATION_FAILED] sentinel (for
  starter-agent templates, this lives in the research-playbook skill doc
  that's loaded before search_web is called).

Propagated the same short additions to boc_rate_decisions (analyst_agent +
starter_agent), sp500_forecasting/starter_agent, food_price_forecasting/
starter_agent, and energy_oil_forecasting's own starter_agent/adaptive_agent
(which each carry their own duplicate copy of the instruction text).
getting_started/concierge_agent has context retrieval disabled, so it's
unaffected.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Local ad-hoc ruff-format checks (repo's newer uv-managed ruff 0.15.19)
missed a one-line collapse that the CI-pinned pre-commit hook version
catches. Verified by running the actual pre-commit hook suite locally
before pushing.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
@ethancjackson ethancjackson marked this pull request as ready for review July 1, 2026 11:28
@ethancjackson ethancjackson merged commit e516a8d into main Jul 1, 2026
2 checks passed
@ethancjackson ethancjackson deleted the search-leakage-verifier branch July 1, 2026 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant