Skip to content

perf(staggered): O(n_units) CS aggregation IF assembly + R did 2.5.1 yardstick#617

Merged
igerber merged 1 commit into
mainfrom
perf/cs-aggregation-if-assembly
Jul 5, 2026
Merged

perf(staggered): O(n_units) CS aggregation IF assembly + R did 2.5.1 yardstick#617
igerber merged 1 commit into
mainfrom
perf/cs-aggregation-if-assembly

Conversation

@igerber

@igerber igerber commented Jul 4, 2026

Copy link
Copy Markdown
Owner

Summary

  • Rewrite CallawaySantAnna's combined influence-function assembly (staggered_aggregation._compute_combined_influence_function, also inherited by StaggeredTripleDifference) from per-target full-DataFrame scans + per-unit Python loops + dense (n_units × n_gt) WIF matrices to per-fit cohort tables (identity-validated cache) + closed-form WIF + fancy-index scatter; the pre-rewrite general path is preserved verbatim as the fallback for direct callers
  • Point estimates bit-identical; aggregated SEs ≤5e-16 relative (frozen-copy drift-bound tests at rtol=0, atol=1e-9 across panel / survey / RCS / unbalanced / triple-diff / inf-coded fixtures)
  • Measured (medians of 3): analytical fits 2.3-6.3x faster at 2-5M rows; bootstrap fits 1.4-1.6x; repeated cross-sections up to 2.8x with peak memory -55 to -68% (WIF matrices were observation-scale in RCS mode); full tables in docs/performance-plan.md
  • R did 2.5.1 yardstick: 3-11x faster at equal work (att_gt + 3 aggte vs fit(aggregate="all"), identical biters/cband, single- AND multi-core R; R's pl/cores is BLAS-bound on this path). benchmarks/R/benchmark_did.R extended with --xformla/--bstrap/--biters/--cband/--pl/--cores/--faster-mode (backward-compatible defaults; errors on unknown flags; config recorded in JSON metadata)
  • Golden aggregated-SE assertions enabled for dr scenarios (match R fixtures at 1e-12..7e-6); pre-existing reg-method aggregated-SE gap vs R fixtures (3-20%, ATTs exact at 1e-11, byte-identical deltas on the pre-rewrite tree) documented in TODO.md for follow-up
  • New TODO rows: reg-path aggregated-SE investigation (Medium), cluster-path np.add.at sibling, lazy per-cell label arrays, dead _compute_aggregated_se removal

Methodology references (required if estimator / math changes)

  • Method name(s): Callaway & Sant'Anna (2021) aggregated ATT standard errors (weight influence function per R did::wif() / compute.aggte.R)
  • Paper / source link(s): Callaway & Sant'Anna (2021), Journal of Econometrics 225(2); R did package (bcallaway11/did)
  • Any intentional deviations from the source (and why): None new. The closed-form WIF is algebraically identical to the dense form (REGISTRY.md CallawaySantAnna "Standard errors" Note documents the implementation change and the ≤5e-16 accumulation-order drift, mirroring the shared-demeaning-engine numerics convention)

Validation

  • Tests added/updated: tests/test_staggered_aggregation.py (new: frozen byte-copy parity, WIF zero-contract, cache-invalidation, fast-path-taken dispatch tests), tests/test_csdid_ported.py (golden aggregated-SE assertions for dr scenarios)
  • Backtest / simulation / notebook evidence (if applicable): 473 tests × both backends green; live-R MPDTA suite 5/5 (baselined on the pristine tree after the did 2.5.1 upgrade to isolate upgrade effects from this change); before/after benchmark protocol per docs/performance-plan.md

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

🤖 Generated with Claude Code

https://claude.ai/code/session_01X4AzrFUMqJxcUumSH31mSr

@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown

Overall Assessment
✅ Looks good. No unmitigated P0/P1 findings.

Executive Summary

  • Affected methods: CallawaySantAnna aggregated SEs and inherited StaggeredTripleDifference aggregation.
  • The closed-form WIF fast path matches the documented registry contract and the R did aggregation structure.
  • Empty/NaN inference handling remains guarded through _compute_aggregated_se_with_wif() and safe_inference_batch().
  • Deferred reg-method aggregated-SE mismatch is tracked in TODO.md, so it is not a blocker.
  • I did not run the test suite in this sandbox; the local Python environment is missing numpy.

Methodology

  • Severity: P3-informational. Impact: The PR changes WIF assembly, but the deviation is documented as an implementation/accumulation-order change in docs/methodology/REGISTRY.md:L499-L520. The R source computes simple/dynamic aggregation with wif(...) and combines it through get_agg_inf_func(..., wif=...); the PR’s cohort-table expression is the algebraic contraction of that dense WIF. Group-specific R aggregation uses wif = NULL, and the new formula reduces to zero for single-cohort group aggregation. Concrete fix: None required. See diff_diff/staggered_aggregation.py:L215-L385. (raw.githubusercontent.com)

Code Quality

  • Severity: P3. Impact: In benchmarks/R/benchmark_did.R, --faster-mode is ignored on older did versions when unsupported, but metadata still records the requested value rather than the effective value. This can mislabel benchmark artifacts. Concrete fix: track effective_faster_mode after the support check and write that to metadata. See benchmarks/R/benchmark_did.R:L132-L139 and benchmarks/R/benchmark_did.R:L201-L214.

Performance

  • Severity: None. Impact: The fast path is appropriately scoped behind identity/shape guards and falls back for unsupported direct callers or non-numeric cohorts. Concrete fix: None.

Maintainability

  • Severity: None. Impact: Cache invalidation by cohort/survey-weight array identity handles the StaggeredTripleDifference shallow-copy pattern. Concrete fix: None. See diff_diff/staggered_aggregation.py:L215-L264 and diff_diff/staggered_triple_diff.py:L590-L600.

Tech Debt

  • Severity: P3-informational. Impact: New deferred items are tracked in TODO.md, including the pre-existing reg-method aggregated-SE gap and follow-up performance cleanup. Under the review rules, these are not blockers. Concrete fix: None for this PR; follow the TODO entries. See TODO.md:L35 and TODO.md:L57-L59.

Security

  • Severity: None. Impact: No secrets, unsafe IO, or new security-sensitive paths found in the diff. Concrete fix: None.

Documentation/Tests

  • Severity: P3. Impact: Value parity coverage is strong, including frozen-copy parity and R golden DR aggregation SE assertions. The fast-path-dispatch assertions cover plain panel and RCS, but not survey, bootstrap-prep, or triple-diff dispatch directly; those cases could fall back and still pass parity tests. Concrete fix: Add spy assertions mirroring test_fast_path_taken_on_plain_panel_fit for survey pweights, bootstrap prep, and StaggeredTripleDifference. See tests/test_staggered_aggregation.py:L331-L405 and tests/test_staggered_aggregation.py:L496-L571.

…yardstick

Rewrite CallawaySantAnna's combined influence-function assembly (also
inherited by StaggeredTripleDifference aggregation) - previously 56-85% of
analytical fit time at scale:

- Per-fit cohort tables (np.unique + np.bincount) cached on the precomputed
  structures with array-identity validation (safe against triple-diff's
  shallow-copy + zeroed-cohort aggregation pattern)
- Closed-form WIF (wif_i = w_i * (E(c_i)/S - K(c_i)*d/S^2)) replacing dense
  (n_units x n_gt) indicator/outer-product matrices - algebraically identical,
  documented in REGISTRY
- Fancy-index scatter replacing np.add.at (per-cell index arrays are
  duplicate-free by construction; invariant documented at all IF producers)
- Pre-rewrite general path preserved verbatim as fallback for direct callers

Point estimates bit-identical; aggregated SEs <=5e-16 relative (drift-bound
frozen-copy tests at rtol=0 atol=1e-9 across panel/survey/RCS/unbalanced/
triple-diff/inf-coded fixtures). Analytical fits 2.3-6.3x faster at 2-5M rows,
bootstrap fits 1.4-1.6x, RCS peak memory -55 to -68%. 3-11x faster than R did
2.5.1 at equal work (single- and multi-core R; benchmarks/R/benchmark_did.R
extended with bootstrap/covariate/parallelism flags for the yardstick).

Golden aggregated-SE assertions enabled for dr scenarios (match R fixtures at
1e-12..7e-6). Pre-existing reg-method aggregated-SE gap vs R fixtures (3-20%,
ATTs exact) documented in TODO.md for follow-up investigation.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01X4AzrFUMqJxcUumSH31mSr
@igerber igerber force-pushed the perf/cs-aggregation-if-assembly branch from 0f4fce8 to d0a7a98 Compare July 4, 2026 23:42
@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: d0a7a98bca213b02fc047ad8727f67a63f77f138


Overall Assessment
✅ Looks good. No unmitigated P0/P1 findings.

Executive Summary

  • Affected methods: CallawaySantAnna aggregated ATT SE assembly and inherited StaggeredTripleDifference aggregation.
  • The closed-form WIF fast path matches the documented registry contract and algebraically contracts the preserved dense WIF path.
  • Prior P3 benchmark metadata issue is fixed via effective_faster_mode.
  • Prior dispatch-test gap is addressed for survey, bootstrap-prep, RCS, and triple-diff paths.
  • I could not run tests: pytest is not installed in this environment.

Methodology

  • Severity: P3-informational. Impact: The PR changes combined IF/WIF assembly for aggregation SEs, but the implementation change is explicitly documented in docs/methodology/REGISTRY.md:L499-L523. The fast path in diff_diff/staggered_aggregation.py:L266-L385 is the cohort-level contraction of the preserved dense WIF formula in diff_diff/staggered_aggregation.py:L520-L636; non-finite WIF still returns NaN and empty IFs still fail soft. Concrete fix: None required.
  • Severity: P3-informational. Impact: The known reg-method aggregated-SE mismatch against R fixtures is tracked in TODO.md:L35, and the new golden assertions are enabled only for DR scenarios in tests/test_csdid_ported.py:L1258-L1335. Concrete fix: None for this PR.

Code Quality

  • Severity: None. Impact: Cache identity validation is appropriate for the triple-diff shallow-copy pattern; see diff_diff/staggered_aggregation.py:L215-L264 and diff_diff/staggered_triple_diff.py:L582-L600. The previous benchmark metadata concern is fixed at benchmarks/R/benchmark_did.R:L132-L145 and benchmarks/R/benchmark_did.R:L204-L219. Concrete fix: None.

Performance

  • Severity: None. Impact: The fast path is guarded by precomputed/index identity and RCS size checks, with fallback for unsupported direct callers or non-numeric cohorts; see diff_diff/staggered_aggregation.py:L423-L452. Concrete fix: None.

Maintainability

  • Severity: P3-informational. Impact: Follow-up cleanup items are tracked in TODO.md:L57-L59, including the cluster scatter sibling, lazy label arrays, and dead _compute_aggregated_se. These are deferrable and tracked. Concrete fix: None required for approval.

Tech Debt

  • Severity: P3-informational. Impact: New deferred work is recorded in TODO.md, including the pre-existing reg-path aggregation-SE investigation and performance cleanups. Under the review rules, these are not blockers. Concrete fix: None for this PR.

Security

  • Severity: None. Impact: No secrets, unsafe IO, or new security-sensitive behavior found in the changed files. Concrete fix: None.

Documentation/Tests

  • Severity: None. Impact: Registry and performance docs cover the implementation change, and new tests cover frozen-copy parity, cache invalidation, WIF zero contract, and fast-path dispatch for panel/RCS/survey/bootstrap/triple-diff paths; see tests/test_staggered_aggregation.py:L331-L630. Concrete fix: None. Verification note: attempted python -m pytest -q tests/test_staggered_aggregation.py -p no:cacheprovider, but the environment lacks pytest.

@igerber igerber added the ready-for-ci Triggers CI test workflows label Jul 4, 2026
@igerber igerber merged commit b56931a into main Jul 5, 2026
35 of 36 checks passed
@igerber igerber deleted the perf/cs-aggregation-if-assembly branch July 5, 2026 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant