perf: factorize-once + bincount MAP demeaning, max_iter 10k parity, FE-spanned regressor guard by igerber · Pull Request #601 · igerber/diff-diff

igerber · 2026-07-02T23:41:36Z

Summary

Rewrites the FE-absorption engine (demean_by_groups N>=2 branch): each absorbed dimension is factorized ONCE and every MAP sweep is two O(n) np.bincount passes, replacing pandas rebuilding the group hash table per (iteration x variable x dimension). Convergence semantics unchanged (per-variable sweep order, max|x - x_old| < tol, zero-total-weight inert guard, non-convergence warning). NaN group keys now raise ValueError at every arity (pandas groupby silently dropped them; pd.factorize would code them -1 and mis-index).
Raises the MAP iteration cap 100 -> 10,000 in both demean_by_groups and within_transform, matching R fixest (fixef.iter) and pyfixest (fixef_maxiter). Correlated FE incidence (contiguous unit lifetimes) genuinely needs ~270 iterations; the old cap warned and returned slightly-off residuals. Worst-case trade-off documented in REGISTRY.
Closes an FE-spanned-regressor inference hazard (new utils.snap_absorbed_regressors, wired at DiD/MPD absorb=, TWFE, SunAbraham, Bacon, Wooldridge, and the DiD/MPD/TWFE replicate-refit closures): regressors spanned by the absorbed FEs previously demeaned to junk columns that survived rank detection via column equilibration and silently perturbed identified estimates - up to ~3e-3 on ATT with ~1e14-scale garbage coefficients in the joint-span (a_unit + b_time) case. Detection is two-stage because the MAP stopping rule bounds the last step, not the distance to the limit: fast relative-norm path at 1e-10, then an exact sparse-LSMR span-membership confirmation for truncation-masked candidates; sqrt(w)-weighted norms so zero-weight domain rows cannot mask spanning. Spanned regressors drop deterministically (coefficient NaN) with a cause-specific UserWarning honoring rank_deficient_action.
Measured (committed fe_absorption suite baselines, generated on frozen code, CVs 0.1-4.4%): county policy event study 3.87s -> 2.39s (1.6x), firm-churn 2.4M-row SunAbraham 93.0s -> 49.5s (1.9x), scanner TWFE 1.55s -> 0.98s (1.6x), 5M-row geo experiment 2.63s -> 0.97s (2.7x), survey BRR x80 7.39s -> 4.08s (1.8x), small-panel guard unregressed. The correlated-FE stress case now CONVERGES (0.8x wall-clock as the price of correctness: 2.7x the iterations to full convergence plus the exact span confirmation; previously it hit the cap and kept a masked junk column).
Identity evidence: --check-estimates gate at 1e-14-1e-16 on all scenarios except the two documented one-time corrections (survey_absorb ATT 9.8e-6, geo_experiment SE 1e-7 - the removed junk influence), passed via the auditable per-run --allow-shift flag. Estimates are now invariant (~1e-14) to the demeaning tolerance where they previously swung at ~1e-5.
Dead _within_transform methods removed (twfe.py, sun_abraham.py). docs/doc-deps.yaml gains the utils.py -> "Absorbed Fixed Effects" REGISTRY mapping.

Methodology references (required if estimator / math changes)

Method name(s): Method of alternating projections (MAP) for N-way fixed-effects absorption (Frisch-Waugh-Lovell residualization); exact span-membership confirmation via LSMR on the sparse FE incidence
Paper / source link(s): fixest/reghdfe/lfe demeaning convention (docs/methodology/REGISTRY.md "Absorbed Fixed Effects with Survey Weights"); Fong & Saunders (2011) LSMR via scipy.sparse.linalg.lsmr
Any intentional deviations from the source (and why): Numerical contract change documented in REGISTRY - bincount accumulation is not Kahan-compensated the way pandas' grouped mean was, so demeaned values agree with the prior implementation to ~1e-10 order rather than bit-for-bit (estimates validated unchanged at the benchmark identity gate). FE-spanned regressors now drop to NaN with a cause warning instead of receiving junk coefficients; designs carrying them see identified estimates shift once by up to ~1e-5 (REGISTRY edge cases + CHANGELOG).

Validation

Tests added/updated: tests/test_utils.py (bincount-engine parity vs full-dummy ground truth, frozen-loop drift guards, NaN-key guards at every arity, weight-aware snap under zero-weight domains, joint-span LSMR confirmation + identified-low-variation counter-test, max_iter defaults lock, >100-iteration convergence fixture), tests/test_estimators.py (DiD/MPD/TWFE snap behavior, replicate-local spanning fail-closed, joint-span covariate dropped with ATT stable, degenerate unbalanced-TWFE spec now asserts the identification error), tests/test_sun_abraham.py, tests/test_wooldridge.py (hc1 lock recaptured, ~1e-15 shift), tests/test_bacon.py, tests/test_within_transform.py. Targeted sweep: 875 passed.
Backtest / simulation / notebook evidence (if applicable): committed before/after baselines in benchmarks/speed_review/baselines/fe_absorption_{before,after}.json (subprocess-isolated multi-run protocol, CVs recorded); findings table auto-regenerated in docs/performance-plan.md; pyfixest yardstick coefficient parity 6e-13 to 2.7e-9 on exact-estimand scenarios.

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

…rity, FE-spanned regressor guard Rewrites the fixed-effects absorption engine and closes a silent-inference hazard the rewrite's identity gate surfaced. Three coupled changes: 1. demean_by_groups N>=2 branch: each absorbed dimension factorized ONCE (pd.factorize), MAP sweeps via np.bincount group-sum + gather instead of pandas rebuilding the group hash table per (iteration x variable x dimension). Same convergence semantics (per-variable sweep order, max|x - x_old| < tol, zero-total-weight inert guard, warn_if_not_converged). NaN group keys raise ValueError at every arity (one-way included; pandas groupby silently dropped NaN keys, factorize would code them -1). Numerical contract: bincount accumulation is not Kahan-compensated, so outputs agree with the prior pandas loop to ~1e-10 order, not bit-for-bit. 2. max_iter raised 100 -> 10,000 in BOTH demean_by_groups and within_transform (fixest fixef.iter / pyfixest fixef_maxiter parity). Correlated FE incidence (contiguous unit lifetimes) genuinely needs ~270 iterations; the old cap warned and returned slightly-off residuals. 3. FE-spanned regressor guard (new utils.snap_absorbed_regressors + pre_demean_norms + _fe_span_residual_norm), wired at DiD/MPD absorb=, TWFE, SunAbraham, Bacon, Wooldridge AND the DiD/MPD/TWFE replicate-refit closures: regressors spanned by the absorbed FEs previously demeaned to junk columns that survived rank detection via column equilibration and perturbed identified estimates - up to ~3e-3 on ATT with ~1e14-scale garbage coefficients in the joint-span (a_unit + b_time) case. Detection is TWO-STAGE because the MAP stopping rule bounds the last step, not the distance to the limit: fast relative-norm path at 1e-10, then an exact sparse-LSMR span-membership confirmation (on the already-demeaned column, so it converges in a few iterations) for truncation-masked candidates in (1e-10, 1e-3]. Norms are sqrt(w)-weighted so zero-weight domain rows cannot mask spanning on the positive-weight sample. Spanned regressors drop deterministically (coefficient NaN) with a cause-specific UserWarning under rank_deficient_action="warn"; identified estimates become invariant (~1e-14) to the demeaning tolerance where they previously swung at ~1e-5. Measured (fe_absorption suite, AFTER baselines committed, generated on frozen code): 1.6-2.7x end-to-end on the headline scenarios; the correlated-FE stress case now converges (0.8x wall-clock as the price of correctness: 2.7x the iterations to full convergence + the exact span confirmation). Identity gate: 1e-14-1e-16 everywhere except the documented one-time corrections (survey_absorb ATT 9.8e-6, geo_experiment SE 1e-7 - junk columns removed). 875 targeted tests pass; byte-identity locks converted to allclose + full-dummy ground-truth guards; Wooldridge 1e-14 baseline recaptured (~1e-15 shift); a pre-existing degenerate test spec that asserted isfinite on garbage output now asserts the identification error. Dead _within_transform methods removed (twfe, sun_abraham). REGISTRY "Absorbed Fixed Effects" + CHANGELOG + doc-deps updated. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-07-02T23:44:46Z

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings found.

Executive Summary

Affected methods: absorbed fixed-effect MAP/FWL residualization, weighted within transforms, and FE-spanned regressor handling across DiD/MPD absorb=, TWFE, SunAbraham, Bacon, Wooldridge, and replicate-refit closures.
The methodology changes are documented in docs/methodology/REGISTRY.md with explicit Note/Edge case entries, so the numerical drift, max_iter=10_000, NaN group-key guard, and FE-spanned snap behavior are P3 informational, not defects.
I did not find new inline inference anti-patterns or partial NaN guards in the modified paths; replicate SE paths still route through safe_inference().
Edge-case coverage is strong for NaN group keys, zero-weight domains, joint FE-span confirmation, and replicate-local spanning.
I could not execute runtime checks because the review environment is missing numpy.

Methodology

Finding: Documented absorbed-FE numerical contract change

Severity: P3 informational, no action required
Location: diff_diff/utils.py:L2644-L2848, docs/methodology/REGISTRY.md:L4317-L4344
Impact: The factorize-once + np.bincount MAP rewrite changes floating-point accumulation relative to pandas groupby and raises default max_iter from 100 to 10,000. This is a methodology-relevant default/numerical contract change, but it is explicitly documented in the registry.
Concrete fix: None required.

Finding: Documented FE-spanned regressor snap behavior

Severity: P3 informational, no action required
Location: diff_diff/utils.py:L2851-L3033, docs/methodology/REGISTRY.md:L4348-L4378
Impact: FE-spanned regressors are now snapped to zero before solve so rank handling drops them deterministically. This can shift estimates that previously included numerical junk, but the correction and assumptions are documented.
Concrete fix: None required.

Code Quality

No blocking findings. The snap helper is centralized and call sites consistently pass pre-demean norms, absorbed group variables, suffixes, rank-deficiency behavior, and weights.

Performance

No blocking findings. The worst-case max_iter=10_000 latency and LSMR confirmation cost are explicitly accepted and documented in REGISTRY.md and docs/performance-plan.md.

Maintainability

No blocking findings. The repeated estimator wiring is somewhat verbose but follows existing local patterns and keeps estimator-specific rank/warning behavior clear.

Tech Debt

No untracked correctness debt found. Existing deferred-work handling in TODO.md does not appear to be used to mask any P0/P1 issue introduced here.

Security

No findings. The changed code does not introduce secret handling, external I/O, shell execution, or unsafe deserialization paths.

Documentation/Tests

Finding: `doc-deps.yaml` note omits the new snap helper

Severity: P3 informational
Location: docs/doc-deps.yaml:L904-L907
Impact: The utils-to-registry dependency note mentions demeaning/max_iter/NaN-key behavior but not snap_absorbed_regressors, even though that new helper is methodology-relevant and lives in utils.py.
Concrete fix: Extend the note to include “FE-spanned regressor snap / exact span confirmation”.

Finding: Local tests not executed in this review environment

Severity: P3 informational
Location: Review environment
Impact: A lightweight Python sanity check failed at import time with ModuleNotFoundError: No module named 'numpy', so I could not independently run the new tests here.
Concrete fix: Run the targeted test set in CI or a dependency-complete local env, especially tests/test_utils.py, tests/test_estimators.py, tests/test_sun_abraham.py, tests/test_wooldridge.py, and tests/test_bacon.py.

…view P3) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-07-02T23:49:18Z

🔁 AI review rerun (requested by @igerber)

Head SHA: a03319a048e69d4da0739044e72fe4c3fbdffa58

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings found.

Executive Summary

Affected methods: MAP/FWL absorbed fixed-effect demeaning, weighted within transforms, and FE-spanned regressor handling across DiD/MPD absorb=, TWFE, SunAbraham, Bacon, Wooldridge, and replicate-refit closures.
The methodology-relevant changes are documented in REGISTRY.md with Note / Edge case labels, so the max-iteration change, bincount numerics, NaN group-key guard, and FE-spanned snap behavior are P3 informational, not defects.
Previous P3 doc-dependency issue is resolved: docs/doc-deps.yaml:L904-L907 now names snap_absorbed_regressors and exact LSMR span confirmation.
No new inline inference anti-patterns or partial NaN guards were found in modified estimator paths; replicate paths continue to route through safe_inference().
I could not execute tests in this environment: numpy is missing and pytest is not installed.

Methodology

Finding: Documented absorbed-FE MAP numerical/default contract change

Severity: P3 informational, no action required
Location: diff_diff/utils.py:L2646-L2833, docs/methodology/REGISTRY.md:L4317-L4347
Impact: demean_by_groups / within_transform now use factorized np.bincount MAP sweeps and raise max_iter to 10_000; NaN absorbed group keys now raise. These are methodology/default behavior changes, but the registry explicitly documents the rationale, numerical drift contract, and edge-case behavior.
Concrete fix: None required.

Finding: Documented FE-spanned regressor snap behavior

Severity: P3 informational, no action required
Location: diff_diff/utils.py:L2851-L3033, docs/methodology/REGISTRY.md:L4348-L4378
Impact: FE-spanned regressors are now snapped to exact zero before solve, including weighted norms and LSMR confirmation for truncation-masked joint-span cases. This can change estimates that previously depended on numerical junk columns, but the correction is documented and wired across the affected estimators.
Concrete fix: None required.

Code Quality

No blocking findings. The helper is centralized and call sites consistently pass pre-demean norms, absorbed group variables, suffixes, rank-deficiency behavior, display names where useful, and weights: diff_diff/estimators.py:L452-L483, diff_diff/twfe.py:L382-L409, diff_diff/sun_abraham.py:L1491-L1513, diff_diff/wooldridge.py:L1198-L1223, diff_diff/bacon.py:L794-L820.

Performance

No blocking findings. The 10,000-iteration worst case and LSMR confirmation cost are documented as accepted trade-offs in docs/methodology/REGISTRY.md:L4317-L4334 and summarized in docs/performance-plan.md.

Maintainability

No blocking findings. The repeated estimator wiring is verbose but follows existing estimator-specific rank/warning patterns and keeps the snap behavior explicit at each demeaning boundary.

Tech Debt

No untracked correctness debt found. I did not see TODO tracking used to mask a P0/P1 issue introduced by this PR.

Security

No findings. The changes do not add secret handling, external I/O, unsafe deserialization, or shell execution paths.

Documentation/Tests

Finding: Local tests not executed in this review environment

Severity: P3 informational
Location: Review environment
Impact: python import failed with ModuleNotFoundError: No module named 'numpy', and pytest is not installed, so I could not independently run the added tests here.
Concrete fix: Run the targeted CI suite covering tests/test_utils.py, tests/test_estimators.py, tests/test_sun_abraham.py, tests/test_wooldridge.py, and tests/test_bacon.py.

docs: mention snap_absorbed_regressors in the utils doc-deps note (re…

a03319a

…view P3) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

igerber added the ready-for-ci Triggers CI test workflows label Jul 2, 2026

igerber merged commit 3bb0cad into main Jul 3, 2026
33 of 34 checks passed

igerber deleted the perf/demean-bincount branch July 3, 2026 01:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: factorize-once + bincount MAP demeaning, max_iter 10k parity, FE-spanned regressor guard#601

perf: factorize-once + bincount MAP demeaning, max_iter 10k parity, FE-spanned regressor guard#601
igerber merged 2 commits into
mainfrom
perf/demean-bincount

igerber commented Jul 2, 2026

Uh oh!

github-actions Bot commented Jul 2, 2026

Uh oh!

github-actions Bot commented Jul 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Jul 2, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented Jul 2, 2026

Overall Assessment

Executive Summary

Methodology

Finding: Documented absorbed-FE numerical contract change

Finding: Documented FE-spanned regressor snap behavior

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Finding: doc-deps.yaml note omits the new snap helper

Finding: Local tests not executed in this review environment

Uh oh!

github-actions Bot commented Jul 2, 2026

Overall Assessment

Executive Summary

Methodology

Finding: Documented absorbed-FE MAP numerical/default contract change

Finding: Documented FE-spanned regressor snap behavior

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Finding: Local tests not executed in this review environment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Finding: `doc-deps.yaml` note omits the new snap helper