perf: factorize-once + bincount MAP demeaning, max_iter 10k parity, FE-spanned regressor guard#601
Conversation
…rity, FE-spanned regressor guard Rewrites the fixed-effects absorption engine and closes a silent-inference hazard the rewrite's identity gate surfaced. Three coupled changes: 1. demean_by_groups N>=2 branch: each absorbed dimension factorized ONCE (pd.factorize), MAP sweeps via np.bincount group-sum + gather instead of pandas rebuilding the group hash table per (iteration x variable x dimension). Same convergence semantics (per-variable sweep order, max|x - x_old| < tol, zero-total-weight inert guard, warn_if_not_converged). NaN group keys raise ValueError at every arity (one-way included; pandas groupby silently dropped NaN keys, factorize would code them -1). Numerical contract: bincount accumulation is not Kahan-compensated, so outputs agree with the prior pandas loop to ~1e-10 order, not bit-for-bit. 2. max_iter raised 100 -> 10,000 in BOTH demean_by_groups and within_transform (fixest fixef.iter / pyfixest fixef_maxiter parity). Correlated FE incidence (contiguous unit lifetimes) genuinely needs ~270 iterations; the old cap warned and returned slightly-off residuals. 3. FE-spanned regressor guard (new utils.snap_absorbed_regressors + pre_demean_norms + _fe_span_residual_norm), wired at DiD/MPD absorb=, TWFE, SunAbraham, Bacon, Wooldridge AND the DiD/MPD/TWFE replicate-refit closures: regressors spanned by the absorbed FEs previously demeaned to junk columns that survived rank detection via column equilibration and perturbed identified estimates - up to ~3e-3 on ATT with ~1e14-scale garbage coefficients in the joint-span (a_unit + b_time) case. Detection is TWO-STAGE because the MAP stopping rule bounds the last step, not the distance to the limit: fast relative-norm path at 1e-10, then an exact sparse-LSMR span-membership confirmation (on the already-demeaned column, so it converges in a few iterations) for truncation-masked candidates in (1e-10, 1e-3]. Norms are sqrt(w)-weighted so zero-weight domain rows cannot mask spanning on the positive-weight sample. Spanned regressors drop deterministically (coefficient NaN) with a cause-specific UserWarning under rank_deficient_action="warn"; identified estimates become invariant (~1e-14) to the demeaning tolerance where they previously swung at ~1e-5. Measured (fe_absorption suite, AFTER baselines committed, generated on frozen code): 1.6-2.7x end-to-end on the headline scenarios; the correlated-FE stress case now converges (0.8x wall-clock as the price of correctness: 2.7x the iterations to full convergence + the exact span confirmation). Identity gate: 1e-14-1e-16 everywhere except the documented one-time corrections (survey_absorb ATT 9.8e-6, geo_experiment SE 1e-7 - junk columns removed). 875 targeted tests pass; byte-identity locks converted to allclose + full-dummy ground-truth guards; Wooldridge 1e-14 baseline recaptured (~1e-15 shift); a pre-existing degenerate test spec that asserted isfinite on garbage output now asserts the identification error. Dead _within_transform methods removed (twfe, sun_abraham). REGISTRY "Absorbed Fixed Effects" + CHANGELOG + doc-deps updated. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Overall Assessment✅ Looks good — no unmitigated P0/P1 findings found. Executive Summary
MethodologyFinding: Documented absorbed-FE numerical contract change
Finding: Documented FE-spanned regressor snap behavior
Code QualityNo blocking findings. The snap helper is centralized and call sites consistently pass pre-demean norms, absorbed group variables, suffixes, rank-deficiency behavior, and weights. PerformanceNo blocking findings. The worst-case MaintainabilityNo blocking findings. The repeated estimator wiring is somewhat verbose but follows existing local patterns and keeps estimator-specific rank/warning behavior clear. Tech DebtNo untracked correctness debt found. Existing deferred-work handling in SecurityNo findings. The changed code does not introduce secret handling, external I/O, shell execution, or unsafe deserialization paths. Documentation/TestsFinding:
|
…view P3) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment✅ Looks good — no unmitigated P0/P1 findings found. Executive Summary
MethodologyFinding: Documented absorbed-FE MAP numerical/default contract change
Finding: Documented FE-spanned regressor snap behavior
Code QualityNo blocking findings. The helper is centralized and call sites consistently pass pre-demean norms, absorbed group variables, suffixes, rank-deficiency behavior, display names where useful, and weights: PerformanceNo blocking findings. The 10,000-iteration worst case and LSMR confirmation cost are documented as accepted trade-offs in MaintainabilityNo blocking findings. The repeated estimator wiring is verbose but follows existing estimator-specific rank/warning patterns and keeps the snap behavior explicit at each demeaning boundary. Tech DebtNo untracked correctness debt found. I did not see TODO tracking used to mask a P0/P1 issue introduced by this PR. SecurityNo findings. The changes do not add secret handling, external I/O, unsafe deserialization, or shell execution paths. Documentation/TestsFinding: Local tests not executed in this review environment
|
Summary
demean_by_groupsN>=2 branch): each absorbed dimension is factorized ONCE and every MAP sweep is two O(n)np.bincountpasses, replacing pandas rebuilding the group hash table per (iteration x variable x dimension). Convergence semantics unchanged (per-variable sweep order,max|x - x_old| < tol, zero-total-weight inert guard, non-convergence warning). NaN group keys now raiseValueErrorat every arity (pandas groupby silently dropped them;pd.factorizewould code them -1 and mis-index).demean_by_groupsandwithin_transform, matching Rfixest(fixef.iter) andpyfixest(fixef_maxiter). Correlated FE incidence (contiguous unit lifetimes) genuinely needs ~270 iterations; the old cap warned and returned slightly-off residuals. Worst-case trade-off documented in REGISTRY.utils.snap_absorbed_regressors, wired at DiD/MPDabsorb=, TWFE, SunAbraham, Bacon, Wooldridge, and the DiD/MPD/TWFE replicate-refit closures): regressors spanned by the absorbed FEs previously demeaned to junk columns that survived rank detection via column equilibration and silently perturbed identified estimates - up to ~3e-3 on ATT with ~1e14-scale garbage coefficients in the joint-span (a_unit + b_time) case. Detection is two-stage because the MAP stopping rule bounds the last step, not the distance to the limit: fast relative-norm path at 1e-10, then an exact sparse-LSMR span-membership confirmation for truncation-masked candidates;sqrt(w)-weighted norms so zero-weight domain rows cannot mask spanning. Spanned regressors drop deterministically (coefficient NaN) with a cause-specificUserWarninghonoringrank_deficient_action.fe_absorptionsuite baselines, generated on frozen code, CVs 0.1-4.4%): county policy event study 3.87s -> 2.39s (1.6x), firm-churn 2.4M-row SunAbraham 93.0s -> 49.5s (1.9x), scanner TWFE 1.55s -> 0.98s (1.6x), 5M-row geo experiment 2.63s -> 0.97s (2.7x), survey BRR x80 7.39s -> 4.08s (1.8x), small-panel guard unregressed. The correlated-FE stress case now CONVERGES (0.8x wall-clock as the price of correctness: 2.7x the iterations to full convergence plus the exact span confirmation; previously it hit the cap and kept a masked junk column).--check-estimatesgate at 1e-14-1e-16 on all scenarios except the two documented one-time corrections (survey_absorb ATT 9.8e-6, geo_experiment SE 1e-7 - the removed junk influence), passed via the auditable per-run--allow-shiftflag. Estimates are now invariant (~1e-14) to the demeaning tolerance where they previously swung at ~1e-5._within_transformmethods removed (twfe.py, sun_abraham.py).docs/doc-deps.yamlgains the utils.py -> "Absorbed Fixed Effects" REGISTRY mapping.Methodology references (required if estimator / math changes)
scipy.sparse.linalg.lsmrValidation
benchmarks/speed_review/baselines/fe_absorption_{before,after}.json(subprocess-isolated multi-run protocol, CVs recorded); findings table auto-regenerated in docs/performance-plan.md; pyfixest yardstick coefficient parity 6e-13 to 2.7e-9 on exact-estimand scenarios.Security / privacy
Generated with Claude Code