perf(imputation,two_stage): shared MAP demeaning engine + zero-weight replicate fix#615
Conversation
… + fix zero-weight replicate bug Delete the private pandas _iterative_demean twins; covariate/lead paths now call demean_by_groups (factorize-once + bincount + optional Rust kernel, group order [time, unit] preserving the historical sweep). Replace both _iterative_fe bodies with a shared bincount Gauss-Seidel helper (utils._iterative_fe_solve, modeled on spillover._iterative_fe_subset); zero-total-weight groups get NaN FE (keys retained) instead of 0/0 NaN poisoning. Fixes covariate + zero-weight replicate designs (JK1/BRR): previously ALL replicate refits failed -> NaN SE + non-convergence warning storm; main fits with zero-weight rows raised opaque ValueError. TwoStageDiD stage-2 nan-ytilde warning suppressed inside replicate closures only (warn_nan=False); main-fit warning unchanged. max_iter modernized 100 -> 10_000. REGISTRY notes rewritten; TODO row resolved with two follow-up rows (spillover migration, lead-path snap guard). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01X4AzrFUMqJxcUumSH31mSr
Overall Assessment✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
MethodologyP3 — Documented Implementation Deviation / No Action RequiredImpact: ImputationDiD and TwoStageDiD now use shared Locations: Concrete fix: None required. Code QualityNo findings. PerformanceNo findings. The refactor removes duplicated pandas groupby loops and uses the shared factorize-once MAP path. MaintainabilityNo findings. Tech DebtP3 — Remaining Solver Consolidation / Tracked in TODO.mdImpact: Locations: Concrete fix: No action required for this PR; address the tracked TODO items in follow-up work. SecurityNo findings. Documentation/TestsNo findings. The PR updates methodology notes and adds focused tests for shared FE drift, zero-weight NaN FE, replicate finite SEs, and TwoStageDiD warning suppression. Verification not run: |
Summary
_iterative_demeantwins in ImputationDiD/TwoStageDiD; the covariate and pre-trend-lead within-transformations now route through the shared MAP enginedemean_by_groups(factorize-once +np.bincount, optional Rust kernel, one dispatch for all columns; group order[time, unit]preserves the historical time-then-unit sweep)._iterative_febodies with a shared bincount Gauss-Seidel helper (diff_diff.utils._iterative_fe_solve, modeled onspillover._iterative_fe_subset); estimator methods stay as thin wrappers with unchanged signatures.max_itermodernized 100 → 10,000 (the shared-engine / Rfixestconvention).keep_maskonly drops always-treated) previously divided 0/0 in the pandas loops → NaN-poisoned demeaned design → every replicate refit failed insolve_ols(check_finite=True)→ NaN SEs plus a non-convergence warning storm (measured: 198 warnings, SE=NaN). Now: finite SE, zero warnings. Zero-total-weight groups surface as NaN FE with keys retained (SpilloverDiD REGISTRY convention — never a silent finite 0.0). Main fits with zero-weight rows + covariates no longer raise an opaque ValueError.warn_nan=Falsethreaded through the three stage-2 helpers); the main-fit warning is unchanged.Methodology references (required if estimator / math changes)
docs/methodology/REGISTRY.mdsections, Notes updated in this PR).**Note:**labels in both REGISTRY sections, cross-linked to the "Absorbed Fixed Effects with Survey Weights" engine notes.Validation
tests/test_utils.py(newTestIterativeFeSolve: frozen-pandas-loop drift guard at atol=1e-9, zero-weight NaN-FE contract, warn label);tests/test_imputation.py+tests/test_two_stage.py(newTestZeroWeightGroupsestimator-level locks: finite SE on zeroed-PSU replicate designs, no warning storm, main-fit nan-ytilde warning unchanged, NaN cohort cell viaassert_nan_inference; covariate-delta-vs-full-dummy-OLS lock;_iterative_fezero-weight NaN tests). Direct tests of the deleted_iterative_demeanretired with an intentional-coverage-narrowing note (engine-level non-convergence coverage lives intest_utils.py::TestDemeanByGroups).test_methodology_imputation.pyvsdidimputation,test_methodology_two_stage.pyvsdid2s) pass unchanged at ATT abs=1e-6 / SE abs=1e-7 under bothDIFF_DIFF_BACKEND=pythonand=rust; full targeted sweeps (439 tests) green under both backends; 5-scenario before/after benchmark grid on 2.35M-row panels.Security / privacy
Generated with Claude Code