From 988be3511f7c72573311f32adad4ee273a36b16e Mon Sep 17 00:00:00 2001 From: igerber Date: Thu, 2 Jul 2026 19:40:47 -0400 Subject: [PATCH 1/2] perf(utils): factorize-once + bincount MAP demeaning, max_iter 10k parity, FE-spanned regressor guard Rewrites the fixed-effects absorption engine and closes a silent-inference hazard the rewrite's identity gate surfaced. Three coupled changes: 1. demean_by_groups N>=2 branch: each absorbed dimension factorized ONCE (pd.factorize), MAP sweeps via np.bincount group-sum + gather instead of pandas rebuilding the group hash table per (iteration x variable x dimension). Same convergence semantics (per-variable sweep order, max|x - x_old| < tol, zero-total-weight inert guard, warn_if_not_converged). NaN group keys raise ValueError at every arity (one-way included; pandas groupby silently dropped NaN keys, factorize would code them -1). Numerical contract: bincount accumulation is not Kahan-compensated, so outputs agree with the prior pandas loop to ~1e-10 order, not bit-for-bit. 2. max_iter raised 100 -> 10,000 in BOTH demean_by_groups and within_transform (fixest fixef.iter / pyfixest fixef_maxiter parity). Correlated FE incidence (contiguous unit lifetimes) genuinely needs ~270 iterations; the old cap warned and returned slightly-off residuals. 3. FE-spanned regressor guard (new utils.snap_absorbed_regressors + pre_demean_norms + _fe_span_residual_norm), wired at DiD/MPD absorb=, TWFE, SunAbraham, Bacon, Wooldridge AND the DiD/MPD/TWFE replicate-refit closures: regressors spanned by the absorbed FEs previously demeaned to junk columns that survived rank detection via column equilibration and perturbed identified estimates - up to ~3e-3 on ATT with ~1e14-scale garbage coefficients in the joint-span (a_unit + b_time) case. Detection is TWO-STAGE because the MAP stopping rule bounds the last step, not the distance to the limit: fast relative-norm path at 1e-10, then an exact sparse-LSMR span-membership confirmation (on the already-demeaned column, so it converges in a few iterations) for truncation-masked candidates in (1e-10, 1e-3]. Norms are sqrt(w)-weighted so zero-weight domain rows cannot mask spanning on the positive-weight sample. Spanned regressors drop deterministically (coefficient NaN) with a cause-specific UserWarning under rank_deficient_action="warn"; identified estimates become invariant (~1e-14) to the demeaning tolerance where they previously swung at ~1e-5. Measured (fe_absorption suite, AFTER baselines committed, generated on frozen code): 1.6-2.7x end-to-end on the headline scenarios; the correlated-FE stress case now converges (0.8x wall-clock as the price of correctness: 2.7x the iterations to full convergence + the exact span confirmation). Identity gate: 1e-14-1e-16 everywhere except the documented one-time corrections (survey_absorb ATT 9.8e-6, geo_experiment SE 1e-7 - junk columns removed). 875 targeted tests pass; byte-identity locks converted to allclose + full-dummy ground-truth guards; Wooldridge 1e-14 baseline recaptured (~1e-15 shift); a pre-existing degenerate test spec that asserted isfinite on garbage output now asserts the identification error. Dead _within_transform methods removed (twfe, sun_abraham). REGISTRY "Absorbed Fixed Effects" + CHANGELOG + doc-deps updated. Co-Authored-By: Claude Fable 5 --- CHANGELOG.md | 41 ++ .../baselines/fe_absorption_after.json | 151 ++++ .../speed_review/bench_fe_absorption.py | 27 +- diff_diff/bacon.py | 15 + diff_diff/estimators.py | 62 ++ diff_diff/sun_abraham.py | 30 +- diff_diff/twfe.py | 64 +- diff_diff/utils.py | 308 +++++++- diff_diff/wooldridge.py | 22 +- docs/doc-deps.yaml | 4 + docs/methodology/REGISTRY.md | 59 +- docs/performance-plan.md | 44 +- tests/test_bacon.py | 299 +++----- tests/test_estimators.py | 345 ++++++++- tests/test_sun_abraham.py | 44 ++ tests/test_utils.py | 672 +++++++++++++----- tests/test_wooldridge.py | 428 +++++++---- 17 files changed, 1986 insertions(+), 629 deletions(-) create mode 100644 benchmarks/speed_review/baselines/fe_absorption_after.json diff --git a/CHANGELOG.md b/CHANGELOG.md index dcb4684ac..9c3e05b0e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,47 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +### Changed +- **FE-absorption demeaning rewritten: factorize-once + `np.bincount` method of alternating + projections** (`demean_by_groups` / `within_transform`). Each absorbed dimension is factorized + once and group means are formed via `np.bincount` instead of re-hashing the group keys with + `pandas.groupby().transform("mean")` on every iteration x variable x dimension. Affects every + fit of `TwoWayFixedEffects`, `SunAbraham`, `BaconDecomposition`, `WooldridgeDiD`, + `DifferenceInDifferences`/`MultiPeriodDiD` with `absorb=`, and the survey replicate-refit path. + Convergence semantics (per-variable sweep order, `max|x - x_old| < tol` criterion, zero-total- + weight group guard, non-convergence warning) are unchanged. **Numerical contract**: bincount + accumulation is not Kahan-compensated the way pandas' grouped mean is, so demeaned values agree + with the prior implementation to ~1e-10 order rather than bit-for-bit; estimator estimates are + validated unchanged at the benchmark identity gate + (`bench_fe_absorption.py --check-estimates`, coefficients atol=1e-9). +- **MAP iteration cap raised from 100 to 10,000** (`max_iter` in both `demean_by_groups` and + `within_transform`), matching the R `fixest` (`fixef.iter`) and `pyfixest` (`fixef_maxiter`) + defaults. Correlated FE incidence (e.g. stores active in contiguous week windows) genuinely + needs hundreds of iterations; the old cap made such fits warn and return slightly-off residuals. + Worst-case trade-off accepted deliberately: a truly non-convergent input now iterates 10,000x + before warning. Documented in `docs/methodology/REGISTRY.md` "Absorbed Fixed Effects". +- **NaN in an absorbed group column now raises `ValueError`** naming the column (previously the + unweighted path silently NaN-poisoned the affected rows and the weighted path silently passed + them through un-demeaned). +- **FE-spanned regressors are now snapped to exact zero before the solver** (new + `diff_diff.utils.snap_absorbed_regressors`, wired at every demeaning consumer: DiD/MPD + `absorb=`, TWFE, SunAbraham, BaconDecomposition, WooldridgeDiD, and the DiD/MPD/TWFE + replicate-refit closures). A regressor spanned by the absorbed FEs (treated-group indicator with + the unit dimension absorbed, a period dummy with time absorbed, a unit-constant covariate, or an + additive `a_unit + b_time` combination) demeans to numerical junk that previously survived the + rank check via column equilibration and perturbed the identified coefficients - up to ~3e-3 on + ATT with a ~1e14-scale garbage coefficient reported for the spanned column in the joint-span + case. Detection is two-stage (fast relative-norm path at 1e-10, then an exact sparse-LSMR + span-membership confirmation for candidates masked by MAP truncation on unbalanced/correlated + panels); norms are `sqrt(w)`-weighted under WLS so zero-weight domain rows cannot mask spanning. + Spanned regressors are dropped deterministically (coefficient NaN) with a cause-specific + `UserWarning` naming them under `rank_deficient_action="warn"`. + **Point-estimate note**: designs carrying such spanned regressors (e.g. 2x2 DiD with + `absorb=[unit_dim, time_dim]`) see identified estimates shift by up to ~1e-5 relative to + v3.6.1 - the removal of the junk direction, documented in REGISTRY "Absorbed Fixed Effects"; + estimates are now invariant (~1e-14) to the demeaning tolerance where they previously swung + at ~1e-5. + ### Added - **FE-absorption benchmark suite** (`benchmarks/speed_review/bench_fe_absorption.py`, scenarios 7-13 in `docs/performance-scenarios.md`): seven realistic workloads (county policy event study, diff --git a/benchmarks/speed_review/baselines/fe_absorption_after.json b/benchmarks/speed_review/baselines/fe_absorption_after.json new file mode 100644 index 000000000..30e2ec590 --- /dev/null +++ b/benchmarks/speed_review/baselines/fe_absorption_after.json @@ -0,0 +1,151 @@ +{ + "suite": "fe_absorption", + "platform": "macOS-26.5.1-arm64-arm-64bit-Mach-O", + "python": "3.14.4", + "versions": { + "diff_diff": "3.6.1", + "numpy": "2.5.0", + "pandas": "3.0.3" + }, + "backend_requested": "python", + "repeats": 3, + "quick": false, + "tolerances": { + "att_atol": 1e-09, + "se_rtol": 1e-07, + "se_rtol_survey": 1e-06, + "gate_exempt": [ + "tail_stress" + ] + }, + "results": [ + { + "scenario": "county_policy", + "n_obs": 177289, + "checksum": 6315.071127371084, + "att": 0.2803785413990301, + "se": 0.031244864610215972, + "fit_median_s": 2.3882, + "fit_min_s": 2.3505, + "fit_max_s": 2.4948, + "fit_cv": 0.0257, + "noisy": false, + "n_timed_fits": 9, + "datagen_s": 0.004, + "peak_rss_mb": 3177.5, + "backend_resolved": "python", + "warnings": [] + }, + { + "scenario": "firm_churn", + "n_obs": 2400000, + "checksum": 240997.06223418942, + "att": 0.3144327184497798, + "se": 0.005837281701253927, + "fit_median_s": 49.502, + "fit_min_s": 48.4532, + "fit_max_s": 50.3118, + "fit_cv": 0.0189, + "noisy": false, + "n_timed_fits": 3, + "datagen_s": 0.07, + "peak_rss_mb": 15900.8, + "backend_resolved": "python", + "warnings": [] + }, + { + "scenario": "scanner_twfe", + "n_obs": 3255000, + "checksum": 76832.22571739106, + "att": 0.251164131495566, + "se": 0.002414938742160613, + "fit_median_s": 0.9834, + "fit_min_s": 0.9825, + "fit_max_s": 0.9845, + "fit_cv": 0.001, + "noisy": false, + "n_timed_fits": 3, + "datagen_s": 0.077, + "peak_rss_mb": 976.9, + "backend_resolved": "python", + "warnings": [] + }, + { + "scenario": "geo_experiment", + "n_obs": 5000000, + "checksum": 443696.3834000407, + "att": 0.25194372599916653, + "se": 0.0017886314723251327, + "fit_median_s": 0.973, + "fit_min_s": 0.968, + "fit_max_s": 0.9759, + "fit_cv": 0.0041, + "noisy": false, + "n_timed_fits": 3, + "datagen_s": 0.062, + "peak_rss_mb": 1664.8, + "backend_resolved": "python", + "warnings": [ + "Rank-deficient design matrix: dropping 2 of 4 columns (column 1, column 2). Coefficients for these columns are", + "Regressor(s) ['treated', 'post'] are collinear with the absorbed fixed effects (absorb=['store', 'week']): the" + ] + }, + { + "scenario": "survey_absorb", + "n_obs": 500000, + "checksum": -176482.6014001448, + "att": 0.24554568144130481, + "se": 0.007662698712656388, + "fit_median_s": 4.0767, + "fit_min_s": 4.0736, + "fit_max_s": 4.0778, + "fit_cv": 0.0005, + "noisy": false, + "n_timed_fits": 3, + "datagen_s": 0.107, + "peak_rss_mb": 3589.2, + "backend_resolved": "python", + "warnings": [ + "Rank-deficient design matrix: dropping 2 of 4 columns (column 1, column 2). Coefficients for these columns are", + "Regressor(s) ['treated', 'post'] are collinear with the absorbed fixed effects (absorb=['state', 'month']): th" + ] + }, + { + "scenario": "tail_stress", + "n_obs": 5000000, + "checksum": 376703.7248141762, + "att": 0.2502082560083505, + "se": 0.004392830594147583, + "fit_median_s": 31.7648, + "fit_min_s": 31.4957, + "fit_max_s": 31.8372, + "fit_cv": 0.0057, + "noisy": false, + "n_timed_fits": 3, + "datagen_s": 0.067, + "peak_rss_mb": 1840.7, + "backend_resolved": "python", + "warnings": [ + "Rank-deficient design matrix: dropping 2 of 4 columns (column 1, column 2). Coefficients for these columns are", + "Regressor(s) ['treated', 'post'] are collinear with the absorbed fixed effects (absorb=['store', 'week']): the" + ] + }, + { + "scenario": "guard_small", + "n_obs": 20000, + "checksum": -571.6190664591859, + "att": 0.2515087011451766, + "se": 0.028726073333820948, + "fit_median_s": 0.0035, + "fit_min_s": 0.0034, + "fit_max_s": 0.0039, + "fit_cv": 0.0437, + "noisy": false, + "n_timed_fits": 21, + "datagen_s": 0.001, + "peak_rss_mb": 145.1, + "backend_resolved": "python", + "warnings": [] + } + ] +} \ No newline at end of file diff --git a/benchmarks/speed_review/bench_fe_absorption.py b/benchmarks/speed_review/bench_fe_absorption.py index 316801d73..f6867628e 100644 --- a/benchmarks/speed_review/bench_fe_absorption.py +++ b/benchmarks/speed_review/bench_fe_absorption.py @@ -189,9 +189,15 @@ def _measure(scenario, repeats, backend, quick): # --------------------------------------------------------------- identity gate -def check_estimates(results, before_path): +def check_estimates(results, before_path, allow_shift=()): """Compare ATT/SE per scenario against a prior baseline JSON. Returns the number of failures (0 = pass). tail_stress reports its delta, ungated. + + allow_shift: scenario ids whose estimates are EXPECTED to move this run + because of a documented estimate-affecting fix (the reason must be in the + PR/CHANGELOG). Their deltas are printed loudly but do not fail the gate - + this is a per-run flag, never a permanent exemption, so future runs + re-gate them against the refreshed baseline. """ before = {r["scenario"]: r for r in json.loads(Path(before_path).read_text())["results"]} failures = 0 @@ -210,6 +216,12 @@ def check_estimates(results, before_path): f"|d att|={d_att:.3e} rel d se={rel_se:.3e}" ) continue + if scen in allow_shift: + print( + f" identity {scen:16s} ALLOWED SHIFT (documented fix): " + f"|d att|={d_att:.3e} rel d se={rel_se:.3e}" + ) + continue ok = d_att <= ATT_ATOL and rel_se <= se_rtol status = "ok" if ok else "FAIL" print( @@ -239,6 +251,14 @@ def main(): metavar="BEFORE_JSON", help="compare ATT/SE against a prior baseline at the unified tolerances", ) + ap.add_argument( + "--allow-shift", + metavar="SCEN[,SCEN...]", + default="", + help="comma-separated scenarios whose estimates are expected to move " + "this run due to a documented estimate-affecting fix (printed loudly, " + "not failed; per-run only - never a permanent exemption)", + ) args = ap.parse_args() if args.worker: @@ -301,7 +321,10 @@ def main(): exit_code = 0 if args.check_estimates: print("\nidentity gate vs", args.check_estimates) - failures = check_estimates([r for r in results if "error" not in r], args.check_estimates) + allow = tuple(s for s in args.allow_shift.split(",") if s) + failures = check_estimates( + [r for r in results if "error" not in r], args.check_estimates, allow + ) if failures: print(f"IDENTITY GATE FAILED: {failures} scenario(s) moved") exit_code = 1 diff --git a/diff_diff/bacon.py b/diff_diff/bacon.py index 4276cfab3..28b2b5c98 100644 --- a/diff_diff/bacon.py +++ b/diff_diff/bacon.py @@ -18,6 +18,7 @@ import pandas as pd from diff_diff.results import _format_survey_block +from diff_diff.utils import pre_demean_norms, snap_absorbed_regressors from diff_diff.utils import within_transform as _within_transform_util @@ -795,6 +796,7 @@ def _compute_twfe( ) -> float: """Compute TWFE estimate using within-transformation.""" # Apply two-way within transformation (weighted if survey weights provided) + _pre_norms = pre_demean_norms(df, [treat_col], weights=weights) df_dm = _within_transform_util( df, [outcome, treat_col], @@ -803,6 +805,19 @@ def _compute_twfe( suffix="_within", weights=weights, ) + # Snap an FE-spanned treatment to exact zero: the d_var == 0 guard + # below then returns its deterministic 0.0 (with the cause warning) + # instead of an arbitrary junk/junk division. + snap_absorbed_regressors( + df_dm, + [treat_col], + _pre_norms, + absorbed_desc=f"unit '{unit}' and time '{time}' fixed effects", + group_vars=[unit, time], + suffix="_within", + display_names={treat_col: "treatment"}, + weights=weights, + ) # Extract within-transformed values y_within = df_dm[f"{outcome}_within"].values diff --git a/diff_diff/estimators.py b/diff_diff/estimators.py index b48ddf643..2c996bf36 100644 --- a/diff_diff/estimators.py +++ b/diff_diff/estimators.py @@ -30,7 +30,9 @@ WildBootstrapResults, demean_by_groups, fe_dummy_names, + pre_demean_norms, safe_inference, + snap_absorbed_regressors, validate_binary, validate_covariate_names, validate_design_term_names, @@ -454,6 +456,8 @@ def fit( float ) * working_data[time].values.astype(float) vars_to_demean = [outcome, treatment, time, "_treat_time"] + (covariates or []) + _absorb_regressors = vars_to_demean[1:] # everything except outcome + _pre_norms = pre_demean_norms(working_data, _absorb_regressors, weights=survey_weights) # Method of alternating projections: for N > 1 absorbed dimensions a # single sequential sweep is only exact on balanced (orthogonal-FE) # panels; demean_by_groups iterates to the exact (W)LS-FWL residual. @@ -464,6 +468,19 @@ def fit( inplace=True, weights=survey_weights, ) + # FE-spanned regressors demean to numerical junk, not exact zero; + # snap them so rank handling drops them deterministically instead + # of the junk direction perturbing the identified coefficients. + snap_absorbed_regressors( + working_data, + _absorb_regressors, + _pre_norms, + absorbed_desc=f"absorb={list(absorb)}", + group_vars=list(absorb), + rank_deficient_action=self.rank_deficient_action, + display_names={"_treat_time": f"{treatment}:{time}"}, + weights=survey_weights, + ) n_absorbed_effects += n_fe absorbed_vars = list(absorb) @@ -638,7 +655,21 @@ def _refit_did_absorb(w_r): float ) vars_dm = [outcome, treatment, time, "_treat_time"] + (covariates or []) + _rep_norms = pre_demean_norms(wd, vars_dm[1:], weights=w_nz) wd, _ = demean_by_groups(wd, vars_dm, _absorb_list, inplace=True, weights=w_nz) + # A regressor can become FE-spanned WITHIN a replicate's + # effective sample (half-sample zeroing): snap silently so the + # replicate solve drops it (NaN replicate -> invalid) instead + # of consuming a junk direction. + snap_absorbed_regressors( + wd, + vars_dm[1:], + _rep_norms, + absorbed_desc=f"absorb={_absorb_list}", + group_vars=_absorb_list, + rank_deficient_action="silent", + weights=w_nz, + ) y_r = wd[outcome].values.astype(float) d_r = wd[treatment].values.astype(float) t_r = wd[time].values.astype(float) @@ -1607,6 +1638,8 @@ def fit( # type: ignore[override] + [f"_did_interact_{p}" for p in non_ref_periods] + (covariates or []) ) + _absorb_regressors = vars_to_demean[1:] # everything except outcome + _pre_norms = pre_demean_norms(working_data, _absorb_regressors, weights=survey_weights) # Method of alternating projections (exact for unbalanced panels; a # single sequential sweep is exact only on balanced orthogonal-FE panels). working_data, n_fe = demean_by_groups( @@ -1616,6 +1649,24 @@ def fit( # type: ignore[override] inplace=True, weights=survey_weights, ) + # Snap FE-spanned regressors (e.g. period dummies when a time + # dimension is absorbed) to exact zero: rank handling then drops + # them deterministically instead of their junk directions + # perturbing the identified interaction coefficients. + snap_absorbed_regressors( + working_data, + _absorb_regressors, + _pre_norms, + absorbed_desc=f"absorb={list(absorb)}", + group_vars=list(absorb), + rank_deficient_action=self.rank_deficient_action, + display_names={ + "_did_treatment": treatment, + **{f"_did_period_{p}": f"{time}=={p}" for p in non_ref_periods}, + **{f"_did_interact_{p}": f"{treatment}:{time}=={p}" for p in non_ref_periods}, + }, + weights=survey_weights, + ) n_absorbed_effects += n_fe # Extract outcome and treatment (may be demeaned if absorb was used) @@ -1840,7 +1891,18 @@ def _refit_mp_absorb(w_r): + [f"_did_interact_{p}" for p in non_ref_periods] + (covariates or []) ) + _rep_norms_mp = pre_demean_norms(wd, vars_dm_[1:], weights=w_nz) wd, _ = demean_by_groups(wd, vars_dm_, _absorb_list_mp, inplace=True, weights=w_nz) + # Replicate-local FE spanning: snap silently (see DiD closure). + snap_absorbed_regressors( + wd, + vars_dm_[1:], + _rep_norms_mp, + absorbed_desc=f"absorb={_absorb_list_mp}", + group_vars=_absorb_list_mp, + rank_deficient_action="silent", + weights=w_nz, + ) y_r = wd[outcome].values.astype(float) d_r = wd["_did_treatment"].values.astype(float) X_r = np.column_stack([np.ones(len(y_r)), d_r]) diff --git a/diff_diff/sun_abraham.py b/diff_diff/sun_abraham.py index 931831f8c..c14286739 100644 --- a/diff_diff/sun_abraham.py +++ b/diff_diff/sun_abraham.py @@ -20,7 +20,9 @@ from diff_diff.linalg import LinearRegression from diff_diff.results import _format_survey_block, _get_significance_stars from diff_diff.utils import ( + pre_demean_norms, safe_inference, + snap_absorbed_regressors, ) from diff_diff.utils import ( within_transform as _within_transform_util, @@ -1491,10 +1493,24 @@ def _fit_saturated_regression( variables_to_demean = [outcome] + interaction_cols if covariates: variables_to_demean.extend(covariates) + _sa_regressors = variables_to_demean[1:] # everything except outcome + _pre_norms = pre_demean_norms(df, _sa_regressors, weights=survey_weights) df_demeaned = _within_transform_util( df, variables_to_demean, unit, time, suffix="_dm", weights=survey_weights ) + # Snap FE-spanned regressors (e.g. a unit-constant covariate) to + # exact zero so rank handling drops them deterministically. + snap_absorbed_regressors( + df_demeaned, + _sa_regressors, + _pre_norms, + absorbed_desc=f"unit '{unit}' and time '{time}' fixed effects", + group_vars=[unit, time], + rank_deficient_action=self.rank_deficient_action, + suffix="_dm", + weights=survey_weights, + ) X_cols = [f"{col}_dm" for col in interaction_cols] if covariates: @@ -1585,20 +1601,6 @@ def _fit_saturated_regression( return cohort_effects, cohort_ses, vcov_cohort, coef_index_map, bm_artifacts - def _within_transform( - self, - df: pd.DataFrame, - variables: List[str], - unit: str, - time: str, - ) -> pd.DataFrame: - """ - Apply two-way within transformation to remove unit and time fixed effects. - - y_it - y_i. - y_.t + y_.. - """ - return _within_transform_util(df, variables, unit, time, suffix="_dm") - def _compute_iw_effects( self, df: pd.DataFrame, diff --git a/diff_diff/twfe.py b/diff_diff/twfe.py index 750a12040..70b7ccd62 100644 --- a/diff_diff/twfe.py +++ b/diff_diff/twfe.py @@ -16,6 +16,8 @@ from diff_diff.results import DiDResults from diff_diff.utils import ( fe_dummy_names, + pre_demean_norms, + snap_absorbed_regressors, validate_covariate_names, validate_design_term_names, ) @@ -382,6 +384,8 @@ def fit( # type: ignore[override] # demean outcome, covariates, AND interaction in a single pass # so the regression uses demeaned regressors (FWL theorem). all_vars = [outcome] + (covariates or []) + ["_treatment_post"] + _twfe_regressors = all_vars[1:] # everything except outcome + _pre_norms = pre_demean_norms(data, _twfe_regressors, weights=survey_weights) data_demeaned = _within_transform_util( data, all_vars, @@ -390,6 +394,19 @@ def fit( # type: ignore[override] suffix="_demeaned", weights=survey_weights, ) + # Snap FE-spanned regressors (e.g. a unit-constant covariate) to + # exact zero so rank handling drops them deterministically. + snap_absorbed_regressors( + data_demeaned, + _twfe_regressors, + _pre_norms, + absorbed_desc=f"unit '{unit}' and time '{time}' fixed effects", + group_vars=[unit, time], + rank_deficient_action=self.rank_deficient_action, + suffix="_demeaned", + display_names={"_treatment_post": f"{treatment}:{time}"}, + weights=survey_weights, + ) y = data_demeaned[f"{outcome}_demeaned"].values X_list = [data_demeaned["_treatment_post_demeaned"].values] if covariates: @@ -595,6 +612,7 @@ def _refit_twfe(w_r): nz = w_r > 0 data_nz = data[nz].copy() w_nz = w_r[nz] + _rep_norms_twfe = pre_demean_norms(data_nz, _all_vars_twfe[1:], weights=w_nz) data_dem_r = _within_transform_util( data_nz, _all_vars_twfe, @@ -603,6 +621,17 @@ def _refit_twfe(w_r): suffix="_demeaned", weights=w_nz, ) + # Replicate-local FE spanning: snap silently (see DiD closure). + snap_absorbed_regressors( + data_dem_r, + _all_vars_twfe[1:], + _rep_norms_twfe, + absorbed_desc=f"unit '{unit}' and time '{time}' fixed effects", + group_vars=[unit, time], + rank_deficient_action="silent", + suffix="_demeaned", + weights=w_nz, + ) y_r = data_dem_r[f"{outcome}_demeaned"].values X_list_r = [data_dem_r["_treatment_post_demeaned"].values] for cov_ in _covariates_twfe: @@ -732,41 +761,6 @@ def _refit_twfe(w_r): self.is_fitted_ = True return self.results_ - def _within_transform( - self, - data: pd.DataFrame, - outcome: str, - unit: str, - time: str, - covariates: Optional[List[str]] = None, - ) -> pd.DataFrame: - """ - Apply within transformation to remove unit and time fixed effects. - - This implements the standard two-way within transformation: - y_it - y_i. - y_.t + y_.. - - Parameters - ---------- - data : pd.DataFrame - Panel data. - outcome : str - Outcome variable name. - unit : str - Unit identifier column. - time : str - Time period column. - covariates : list, optional - Covariate column names. - - Returns - ------- - pd.DataFrame - Data with demeaned variables. - """ - variables = [outcome] + (covariates or []) - return _within_transform_util(data, variables, unit, time, suffix="_demeaned") - def _check_staggered_treatment( self, data: pd.DataFrame, diff --git a/diff_diff/utils.py b/diff_diff/utils.py index 0afddb711..e4a44c201 100644 --- a/diff_diff/utils.py +++ b/diff_diff/utils.py @@ -2648,7 +2648,7 @@ def demean_by_groups( inplace: bool = False, suffix: str = "", weights: Optional[np.ndarray] = None, - max_iter: int = 100, + max_iter: int = 10_000, tol: float = 1e-10, ) -> Tuple[pd.DataFrame, int]: """N-way within transformation via the method of alternating projections (MAP). @@ -2685,9 +2685,12 @@ def demean_by_groups( weights : np.ndarray, optional Observation weights. When provided, uses weighted group means ``sum(w*x)/sum(w)`` per group and converges to the WLS-FWL residual. - max_iter : int, default 100 - Maximum number of alternating-projection iterations per variable. Emits a - single ``UserWarning`` per call listing any variable that fails to converge. + max_iter : int, default 10_000 + Maximum number of alternating-projection iterations per variable + (matching the R ``fixest`` ``fixef.iter`` / ``pyfixest`` + ``fixef_maxiter`` defaults; correlated FE incidence can genuinely + require hundreds of iterations). Emits a single ``UserWarning`` per + call listing any variable that fails to converge. tol : float, default 1e-10 Convergence tolerance on the max absolute change across the iterate. @@ -2699,6 +2702,25 @@ def demean_by_groups( Number of absorbed fixed effects, ``sum_d (nunique_d - 1)`` over ``group_vars`` (the standard DOF-accounting convention). + Raises + ------ + ValueError + If ``group_vars`` is empty, or if any absorbed group column contains + NaN (checked for every ``len(group_vars)``, one-way included) — + pandas groupby silently drops NaN keys and ``pd.factorize`` codes + them ``-1``, which would silently mis-index the group means, so + missing group keys must be dropped or imputed by the caller. + + Notes + ----- + For ``len(group_vars) >= 2`` each dimension is factorized once and the MAP + sweeps compute group means via ``np.bincount`` (two O(n) passes per + dimension). Plain bincount summation is not compensated the way pandas' + grouped mean is, so results agree with a pandas ``groupby`` implementation + to ~1e-10 order (drift compounds across iterations), not bit-for-bit. See + ``docs/methodology/REGISTRY.md`` "Absorbed Fixed Effects with Survey + Weights". + Examples -------- >>> df, n_fe = demean_by_groups(df, ['y', 'x'], ['unit', 'period']) @@ -2706,8 +2728,22 @@ def demean_by_groups( if not group_vars: raise ValueError("demean_by_groups requires at least one grouping variable.") + # NaN group keys are rejected for EVERY N (one-way included): pandas + # groupby silently drops NaN keys (NaN-poisoning the affected rows + # unweighted, passing them through un-demeaned weighted) and + # pd.factorize codes them -1, which would mis-index the last group's + # mean in the bincount sweeps below. + for g in group_vars: + if pd.isna(data[g].values).any(): + raise ValueError( + f"demean_by_groups: absorbed group column '{g}' contains NaN. " + "Drop or impute missing group keys before absorbing this " + "dimension." + ) + # One dimension: the within projection is exact in a single pass. Delegate so - # the one-way callers stay byte-identical to demean_by_group. + # the one-way callers stay byte-identical to demean_by_group (for valid, + # NaN-free group keys). if len(group_vars) == 1: return demean_by_group( data, @@ -2719,38 +2755,49 @@ def demean_by_groups( ) # N >= 2: method of alternating projections. Per-variable independent - # convergence (outer loop = variable, inner = iterations) — this mirrors - # within_transform's weighted loop exactly so that the two-way weighted case - # (group_vars == [unit, time]) is bit-identical. - n_effects = sum(int(data[g].nunique()) - 1 for g in group_vars) + # convergence (outer loop = variable, inner = iterations), same sweep order + # over group_vars as within_transform's historical unit-then-time loop. + # + # Each absorbed dimension is factorized ONCE; every MAP sweep is then two + # O(n) passes (np.bincount group-sum + fancy-index gather) with no group + # re-hashing — pandas groupby.transform rebuilt its hash table on every + # (iteration x variable x dimension) call. Same precompute pattern as + # survey._PsuScaffolding. Bincount accumulation is not compensated the way + # pandas' grouped mean is: results agree with the prior implementation to + # ~1e-10 order, not bit-for-bit (see REGISTRY "Absorbed Fixed Effects"). target_cols = [var if not suffix else f"{var}{suffix}" for var in variables] - group_arrays = [data[g].values for g in group_vars] + codes_list: List[np.ndarray] = [] + n_groups_list: List[int] = [] + for g in group_vars: + # NaN keys already rejected above, so codes are all >= 0 here. + codes, uniques = pd.factorize(data[g].values, sort=False) + codes_list.append(codes.astype(np.intp, copy=False)) + n_groups_list.append(len(uniques)) + n_effects = sum(n_g - 1 for n_g in n_groups_list) demeaned_values: List[np.ndarray] = [] non_converged_vars: List[str] = [] if weights is not None: w = np.asarray(weights, dtype=np.float64) # Cache per-group weight sums once (invariant across variables/iterations). - w_sums = [pd.Series(w).groupby(g).transform("sum").values for g in group_arrays] - - def _weighted_group_demean(x, groups, w, w_sum): - wx_sum = pd.Series(w * x).groupby(groups).transform("sum").values - # Guard zero-total-weight groups (survey subpopulation / zero-weight - # domain padding): leave such rows unchanged (mean 0) so they remain - # inert in the downstream WLS instead of poisoning the design with - # NaN/Inf. For positive-weight groups this is bit-identical to wx/w_sum. - means = np.divide( - wx_sum, w_sum, out=np.zeros_like(wx_sum, dtype=np.float64), where=w_sum > 0 - ) - return x - means + w_sums = [ + np.bincount(codes, weights=w, minlength=n_g) + for codes, n_g in zip(codes_list, n_groups_list) + ] for var in variables: x = data[var].values.astype(np.float64) converged = False for _iter in range(max_iter): x_old = x.copy() - for groups, w_sum in zip(group_arrays, w_sums): - x = _weighted_group_demean(x, groups, w, w_sum) + for codes, n_g, w_sum in zip(codes_list, n_groups_list, w_sums): + wx_sum = np.bincount(codes, weights=w * x, minlength=n_g) + # Guard zero-total-weight groups (survey subpopulation / + # zero-weight domain padding): leave such rows unchanged + # (mean 0) so they remain inert in the downstream WLS + # instead of poisoning the design with NaN/Inf. + means = np.divide(wx_sum, w_sum, out=np.zeros_like(wx_sum), where=w_sum > 0) + x = x - means[codes] if np.max(np.abs(x - x_old)) < tol: converged = True break @@ -2758,13 +2805,18 @@ def _weighted_group_demean(x, groups, w, w_sum): non_converged_vars.append(var) demeaned_values.append(x) else: + counts = [ + np.bincount(codes, minlength=n_g).astype(np.float64) + for codes, n_g in zip(codes_list, n_groups_list) + ] for var in variables: x = data[var].values.astype(np.float64) converged = False for _iter in range(max_iter): x_old = x.copy() - for groups in group_arrays: - x = x - pd.Series(x).groupby(groups).transform("mean").values + for codes, n_g, cnt in zip(codes_list, n_groups_list, counts): + means = np.bincount(codes, weights=x, minlength=n_g) / cnt + x = x - means[codes] if np.max(np.abs(x - x_old)) < tol: converged = True break @@ -2796,6 +2848,191 @@ def _weighted_group_demean(x, groups, w, w_sum): return pd.concat([data, new_block], axis=1), n_effects +def pre_demean_norms( + data: pd.DataFrame, + regressors: List[str], + weights: Optional[np.ndarray] = None, +) -> Dict[str, float]: + """L2 norms of regressor columns BEFORE demeaning, for + :func:`snap_absorbed_regressors` (capture before an in-place demean + overwrites the source values). + + When ``weights`` is given, norms are ``||sqrt(w) * x||`` — the same + effective-sample scaling ``solve_ols`` applies in WLS, so the snap + decision is made on the sample the solver actually sees (zero-weight + domain rows, which the weighted demean leaves inert by design, must not + mask a regressor that is FE-spanned on the positive-weight sample). + """ + sw = None if weights is None else np.sqrt(np.asarray(weights, dtype=np.float64)) + out: Dict[str, float] = {} + for v in regressors: + x = data[v].to_numpy(dtype=np.float64) + out[v] = float(np.linalg.norm(x if sw is None else sw * x)) + return out + + +def _fe_span_residual_norm( + x_eff: np.ndarray, + group_codes: List[np.ndarray], + n_groups: List[int], + sqrt_w: Optional[np.ndarray], +) -> float: + """Exact residual norm of an (effective-sample) column projected onto the + absorbed-FE span, via sparse LSMR. + + The MAP demean's per-iteration stopping rule bounds the last STEP, not the + distance to the limit, so a truncated iterate of an exactly-spanned column + can carry a structured residual orders of magnitude above the convergence + tolerance (slow-convergence regimes: unbalanced, correlated FE incidence). + A Krylov solve on the sparse FE incidence decides span membership exactly + (up to fp), and because ``x_eff`` is the ALREADY-DEMEANED column (nearly + orthogonal to the span), LSMR converges in a handful of iterations. + + Returns the achieved residual norm ``||x_eff - A @ sol||`` — an upper + bound on the true projection residual, so an unconverged LSMR errs toward + KEEPING the column (the pre-guard status quo), never toward over-snapping. + """ + from scipy.sparse import csr_matrix, hstack + from scipy.sparse.linalg import lsmr + + n = x_eff.shape[0] + rows = np.arange(n) + ones = np.ones(n) if sqrt_w is None else sqrt_w + blocks = [ + csr_matrix((ones, (rows, codes)), shape=(n, n_g)) + for codes, n_g in zip(group_codes, n_groups) + ] + a_mat = hstack(blocks, format="csr") + sol = lsmr(a_mat, x_eff, atol=1e-13, btol=1e-13)[0] + return float(np.linalg.norm(x_eff - a_mat @ sol)) + + +def snap_absorbed_regressors( + demeaned: pd.DataFrame, + regressors: List[str], + pre_norms: Dict[str, float], + absorbed_desc: str, + group_vars: List[str], + rank_deficient_action: str = "warn", + suffix: str = "", + display_names: Optional[Dict[str, str]] = None, + rel_tol: float = 1e-10, + weights: Optional[np.ndarray] = None, + screen_tol: float = 1e-3, +) -> List[str]: + """Zero out regressors that were absorbed (spanned) by the fixed effects. + + A regressor lying exactly in the span of the absorbed FE dummies demeans + to numerical junk rather than exact zero. Such a column must not reach the + solver: column equilibration re-inflates it to unit norm, it passes the + rank check as linearly independent, and its arbitrary direction perturbs + the OTHER coefficients (measured up to ~3e-3 on ATT with a garbage + ~1e14-scale coefficient reported for the spanned column itself). Snapping + it to exact zero makes the downstream rank-deficiency handling drop it + deterministically (coefficient NaN) — the documented contract for + FE-collinear regressors. See ``docs/methodology/REGISTRY.md`` "Absorbed + Fixed Effects". + + Detection is TWO-STAGE, because MAP truncation bounds the last iteration + step, not the distance to the limit — a spanned column in a + slow-convergence regime (unbalanced, correlated FE incidence) can stop + with a structured residual far above ``rel_tol``: + + 1. fast path: relative demeaned norm ``<= rel_tol`` snaps immediately + (covers exactly-converged spanned columns); + 2. candidates with relative norm in ``(rel_tol, screen_tol]`` get an + exact span-membership confirmation via sparse LSMR on the FE incidence + (:func:`_fe_span_residual_norm`) and snap iff the true projection + residual is ``<= rel_tol`` — genuinely identified low-within-variation + regressors keep their (real) residual and are left untouched. + + Parameters + ---------- + demeaned : pd.DataFrame + Frame holding the demeaned columns AND the ``group_vars`` columns + (modified in place). + regressors : list of str + SOURCE names of the regressors (never the outcome — an outcome spanned + by the FEs is a zero-residual-variance situation, not a collinearity). + pre_norms : dict + Pre-demean L2 norm per source name (:func:`pre_demean_norms`). + absorbed_desc : str + Human description of the absorbed dimensions, for the warning. + group_vars : list of str + The absorbed FE columns (present in ``demeaned``), needed to build + the sparse incidence for the stage-2 confirmation. + rank_deficient_action : str, default "warn" + The owning estimator's contract: ``"warn"`` emits the cause-specific + ``UserWarning`` here; ``"silent"`` and ``"error"`` defer entirely to + the downstream rank machinery (silent drop / raise respectively). + suffix : str, default "" + Demeaned column naming (``f"{var}{suffix}"``). + display_names : dict, optional + Source name -> user-facing name for the warning message. + rel_tol : float, default 1e-10 + Snap threshold on the (confirmed) relative projection residual. + weights : np.ndarray, optional + WLS observation weights. When given, BOTH norms must be + ``sqrt(w)``-weighted (pass the same weights to + :func:`pre_demean_norms`) so the snap decision matches the effective + sample ``solve_ols`` sees — zero-weight rows, which the weighted + demean leaves inert by design, must not mask an FE-spanned regressor + on the positive-weight sample. + screen_tol : float, default 1e-3 + Stage-2 screening bound. Spanned columns whose MAP truncation + residual exceeds this are outside the guard — but at that point the + demeaning of every other column is equally unconverged and the + non-convergence warning contract applies. + + Returns + ------- + list of str + Source names of the snapped regressors (empty if none). + """ + sw = None if weights is None else np.sqrt(np.asarray(weights, dtype=np.float64)) + snapped: List[str] = [] + span_cache: Optional[Tuple[List[np.ndarray], List[int]]] = None + for var in regressors: + pre = pre_norms.get(var, 0.0) + if pre <= 0.0: + continue # all-zero input column: rank handling covers it as-is + col = f"{var}{suffix}" if suffix else var + vals = demeaned[col].to_numpy(dtype=np.float64) + eff = vals if sw is None else sw * vals + eff_norm = float(np.linalg.norm(eff)) + if eff_norm <= rel_tol * pre: + demeaned[col] = np.zeros(len(vals), dtype=np.float64) + snapped.append(var) + continue + if eff_norm <= screen_tol * pre: + # Stage 2: MAP truncation may mask an exactly-spanned column; + # confirm with an exact projection on the FE incidence. + if span_cache is None: + codes_l: List[np.ndarray] = [] + sizes_l: List[int] = [] + for g in group_vars: + codes_g, uniques_g = pd.factorize(demeaned[g].values, sort=False) + codes_l.append(codes_g.astype(np.intp, copy=False)) + sizes_l.append(len(uniques_g)) + span_cache = (codes_l, sizes_l) + resid = _fe_span_residual_norm(eff, span_cache[0], span_cache[1], sw) + if resid <= rel_tol * pre: + demeaned[col] = np.zeros(len(vals), dtype=np.float64) + snapped.append(var) + if snapped and rank_deficient_action == "warn": + shown = [display_names.get(v, v) if display_names else v for v in snapped] + warnings.warn( + f"Regressor(s) {shown} are collinear with the absorbed fixed " + f"effects ({absorbed_desc}): their within-transformed values are " + f"numerically zero (relative projection residual <= {rel_tol:g}), " + "so their coefficients are not identified and will be reported " + "as NaN.", + UserWarning, + stacklevel=3, + ) + return snapped + + def within_transform( data: pd.DataFrame, variables: List[str], @@ -2804,7 +3041,7 @@ def within_transform( inplace: bool = False, suffix: str = "_demeaned", weights: Optional[np.ndarray] = None, - max_iter: int = 100, + max_iter: int = 10_000, tol: float = 1e-8, ) -> pd.DataFrame: """ @@ -2846,12 +3083,14 @@ def within_transform( column name overwrites it rather than appending a duplicate label. weights : np.ndarray, optional Observation weights for weighted group means. - max_iter : int, default 100 + max_iter : int, default 10_000 Maximum number of alternating-projection iterations. Applies to BOTH the weighted and unweighted paths (both now iterate via the method of alternating projections, exact for unbalanced panels). Emits a ``UserWarning`` per call when any variable fails to converge within this - budget. Balanced panels converge in ~2 iterations. + budget. Balanced panels converge in ~2 iterations; correlated FE + incidence (contiguous unit lifetimes) can require hundreds. The default + matches ``fixest``'s ``fixef.iter`` / ``pyfixest``'s ``fixef_maxiter``. tol : float, default 1e-8 Convergence tolerance on the max absolute change across the iterate. @@ -2874,12 +3113,11 @@ def within_transform( """ # Two-way (unit + time) within transformation is the N-way method of # alternating projections specialized to two FE dimensions. Delegate to the - # shared engine so there is one MAP implementation. The weighted path is - # bit-identical to the previous inline loop (same per-variable convergence, - # unit-then-time order, and forwarded tol=1e-8); the unweighted path now also - # iterates (MAP) instead of the closed-form additive demean, which was exact - # only for balanced panels. ``tol`` is forwarded explicitly (within_transform's - # default is 1e-8, not demean_by_groups' 1e-10) to preserve weighted goldens. + # shared engine so there is one MAP implementation (same per-variable + # convergence, unit-then-time sweep order; agrees with the historical + # pandas-groupby loop to ~1e-10 order — see the engine's Notes). ``tol`` is + # forwarded explicitly (within_transform's default is 1e-8, not + # demean_by_groups' 1e-10) to preserve the historical convergence budget. return demean_by_groups( data, variables, diff --git a/diff_diff/wooldridge.py b/diff_diff/wooldridge.py index d6597f5d3..1869e80ee 100644 --- a/diff_diff/wooldridge.py +++ b/diff_diff/wooldridge.py @@ -20,7 +20,12 @@ import pandas as pd from diff_diff.linalg import compute_robust_vcov, solve_logit, solve_ols, solve_poisson -from diff_diff.utils import safe_inference, within_transform +from diff_diff.utils import ( + pre_demean_norms, + safe_inference, + snap_absorbed_regressors, + within_transform, +) from diff_diff.wooldridge_results import WooldridgeDiDResults _VALID_METHODS = ("ols", "logit", "poisson") @@ -1195,6 +1200,8 @@ def _fit_ols( f"{grp_label}s or use non-zero weights." ) + _w_regressors = [f"_x{i}" for i in range(X_design.shape[1])] + _pre_norms = pre_demean_norms(tmp, _w_regressors, weights=wt_weights) transformed = within_transform( tmp, all_vars, @@ -1203,6 +1210,19 @@ def _fit_ols( suffix="_demeaned", weights=wt_weights, ) + # Snap FE-spanned design columns (e.g. a unit-constant covariate) + # to exact zero so rank handling drops them deterministically. + snap_absorbed_regressors( + transformed, + _w_regressors, + _pre_norms, + absorbed_desc=f"unit '{unit}' and time '{time}' fixed effects", + group_vars=[unit, time], + rank_deficient_action=self.rank_deficient_action, + suffix="_demeaned", + display_names=dict(zip(_w_regressors, col_names)), + weights=wt_weights, + ) y = transformed[f"{outcome}_demeaned"].values X_cols = [f"_x{i}_demeaned" for i in range(X_design.shape[1])] diff --git a/docs/doc-deps.yaml b/docs/doc-deps.yaml index 7753311f6..e90ea6b70 100644 --- a/docs/doc-deps.yaml +++ b/docs/doc-deps.yaml @@ -901,6 +901,10 @@ sources: section: "Wild cluster bootstrap (WCR)" type: methodology note: "wild_bootstrap_se: WCR null-imposition, enumeration, test-inversion CI, fwildclusterboot parity" + - path: docs/methodology/REGISTRY.md + section: "Absorbed Fixed Effects with Survey Weights" + type: methodology + note: "demean_by_groups / within_transform: MAP convergence contract (max_iter/tol), bincount accumulation numerics, NaN-group-key guard" - path: docs/practitioner_getting_started.rst section: "Step 4: Check Whether the Result Is Trustworthy" type: user_guide diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index a0cbd70fc..38e0c3558 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -4319,11 +4319,64 @@ unequal selection probabilities). when any transformed variable exits the alternating-projection loop without reaching `tol` within `max_iter`. This now covers the **unweighted** path as well (previously the unweighted two-way transform used a closed-form additive demean - and could not warn). Defaults: `max_iter=100`; `tol=1e-8` via `within_transform` + and could not warn). Defaults: `max_iter=10_000` in both `demean_by_groups` and + `within_transform` (raised from 100 in v3.6.x to match the compiled-library + convention - R `fixest` `fixef.iter` and `pyfixest` `fixef_maxiter` both default + to 10,000; correlated FE incidence such as contiguous unit lifetimes in + order-level data genuinely requires hundreds of iterations, measured ~250-280 on + the `tail_stress` benchmark scenario where the old cap of 100 warned and + returned slightly-off residuals). `tol=1e-8` via `within_transform` (TwoWayFixedEffects, SunAbraham, BaconDecomposition, WooldridgeDiD) and `tol=1e-10` for the DiD/MultiPeriodDiD `absorb=` path. Balanced panels converge in ~2 - iterations. Silent return of the current iterate was classified as a silent - failure under the Phase 2 audit and replaced with this explicit signal. + iterations. Worst-case trade-off, accepted deliberately: an input that cannot + converge at all now burns the full 10,000 iterations before warning + (correctness-over-latency). Silent return of the current iterate was classified + as a silent failure under the Phase 2 audit and replaced with this explicit signal. +- **Note:** The MAP inner loop factorizes each absorbed dimension once + (`pd.factorize`) and forms group means via `np.bincount` accumulation. Plain + summation is not Kahan-compensated the way pandas `groupby().mean()` is, so + demeaned values agree with the pre-v3.6.x pandas implementation to ~1e-10 order + (drift compounds across MAP iterations), not bit-for-bit; estimator estimates + are validated unchanged at the FE-absorption benchmark identity gate + (`benchmarks/speed_review/bench_fe_absorption.py --check-estimates`). +- **Edge case:** NaN in an absorbed group column raises a `ValueError` naming the + column. `pd.factorize` codes NaN keys as -1, which would otherwise silently index + the last group's mean; the prior pandas behavior was itself silently bad + (unweighted: NaN-poisoned the affected rows; weighted: passed those rows through + un-demeaned into the regression), so the explicit error replaces two distinct + silent failure modes. +- **Edge case (FE-spanned regressors, v3.6.x):** a regressor lying exactly in the + span of the absorbed/within-transformed FE dummies (e.g. the treated-group + indicator after absorbing the unit dimension, a period dummy after absorbing + time, or a unit-constant covariate) demeans to numerical junk (relative norm + ~1e-13), NOT exact zero. Such a column previously reached the solver, where + column equilibration re-inflated it to unit norm, it passed the rank check as + linearly independent, and its arbitrary direction perturbed the identified + coefficients at the ~1e-5 level (tolerance- and implementation-dependent). + `diff_diff.utils.snap_absorbed_regressors` now zeroes spanned regressors at + every demeaning consumer (DiD/MultiPeriodDiD `absorb=`, TwoWayFixedEffects, + SunAbraham, BaconDecomposition, WooldridgeDiD) — including the replicate-refit + closures (silently, matching their `rank_deficient_action="silent"` solves) — + so the rank-deficiency machinery drops them deterministically (coefficient + NaN). Detection is **two-stage**, because the MAP stopping rule bounds the + last iteration step, not the distance to the limit, and a spanned column in a + slow-convergence regime (unbalanced, correlated FE incidence — e.g. + `x = a_unit + b_time` on a contiguous-lifetimes panel) can stop with a + structured truncation residual far above any fixed norm threshold (measured + 1.9e-10 at tol=1e-10 and 2e-8 at tol=1e-8; left unsnapped it shifted ATT by + ~3e-3 and reported a ~1e14-scale garbage coefficient): (1) relative demeaned + norm <= 1e-10 snaps immediately; (2) candidates in (1e-10, 1e-3] get an exact + span-membership confirmation via sparse LSMR on the (weighted) FE incidence + and snap iff the true projection residual is <= 1e-10 relative — genuinely + identified low-within-variation regressors are left untouched (their LSMR + residual equals their real within-variation). Norms are `sqrt(w)`-weighted + under WLS so zero-weight domain rows (left inert by the weighted demean) + cannot mask spanning on the positive-weight sample. A cause-specific + `UserWarning` naming the spanned regressors is emitted under + `rank_deficient_action="warn"`; `"silent"` and `"error"` defer to the rank + machinery per that parameter's existing contract. Identified coefficients are + consequently stable in the demeaning tolerance (verified to ~1e-14 across + tol=1e-8..1e-12, previously ~1e-5 swings). ### Survey Degrees of Freedom diff --git a/docs/performance-plan.md b/docs/performance-plan.md index e3aaf3c14..864c30ddc 100644 --- a/docs/performance-plan.md +++ b/docs/performance-plan.md @@ -41,25 +41,43 @@ the gap is machinery, not architecture: The measurement surface is the seven-scenario FE-absorption suite (`benchmarks/speed_review/bench_fe_absorption.py`; shapes documented in -`docs/performance-scenarios.md` scenarios 7-13; committed baseline -`baselines/fe_absorption_before.json`, with `_after.json` to be added by the -optimization PR). The optional pyfixest yardstick -(`bench_fe_absorption_pyfixest.py`, guarded on import) asserts coefficient -parity < 1e-6 on the four exact-estimand scenarios so the comparison stays -honest over time. +`docs/performance-scenarios.md` scenarios 7-13; committed baselines +`baselines/fe_absorption_{before,after}.json`). The optional pyfixest +yardstick (`bench_fe_absorption_pyfixest.py`, guarded on import) asserts +coefficient parity < 1e-6 on the four exact-estimand scenarios so the +comparison stays honest over time. + +**Landed (v3.6.x): factorize-once + bincount MAP rewrite.** End-to-end fit +speedups of 1.6-2.7x across the headline scenarios (table below). The +correlated-FE stress case now CONVERGES (~270 iterations under the raised +`max_iter=10_000` cap; ~2.7x cheaper per iteration) where the old loop hit +the cap of 100 and returned slightly-off residuals - its 0.8x wall-clock +ratio is the price of correctness: it runs 2.7x the iterations to full +convergence AND pays the exact LSMR span confirmation that now catches its +truncation-masked FE-spanned `post` column (previously kept as a junk +regressor in the design). The identity gate held at 1e-14-1e-16 on all +scenarios except the two carrying FE-spanned regressors through the DiD +`absorb=` path, whose estimates moved once (survey_absorb ATT by 9.8e-6; +geo_experiment SE by 1e-7) when the FE-spanned junk columns were snapped to +zero - a deliberate correction documented in REGISTRY "Absorbed Fixed +Effects" and the CHANGELOG. Remaining gap vs the pyfixest yardstick (2-26x +depending on scenario) is the compiled+parallel margin; a Rust demean kernel +in the existing `rust/` backend is the candidate next step, to be scoped +only if the remaining gap matters in practice (pyfixest itself validates +that architecture). ### FE-absorption suite results | Scenario | n rows | Before (s) | After (s) | Speedup | pyfixest (s) | |---|---:|---:|---:|---:|---:| -| 7. County policy event study (SunAbraham) | 177,289 | 3.865 (cv 2.5%) | - | - | 0.190 (cv 4.0%, proxy) | -| 8. Firm panel with churn (SunAbraham) | 2,400,000 | 92.957 (cv 0.9%) | - | - | 1.912 (cv 20.3%, noisy, proxy) | -| 9. Scanner store-week (TWFE) | 3,255,000 | 1.549 (cv 0.3%) | - | - | 0.644 (cv 12.3%, noisy) | -| 10. Geo experiment 5M orders (DiD absorb) | 5,000,000 | 2.630 (cv 0.6%) | - | - | 0.623 (cv 7.8%) | -| 11. Survey BRR replicates (DiD absorb) | 500,000 | 7.385 (cv 8.6%) | - | - | - | -| 12. Correlated-FE stress (DiD absorb) | 5,000,000 | 26.271 (cv 0.4%) | - | - | 3.075 (cv 1.5%) | -| 13. Small-panel guard (TWFE) | 20,000 | 0.005 (cv 4.1%) | - | - | 0.007 (cv 2.2%) | +| 7. County policy event study (SunAbraham) | 177,289 | 3.865 (cv 2.5%) | 2.388 (cv 2.6%) | 1.6x | 0.190 (cv 4.0%, proxy) | +| 8. Firm panel with churn (SunAbraham) | 2,400,000 | 92.957 (cv 0.9%) | 49.502 (cv 1.9%) | 1.9x | 1.912 (cv 20.3%, noisy, proxy) | +| 9. Scanner store-week (TWFE) | 3,255,000 | 1.549 (cv 0.3%) | 0.983 (cv 0.1%) | 1.6x | 0.644 (cv 12.3%, noisy) | +| 10. Geo experiment 5M orders (DiD absorb) | 5,000,000 | 2.630 (cv 0.6%) | 0.973 (cv 0.4%) | 2.7x | 0.623 (cv 7.8%) | +| 11. Survey BRR replicates (DiD absorb) | 500,000 | 7.385 (cv 8.6%) | 4.077 (cv 0.1%) | 1.8x | - | +| 12. Correlated-FE stress (DiD absorb) | 5,000,000 | 26.271 (cv 0.4%) | 31.765 (cv 0.6%) | 0.8x | 3.075 (cv 1.5%) | +| 13. Small-panel guard (TWFE) | 20,000 | 0.005 (cv 4.1%) | 0.004 (cv 4.4%) | 1.3x | 0.007 (cv 2.2%) | *noisy* = CV above the 10% unusable threshold from the noise protocol. *proxy* = timing-only pyfixest stand-in: the Sun-Abraham scenarios run a saturated `i(rel_time)` event study there (pyfixest 0.60 has no `sunab()`) - comparable demeaning load, different estimand, so those cells are not exact-estimand comparisons (see `bench_fe_absorption_pyfixest.py`). diff --git a/tests/test_bacon.py b/tests/test_bacon.py index cce72d3da..7761a3450 100644 --- a/tests/test_bacon.py +++ b/tests/test_bacon.py @@ -65,19 +65,18 @@ def generate_staggered_data( effect = np.full(len(units), treatment_effect) outcomes = ( - unit_fe_expanded + - time_fe_expanded + - effect * post + - np.random.randn(len(units)) * 0.5 + unit_fe_expanded + time_fe_expanded + effect * post + np.random.randn(len(units)) * 0.5 ) - df = pd.DataFrame({ - 'unit': units, - 'time': times, - 'outcome': outcomes, - 'first_treat': first_treat_expanded.astype(int), - 'treated': post.astype(int), - }) + df = pd.DataFrame( + { + "unit": units, + "time": times, + "outcome": outcomes, + "first_treat": first_treat_expanded.astype(int), + "treated": post.astype(int), + } + ) return df @@ -91,11 +90,7 @@ def test_basic_fit(self): decomp = BaconDecomposition() results = decomp.fit( - data, - outcome='outcome', - unit='unit', - time='time', - first_treat='first_treat' + data, outcome="outcome", unit="unit", time="time", first_treat="first_treat" ) assert decomp.is_fitted_ @@ -108,11 +103,7 @@ def test_weights_sum_to_one(self): data = generate_staggered_data(seed=123) results = bacon_decompose( - data, - outcome='outcome', - unit='unit', - time='time', - first_treat='first_treat' + data, outcome="outcome", unit="unit", time="time", first_treat="first_treat" ) total_weight = sum(c.weight for c in results.comparisons) @@ -128,11 +119,11 @@ def test_weighted_sum_equals_twfe(self): results = bacon_decompose( data, - outcome='outcome', - unit='unit', - time='time', - first_treat='first_treat', - weights='exact', + outcome="outcome", + unit="unit", + time="time", + first_treat="first_treat", + weights="exact", ) weighted_sum = sum(c.weight * c.estimate for c in results.comparisons) @@ -147,11 +138,7 @@ def test_comparison_types(self): data = generate_staggered_data(n_cohorts=3, never_treated_frac=0.3) results = bacon_decompose( - data, - outcome='outcome', - unit='unit', - time='time', - first_treat='first_treat' + data, outcome="outcome", unit="unit", time="time", first_treat="first_treat" ) comp_types = set(c.comparison_type for c in results.comparisons) @@ -166,11 +153,7 @@ def test_no_never_treated(self): data = generate_staggered_data(never_treated_frac=0.0) results = bacon_decompose( - data, - outcome='outcome', - unit='unit', - time='time', - first_treat='first_treat' + data, outcome="outcome", unit="unit", time="time", first_treat="first_treat" ) # Should still work @@ -182,11 +165,7 @@ def test_single_cohort(self): data = generate_staggered_data(n_cohorts=1, never_treated_frac=0.3) results = bacon_decompose( - data, - outcome='outcome', - unit='unit', - time='time', - first_treat='first_treat' + data, outcome="outcome", unit="unit", time="time", first_treat="first_treat" ) # With single cohort, should only have treated vs never @@ -199,11 +178,7 @@ def test_weight_by_type(self): data = generate_staggered_data() results = bacon_decompose( - data, - outcome='outcome', - unit='unit', - time='time', - first_treat='first_treat' + data, outcome="outcome", unit="unit", time="time", first_treat="first_treat" ) weights = results.weight_by_type() @@ -218,11 +193,7 @@ def test_effect_by_type(self): data = generate_staggered_data() results = bacon_decompose( - data, - outcome='outcome', - unit='unit', - time='time', - first_treat='first_treat' + data, outcome="outcome", unit="unit", time="time", first_treat="first_treat" ) effects = results.effect_by_type() @@ -236,11 +207,7 @@ def test_to_dataframe(self): data = generate_staggered_data() results = bacon_decompose( - data, - outcome='outcome', - unit='unit', - time='time', - first_treat='first_treat' + data, outcome="outcome", unit="unit", time="time", first_treat="first_treat" ) df = results.to_dataframe() @@ -258,11 +225,7 @@ def test_summary(self): data = generate_staggered_data() results = bacon_decompose( - data, - outcome='outcome', - unit='unit', - time='time', - first_treat='first_treat' + data, outcome="outcome", unit="unit", time="time", first_treat="first_treat" ) summary = results.summary() @@ -277,11 +240,7 @@ def test_missing_column_error(self): with pytest.raises(ValueError, match="Missing columns"): bacon_decompose( - data, - outcome='nonexistent', - unit='unit', - time='time', - first_treat='first_treat' + data, outcome="nonexistent", unit="unit", time="time", first_treat="first_treat" ) @@ -315,11 +274,7 @@ def test_twfe_decompose_method(self): twfe = TwoWayFixedEffects() decomp = twfe.decompose( - data, - outcome='outcome', - unit='unit', - time='time', - first_treat='first_treat' + data, outcome="outcome", unit="unit", time="time", first_treat="first_treat" ) assert isinstance(decomp, BaconDecompositionResults) @@ -333,19 +288,10 @@ def test_twfe_staggered_warning(self): with warnings.catch_warnings(record=True) as w: warnings.simplefilter("always") - twfe.fit( - data, - outcome='outcome', - treatment='treated', - time='time', - unit='unit' - ) + twfe.fit(data, outcome="outcome", treatment="treated", time="time", unit="unit") # Should have emitted a warning about staggered treatment - staggered_warnings = [ - x for x in w - if "staggered" in str(x.message).lower() - ] + staggered_warnings = [x for x in w if "staggered" in str(x.message).lower()] assert len(staggered_warnings) > 0 @@ -357,11 +303,7 @@ def test_convenience_function(self): data = generate_staggered_data() results = bacon_decompose( - data, - outcome='outcome', - unit='unit', - time='time', - first_treat='first_treat' + data, outcome="outcome", unit="unit", time="time", first_treat="first_treat" ) assert isinstance(results, BaconDecompositionResults) @@ -377,15 +319,11 @@ def test_plot_bacon_scatter(self): data = generate_staggered_data() results = bacon_decompose( - data, - outcome='outcome', - unit='unit', - time='time', - first_treat='first_treat' + data, outcome="outcome", unit="unit", time="time", first_treat="first_treat" ) # Should not raise - ax = plot_bacon(results, plot_type='scatter', show=False) + ax = plot_bacon(results, plot_type="scatter", show=False) assert ax is not None def test_plot_bacon_bar(self): @@ -395,15 +333,11 @@ def test_plot_bacon_bar(self): data = generate_staggered_data() results = bacon_decompose( - data, - outcome='outcome', - unit='unit', - time='time', - first_treat='first_treat' + data, outcome="outcome", unit="unit", time="time", first_treat="first_treat" ) # Should not raise - ax = plot_bacon(results, plot_type='bar', show=False) + ax = plot_bacon(results, plot_type="bar", show=False) assert ax is not None def test_plot_bacon_invalid_type(self): @@ -413,15 +347,11 @@ def test_plot_bacon_invalid_type(self): data = generate_staggered_data() results = bacon_decompose( - data, - outcome='outcome', - unit='unit', - time='time', - first_treat='first_treat' + data, outcome="outcome", unit="unit", time="time", first_treat="first_treat" ) with pytest.raises(ValueError, match="Unknown plot_type"): - plot_bacon(results, plot_type='invalid', show=False) + plot_bacon(results, plot_type="invalid", show=False) class TestWeightsParameter: @@ -441,11 +371,7 @@ def test_exact_weights_default(self): assert decomp.weights == "exact" results = decomp.fit( - data, - outcome='outcome', - unit='unit', - time='time', - first_treat='first_treat' + data, outcome="outcome", unit="unit", time="time", first_treat="first_treat" ) # Weights should sum to 1 @@ -467,11 +393,7 @@ def test_approximate_weights_opt_in(self): assert decomp.weights == "approximate" results = decomp.fit( - data, - outcome='outcome', - unit='unit', - time='time', - first_treat='first_treat' + data, outcome="outcome", unit="unit", time="time", first_treat="first_treat" ) # Sum-to-1 contract preserved via normalization, but tolerance is @@ -487,11 +409,7 @@ def test_exact_weights(self): assert decomp.weights == "exact" results = decomp.fit( - data, - outcome='outcome', - unit='unit', - time='time', - first_treat='first_treat' + data, outcome="outcome", unit="unit", time="time", first_treat="first_treat" ) # Weights should still sum to 1 @@ -504,20 +422,20 @@ def test_exact_vs_approximate_different(self): results_approx = bacon_decompose( data, - outcome='outcome', - unit='unit', - time='time', - first_treat='first_treat', - weights="approximate" + outcome="outcome", + unit="unit", + time="time", + first_treat="first_treat", + weights="approximate", ) results_exact = bacon_decompose( data, - outcome='outcome', - unit='unit', - time='time', - first_treat='first_treat', - weights="exact" + outcome="outcome", + unit="unit", + time="time", + first_treat="first_treat", + weights="exact", ) # TWFE estimates should be the same @@ -532,20 +450,20 @@ def test_exact_weights_lower_decomposition_error(self): results_approx = bacon_decompose( data, - outcome='outcome', - unit='unit', - time='time', - first_treat='first_treat', - weights="approximate" + outcome="outcome", + unit="unit", + time="time", + first_treat="first_treat", + weights="approximate", ) results_exact = bacon_decompose( data, - outcome='outcome', - unit='unit', - time='time', - first_treat='first_treat', - weights="exact" + outcome="outcome", + unit="unit", + time="time", + first_treat="first_treat", + weights="exact", ) # Exact weights should have equal or lower decomposition error @@ -563,11 +481,11 @@ def test_convenience_function_weights_param(self): results = bacon_decompose( data, - outcome='outcome', - unit='unit', - time='time', - first_treat='first_treat', - weights="exact" + outcome="outcome", + unit="unit", + time="time", + first_treat="first_treat", + weights="exact", ) assert isinstance(results, BaconDecompositionResults) @@ -581,22 +499,22 @@ def test_twfe_decompose_weights_param(self): # Test with approximate decomp_approx = twfe.decompose( data, - outcome='outcome', - unit='unit', - time='time', - first_treat='first_treat', - weights="approximate" + outcome="outcome", + unit="unit", + time="time", + first_treat="first_treat", + weights="approximate", ) assert isinstance(decomp_approx, BaconDecompositionResults) # Test with exact decomp_exact = twfe.decompose( data, - outcome='outcome', - unit='unit', - time='time', - first_treat='first_treat', - weights="exact" + outcome="outcome", + unit="unit", + time="time", + first_treat="first_treat", + weights="exact", ) assert isinstance(decomp_exact, BaconDecompositionResults) @@ -622,24 +540,17 @@ def test_unbalanced_panel_warning(self): # Remove some observations from specific units to make it unbalanced # This ensures different units have different numbers of periods - mask = ~((data['unit'] == 0) & (data['time'] == 0)) # Remove one period from unit 0 + mask = ~((data["unit"] == 0) & (data["time"] == 0)) # Remove one period from unit 0 data = data[mask].reset_index(drop=True) with warnings.catch_warnings(record=True) as w: warnings.simplefilter("always") bacon_decompose( - data, - outcome='outcome', - unit='unit', - time='time', - first_treat='first_treat' + data, outcome="outcome", unit="unit", time="time", first_treat="first_treat" ) # Should have warning about unbalanced panel - unbalanced_warnings = [ - x for x in w - if "unbalanced" in str(x.message).lower() - ] + unbalanced_warnings = [x for x in w if "unbalanced" in str(x.message).lower()] assert len(unbalanced_warnings) > 0 def test_balanced_panel_no_warning(self): @@ -649,18 +560,11 @@ def test_balanced_panel_no_warning(self): with warnings.catch_warnings(record=True) as w: warnings.simplefilter("always") bacon_decompose( - data, - outcome='outcome', - unit='unit', - time='time', - first_treat='first_treat' + data, outcome="outcome", unit="unit", time="time", first_treat="first_treat" ) # Should NOT have warning about unbalanced panel - unbalanced_warnings = [ - x for x in w - if "unbalanced" in str(x.message).lower() - ] + unbalanced_warnings = [x for x in w if "unbalanced" in str(x.message).lower()] assert len(unbalanced_warnings) == 0 @@ -672,11 +576,7 @@ def test_small_sample(self): data = generate_staggered_data(n_units=20, n_periods=5, n_cohorts=2) results = bacon_decompose( - data, - outcome='outcome', - unit='unit', - time='time', - first_treat='first_treat' + data, outcome="outcome", unit="unit", time="time", first_treat="first_treat" ) assert len(results.comparisons) > 0 @@ -688,11 +588,7 @@ def test_many_cohorts(self): ) results = bacon_decompose( - data, - outcome='outcome', - unit='unit', - time='time', - first_treat='first_treat' + data, outcome="outcome", unit="unit", time="time", first_treat="first_treat" ) # Should have comparisons from all cohort pairs @@ -703,15 +599,38 @@ def test_inf_for_never_treated(self): data = generate_staggered_data() # Replace 0 with inf for never-treated - data['first_treat'] = data['first_treat'].replace(0, np.inf) + data["first_treat"] = data["first_treat"].replace(0, np.inf) results = bacon_decompose( - data, - outcome='outcome', - unit='unit', - time='time', - first_treat='first_treat' + data, outcome="outcome", unit="unit", time="time", first_treat="first_treat" ) assert results.n_never_treated > 0 assert len(results.comparisons) > 0 + + +class TestAbsorbedTreatmentSnap: + """A treatment indicator spanned by the FEs in an internal 2x2 TWFE cell + must hit the deterministic zero-variance guard with a cause warning, + not an arbitrary junk/junk division (REGISTRY 'Absorbed FE'). The 0.0 + return for the degenerate cell is _compute_twfe's pre-existing contract. + """ + + def test_fe_spanned_treatment_hits_zero_variance_guard_with_warning(self): + rng = np.random.default_rng(0) + # treat == post for every unit -> treatment is a function of time, + # spanned by the time FE + unit = np.repeat(np.arange(20), 4) + time = np.tile(np.arange(4), 20) + df = pd.DataFrame( + { + "unit": unit, + "time": time, + "y": rng.normal(size=unit.size), + "__bacon_treated_internal__": (time >= 2).astype(float), + } + ) + decomp = BaconDecomposition() + with pytest.warns(UserWarning, match="collinear with the absorbed"): + beta = decomp._compute_twfe(df, outcome="y", unit="unit", time="time") + assert beta == 0.0 # deterministic d_var == 0 guard, not junk/junk diff --git a/tests/test_estimators.py b/tests/test_estimators.py index 5448a347f..2c61b8d0a 100644 --- a/tests/test_estimators.py +++ b/tests/test_estimators.py @@ -1,5 +1,7 @@ """Tests for difference-in-differences estimators.""" +import warnings + import numpy as np import pandas as pd import pytest @@ -12,6 +14,7 @@ PeriodEffect, SyntheticDiD, SyntheticDiDResults, + TwoWayFixedEffects, ) @@ -3264,11 +3267,25 @@ def test_twfe_with_unbalanced_panel(self): df = pd.DataFrame(data) twfe = TwoWayFixedEffects() - results = twfe.fit(df, outcome="outcome", treatment="post", unit="unit", time="period") + # NOTE: this test previously passed treatment="post" with time="period", + # making the treatment interaction a pure function of the period FE - + # an unidentifiable specification that "worked" only because the + # FE-spanned junk column survived rank detection and produced a finite + # garbage ATT. The v3.6.x span guard now routes that spec into TWFE's + # collinearity error, so the test uses the well-specified form. + results = twfe.fit(df, outcome="outcome", treatment="treated", unit="unit", time="post") # Should produce valid results assert np.isfinite(results.att) assert results.se > 0 + # the degenerate spec now raises the pre-existing identification error + # (routed there deterministically by the FE-span snap): + with pytest.raises(ValueError, match="cannot be identified"): + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + TwoWayFixedEffects().fit( + df, outcome="outcome", treatment="post", unit="unit", time="period" + ) def test_multiperiod_with_sparse_data(self): """Test MultiPeriodDiD with sparse data across periods.""" @@ -3549,8 +3566,12 @@ def test_twfe_with_absorbed_covariate(self): # absorbed by unit FE (becomes zero after within-transformation) twfe = TwoWayFixedEffects(rank_deficient_action="silent") results = twfe.fit( - df, outcome="outcome", treatment="treated", unit="unit", - time="post", covariates=["unit_covariate"], + df, + outcome="outcome", + treatment="treated", + unit="unit", + time="post", + covariates=["unit_covariate"], ) assert np.isfinite(results.att) @@ -3586,13 +3607,13 @@ def _mpd_data(): ueff = rng.normal() for period in range(6): y = ( - 10.0 + ueff + 0.5 * period + 10.0 + + ueff + + 0.5 * period + (3.0 if (treated and period >= 3) else 0.0) + rng.normal(0, 0.5) ) - rows.append( - {"unit": unit, "treated": treated, "time": period, "outcome": y} - ) + rows.append({"unit": unit, "treated": treated, "time": period, "outcome": y}) df = pd.DataFrame(rows) df["x1"] = np.random.default_rng(11).normal(size=len(df)) return df @@ -3602,9 +3623,7 @@ def test_did_collision_for_each_structural_name(self): # the test cannot drift from the real var_names (const/treated/post/ # treated:post), plus the internal _treat_time working column. data = self._did_data() - clean = DifferenceInDifferences().fit( - data, "outcome", "treated", "post", covariates=["x1"] - ) + clean = DifferenceInDifferences().fit(data, "outcome", "treated", "post", covariates=["x1"]) reserved = [k for k in clean.coefficients if k != "x1"] + ["_treat_time"] assert "const" in reserved and "treated:post" in reserved # sanity for name in reserved: @@ -3612,9 +3631,7 @@ def test_did_collision_for_each_structural_name(self): if name not in d2.columns: d2[name] = np.random.default_rng(5).normal(size=len(d2)) with pytest.raises(ValueError, match="collide"): - DifferenceInDifferences().fit( - d2, "outcome", "treated", "post", covariates=[name] - ) + DifferenceInDifferences().fit(d2, "outcome", "treated", "post", covariates=[name]) def test_did_fixed_effects_dummy_collision(self): # get_dummies(region, prefix="region", drop_first=True) drops "region_A" @@ -3624,15 +3641,17 @@ def test_did_fixed_effects_dummy_collision(self): data["region_B"] = np.random.default_rng(2).normal(size=len(data)) with pytest.raises(ValueError, match="collide"): DifferenceInDifferences().fit( - data, "outcome", "treated", "post", - covariates=["region_B"], fixed_effects=["region"], + data, + "outcome", + "treated", + "post", + covariates=["region_B"], + fixed_effects=["region"], ) def test_did_noncolliding_preserves_structural_coefs(self): data = self._did_data() - r = DifferenceInDifferences().fit( - data, "outcome", "treated", "post", covariates=["x1"] - ) + r = DifferenceInDifferences().fit(data, "outcome", "treated", "post", covariates=["x1"]) ck = r.coefficients assert np.isfinite(ck["const"]) and np.isfinite(ck["treated:post"]) assert "x1" in ck and ck["x1"] != ck["treated:post"] @@ -3659,9 +3678,7 @@ def test_did_duplicate_covariates_raise(self): def test_mpd_collision_for_each_structural_name(self): data = self._mpd_data() - clean = MultiPeriodDiD().fit( - data, "outcome", "treated", "time", covariates=["x1"] - ) + clean = MultiPeriodDiD().fit(data, "outcome", "treated", "time", covariates=["x1"]) # const/treated/period_*/treated:period_* (actual keys) + internal column. reserved = [k for k in clean.coefficients if k != "x1"] + ["_did_treatment"] assert any(k.startswith("period_") for k in reserved) # sanity @@ -3670,15 +3687,11 @@ def test_mpd_collision_for_each_structural_name(self): if name not in d2.columns: d2[name] = np.random.default_rng(5).normal(size=len(d2)) with pytest.raises(ValueError, match="collide"): - MultiPeriodDiD().fit( - d2, "outcome", "treated", "time", covariates=[name] - ) + MultiPeriodDiD().fit(d2, "outcome", "treated", "time", covariates=[name]) def test_mpd_noncolliding_preserves_structural_coefs(self): data = self._mpd_data() - r = MultiPeriodDiD().fit( - data, "outcome", "treated", "time", covariates=["x1"] - ) + r = MultiPeriodDiD().fit(data, "outcome", "treated", "time", covariates=["x1"]) ck = r.coefficients assert "x1" in ck and np.isfinite(ck["const"]) assert any(k.startswith("period_") for k in ck) @@ -3693,8 +3706,12 @@ def test_mpd_fixed_effect_dummy_collides_with_period_keys(self): data["period"] = data["time"] # FE column 'period' -> 'period_1'... dummies with pytest.raises(ValueError, match="collide"): MultiPeriodDiD().fit( - data, "outcome", "treated", "time", - covariates=["x1"], fixed_effects=["period"], + data, + "outcome", + "treated", + "time", + covariates=["x1"], + fixed_effects=["period"], ) def test_synthetic_did_unaffected_by_guard(self): @@ -3707,17 +3724,275 @@ def test_synthetic_did_unaffected_by_guard(self): ueff = rng.normal(0, 3) for period in range(8): y = ( - 10.0 + ueff + 0.5 * period + 10.0 + + ueff + + 0.5 * period + (5.0 if (treated and period >= 4) else 0.0) + rng.normal(0, 0.5) ) - rows.append( - {"unit": unit, "period": period, "treated": treated, "outcome": y} - ) + rows.append({"unit": unit, "period": period, "treated": treated, "outcome": y}) d = pd.DataFrame(rows) d["x1"] = rng.normal(size=len(d)) r = SyntheticDiD().fit( - d, "outcome", "treated", unit="unit", time="period", - post_periods=[4, 5, 6, 7], covariates=["x1"], + d, + "outcome", + "treated", + unit="unit", + time="period", + post_periods=[4, 5, 6, 7], + covariates=["x1"], ) assert np.isfinite(r.att) + + +class TestAbsorbedRegressorSnap: + """FE-spanned regressors snap to zero -> deterministic NaN + warning + (REGISTRY 'Absorbed Fixed Effects'; the pre-snap junk columns perturbed + identified coefficients at ~1e-5, tolerance- and implementation-dependent). + """ + + @staticmethod + def _geo_frame(seed=0, n_states=20, n_months=12, n_rows=4_000): + rng = np.random.default_rng(seed) + state = rng.integers(0, n_states, n_rows) + month = rng.integers(0, n_months, n_rows) + treated = (state < n_states // 2).astype(int) + post = (month >= n_months // 2).astype(int) + y = ( + rng.normal(0, 1, n_states)[state] + + rng.normal(0, 0.5, n_months)[month] + + 0.3 * treated * post + + rng.normal(size=n_rows) + ) + return pd.DataFrame( + {"y": y, "treated": treated, "post": post, "state": state, "month": month} + ) + + def test_did_absorb_spanned_main_effects_nan_with_cause_warning(self): + df = self._geo_frame() + est = DifferenceInDifferences() + with pytest.warns(UserWarning, match="collinear with the absorbed"): + res = est.fit( + df, + outcome="y", + treatment="treated", + time="post", + absorb=["state", "month"], + ) + assert np.isnan(res.coefficients["treated"]) + assert np.isnan(res.coefficients["post"]) + assert np.isfinite(res.att) and np.isfinite(res.se) + + def test_did_absorb_att_matches_clean_fwl_ground_truth(self): + """With the junk columns snapped, ATT must equal the regression on the + demeaned interaction ALONE (the identified FWL estimand) at 1e-10.""" + from diff_diff.utils import demean_by_groups + + df = self._geo_frame(seed=1) + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + res = DifferenceInDifferences().fit( + df, + outcome="y", + treatment="treated", + time="post", + absorb=["state", "month"], + ) + d = df.copy() + d["_tt"] = d["treated"].values.astype(float) * d["post"].values.astype(float) + out, _ = demean_by_groups(d, ["y", "_tt"], ["state", "month"], suffix="_dm", tol=1e-12) + tt = out["_tt_dm"].values + gt = float(np.dot(tt, out["y_dm"].values) / np.dot(tt, tt)) + assert res.att == pytest.approx(gt, abs=1e-10) + + def test_multiperiod_absorb_spanned_period_dummies_snap(self): + """MPD with an absorbed time dimension: the period dummies are spanned + (NaN), the interaction effects stay identified.""" + df = self._geo_frame(seed=2) + est = MultiPeriodDiD() + with pytest.warns(UserWarning, match="collinear with the absorbed"): + res = est.fit( + df, + outcome="y", + treatment="treated", + time="month", + absorb=["state", "month"], + ) + assert any(np.isfinite(e.effect) for e in res.period_effects.values()) + + def test_twfe_unit_constant_covariate_nan_att_unaffected(self): + rng = np.random.default_rng(3) + n_units, n_periods = 80, 6 + unit = np.repeat(np.arange(n_units), n_periods) + time = np.tile(np.arange(n_periods), n_units) + treated = (unit < n_units // 2).astype(int) + post = (time >= n_periods // 2).astype(int) + y = 0.4 * treated * post + rng.normal(size=unit.size) + df = pd.DataFrame({"y": y, "treated": treated, "post": post, "unit": unit, "time": time}) + df["xc"] = np.repeat(rng.normal(size=n_units), n_periods) # unit-constant + + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + base = TwoWayFixedEffects().fit( + df, outcome="y", treatment="treated", time="post", unit="unit" + ) + with pytest.warns(UserWarning, match="collinear with the absorbed"): + res = TwoWayFixedEffects().fit( + df, + outcome="y", + treatment="treated", + time="post", + unit="unit", + covariates=["xc"], + ) + # TWFE results expose only the ATT coefficient; the snapped covariate + # column is dropped by rank handling, so the fit must reduce EXACTLY + # to the no-covariate design. + assert res.att == pytest.approx(base.att, abs=1e-10) + assert res.se == pytest.approx(base.se, rel=1e-8) + + +class TestReplicateRefitSnap: + """Replicate refits must fail closed when a regressor becomes FE-spanned + within a replicate's effective half-sample (local-review follow-up).""" + + def test_replicate_local_spanning_yields_finite_inference(self): + from diff_diff.survey import SurveyDesign + + rng = np.random.default_rng(30) + n_states, n_months, n_rows = 8, 10, 4_000 + state = rng.integers(0, n_states, n_rows) + month = rng.integers(0, n_months, n_rows) + treated = (state < n_states // 2).astype(int) + post = (month >= n_months // 2).astype(int) + # covariate varies ONLY within state 0's rows; on any replicate that + # zeroes state 0 it becomes state-spanned in the effective sample + xc = np.where(state == 0, rng.normal(size=n_rows), 1.0) + y = ( + rng.normal(0, 1, n_states)[state] + + rng.normal(0, 0.3, n_months)[month] + + 0.3 * treated * post + + 0.1 * xc + + rng.normal(size=n_rows) + ) + df = pd.DataFrame( + { + "y": y, + "treated": treated, + "post": post, + "xc": xc, + "state": state, + "month": month, + "w": np.ones(n_rows), + } + ) + # BRR-style replicates; replicate 1 zeroes states {0, 1} so xc is + # constant (spanned by state FE) on its effective sample + halves = [ + np.array([1, 0, 1, 0, 1, 0, 1, 0]), + np.array([0, 1, 0, 1, 1, 0, 0, 1]), + np.array([1, 1, 0, 0, 0, 1, 1, 0]), + np.array([0, 0, 1, 1, 0, 1, 0, 1]), + ] + for i, h in enumerate(halves, 1): + df[f"rw{i}"] = 2.0 * h[state] + design = SurveyDesign( + weights="w", + replicate_weights=[f"rw{i}" for i in range(1, 5)], + replicate_method="BRR", + ) + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + res = DifferenceInDifferences().fit( + df, + outcome="y", + treatment="treated", + time="post", + absorb=["state", "month"], + covariates=["xc"], + survey_design=design, + ) + # fail-closed contract: inference is finite (invalid replicates are + # dropped by compute_replicate_refit_variance), never junk-inflated + assert np.isfinite(res.att) + assert np.isfinite(res.se) and res.se > 0 + + +class TestJointSpanCovariateSnap: + """Local-review P0 end-to-end: a covariate exactly in the joint span of + the absorbed dimensions must be dropped (NaN + warning) and leave the + identified ATT untouched, even when MAP truncation masks it from the + fast-path norm test (unbalanced, correlated FE incidence).""" + + @staticmethod + def _frame(seed=7): + rng = np.random.default_rng(seed) + n_units, n_periods, span = 150, 30, 9 + unit = np.repeat(np.arange(n_units), n_periods) + time = np.tile(np.arange(n_periods), n_units) + entry = rng.integers(0, n_periods - span, n_units) + keep = (time >= entry[unit]) & (time < entry[unit] + span) + unit, time = unit[keep], time[keep] + treated = (unit < n_units // 2).astype(int) + post = (time >= n_periods // 2).astype(int) + y = 0.4 * treated * post + rng.normal(size=unit.size) + a = rng.normal(0, 1, n_units) + b = rng.normal(0, 1, n_periods) + return pd.DataFrame( + { + "y": y, + "treated": treated, + "post": post, + "unit": unit, + "time": time, + "xspan": a[unit] + b[time], + } + ) + + def test_did_absorb_joint_span_covariate_dropped_att_stable(self): + df = self._frame() + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + base = DifferenceInDifferences().fit( + df, + outcome="y", + treatment="treated", + time="post", + absorb=["unit", "time"], + ) + with pytest.warns(UserWarning, match=r"xspan.*collinear with the absorbed"): + res = DifferenceInDifferences().fit( + df, + outcome="y", + treatment="treated", + time="post", + absorb=["unit", "time"], + covariates=["xspan"], + ) + assert np.isnan(res.coefficients["xspan"]) + assert res.att == pytest.approx(base.att, abs=1e-10) + + def test_sun_abraham_joint_span_covariate_dropped_att_stable(self): + from diff_diff import SunAbraham + + df = self._frame(seed=8) + rng = np.random.default_rng(9) + first = np.where(np.arange(df["unit"].nunique()) % 3 == 0, 0, 15)[ + df["unit"].values % df["unit"].nunique() + ] + df = df.assign(first_treat=first) + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + base = SunAbraham().fit( + df, outcome="y", unit="unit", time="time", first_treat="first_treat" + ) + with pytest.warns(UserWarning, match="collinear with the absorbed"): + res = SunAbraham().fit( + df, + outcome="y", + unit="unit", + time="time", + first_treat="first_treat", + covariates=["xspan"], + ) + assert res.att == pytest.approx(base.att, abs=1e-8) diff --git a/tests/test_sun_abraham.py b/tests/test_sun_abraham.py index 21c21d573..28fc55a16 100644 --- a/tests/test_sun_abraham.py +++ b/tests/test_sun_abraham.py @@ -1968,3 +1968,47 @@ def test_bootstrap_finite_and_point_estimates_invariant(self, ci_params): f"{vt}={results[vt].overall_att}. Bootstrap refits in the new " "full-dummy branch are not FWL-invariant." ) + + +class TestAbsorbedCovariateSnap: + """A unit-constant covariate is spanned by the unit FE: it must snap to + zero (deterministic drop + cause warning) instead of leaving a junk + column that perturbs the IW estimates (REGISTRY 'Absorbed FE').""" + + @staticmethod + def _panel(seed=0, with_cov=False): + rng = np.random.default_rng(seed) + n_units, n_periods = 90, 8 + unit = np.repeat(np.arange(n_units), n_periods) + time = np.tile(np.arange(n_periods), n_units) + first = np.repeat(rng.choice([0, 4, 5], n_units), n_periods) + d = (first > 0) & (time >= first) + y = 0.3 * d + rng.normal(size=unit.size) + df = pd.DataFrame({"y": y, "unit": unit, "time": time, "first_treat": first}) + if with_cov: + df["xc"] = np.repeat(rng.normal(size=n_units), n_periods) + return df + + def test_unit_constant_covariate_snaps_att_unaffected(self): + base_df = self._panel(seed=42) + cov_df = self._panel(seed=42, with_cov=True) + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + base = SunAbraham().fit( + base_df, + outcome="y", + unit="unit", + time="time", + first_treat="first_treat", + ) + with pytest.warns(UserWarning, match="collinear with the absorbed"): + res = SunAbraham().fit( + cov_df, + outcome="y", + unit="unit", + time="time", + first_treat="first_treat", + covariates=["xc"], + ) + # snapped covariate -> dropped -> identical to the no-covariate design + assert res.att == pytest.approx(base.att, abs=1e-10) diff --git a/tests/test_utils.py b/tests/test_utils.py index 3edfa69df..ae2144566 100644 --- a/tests/test_utils.py +++ b/tests/test_utils.py @@ -11,6 +11,8 @@ - Placebo effects for Synthetic DiD """ +import warnings + import numpy as np import pandas as pd import pytest @@ -47,11 +49,13 @@ def simple_regression_data(): """Create simple regression data for testing robust SE.""" np.random.seed(42) n = 100 - X = np.column_stack([ - np.ones(n), - np.random.randn(n), - np.random.randn(n), - ]) + X = np.column_stack( + [ + np.ones(n), + np.random.randn(n), + np.random.randn(n), + ] + ) beta_true = np.array([1.0, 2.0, -1.0]) # Heteroskedastic errors errors = np.random.randn(n) * (1 + np.abs(X[:, 1])) @@ -70,10 +74,12 @@ def clustered_regression_data(): cluster_ids = np.repeat(np.arange(n_clusters), obs_per_cluster) cluster_effects = np.random.randn(n_clusters) - X = np.column_stack([ - np.ones(n), - np.random.randn(n), - ]) + X = np.column_stack( + [ + np.ones(n), + np.random.randn(n), + ] + ) beta_true = np.array([5.0, 2.0]) # Cluster-correlated errors @@ -106,12 +112,14 @@ def parallel_trends_data(): y += np.random.normal(0, 0.5) - data.append({ - "unit": unit, - "period": period, - "treated": int(is_treated), - "outcome": y, - }) + data.append( + { + "unit": unit, + "period": period, + "treated": int(is_treated), + "outcome": y, + } + ) return pd.DataFrame(data) @@ -143,12 +151,14 @@ def non_parallel_trends_data(): y += np.random.normal(0, 0.5) - data.append({ - "unit": unit, - "period": period, - "treated": int(is_treated), - "outcome": y, - }) + data.append( + { + "unit": unit, + "period": period, + "treated": int(is_treated), + "outcome": y, + } + ) return pd.DataFrame(data) @@ -169,12 +179,14 @@ def sdid_panel_data(): unit_effect = np.random.normal(0, 2) for period in range(n_pre + n_post): y = 10.0 + unit_effect + period * 0.5 + np.random.normal(0, 0.3) - data.append({ - "unit": unit, - "period": period, - "treated": 0, - "outcome": y, - }) + data.append( + { + "unit": unit, + "period": period, + "treated": 0, + "outcome": y, + } + ) # Treated units for unit in range(n_control, n_control + n_treated): @@ -184,12 +196,14 @@ def sdid_panel_data(): if period >= n_pre: y += 3.0 # Treatment effect y += np.random.normal(0, 0.3) - data.append({ - "unit": unit, - "period": period, - "treated": 1, - "outcome": y, - }) + data.append( + { + "unit": unit, + "period": period, + "treated": 1, + "outcome": y, + } + ) return pd.DataFrame(data) @@ -640,7 +654,7 @@ def test_returns_expected_keys(self, parallel_trends_data): outcome="outcome", time="period", treatment_group="treated", - pre_periods=[0, 1, 2] + pre_periods=[0, 1, 2], ) expected_keys = [ @@ -665,7 +679,7 @@ def test_parallel_trends_detected(self, parallel_trends_data): outcome="outcome", time="period", treatment_group="treated", - pre_periods=[0, 1, 2] + pre_periods=[0, 1, 2], ) # Should not reject parallel trends @@ -679,7 +693,7 @@ def test_non_parallel_trends_detected(self, non_parallel_trends_data): outcome="outcome", time="period", treatment_group="treated", - pre_periods=[0, 1, 2] + pre_periods=[0, 1, 2], ) # Should reject parallel trends (different slopes) @@ -693,7 +707,7 @@ def test_trend_difference_sign(self, non_parallel_trends_data): outcome="outcome", time="period", treatment_group="treated", - pre_periods=[0, 1, 2] + pre_periods=[0, 1, 2], ) # Treated has steeper trend (3.0 vs 1.0), so difference should be positive @@ -705,7 +719,7 @@ def test_auto_infer_pre_periods(self, parallel_trends_data): parallel_trends_data, outcome="outcome", time="period", - treatment_group="treated" + treatment_group="treated", # pre_periods not specified ) @@ -715,18 +729,16 @@ def test_auto_infer_pre_periods(self, parallel_trends_data): def test_single_period_returns_nan(self): """Test that single pre-period returns NaN for trends.""" - data = pd.DataFrame({ - "outcome": [10, 11, 12, 13], - "period": [0, 0, 0, 0], # All same period - "treated": [1, 1, 0, 0], - }) + data = pd.DataFrame( + { + "outcome": [10, 11, 12, 13], + "period": [0, 0, 0, 0], # All same period + "treated": [1, 1, 0, 0], + } + ) results = check_parallel_trends( - data, - outcome="outcome", - time="period", - treatment_group="treated", - pre_periods=[0] + data, outcome="outcome", time="period", treatment_group="treated", pre_periods=[0] ) # Cannot compute trend with single period @@ -740,7 +752,7 @@ def test_standard_errors_positive(self, parallel_trends_data): outcome="outcome", time="period", treatment_group="treated", - pre_periods=[0, 1, 2] + pre_periods=[0, 1, 2], ) assert results["treated_trend_se"] > 0 @@ -761,11 +773,7 @@ def test_with_unit_specified(self, parallel_trends_data): pre_data = parallel_trends_data[parallel_trends_data["period"] < 3] treated_changes, control_changes = _compute_outcome_changes( - pre_data, - outcome="outcome", - time="period", - treatment_group="treated", - unit="unit" + pre_data, outcome="outcome", time="period", treatment_group="treated", unit="unit" ) # Should have changes for each unit-period transition @@ -777,11 +785,7 @@ def test_without_unit_specified(self, parallel_trends_data): pre_data = parallel_trends_data[parallel_trends_data["period"] < 3] treated_changes, control_changes = _compute_outcome_changes( - pre_data, - outcome="outcome", - time="period", - treatment_group="treated", - unit=None + pre_data, outcome="outcome", time="period", treatment_group="treated", unit=None ) # Should have aggregate changes (fewer than with unit) @@ -793,11 +797,7 @@ def test_returns_float_arrays(self, parallel_trends_data): pre_data = parallel_trends_data[parallel_trends_data["period"] < 3] treated_changes, control_changes = _compute_outcome_changes( - pre_data, - outcome="outcome", - time="period", - treatment_group="treated", - unit="unit" + pre_data, outcome="outcome", time="period", treatment_group="treated", unit="unit" ) assert treated_changes.dtype == np.float64 @@ -811,21 +811,19 @@ def test_changes_reflect_trend(self): is_treated = unit < 5 for period in range(3): y = 10.0 + period * 2.0 # Trend of 2.0 per period - data.append({ - "unit": unit, - "period": period, - "treated": int(is_treated), - "outcome": y, - }) + data.append( + { + "unit": unit, + "period": period, + "treated": int(is_treated), + "outcome": y, + } + ) df = pd.DataFrame(data) treated_changes, control_changes = _compute_outcome_changes( - df, - outcome="outcome", - time="period", - treatment_group="treated", - unit="unit" + df, outcome="outcome", time="period", treatment_group="treated", unit="unit" ) # All changes should be approximately 2.0 @@ -841,17 +839,24 @@ def test_silent_on_balanced_panel(self): for unit in range(10): treated = int(unit >= 5) for t in range(1, 5): - rows.append({ - "unit": unit, "period": t, - "treated": treated, "outcome": rng.normal(), - }) + rows.append( + { + "unit": unit, + "period": t, + "treated": treated, + "outcome": rng.normal(), + } + ) df = pd.DataFrame(rows) with warnings.catch_warnings(record=True) as w: warnings.simplefilter("always") _compute_outcome_changes( - df, outcome="outcome", time="period", - treatment_group="treated", unit="unit", + df, + outcome="outcome", + time="period", + treatment_group="treated", + unit="unit", ) # Generic filter on "dropped" catches both the old and new label so a @@ -867,10 +872,14 @@ def test_warns_on_nan_outcomes_with_excess_drop_count(self): for unit in range(10): treated = int(unit >= 5) for t in range(1, 5): - rows.append({ - "unit": unit, "period": t, - "treated": treated, "outcome": rng.normal(), - }) + rows.append( + { + "unit": unit, + "period": t, + "treated": treated, + "outcome": rng.normal(), + } + ) df = pd.DataFrame(rows) df.loc[[5, 12, 22], "outcome"] = np.nan @@ -879,8 +888,11 @@ def test_warns_on_nan_outcomes_with_excess_drop_count(self): match=r"parallel-trend diagnostic: dropped \d+ row\(s\).*additional NaN first-differences", ): _compute_outcome_changes( - df, outcome="outcome", time="period", - treatment_group="treated", unit="unit", + df, + outcome="outcome", + time="period", + treatment_group="treated", + unit="unit", ) def test_warning_label_reflects_public_caller(self): @@ -892,24 +904,35 @@ def test_warning_label_reflects_public_caller(self): for unit in range(10): treated = int(unit >= 5) for t in range(1, 5): - rows.append({ - "unit": unit, "period": t, - "treated": treated, "outcome": rng.normal(), - }) + rows.append( + { + "unit": unit, + "period": t, + "treated": treated, + "outcome": rng.normal(), + } + ) df = pd.DataFrame(rows) df.loc[[5, 12, 22], "outcome"] = np.nan with pytest.warns(UserWarning, match="check_parallel_trends_robust:"): check_parallel_trends_robust( - df, outcome="outcome", time="period", - treatment_group="treated", unit="unit", - n_permutations=100, seed=0, + df, + outcome="outcome", + time="period", + treatment_group="treated", + unit="unit", + n_permutations=100, + seed=0, ) with pytest.warns(UserWarning, match="equivalence_test_trends:"): equivalence_test_trends( - df, outcome="outcome", time="period", - treatment_group="treated", unit="unit", + df, + outcome="outcome", + time="period", + treatment_group="treated", + unit="unit", ) @@ -930,7 +953,7 @@ def test_reproducibility_with_seed(self, parallel_trends_data): treatment_group="treated", unit="unit", pre_periods=[0, 1, 2], - seed=42 + seed=42, ) results2 = check_parallel_trends_robust( @@ -940,7 +963,7 @@ def test_reproducibility_with_seed(self, parallel_trends_data): treatment_group="treated", unit="unit", pre_periods=[0, 1, 2], - seed=42 + seed=42, ) assert results1["wasserstein_p_value"] == results2["wasserstein_p_value"] @@ -954,7 +977,7 @@ def test_different_seeds_different_results(self, parallel_trends_data): treatment_group="treated", unit="unit", pre_periods=[0, 1, 2], - seed=42 + seed=42, ) results2 = check_parallel_trends_robust( @@ -964,7 +987,7 @@ def test_different_seeds_different_results(self, parallel_trends_data): treatment_group="treated", unit="unit", pre_periods=[0, 1, 2], - seed=123 + seed=123, ) # May be equal by chance but typically different @@ -982,7 +1005,7 @@ def test_n_permutations_affects_precision(self, parallel_trends_data): unit="unit", pre_periods=[0, 1, 2], n_permutations=100, - seed=42 + seed=42, ) results_many = check_parallel_trends_robust( @@ -993,7 +1016,7 @@ def test_n_permutations_affects_precision(self, parallel_trends_data): unit="unit", pre_periods=[0, 1, 2], n_permutations=1000, - seed=42 + seed=42, ) # Both should be valid @@ -1009,7 +1032,7 @@ def test_wasserstein_normalized_returned(self, parallel_trends_data): treatment_group="treated", unit="unit", pre_periods=[0, 1, 2], - seed=42 + seed=42, ) assert "wasserstein_normalized" in results @@ -1024,7 +1047,7 @@ def test_sample_sizes_returned(self, parallel_trends_data): treatment_group="treated", unit="unit", pre_periods=[0, 1, 2], - seed=42 + seed=42, ) assert "n_treated" in results @@ -1035,12 +1058,14 @@ def test_sample_sizes_returned(self, parallel_trends_data): def test_insufficient_data_returns_nan(self): """Test that insufficient data returns NaN values.""" # Only one observation per group - data = pd.DataFrame({ - "unit": [0, 1], - "period": [0, 0], - "treated": [1, 0], - "outcome": [10.0, 12.0], - }) + data = pd.DataFrame( + { + "unit": [0, 1], + "period": [0, 0], + "treated": [1, 0], + "outcome": [10.0, 12.0], + } + ) results = check_parallel_trends_robust( data, @@ -1049,7 +1074,7 @@ def test_insufficient_data_returns_nan(self): treatment_group="treated", unit="unit", pre_periods=[0], - seed=42 + seed=42, ) assert np.isnan(results["wasserstein_distance"]) @@ -1072,7 +1097,7 @@ def test_tost_p_value_in_range(self, parallel_trends_data): time="period", treatment_group="treated", unit="unit", - pre_periods=[0, 1, 2] + pre_periods=[0, 1, 2], ) assert 0 <= results["tost_p_value"] <= 1 @@ -1085,7 +1110,7 @@ def test_equivalence_margin_auto_set(self, parallel_trends_data): time="period", treatment_group="treated", unit="unit", - pre_periods=[0, 1, 2] + pre_periods=[0, 1, 2], ) assert results["equivalence_margin"] > 0 @@ -1098,7 +1123,7 @@ def test_degrees_of_freedom_returned(self, parallel_trends_data): time="period", treatment_group="treated", unit="unit", - pre_periods=[0, 1, 2] + pre_periods=[0, 1, 2], ) assert "degrees_of_freedom" in results @@ -1113,7 +1138,7 @@ def test_tighter_margin_harder_to_pass(self, parallel_trends_data): treatment_group="treated", unit="unit", pre_periods=[0, 1, 2], - equivalence_margin=10.0 # Very wide margin + equivalence_margin=10.0, # Very wide margin ) results_tight = equivalence_test_trends( @@ -1123,7 +1148,7 @@ def test_tighter_margin_harder_to_pass(self, parallel_trends_data): treatment_group="treated", unit="unit", pre_periods=[0, 1, 2], - equivalence_margin=0.001 # Very tight margin + equivalence_margin=0.001, # Very tight margin ) # Wide margin should have smaller TOST p-value (easier to show equivalence) @@ -1258,9 +1283,7 @@ def test_uniform_weights_equals_did(self): time_weights = np.array([1.0]) tau = compute_sdid_estimator( - Y_pre_control, Y_post_control, - Y_pre_treated, Y_post_treated, - unit_weights, time_weights + Y_pre_control, Y_post_control, Y_pre_treated, Y_post_treated, unit_weights, time_weights ) # Standard DiD: (16-10) - (12-10) = 6 - 2 = 4 @@ -1278,9 +1301,7 @@ def test_concentrated_unit_weights(self): time_weights = np.array([1.0]) tau = compute_sdid_estimator( - Y_pre_control, Y_post_control, - Y_pre_treated, Y_post_treated, - unit_weights, time_weights + Y_pre_control, Y_post_control, Y_pre_treated, Y_post_treated, unit_weights, time_weights ) # DiD using only first control: (20-15) - (12-10) = 5 - 2 = 3 @@ -1297,9 +1318,7 @@ def test_multiple_post_periods(self): time_weights = np.array([1.0]) tau = compute_sdid_estimator( - Y_pre_control, Y_post_control, - Y_pre_treated, Y_post_treated, - unit_weights, time_weights + Y_pre_control, Y_post_control, Y_pre_treated, Y_post_treated, unit_weights, time_weights ) # Treated post mean: (17+19+21)/3 = 19 @@ -1399,12 +1418,7 @@ def test_matches_get_dummies(self, col): def _unbalanced_2way_panel(seed=0, drop=0.30): """Unbalanced (non-orthogonal) 2-way panel: some unit-period cells dropped.""" rng = np.random.default_rng(seed) - rows = [ - (u, t) - for u in range(8) - for t in range(6) - if rng.random() >= drop - ] + rows = [(u, t) for u in range(8) for t in range(6) if rng.random() >= drop] df = pd.DataFrame(rows, columns=["unit", "period"]) n = len(df) df["x1"] = rng.normal(size=n) @@ -1445,10 +1459,14 @@ def _fwl_slopes(demeaned, xcols, weights=None): def _frozen_old_within_transform_weighted( df, variables, unit, time, weights, max_iter=100, tol=1e-8 ): - """Byte-for-byte copy of the PRE-REFACTOR within_transform weighted loop. - - Used as the byte-identity guard: demean_by_groups([unit, time], weighted) and - the refactored within_transform must reproduce this exactly. + """Byte-for-byte copy of the PRE-v3.6.x pandas-groupby weighted MAP loop. + + Historical reference implementation. The v3.6.x factorize-once + bincount + engine is NOT bit-identical to this loop (bincount accumulation is not + Kahan-compensated the way pandas' grouped mean is; drift compounds across + MAP iterations to ~1e-10 order — see REGISTRY "Absorbed Fixed Effects"). + Kept as a drift-bound guard: the new engine must agree with this loop at + atol=1e-9, and with full-dummy WLS ground truth at atol=1e-9. """ w = np.asarray(weights, dtype=np.float64) unit_groups = df[unit].values @@ -1479,19 +1497,11 @@ def test_len1_byte_identical_to_demean_by_group(self, weighted): """One grouping var must delegate to demean_by_group (byte-identical).""" df = _unbalanced_2way_panel(seed=1) w = df["w"].values if weighted else None - out_groups, n_g = demean_by_groups( - df, ["y", "x1"], ["unit"], suffix="_dm", weights=w - ) - out_single, n_s = demean_by_group( - df, ["y", "x1"], "unit", suffix="_dm", weights=w - ) + out_groups, n_g = demean_by_groups(df, ["y", "x1"], ["unit"], suffix="_dm", weights=w) + out_single, n_s = demean_by_group(df, ["y", "x1"], "unit", suffix="_dm", weights=w) assert n_g == n_s - np.testing.assert_array_equal( - out_groups["y_dm"].values, out_single["y_dm"].values - ) - np.testing.assert_array_equal( - out_groups["x1_dm"].values, out_single["x1_dm"].values - ) + np.testing.assert_array_equal(out_groups["y_dm"].values, out_single["y_dm"].values) + np.testing.assert_array_equal(out_groups["x1_dm"].values, out_single["x1_dm"].values) def test_n_effects_is_sum_nunique_minus_one(self): df = _unbalanced_2way_panel(seed=2) @@ -1515,19 +1525,12 @@ def test_unbalanced_2way_matches_full_dummy_ols(self, weighted): def test_n3_absorb_matches_full_dummy_ols(self): """Generalizes to 3 absorbed dimensions.""" rng = np.random.default_rng(4) - rows = [ - (u, t, (u + t) % 4) - for u in range(10) - for t in range(6) - if rng.random() >= 0.4 - ] + rows = [(u, t, (u + t) % 4) for u in range(10) for t in range(6) if rng.random() >= 0.4] df = pd.DataFrame(rows, columns=["unit", "period", "firm"]) n = len(df) df["x1"] = rng.normal(size=n) df["y"] = 2.0 * df["x1"] + rng.normal(size=n) - out, _ = demean_by_groups( - df, ["y", "x1"], ["unit", "period", "firm"], suffix="_dm" - ) + out, _ = demean_by_groups(df, ["y", "x1"], ["unit", "period", "firm"], suffix="_dm") demeaned = {c: out[f"{c}_dm"].values for c in ("y", "x1")} map_slope = _fwl_slopes(demeaned, ["x1"])[0] gt = _full_dummy_slopes(df, ["unit", "period", "firm"], ["x1"])[0] @@ -1538,9 +1541,7 @@ def test_result_orthogonal_to_fe_spans(self, weighted): """Demeaned variables must have ~0 (weighted) group means in every FE dim.""" df = _unbalanced_2way_panel(seed=5) w = df["w"].values if weighted else None - out, _ = demean_by_groups( - df, ["y"], ["unit", "period"], suffix="_dm", weights=w, tol=1e-12 - ) + out, _ = demean_by_groups(df, ["y"], ["unit", "period"], suffix="_dm", weights=w, tol=1e-12) ydm = out["y_dm"].values for g in ("unit", "period"): if weighted: @@ -1551,29 +1552,38 @@ def test_result_orthogonal_to_fe_spans(self, weighted): means = pd.Series(ydm).groupby(df[g].values).transform("mean").values assert np.max(np.abs(means)) < 1e-9 - def test_weighted_byte_identity_vs_frozen_within_transform(self): - """demean_by_groups([unit, time], weighted) reproduces the old loop exactly.""" + def test_weighted_close_to_frozen_pandas_loop(self): + """demean_by_groups([unit, time], weighted) agrees with the legacy + pandas-groupby loop at the documented ~1e-9 drift bound (bincount + accumulation vs Kahan-compensated pandas mean; not bit-identical).""" df = _unbalanced_2way_panel(seed=6) w = df["w"].values - frozen = _frozen_old_within_transform_weighted( - df, ["y", "x1", "x2"], "unit", "period", w - ) + frozen = _frozen_old_within_transform_weighted(df, ["y", "x1", "x2"], "unit", "period", w) out, _ = demean_by_groups( df, ["y", "x1", "x2"], ["unit", "period"], suffix="_dm", weights=w, tol=1e-8 ) for var in ("y", "x1", "x2"): - np.testing.assert_array_equal(out[f"{var}_dm"].values, frozen[var]) + np.testing.assert_allclose(out[f"{var}_dm"].values, frozen[var], atol=1e-9, rtol=0) - def test_within_transform_weighted_unchanged(self): - """The refactored within_transform weighted path is byte-identical to before.""" + def test_within_transform_weighted_close_to_frozen_loop(self): + """within_transform weighted path agrees with the legacy loop at 1e-9.""" df = _unbalanced_2way_panel(seed=7) w = df["w"].values - frozen = _frozen_old_within_transform_weighted( - df, ["y", "x1"], "unit", "period", w - ) + frozen = _frozen_old_within_transform_weighted(df, ["y", "x1"], "unit", "period", w) out = within_transform(df, ["y", "x1"], "unit", "period", weights=w) - np.testing.assert_array_equal(out["y_demeaned"].values, frozen["y"]) - np.testing.assert_array_equal(out["x1_demeaned"].values, frozen["x1"]) + np.testing.assert_allclose(out["y_demeaned"].values, frozen["y"], atol=1e-9, rtol=0) + np.testing.assert_allclose(out["x1_demeaned"].values, frozen["x1"], atol=1e-9, rtol=0) + + def test_within_transform_weighted_matches_full_dummy_wls(self): + """Ground truth (not implementation identity): weighted within_transform + residualization must reproduce full-dummy WLS slopes.""" + df = _unbalanced_2way_panel(seed=7) + w = df["w"].values + out = within_transform(df, ["y", "x1", "x2"], "unit", "period", weights=w) + demeaned = {c: out[f"{c}_demeaned"].values for c in ("y", "x1", "x2")} + map_slopes = _fwl_slopes(demeaned, ["x1", "x2"], weights=w) + gt = _full_dummy_slopes(df, ["unit", "period"], ["x1", "x2"], weights=w) + np.testing.assert_allclose(map_slopes, gt, atol=1e-9, rtol=0) def test_within_transform_unweighted_now_matches_full_dummy(self): """Unweighted within_transform now uses MAP -> exact on unbalanced panels.""" @@ -1588,11 +1598,323 @@ def test_nonconvergence_emits_warning(self): """A starved iteration budget on an unbalanced panel warns (not silent).""" df = _unbalanced_2way_panel(seed=9) with pytest.warns(UserWarning, match="did not converge"): - demean_by_groups( - df, ["y"], ["unit", "period"], suffix="_dm", max_iter=1, tol=1e-15 - ) + demean_by_groups(df, ["y"], ["unit", "period"], suffix="_dm", max_iter=1, tol=1e-15) def test_empty_group_vars_raises(self): df = _unbalanced_2way_panel(seed=10) with pytest.raises(ValueError, match="at least one grouping variable"): demean_by_groups(df, ["y"], []) + + @pytest.mark.parametrize("weighted", [False, True]) + def test_nan_group_key_raises(self, weighted): + """NaN in an absorbed group column must raise, naming the column. + + pd.factorize codes NaN as -1, which would silently index the LAST + group's mean; the pre-v3.6.x behavior was also silently wrong + (unweighted NaN-poisoned rows; weighted passed them through + un-demeaned). REGISTRY "Absorbed Fixed Effects" edge case. + """ + df = _unbalanced_2way_panel(seed=11) + df.loc[df.index[5], "period"] = np.nan + w = df["w"].values if weighted else None + with pytest.raises(ValueError, match="'period' contains NaN"): + demean_by_groups(df, ["y"], ["unit", "period"], suffix="_dm", weights=w) + + def test_zero_total_weight_group_rows_inert_and_finite(self): + """Zero-total-weight groups stay inert (no NaN/Inf poisoning) and + positive-weight groups remain weighted-orthogonal.""" + df = _unbalanced_2way_panel(seed=12) + w = df["w"].values.copy() + zero_units = df["unit"].values == df["unit"].values[0] + w[zero_units] = 0.0 + out, _ = demean_by_groups(df, ["y"], ["unit", "period"], suffix="_dm", weights=w, tol=1e-12) + ydm = out["y_dm"].values + assert np.isfinite(ydm).all() + pos = ~zero_units + num = pd.Series(w * ydm).groupby(df["unit"].values).transform("sum").values + den = pd.Series(w).groupby(df["unit"].values).transform("sum").values + means = np.divide(num, den, out=np.zeros_like(num), where=den > 0) + assert np.max(np.abs(means[pos])) < 1e-9 + + def test_defaults_max_iter_10000_both_entry_points(self): + """Both re-declared max_iter defaults must stay in sync at 10,000 + (fixest fixef.iter / pyfixest fixef_maxiter parity); raising only one + would leave the within_transform family capped differently.""" + import inspect + + assert inspect.signature(demean_by_groups).parameters["max_iter"].default == 10_000 + assert inspect.signature(within_transform).parameters["max_iter"].default == 10_000 + + def test_balanced_panel_converges_within_two_iterations(self): + """Balanced fully-crossed panels have orthogonal FE subspaces: MAP must + converge in ~2 sweeps (iteration-count regression guard).""" + rng = np.random.default_rng(13) + df = pd.DataFrame( + { + "unit": np.repeat(np.arange(30), 8), + "period": np.tile(np.arange(8), 30), + } + ) + df["y"] = rng.normal(size=len(df)) + with warnings.catch_warnings(): + warnings.simplefilter("error") # a convergence warning fails the test + demean_by_groups(df, ["y"], ["unit", "period"], suffix="_dm", max_iter=2, tol=1e-10) + + def test_correlated_fe_needs_over_100_iterations_and_converges(self): + """The headline max_iter change: banded (contiguous-lifetime) two-way + incidence genuinely needs >100 MAP iterations. Under the old cap of 100 + this warned and returned slightly-off residuals; under the 10,000 + default it must converge silently AND match full-dummy OLS. + + Iteration count depends on the angle between the FE subspaces, not on + data size, so this fixture stays small and fast. + """ + rng = np.random.default_rng(14) + n_units, n_periods = 300, 60 + span = max(2, int(n_periods * 0.1)) # 10% contiguous lifetimes + entry = rng.integers(0, n_periods - span + 1, n_units) + rows = [(u, t) for u in range(n_units) for t in range(entry[u], entry[u] + span)] + df = pd.DataFrame(rows, columns=["unit", "period"]) + df["x1"] = rng.normal(size=len(df)) + df["y"] = 1.5 * df["x1"] + rng.normal(size=len(df)) + + # the fixture is honest: the old cap genuinely fails on it + with pytest.warns(UserWarning, match="did not converge"): + demean_by_groups(df.copy(), ["y", "x1"], ["unit", "period"], suffix="_dm", max_iter=100) + + # the new default converges silently and matches ground truth + with warnings.catch_warnings(): + warnings.simplefilter("error") + out, _ = demean_by_groups(df, ["y", "x1"], ["unit", "period"], suffix="_dm") + demeaned = {c: out[f"{c}_dm"].values for c in ("y", "x1")} + map_slope = _fwl_slopes(demeaned, ["x1"])[0] + gt = _full_dummy_slopes(df, ["unit", "period"], ["x1"])[0] + np.testing.assert_allclose(map_slope, gt, atol=1e-8, rtol=0) + + +class TestSnapAbsorbedRegressors: + """Unit tests for the FE-spanned-regressor snap (REGISTRY 'Absorbed FE').""" + + def _frame(self, seed=0): + rng = np.random.default_rng(seed) + n = 40 + return pd.DataFrame( + { + "junk_dm": rng.normal(0.0, 1e-14, n), # spanned: numerical junk + "real_dm": rng.normal(0.0, 1.0, n), # genuinely identified + "g": np.arange(n) % 4, # FE column for group_vars + } + ) + + def test_snaps_junk_keeps_real(self): + from diff_diff.utils import snap_absorbed_regressors + + df = self._frame() + real_before = df["real_dm"].values.copy() + with pytest.warns(UserWarning, match="collinear with the absorbed"): + snapped = snap_absorbed_regressors( + df, + ["junk", "real"], + {"junk": 1.0, "real": 1.0}, + absorbed_desc="test FEs", + group_vars=["g"], + suffix="_dm", + ) + assert snapped == ["junk"] + assert (df["junk_dm"].values == 0.0).all() + np.testing.assert_array_equal(df["real_dm"].values, real_before) + + @pytest.mark.parametrize("action", ["silent", "error"]) + def test_non_warn_actions_snap_without_warning(self, action): + from diff_diff.utils import snap_absorbed_regressors + + df = self._frame() + with warnings.catch_warnings(): + warnings.simplefilter("error") # any warning fails the test + snapped = snap_absorbed_regressors( + df, + ["junk"], + {"junk": 1.0}, + absorbed_desc="test FEs", + group_vars=["g"], + rank_deficient_action=action, + suffix="_dm", + ) + assert snapped == ["junk"] + assert (df["junk_dm"].values == 0.0).all() + + def test_zero_pre_norm_skipped(self): + from diff_diff.utils import snap_absorbed_regressors + + df = self._frame() + with warnings.catch_warnings(): + warnings.simplefilter("error") + snapped = snap_absorbed_regressors( + df, + ["junk"], + {"junk": 0.0}, # all-zero input column: rank handling owns it + absorbed_desc="test FEs", + group_vars=["g"], + suffix="_dm", + ) + assert snapped == [] + + def test_display_names_in_warning(self): + from diff_diff.utils import snap_absorbed_regressors + + df = self._frame() + with pytest.warns(UserWarning, match=r"treated:post"): + snap_absorbed_regressors( + df, + ["junk"], + {"junk": 1.0}, + absorbed_desc="test FEs", + group_vars=["g"], + suffix="_dm", + display_names={"junk": "treated:post"}, + ) + + def test_pre_demean_norms_captures_l2(self): + from diff_diff.utils import pre_demean_norms + + df = pd.DataFrame({"a": [3.0, 4.0], "b": [0, 0]}) + norms = pre_demean_norms(df, ["a", "b"]) + assert norms["a"] == pytest.approx(5.0) + assert norms["b"] == 0.0 + + +class TestReviewFollowupsAbsorbedFE: + """Local-review follow-ups: one-way NaN guard + weight-aware snap.""" + + @pytest.mark.parametrize("weighted", [False, True]) + def test_one_way_nan_group_key_raises(self, weighted): + """The NaN-group guard must cover len(group_vars) == 1 too (the + delegation to demean_by_group previously bypassed it).""" + df = _unbalanced_2way_panel(seed=20) + df["unit"] = df["unit"].astype(float) + df.loc[df.index[3], "unit"] = np.nan + w = df["w"].values if weighted else None + with pytest.raises(ValueError, match="'unit' contains NaN"): + demean_by_groups(df, ["y"], ["unit"], suffix="_dm", weights=w) + + def test_estimator_one_way_absorb_nan_raises(self): + from diff_diff import DifferenceInDifferences + + df = _unbalanced_2way_panel(seed=21) + df["treated"] = (df["unit"] < df["unit"].median()).astype(int) + df["post"] = (df["period"] >= df["period"].median()).astype(int) + df["fe"] = df["unit"].astype(float) + df.loc[df.index[5], "fe"] = np.nan + with pytest.raises(ValueError, match="'fe' contains NaN"): + DifferenceInDifferences().fit( + df, outcome="y", treatment="treated", time="post", absorb=["fe"] + ) + + def test_weight_aware_snap_zero_weight_domain(self): + """A regressor FE-spanned on the POSITIVE-weight sample must snap even + though zero-weight domain rows (left inert by the weighted demean) + carry non-zero demeaned values that would mask it unweighted.""" + from diff_diff.utils import pre_demean_norms, snap_absorbed_regressors + + rng = np.random.default_rng(22) + n_units, n_periods = 30, 6 + unit = np.repeat(np.arange(n_units), n_periods) + period = np.tile(np.arange(n_periods), n_units) + df = pd.DataFrame({"unit": unit, "period": period}) + # xc is unit-constant on positive-weight units (spanned by unit FE + # there) but VARIES on the zero-weight unit's rows + xc = np.repeat(rng.normal(size=n_units), n_periods).astype(float) + zero_rows = unit == 0 + xc[zero_rows] = rng.normal(size=zero_rows.sum()) # varying junk + df["xc"] = xc + df["y"] = rng.normal(size=len(df)) + w = np.ones(len(df)) + w[zero_rows] = 0.0 # domain-excluded unit + + pre = pre_demean_norms(df, ["xc"], weights=w) + out, _ = demean_by_groups( + df, ["y", "xc"], ["unit", "period"], suffix="_dm", weights=w, tol=1e-12 + ) + # unweighted norm is large (inert zero-weight rows keep raw values): + assert float(np.linalg.norm(out["xc_dm"].values)) > 1e-3 + # ...but the weight-aware snap sees the effective sample and fires + with pytest.warns(UserWarning, match="collinear with the absorbed"): + snapped = snap_absorbed_regressors( + out, + ["xc"], + pre, + absorbed_desc="unit and period FEs", + group_vars=["unit", "period"], + suffix="_dm", + weights=w, + ) + assert snapped == ["xc"] + assert (out["xc_dm"].values == 0.0).all() + + +class TestJointSpanSnapConfirmation: + """Local-review P0: a column in the JOINT span of two FE dimensions + (x = a[unit] + b[time]) converges to zero slowly under MAP on unbalanced + panels, so its truncation residual can sit far above the fast-path snap + threshold. Stage 2 (exact LSMR span confirmation) must catch it; a + genuinely identified low-within-variation covariate must NOT be snapped. + """ + + @staticmethod + def _unbalanced_panel(seed=7, n_units=120, n_periods=30, span=9): + rng = np.random.default_rng(seed) + unit = np.repeat(np.arange(n_units), n_periods) + period = np.tile(np.arange(n_periods), n_units) + entry = rng.integers(0, n_periods - span, n_units) + keep = (period >= entry[unit]) & (period < entry[unit] + span) + unit, period = unit[keep], period[keep] + df = pd.DataFrame({"unit": unit, "period": period}) + df["y"] = rng.normal(size=len(df)) + a = rng.normal(0, 1, n_units) + b = rng.normal(0, 1, n_periods) + df["xspan"] = a[unit] + b[period] # exactly in span(unit + period) + return df, rng + + def test_joint_span_column_snapped_via_confirmation(self): + from diff_diff.utils import pre_demean_norms, snap_absorbed_regressors + + df, _ = self._unbalanced_panel() + pre = pre_demean_norms(df, ["xspan"]) + out, _ = demean_by_groups(df, ["y", "xspan"], ["unit", "period"], suffix="_dm", tol=1e-8) + rel = np.linalg.norm(out["xspan_dm"].values) / pre["xspan"] + assert rel > 1e-10 # the fixture is honest: fast path alone misses it + with pytest.warns(UserWarning, match="collinear with the absorbed"): + snapped = snap_absorbed_regressors( + out, + ["xspan"], + pre, + absorbed_desc="unit and period FEs", + group_vars=["unit", "period"], + suffix="_dm", + ) + assert snapped == ["xspan"] + assert (out["xspan_dm"].values == 0.0).all() + + def test_identified_low_variation_covariate_not_snapped(self): + """Counter-test: near-spanned but genuinely identified (tiny real + within-variation) must survive stage 2 untouched.""" + from diff_diff.utils import pre_demean_norms, snap_absorbed_regressors + + df, rng = self._unbalanced_panel(seed=8) + z = rng.normal(size=len(df)) + df["xnear"] = df["xspan"] + 1e-5 * z # real within-FE variation ~1e-5 + pre = pre_demean_norms(df, ["xnear"]) + out, _ = demean_by_groups(df, ["xnear"], ["unit", "period"], suffix="_dm", tol=1e-10) + before = out["xnear_dm"].values.copy() + with warnings.catch_warnings(): + warnings.simplefilter("error") # any snap warning fails the test + snapped = snap_absorbed_regressors( + out, + ["xnear"], + pre, + absorbed_desc="unit and period FEs", + group_vars=["unit", "period"], + suffix="_dm", + ) + assert snapped == [] + np.testing.assert_array_equal(out["xnear_dm"].values, before) diff --git a/tests/test_wooldridge.py b/tests/test_wooldridge.py index e5264a45a..ef9f0ad05 100644 --- a/tests/test_wooldridge.py +++ b/tests/test_wooldridge.py @@ -1093,8 +1093,12 @@ def test_parity_with_dummy_ols_not_yet_treated(self, unbalanced_data): # Build explicit dummy regression on same sample sample = _filter_sample(df, "unit", "time", "cohort", "not_yet_treated", 0) X_int, _, gt_keys = _build_interaction_matrix( - sample, cohort="cohort", time="time", anticipation=0, - control_group="not_yet_treated", method="ols", + sample, + cohort="cohort", + time="time", + anticipation=0, + control_group="not_yet_treated", + method="ols", ) unit_dummies = pd.get_dummies(sample["unit"], drop_first=True).values.astype(float) time_dummies = pd.get_dummies(sample["time"], drop_first=True).values.astype(float) @@ -1169,7 +1173,7 @@ def test_logit_never_treated_post_treatment_only(self, binary_data): binary_data, outcome="y", unit="unit", time="time", cohort="cohort" ) # All cells should be post-treatment - for (g, t) in r.group_time_effects: + for g, t in r.group_time_effects: assert t >= g, f"Pre-treatment cell ({g},{t}) in nonlinear never_treated" # All expected post-treatment cells present expected = {(g, t) for g in [3, 4] for t in range(1, 6) if t >= g} @@ -1180,7 +1184,7 @@ def test_poisson_never_treated_post_treatment_only(self, count_data): r = WooldridgeDiD(method="poisson", control_group="never_treated").fit( count_data, outcome="y", unit="unit", time="time", cohort="cohort" ) - for (g, t) in r.group_time_effects: + for g, t in r.group_time_effects: assert t >= g, f"Pre-treatment cell ({g},{t}) in nonlinear never_treated" expected = {(g, t) for g in [3, 4] for t in range(1, 6) if t >= g} assert set(r.group_time_effects.keys()) == expected @@ -1197,15 +1201,9 @@ def test_interaction_matrix_fewer_cols_for_nonlinear(self): rows.append({"unit": u, "time": t, "cohort": g, "y": 0.0}) df = pd.DataFrame(rows) - X_ols, _, _ = _build_interaction_matrix( - df, "cohort", "time", 0, "never_treated", "ols" - ) - X_logit, _, _ = _build_interaction_matrix( - df, "cohort", "time", 0, "never_treated", "logit" - ) - X_nyt, _, _ = _build_interaction_matrix( - df, "cohort", "time", 0, "not_yet_treated", "ols" - ) + X_ols, _, _ = _build_interaction_matrix(df, "cohort", "time", 0, "never_treated", "ols") + X_logit, _, _ = _build_interaction_matrix(df, "cohort", "time", 0, "never_treated", "logit") + X_nyt, _, _ = _build_interaction_matrix(df, "cohort", "time", 0, "not_yet_treated", "ols") # OLS never_treated > nonlinear never_treated == not_yet_treated assert X_ols.shape[1] > X_logit.shape[1] assert X_logit.shape[1] == X_nyt.shape[1] @@ -1267,24 +1265,32 @@ def test_ols_covariate_parity_with_full_basis_dummy_ols(self, cov_data): cell_cov = np.column_stack([X_int[:, i] * x1_demeaned for i in range(n_int)]) # D_g × X (cohort × covariate) - cohort_cov = np.column_stack([ - (sample["cohort"].values == g).astype(float) * x1_raw for g in groups - ]) + cohort_cov = np.column_stack( + [(sample["cohort"].values == g).astype(float) * x1_raw for g in groups] + ) # f_t × X (time × covariate, drop first) times = sorted(sample["time"].unique()) - time_cov = np.column_stack([ - (sample["time"].values == t).astype(float) * x1_raw for t in times[1:] - ]) + time_cov = np.column_stack( + [(sample["time"].values == t).astype(float) * x1_raw for t in times[1:]] + ) # Full design: intercept + cells + cell×cov + D_g×X + f_t×X + raw_X + unit + time dummies unit_dummies = pd.get_dummies(sample["unit"], drop_first=True).values.astype(float) time_dummies = pd.get_dummies(sample["time"], drop_first=True).values.astype(float) intercept = np.ones((len(sample), 1)) - X_full = np.hstack([ - intercept, X_int, cell_cov, cohort_cov, time_cov, - x1_raw.reshape(-1, 1), unit_dummies, time_dummies, - ]) + X_full = np.hstack( + [ + intercept, + X_int, + cell_cov, + cohort_cov, + time_cov, + x1_raw.reshape(-1, 1), + unit_dummies, + time_dummies, + ] + ) y = sample["y"].values coefs_dummy, _, _ = solve_ols(X_full, y, rank_deficient_action="silent") @@ -1304,12 +1310,10 @@ def test_covariates_affect_ols_att(self, cov_data): r_cov = WooldridgeDiD().fit( df, outcome="y", unit="unit", time="time", cohort="cohort", exovar=["x1"] ) - r_nocov = WooldridgeDiD().fit( - df, outcome="y", unit="unit", time="time", cohort="cohort" - ) - assert r_cov.overall_att != r_nocov.overall_att, ( - "Covariate-adjusted ATT should differ from unadjusted" - ) + r_nocov = WooldridgeDiD().fit(df, outcome="y", unit="unit", time="time", cohort="cohort") + assert ( + r_cov.overall_att != r_nocov.overall_att + ), "Covariate-adjusted ATT should differ from unadjusted" class TestWooldridgeSurvey: @@ -1333,20 +1337,33 @@ def survey_panel(self): y_bin = int(rng.random() < 1 / (1 + np.exp(-eta))) mu = np.exp(0.5 + 0.5 * treated + 0.1 * rng.standard_normal()) y_count = float(rng.poisson(mu)) - rows.append({ - "unit": u, "time": t, "cohort": cohort, - "y": y_cont, "y_bin": y_bin, "y_count": y_count, - "stratum": stratum, "psu": psu, "weight": weight, - }) + rows.append( + { + "unit": u, + "time": t, + "cohort": cohort, + "y": y_cont, + "y_bin": y_bin, + "y_count": y_count, + "stratum": stratum, + "psu": psu, + "weight": weight, + } + ) return pd.DataFrame(rows) def test_ols_survey_runs(self, survey_panel): """OLS with full survey design completes.""" from diff_diff.survey import SurveyDesign + sd = SurveyDesign(weights="weight", strata="stratum", psu="psu") r = WooldridgeDiD().fit( - survey_panel, outcome="y", unit="unit", time="time", - cohort="cohort", survey_design=sd, + survey_panel, + outcome="y", + unit="unit", + time="time", + cohort="cohort", + survey_design=sd, ) assert np.isfinite(r.overall_att) assert np.isfinite(r.overall_se) @@ -1355,13 +1372,21 @@ def test_ols_survey_runs(self, survey_panel): def test_ols_survey_se_differs_from_naive(self, survey_panel): """Survey SE should differ from naive (unweighted) SE.""" from diff_diff.survey import SurveyDesign + sd = SurveyDesign(weights="weight", strata="stratum", psu="psu") r_survey = WooldridgeDiD().fit( - survey_panel, outcome="y", unit="unit", time="time", - cohort="cohort", survey_design=sd, + survey_panel, + outcome="y", + unit="unit", + time="time", + cohort="cohort", + survey_design=sd, ) r_naive = WooldridgeDiD().fit( - survey_panel, outcome="y", unit="unit", time="time", + survey_panel, + outcome="y", + unit="unit", + time="time", cohort="cohort", ) assert r_survey.overall_se != r_naive.overall_se @@ -1369,10 +1394,15 @@ def test_ols_survey_se_differs_from_naive(self, survey_panel): def test_logit_survey_runs(self, survey_panel): """Logit with survey design completes.""" from diff_diff.survey import SurveyDesign + sd = SurveyDesign(weights="weight", strata="stratum", psu="psu") r = WooldridgeDiD(method="logit").fit( - survey_panel, outcome="y_bin", unit="unit", time="time", - cohort="cohort", survey_design=sd, + survey_panel, + outcome="y_bin", + unit="unit", + time="time", + cohort="cohort", + survey_design=sd, ) assert np.isfinite(r.overall_att) assert np.isfinite(r.overall_se) @@ -1381,10 +1411,15 @@ def test_logit_survey_runs(self, survey_panel): def test_poisson_survey_runs(self, survey_panel): """Poisson with survey design completes.""" from diff_diff.survey import SurveyDesign + sd = SurveyDesign(weights="weight", strata="stratum", psu="psu") r = WooldridgeDiD(method="poisson").fit( - survey_panel, outcome="y_count", unit="unit", time="time", - cohort="cohort", survey_design=sd, + survey_panel, + outcome="y_count", + unit="unit", + time="time", + cohort="cohort", + survey_design=sd, ) assert np.isfinite(r.overall_att) assert np.isfinite(r.overall_se) @@ -1393,20 +1428,32 @@ def test_poisson_survey_runs(self, survey_panel): def test_bootstrap_survey_rejected(self, survey_panel): """n_bootstrap > 0 with survey_design raises ValueError.""" from diff_diff.survey import SurveyDesign + sd = SurveyDesign(weights="weight") - with pytest.raises(ValueError, match="Bootstrap inference is not supported with survey_design"): + with pytest.raises( + ValueError, match="Bootstrap inference is not supported with survey_design" + ): WooldridgeDiD(n_bootstrap=100).fit( - survey_panel, outcome="y", unit="unit", time="time", - cohort="cohort", survey_design=sd, + survey_panel, + outcome="y", + unit="unit", + time="time", + cohort="cohort", + survey_design=sd, ) def test_weights_only_survey(self, survey_panel): """Weights-only survey (no strata/PSU) works.""" from diff_diff.survey import SurveyDesign + sd = SurveyDesign(weights="weight") r = WooldridgeDiD().fit( - survey_panel, outcome="y", unit="unit", time="time", - cohort="cohort", survey_design=sd, + survey_panel, + outcome="y", + unit="unit", + time="time", + cohort="cohort", + survey_design=sd, ) assert np.isfinite(r.overall_att) assert np.isfinite(r.overall_se) @@ -1415,10 +1462,15 @@ def test_weights_only_survey(self, survey_panel): def test_survey_metadata_present(self, survey_panel): """survey_metadata is populated with correct fields.""" from diff_diff.survey import SurveyDesign + sd = SurveyDesign(weights="weight", strata="stratum", psu="psu") r = WooldridgeDiD().fit( - survey_panel, outcome="y", unit="unit", time="time", - cohort="cohort", survey_design=sd, + survey_panel, + outcome="y", + unit="unit", + time="time", + cohort="cohort", + survey_design=sd, ) sm = r.survey_metadata assert sm is not None @@ -1432,6 +1484,7 @@ def test_survey_metadata_present(self, survey_panel): def test_replicate_weights_rejected(self, survey_panel): """Replicate-weight designs raise NotImplementedError.""" from diff_diff.survey import SurveyDesign + # Add replicate weight columns survey_panel["rep_w1"] = 1.0 survey_panel["rep_w2"] = 1.0 @@ -1442,17 +1495,26 @@ def test_replicate_weights_rejected(self, survey_panel): ) with pytest.raises(NotImplementedError, match="replicate-weight variance"): WooldridgeDiD().fit( - survey_panel, outcome="y", unit="unit", time="time", - cohort="cohort", survey_design=sd, + survey_panel, + outcome="y", + unit="unit", + time="time", + cohort="cohort", + survey_design=sd, ) def test_weights_only_plus_cluster(self, survey_panel): """Weights-only survey + cluster= injects cluster as PSU.""" from diff_diff.survey import SurveyDesign + sd = SurveyDesign(weights="weight") r = WooldridgeDiD(cluster="stratum").fit( - survey_panel, outcome="y", unit="unit", time="time", - cohort="cohort", survey_design=sd, + survey_panel, + outcome="y", + unit="unit", + time="time", + cohort="cohort", + survey_design=sd, ) # Cluster should have been injected as PSU n_strata = survey_panel["stratum"].nunique() @@ -1461,38 +1523,55 @@ def test_weights_only_plus_cluster(self, survey_panel): # SE should differ from same run without cluster r_no_cluster = WooldridgeDiD().fit( - survey_panel, outcome="y", unit="unit", time="time", - cohort="cohort", survey_design=sd, + survey_panel, + outcome="y", + unit="unit", + time="time", + cohort="cohort", + survey_design=sd, ) assert r.overall_se != r_no_cluster.overall_se def test_survey_gt_weights_are_counts(self, survey_panel): """Survey aggregation uses cell counts, not survey-weight sums.""" from diff_diff.survey import SurveyDesign + sd = SurveyDesign(weights="weight", strata="stratum", psu="unit") r = WooldridgeDiD(method="logit").fit( - survey_panel, outcome="y_bin", unit="unit", time="time", - cohort="cohort", survey_design=sd, + survey_panel, + outcome="y_bin", + unit="unit", + time="time", + cohort="cohort", + survey_design=sd, ) for k, w in r._gt_weights.items(): assert isinstance(w, int), ( - f"gt_weights[{k}] = {w} (type {type(w).__name__}); " - f"expected int (cell count)" + f"gt_weights[{k}] = {w} (type {type(w).__name__}); " f"expected int (cell count)" ) def test_weights_only_no_cluster_implicit_psu(self, survey_panel): """Weights-only survey without cluster= keeps implicit per-obs PSUs.""" from diff_diff.survey import SurveyDesign from diff_diff.wooldridge import _filter_sample + sd = SurveyDesign(weights="weight") r = WooldridgeDiD().fit( - survey_panel, outcome="y", unit="unit", time="time", - cohort="cohort", survey_design=sd, + survey_panel, + outcome="y", + unit="unit", + time="time", + cohort="cohort", + survey_design=sd, ) # n_psu should equal n_obs in the filtered sample (not n_units) sample = _filter_sample( survey_panel.copy().assign(cohort=lambda d: d["cohort"].fillna(0)), - "unit", "time", "cohort", "not_yet_treated", 0, + "unit", + "time", + "cohort", + "not_yet_treated", + 0, ) assert r.survey_metadata is not None assert r.survey_metadata.n_psu == len(sample) @@ -1500,6 +1579,7 @@ def test_weights_only_no_cluster_implicit_psu(self, survey_panel): def test_fweight_rejected(self, survey_panel): """fweight raises ValueError (pweight only).""" from diff_diff.survey import SurveyDesign + # Use integer weights so fweight validation passes in resolve(), # and the pweight guard in _resolve_survey_for_wooldridge fires. df = survey_panel.copy() @@ -1507,20 +1587,29 @@ def test_fweight_rejected(self, survey_panel): sd = SurveyDesign(weights="int_weight", weight_type="fweight") with pytest.raises(ValueError, match="weight_type='pweight'"): WooldridgeDiD().fit( - df, outcome="y", unit="unit", time="time", - cohort="cohort", survey_design=sd, + df, + outcome="y", + unit="unit", + time="time", + cohort="cohort", + survey_design=sd, ) def test_poisson_zero_weight_cell(self, survey_panel): """Poisson survey fit handles zero-weight treated cells cleanly.""" from diff_diff.survey import SurveyDesign + df = survey_panel.copy() # Zero out weights for one treated cohort so some cells have zero weight df.loc[df["cohort"] == 3, "weight"] = 0.0 sd = SurveyDesign(weights="weight", strata="stratum", psu="unit") r = WooldridgeDiD(method="poisson").fit( - df, outcome="y_count", unit="unit", time="time", - cohort="cohort", survey_design=sd, + df, + outcome="y_count", + unit="unit", + time="time", + cohort="cohort", + survey_design=sd, ) assert np.isfinite(r.overall_att) assert np.isfinite(r.overall_se) @@ -1528,12 +1617,17 @@ def test_poisson_zero_weight_cell(self, survey_panel): def test_ols_survey_rank_deficient(self, survey_panel): """Survey OLS handles rank-deficient all-eventually-treated designs.""" from diff_diff.survey import SurveyDesign + # Remove never-treated (cohort=0) to create rank-deficient design df = survey_panel[survey_panel["cohort"] > 0].copy() sd = SurveyDesign(weights="weight", strata="stratum", psu="unit") r = WooldridgeDiD(control_group="not_yet_treated").fit( - df, outcome="y", unit="unit", time="time", - cohort="cohort", survey_design=sd, + df, + outcome="y", + unit="unit", + time="time", + cohort="cohort", + survey_design=sd, ) assert np.isfinite(r.overall_att) assert np.isfinite(r.overall_se) @@ -1541,25 +1635,35 @@ def test_ols_survey_rank_deficient(self, survey_panel): def test_ols_survey_zero_weight_unit_rejected(self, survey_panel): """Zero-weight unit raises ValueError before within_transform.""" from diff_diff.survey import SurveyDesign + df = survey_panel.copy() # Zero out all weights for unit 0 df.loc[df["unit"] == 0, "weight"] = 0.0 sd = SurveyDesign(weights="weight", strata="stratum", psu="unit") with pytest.raises(ValueError, match="Survey weights sum to zero for unit"): WooldridgeDiD().fit( - df, outcome="y", unit="unit", time="time", - cohort="cohort", survey_design=sd, + df, + outcome="y", + unit="unit", + time="time", + cohort="cohort", + survey_design=sd, ) def test_logit_survey_zero_weight_cell(self, survey_panel): """Logit survey fit skips zero-weight treated cells cleanly.""" from diff_diff.survey import SurveyDesign + df = survey_panel.copy() df.loc[df["cohort"] == 3, "weight"] = 0.0 sd = SurveyDesign(weights="weight", strata="stratum", psu="unit") r = WooldridgeDiD(method="logit").fit( - df, outcome="y_bin", unit="unit", time="time", - cohort="cohort", survey_design=sd, + df, + outcome="y_bin", + unit="unit", + time="time", + cohort="cohort", + survey_design=sd, ) assert np.isfinite(r.overall_att) assert np.isfinite(r.overall_se) @@ -1567,23 +1671,33 @@ def test_logit_survey_zero_weight_cell(self, survey_panel): def test_ols_survey_non_range_index(self, survey_panel): """OLS survey zero-weight guard works with non-RangeIndex DataFrames.""" from diff_diff.survey import SurveyDesign + df = survey_panel.copy() df.index = df.index + 1000 # shift to non-zero-based index df.loc[df["unit"] == 0, "weight"] = 0.0 sd = SurveyDesign(weights="weight", strata="stratum", psu="unit") with pytest.raises(ValueError, match="Survey weights sum to zero for unit"): WooldridgeDiD().fit( - df, outcome="y", unit="unit", time="time", - cohort="cohort", survey_design=sd, + df, + outcome="y", + unit="unit", + time="time", + cohort="cohort", + survey_design=sd, ) def test_survey_aggregate_and_summary(self, survey_panel): """Survey aggregate() uses df_survey and summary() shows survey block.""" from diff_diff.survey import SurveyDesign + sd = SurveyDesign(weights="weight", strata="stratum", psu="unit") r = WooldridgeDiD().fit( - survey_panel, outcome="y", unit="unit", time="time", - cohort="cohort", survey_design=sd, + survey_panel, + outcome="y", + unit="unit", + time="time", + cohort="cohort", + survey_design=sd, ) # aggregate() should use t-distribution with survey df r.aggregate("group") @@ -1607,10 +1721,14 @@ def _make_panel_with_nan_cohort(): for unit in range(10): cohort_val = np.nan if unit < 2 else 0.0 for t in range(1, 5): - rows.append({ - "unit": unit, "time": t, "cohort": cohort_val, - "y": unit + t + np.random.default_rng(unit).normal(0, 0.1), - }) + rows.append( + { + "unit": unit, + "time": t, + "cohort": cohort_val, + "y": unit + t + np.random.default_rng(unit).normal(0, 0.1), + } + ) return pd.DataFrame(rows) def test_fit_warns_on_nan_cohort_with_count(self): @@ -1624,6 +1742,7 @@ def test_fit_warns_on_nan_cohort_with_count(self): def test_fit_silent_on_clean_cohort(self): import warnings + df = self._make_panel_with_nan_cohort() df["cohort"] = df["cohort"].fillna(0) est = WooldridgeDiD(method="ols") @@ -1640,8 +1759,12 @@ def test_select_sample_helper_warns(self): df = self._make_panel_with_nan_cohort() with pytest.warns(UserWarning, match=r"8 row\(s\) have NaN cohort values"): _filter_sample( - df, unit="unit", time="time", cohort="cohort", - control_group="never_treated", anticipation=0, + df, + unit="unit", + time="time", + cohort="cohort", + control_group="never_treated", + anticipation=0, ) @@ -1669,9 +1792,7 @@ def _make_vcov_panel(n_units=40, n_periods=6, seed=202605211230): 0.0, ) y = 0.7 + 0.1 * periods + 0.05 * units + tau + 0.15 * rng.normal(size=len(units)) - return pd.DataFrame( - {"unit": units, "time": periods, "cohort": cohort_per_obs, "y": y} - ) + return pd.DataFrame({"unit": units, "time": periods, "cohort": cohort_per_obs, "y": y}) class TestWooldridgeVcovType: @@ -1683,19 +1804,23 @@ def test_default_vcov_type_is_hc1(self): assert est._vcov_type_explicit is False def test_hc1_se_bit_equal_to_pre_pr_baseline(self): - """HC1 within-transform path must match pre-PR baseline at atol=1e-14. - - Baseline captured on the Phase 1b PR 3/8 branch with - ``_make_vcov_panel(seed=202605211230)``. FWL preserves the CR1 - cluster-robust score, so the new ``vcov_type`` branching keeps HC1 - bit-equal to the prior hard-coded HC1 behavior. + """HC1 within-transform path must match the frozen baseline at atol=1e-14. + + Baseline originally captured on the Phase 1b PR 3/8 branch with + ``_make_vcov_panel(seed=202605211230)`` (FWL preserves the CR1 + cluster-robust score, so the ``vcov_type`` branching kept HC1 bit-equal + to the prior hard-coded HC1 behavior). Recaptured under the v3.6.x + factorize-once + bincount demeaner, which moved the values by ~1e-15 + (bincount accumulation vs pandas' compensated grouped mean - see + REGISTRY "Absorbed Fixed Effects"); the 1e-14 lock semantics are + unchanged. """ df = _make_vcov_panel() res = WooldridgeDiD(method="ols", vcov_type="hc1").fit( df, outcome="y", unit="unit", time="time", cohort="cohort" ) - assert res.overall_att == pytest.approx(0.9178849934516247, abs=1e-14) - assert res.overall_se == pytest.approx(0.03149488781317814, abs=1e-14) + assert res.overall_att == pytest.approx(0.9178849934516233, abs=1e-14) + assert res.overall_se == pytest.approx(0.03149488781317813, abs=1e-14) def test_hc2_bm_finite_and_inflates_over_hc1(self): df = _make_vcov_panel() @@ -1790,7 +1915,11 @@ def test_survey_design_plus_hc2_bm_rejected(self): est = WooldridgeDiD(method="ols", vcov_type="hc2_bm") with pytest.raises(NotImplementedError, match=r"survey_design"): est.fit( - df, outcome="y", unit="unit", time="time", cohort="cohort", + df, + outcome="y", + unit="unit", + time="time", + cohort="cohort", survey_design=design, ) @@ -1803,7 +1932,11 @@ def test_survey_design_plus_classical_rejected(self): est = WooldridgeDiD(method="ols", vcov_type="classical") with pytest.raises(NotImplementedError, match=r"survey_design"): est.fit( - df, outcome="y", unit="unit", time="time", cohort="cohort", + df, + outcome="y", + unit="unit", + time="time", + cohort="cohort", survey_design=design, ) @@ -1841,13 +1974,11 @@ def test_hc2_bm_plus_bootstrap_finite_inference(self): df, outcome="y", unit="unit", time="time", cohort="cohort" ) # Bootstrap fit on the same data + seed. - res_boot = WooldridgeDiD( - method="ols", vcov_type="hc2_bm", n_bootstrap=50, seed=0 - ).fit(df, outcome="y", unit="unit", time="time", cohort="cohort") - # ATT is unchanged by the bootstrap (only SE is overridden) - assert res_boot.overall_att == pytest.approx( - res_analytical.overall_att, abs=1e-10 + res_boot = WooldridgeDiD(method="ols", vcov_type="hc2_bm", n_bootstrap=50, seed=0).fit( + df, outcome="y", unit="unit", time="time", cohort="cohort" ) + # ATT is unchanged by the bootstrap (only SE is overridden) + assert res_boot.overall_att == pytest.approx(res_analytical.overall_att, abs=1e-10) # SE finite + sensible (positive, smaller than the panel SD of y) assert np.isfinite(res_boot.overall_se) assert res_boot.overall_se > 0 @@ -1884,18 +2015,14 @@ def test_hc2_bm_plus_bootstrap_rank_deficient(self): periods = np.tile(np.arange(1, n_periods + 1), n_units) cohorts = rng.choice([3, 5, 7], size=n_units) cohort_per_obs = cohorts[units] - tau = np.where( - periods >= cohort_per_obs, 0.5 + 0.2 * (periods - cohort_per_obs), 0.0 - ) + tau = np.where(periods >= cohort_per_obs, 0.5 + 0.2 * (periods - cohort_per_obs), 0.0) y = 1.0 + 0.1 * periods + tau + 0.1 * rng.normal(size=len(units)) - df = pd.DataFrame( - {"unit": units, "time": periods, "cohort": cohort_per_obs, "y": y} - ) + df = pd.DataFrame({"unit": units, "time": periods, "cohort": cohort_per_obs, "y": y}) with warnings.catch_warnings(): warnings.simplefilter("ignore", UserWarning) - res = WooldridgeDiD( - method="ols", vcov_type="hc2_bm", n_bootstrap=50, seed=0 - ).fit(df, outcome="y", unit="unit", time="time", cohort="cohort") + res = WooldridgeDiD(method="ols", vcov_type="hc2_bm", n_bootstrap=50, seed=0).fit( + df, outcome="y", unit="unit", time="time", cohort="cohort" + ) assert np.isfinite(res.overall_att) assert np.isfinite(res.overall_se) assert res.overall_se > 0 @@ -1963,7 +2090,11 @@ def test_survey_design_clears_cluster_metadata(self): # have surfaced cluster_name='unit', n_clusters=N — but survey TSL # replaces that vcov, so the dataclass must report None. res = WooldridgeDiD(method="ols", vcov_type="hc1").fit( - df, outcome="y", unit="unit", time="time", cohort="cohort", + df, + outcome="y", + unit="unit", + time="time", + cohort="cohort", survey_design=design, ) assert res.survey_metadata is not None @@ -2077,7 +2208,9 @@ def test_aggregate_event_under_hc2_bm_uses_bm_contrast_dof(self): for k, eff in res.event_study_effects.items(): assert np.isfinite(eff["att"]) assert np.isfinite(eff["se"]) - assert np.isfinite(eff["t_stat"]), f"event k={k} t_stat NaN — BM DOF threading regressed" + assert np.isfinite( + eff["t_stat"] + ), f"event k={k} t_stat NaN — BM DOF threading regressed" assert np.isfinite(eff["p_value"]) assert np.isfinite(eff["conf_int"][0]) assert np.isfinite(eff["conf_int"][1]) @@ -2094,7 +2227,9 @@ def test_aggregate_calendar_under_hc2_bm_uses_bm_contrast_dof(self): for t, eff in res.calendar_effects.items(): assert np.isfinite(eff["att"]) assert np.isfinite(eff["se"]) - assert np.isfinite(eff["t_stat"]), f"calendar t={t} t_stat NaN — BM DOF threading regressed" + assert np.isfinite( + eff["t_stat"] + ), f"calendar t={t} t_stat NaN — BM DOF threading regressed" assert np.isfinite(eff["p_value"]) assert np.isfinite(eff["conf_int"][0]) assert np.isfinite(eff["conf_int"][1]) @@ -2113,13 +2248,9 @@ def test_hc2_bm_handles_rank_deficient_all_eventually_treated(self): periods = np.tile(np.arange(1, n_periods + 1), n_units) cohorts = rng.choice([3, 5, 7], size=n_units) cohort_per_obs = cohorts[units] - tau = np.where( - periods >= cohort_per_obs, 0.5 + 0.2 * (periods - cohort_per_obs), 0.0 - ) + tau = np.where(periods >= cohort_per_obs, 0.5 + 0.2 * (periods - cohort_per_obs), 0.0) y = 1.0 + 0.1 * periods + tau + 0.1 * rng.normal(size=len(units)) - df = pd.DataFrame( - {"unit": units, "time": periods, "cohort": cohort_per_obs, "y": y} - ) + df = pd.DataFrame({"unit": units, "time": periods, "cohort": cohort_per_obs, "y": y}) # Expect a rank-deficient warning from solve_ols (late-cohort drop). with warnings.catch_warnings(): warnings.simplefilter("ignore", UserWarning) @@ -2133,7 +2264,9 @@ def test_hc2_bm_handles_rank_deficient_all_eventually_treated(self): for k, eff in res.group_time_effects.items(): assert np.isfinite(eff["att"]), f"({k}) att NaN" assert np.isfinite(eff["se"]), f"({k}) se NaN" - assert np.isfinite(eff["t_stat"]), f"({k}) t_stat NaN — BM DOF not threaded on reduced design" + assert np.isfinite( + eff["t_stat"] + ), f"({k}) t_stat NaN — BM DOF not threaded on reduced design" assert np.isfinite(eff["p_value"]), f"({k}) p_value NaN" assert np.isfinite(eff["conf_int"][0]) assert np.isfinite(eff["conf_int"][1]) @@ -2146,18 +2279,24 @@ def test_hc2_bm_handles_rank_deficient_all_eventually_treated(self): res.aggregate(agg_type) assert res.event_study_effects is not None for k, eff in res.event_study_effects.items(): - assert np.isfinite(eff["t_stat"]), f"event k={k} t_stat NaN — aggregate BM DOF on reduced design regressed" + assert np.isfinite( + eff["t_stat"] + ), f"event k={k} t_stat NaN — aggregate BM DOF on reduced design regressed" assert np.isfinite(eff["p_value"]) assert res.group_effects is not None for g, eff in res.group_effects.items(): - assert np.isfinite(eff["t_stat"]), f"group g={g} t_stat NaN — aggregate BM DOF on reduced design regressed" + assert np.isfinite( + eff["t_stat"] + ), f"group g={g} t_stat NaN — aggregate BM DOF on reduced design regressed" assert np.isfinite(eff["p_value"]) assert res.calendar_effects is not None # Calendar entries with at least one identified treated cell should # have finite BM inference; entirely-pre-treatment calendar periods # are absent from calendar_effects (their cells aren't post-treatment). for t, eff in res.calendar_effects.items(): - assert np.isfinite(eff["t_stat"]), f"calendar t={t} t_stat NaN — aggregate BM DOF on reduced design regressed" + assert np.isfinite( + eff["t_stat"] + ), f"calendar t={t} t_stat NaN — aggregate BM DOF on reduced design regressed" assert np.isfinite(eff["p_value"]) def test_hc2_bm_handles_rank_deficient_with_unit_invariant_exovar(self): @@ -2182,7 +2321,9 @@ def test_hc2_bm_handles_rank_deficient_with_unit_invariant_exovar(self): for k, eff in res.group_time_effects.items(): assert np.isfinite(eff["att"]), f"({k}) att NaN" assert np.isfinite(eff["se"]), f"({k}) se NaN" - assert np.isfinite(eff["t_stat"]), f"({k}) t_stat NaN under rank-deficient exovar — BM DOF not threaded" + assert np.isfinite( + eff["t_stat"] + ), f"({k}) t_stat NaN under rank-deficient exovar — BM DOF not threaded" assert np.isfinite(eff["p_value"]) assert np.isfinite(res.overall_t_stat) assert np.isfinite(res.overall_p_value) @@ -2193,7 +2334,9 @@ def test_hc2_bm_handles_rank_deficient_with_unit_invariant_exovar(self): for g, eff in (res.group_effects or {}).items(): assert np.isfinite(eff["t_stat"]), f"group g={g} t_stat NaN under rank-deficient exovar" for t, eff in (res.calendar_effects or {}).items(): - assert np.isfinite(eff["t_stat"]), f"calendar t={t} t_stat NaN under rank-deficient exovar" + assert np.isfinite( + eff["t_stat"] + ), f"calendar t={t} t_stat NaN under rank-deficient exovar" for k, eff in (res.event_study_effects or {}).items(): assert np.isfinite(eff["t_stat"]), f"event k={k} t_stat NaN under rank-deficient exovar" @@ -2351,3 +2494,36 @@ def test_filter_sample_preserves_outcome_support(self): for cg in ("not_yet_treated", "never_treated"): sample = _filter_sample(df, "unit", "time", "cohort", cg, 0) assert sorted(sample["y"].unique()) == sorted(df["y"].unique()) + + +class TestAbsorbedCovariateSnap: + """Unit-constant exovar columns (and their cohort interactions, which are + also unit-constant) are spanned by the unit FE on the within-transform + path: they must snap to deterministic NaN with a cause warning + (REGISTRY 'Absorbed FE').""" + + def test_unit_constant_exovar_snaps_with_cause_warning(self): + rng = np.random.default_rng(5) + n_units, n_periods = 60, 6 + units = np.repeat(np.arange(n_units), n_periods) + t = np.tile(np.arange(n_periods), n_units) + cohort = np.repeat( + np.where(np.arange(n_units) % 3 == 0, 0, np.where(np.arange(n_units) % 3 == 1, 3, 4)), + n_periods, + ) + d = (cohort > 0) & (t >= cohort) + y = 0.5 * d + rng.normal(size=len(units)) + df = pd.DataFrame({"unit": units, "time": t, "cohort": cohort, "y": y}) + df["xc"] = np.repeat(rng.normal(size=n_units), n_periods) + + with pytest.warns(UserWarning, match="collinear with the absorbed"): + res = WooldridgeDiD(method="ols", vcov_type="hc1").fit( + df, + outcome="y", + unit="unit", + time="time", + cohort="cohort", + exovar=["xc"], + ) + assert np.isfinite(res.overall_att) + assert np.isfinite(res.overall_se) From a03319a048e69d4da0739044e72fe4c3fbdffa58 Mon Sep 17 00:00:00 2001 From: igerber Date: Thu, 2 Jul 2026 19:46:33 -0400 Subject: [PATCH 2/2] docs: mention snap_absorbed_regressors in the utils doc-deps note (review P3) Co-Authored-By: Claude Fable 5 --- docs/doc-deps.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/doc-deps.yaml b/docs/doc-deps.yaml index e90ea6b70..a773db435 100644 --- a/docs/doc-deps.yaml +++ b/docs/doc-deps.yaml @@ -904,7 +904,7 @@ sources: - path: docs/methodology/REGISTRY.md section: "Absorbed Fixed Effects with Survey Weights" type: methodology - note: "demean_by_groups / within_transform: MAP convergence contract (max_iter/tol), bincount accumulation numerics, NaN-group-key guard" + note: "demean_by_groups / within_transform: MAP convergence contract (max_iter/tol), bincount accumulation numerics, NaN-group-key guard, snap_absorbed_regressors FE-spanned snap + exact LSMR span confirmation" - path: docs/practitioner_getting_started.rst section: "Step 4: Check Whether the Result Is Trustworthy" type: user_guide