Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,47 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Changed
- **FE-absorption demeaning rewritten: factorize-once + `np.bincount` method of alternating
projections** (`demean_by_groups` / `within_transform`). Each absorbed dimension is factorized
once and group means are formed via `np.bincount` instead of re-hashing the group keys with
`pandas.groupby().transform("mean")` on every iteration x variable x dimension. Affects every
fit of `TwoWayFixedEffects`, `SunAbraham`, `BaconDecomposition`, `WooldridgeDiD`,
`DifferenceInDifferences`/`MultiPeriodDiD` with `absorb=`, and the survey replicate-refit path.
Convergence semantics (per-variable sweep order, `max|x - x_old| < tol` criterion, zero-total-
weight group guard, non-convergence warning) are unchanged. **Numerical contract**: bincount
accumulation is not Kahan-compensated the way pandas' grouped mean is, so demeaned values agree
with the prior implementation to ~1e-10 order rather than bit-for-bit; estimator estimates are
validated unchanged at the benchmark identity gate
(`bench_fe_absorption.py --check-estimates`, coefficients atol=1e-9).
- **MAP iteration cap raised from 100 to 10,000** (`max_iter` in both `demean_by_groups` and
`within_transform`), matching the R `fixest` (`fixef.iter`) and `pyfixest` (`fixef_maxiter`)
defaults. Correlated FE incidence (e.g. stores active in contiguous week windows) genuinely
needs hundreds of iterations; the old cap made such fits warn and return slightly-off residuals.
Worst-case trade-off accepted deliberately: a truly non-convergent input now iterates 10,000x
before warning. Documented in `docs/methodology/REGISTRY.md` "Absorbed Fixed Effects".
- **NaN in an absorbed group column now raises `ValueError`** naming the column (previously the
unweighted path silently NaN-poisoned the affected rows and the weighted path silently passed
them through un-demeaned).
- **FE-spanned regressors are now snapped to exact zero before the solver** (new
`diff_diff.utils.snap_absorbed_regressors`, wired at every demeaning consumer: DiD/MPD
`absorb=`, TWFE, SunAbraham, BaconDecomposition, WooldridgeDiD, and the DiD/MPD/TWFE
replicate-refit closures). A regressor spanned by the absorbed FEs (treated-group indicator with
the unit dimension absorbed, a period dummy with time absorbed, a unit-constant covariate, or an
additive `a_unit + b_time` combination) demeans to numerical junk that previously survived the
rank check via column equilibration and perturbed the identified coefficients - up to ~3e-3 on
ATT with a ~1e14-scale garbage coefficient reported for the spanned column in the joint-span
case. Detection is two-stage (fast relative-norm path at 1e-10, then an exact sparse-LSMR
span-membership confirmation for candidates masked by MAP truncation on unbalanced/correlated
panels); norms are `sqrt(w)`-weighted under WLS so zero-weight domain rows cannot mask spanning.
Spanned regressors are dropped deterministically (coefficient NaN) with a cause-specific
`UserWarning` naming them under `rank_deficient_action="warn"`.
**Point-estimate note**: designs carrying such spanned regressors (e.g. 2x2 DiD with
`absorb=[unit_dim, time_dim]`) see identified estimates shift by up to ~1e-5 relative to
v3.6.1 - the removal of the junk direction, documented in REGISTRY "Absorbed Fixed Effects";
estimates are now invariant (~1e-14) to the demeaning tolerance where they previously swung
at ~1e-5.

### Added
- **FE-absorption benchmark suite** (`benchmarks/speed_review/bench_fe_absorption.py`, scenarios
7-13 in `docs/performance-scenarios.md`): seven realistic workloads (county policy event study,
Expand Down
151 changes: 151 additions & 0 deletions benchmarks/speed_review/baselines/fe_absorption_after.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
{
"suite": "fe_absorption",
"platform": "macOS-26.5.1-arm64-arm-64bit-Mach-O",
"python": "3.14.4",
"versions": {
"diff_diff": "3.6.1",
"numpy": "2.5.0",
"pandas": "3.0.3"
},
"backend_requested": "python",
"repeats": 3,
"quick": false,
"tolerances": {
"att_atol": 1e-09,
"se_rtol": 1e-07,
"se_rtol_survey": 1e-06,
"gate_exempt": [
"tail_stress"
]
},
"results": [
{
"scenario": "county_policy",
"n_obs": 177289,
"checksum": 6315.071127371084,
"att": 0.2803785413990301,
"se": 0.031244864610215972,
"fit_median_s": 2.3882,
"fit_min_s": 2.3505,
"fit_max_s": 2.4948,
"fit_cv": 0.0257,
"noisy": false,
"n_timed_fits": 9,
"datagen_s": 0.004,
"peak_rss_mb": 3177.5,
"backend_resolved": "python",
"warnings": []
},
{
"scenario": "firm_churn",
"n_obs": 2400000,
"checksum": 240997.06223418942,
"att": 0.3144327184497798,
"se": 0.005837281701253927,
"fit_median_s": 49.502,
"fit_min_s": 48.4532,
"fit_max_s": 50.3118,
"fit_cv": 0.0189,
"noisy": false,
"n_timed_fits": 3,
"datagen_s": 0.07,
"peak_rss_mb": 15900.8,
"backend_resolved": "python",
"warnings": []
},
{
"scenario": "scanner_twfe",
"n_obs": 3255000,
"checksum": 76832.22571739106,
"att": 0.251164131495566,
"se": 0.002414938742160613,
"fit_median_s": 0.9834,
"fit_min_s": 0.9825,
"fit_max_s": 0.9845,
"fit_cv": 0.001,
"noisy": false,
"n_timed_fits": 3,
"datagen_s": 0.077,
"peak_rss_mb": 976.9,
"backend_resolved": "python",
"warnings": []
},
{
"scenario": "geo_experiment",
"n_obs": 5000000,
"checksum": 443696.3834000407,
"att": 0.25194372599916653,
"se": 0.0017886314723251327,
"fit_median_s": 0.973,
"fit_min_s": 0.968,
"fit_max_s": 0.9759,
"fit_cv": 0.0041,
"noisy": false,
"n_timed_fits": 3,
"datagen_s": 0.062,
"peak_rss_mb": 1664.8,
"backend_resolved": "python",
"warnings": [
"Rank-deficient design matrix: dropping 2 of 4 columns (column 1, column 2). Coefficients for these columns are",
"Regressor(s) ['treated', 'post'] are collinear with the absorbed fixed effects (absorb=['store', 'week']): the"
]
},
{
"scenario": "survey_absorb",
"n_obs": 500000,
"checksum": -176482.6014001448,
"att": 0.24554568144130481,
"se": 0.007662698712656388,
"fit_median_s": 4.0767,
"fit_min_s": 4.0736,
"fit_max_s": 4.0778,
"fit_cv": 0.0005,
"noisy": false,
"n_timed_fits": 3,
"datagen_s": 0.107,
"peak_rss_mb": 3589.2,
"backend_resolved": "python",
"warnings": [
"Rank-deficient design matrix: dropping 2 of 4 columns (column 1, column 2). Coefficients for these columns are",
"Regressor(s) ['treated', 'post'] are collinear with the absorbed fixed effects (absorb=['state', 'month']): th"
]
},
{
"scenario": "tail_stress",
"n_obs": 5000000,
"checksum": 376703.7248141762,
"att": 0.2502082560083505,
"se": 0.004392830594147583,
"fit_median_s": 31.7648,
"fit_min_s": 31.4957,
"fit_max_s": 31.8372,
"fit_cv": 0.0057,
"noisy": false,
"n_timed_fits": 3,
"datagen_s": 0.067,
"peak_rss_mb": 1840.7,
"backend_resolved": "python",
"warnings": [
"Rank-deficient design matrix: dropping 2 of 4 columns (column 1, column 2). Coefficients for these columns are",
"Regressor(s) ['treated', 'post'] are collinear with the absorbed fixed effects (absorb=['store', 'week']): the"
]
},
{
"scenario": "guard_small",
"n_obs": 20000,
"checksum": -571.6190664591859,
"att": 0.2515087011451766,
"se": 0.028726073333820948,
"fit_median_s": 0.0035,
"fit_min_s": 0.0034,
"fit_max_s": 0.0039,
"fit_cv": 0.0437,
"noisy": false,
"n_timed_fits": 21,
"datagen_s": 0.001,
"peak_rss_mb": 145.1,
"backend_resolved": "python",
"warnings": []
}
]
}
27 changes: 25 additions & 2 deletions benchmarks/speed_review/bench_fe_absorption.py
Original file line number Diff line number Diff line change
Expand Up @@ -189,9 +189,15 @@ def _measure(scenario, repeats, backend, quick):


# --------------------------------------------------------------- identity gate
def check_estimates(results, before_path):
def check_estimates(results, before_path, allow_shift=()):
"""Compare ATT/SE per scenario against a prior baseline JSON. Returns the
number of failures (0 = pass). tail_stress reports its delta, ungated.

allow_shift: scenario ids whose estimates are EXPECTED to move this run
because of a documented estimate-affecting fix (the reason must be in the
PR/CHANGELOG). Their deltas are printed loudly but do not fail the gate -
this is a per-run flag, never a permanent exemption, so future runs
re-gate them against the refreshed baseline.
"""
before = {r["scenario"]: r for r in json.loads(Path(before_path).read_text())["results"]}
failures = 0
Expand All @@ -210,6 +216,12 @@ def check_estimates(results, before_path):
f"|d att|={d_att:.3e} rel d se={rel_se:.3e}"
)
continue
if scen in allow_shift:
print(
f" identity {scen:16s} ALLOWED SHIFT (documented fix): "
f"|d att|={d_att:.3e} rel d se={rel_se:.3e}"
)
continue
ok = d_att <= ATT_ATOL and rel_se <= se_rtol
status = "ok" if ok else "FAIL"
print(
Expand Down Expand Up @@ -239,6 +251,14 @@ def main():
metavar="BEFORE_JSON",
help="compare ATT/SE against a prior baseline at the unified tolerances",
)
ap.add_argument(
"--allow-shift",
metavar="SCEN[,SCEN...]",
default="",
help="comma-separated scenarios whose estimates are expected to move "
"this run due to a documented estimate-affecting fix (printed loudly, "
"not failed; per-run only - never a permanent exemption)",
)
args = ap.parse_args()

if args.worker:
Expand Down Expand Up @@ -301,7 +321,10 @@ def main():
exit_code = 0
if args.check_estimates:
print("\nidentity gate vs", args.check_estimates)
failures = check_estimates([r for r in results if "error" not in r], args.check_estimates)
allow = tuple(s for s in args.allow_shift.split(",") if s)
failures = check_estimates(
[r for r in results if "error" not in r], args.check_estimates, allow
)
if failures:
print(f"IDENTITY GATE FAILED: {failures} scenario(s) moved")
exit_code = 1
Expand Down
15 changes: 15 additions & 0 deletions diff_diff/bacon.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
import pandas as pd

from diff_diff.results import _format_survey_block
from diff_diff.utils import pre_demean_norms, snap_absorbed_regressors
from diff_diff.utils import within_transform as _within_transform_util


Expand Down Expand Up @@ -795,6 +796,7 @@ def _compute_twfe(
) -> float:
"""Compute TWFE estimate using within-transformation."""
# Apply two-way within transformation (weighted if survey weights provided)
_pre_norms = pre_demean_norms(df, [treat_col], weights=weights)
df_dm = _within_transform_util(
df,
[outcome, treat_col],
Expand All @@ -803,6 +805,19 @@ def _compute_twfe(
suffix="_within",
weights=weights,
)
# Snap an FE-spanned treatment to exact zero: the d_var == 0 guard
# below then returns its deterministic 0.0 (with the cause warning)
# instead of an arbitrary junk/junk division.
snap_absorbed_regressors(
df_dm,
[treat_col],
_pre_norms,
absorbed_desc=f"unit '{unit}' and time '{time}' fixed effects",
group_vars=[unit, time],
suffix="_within",
display_names={treat_col: "treatment"},
weights=weights,
)

# Extract within-transformed values
y_within = df_dm[f"{outcome}_within"].values
Expand Down
Loading
Loading