From f155783dfc4b57efc41d310bdfce23aea4ec939e Mon Sep 17 00:00:00 2001
From: Raymond Yee <raymond.yee@gmail.com>
Date: Fri, 19 Jun 2026 21:51:40 -0700
Subject: [PATCH 01/14] docs: plan of record for #305 facet-count coherence

Joint Claude+Codex plan for computing multi-filter facet counts from the
#299 bitmask index (avoiding the 39M-row membership scan). Covers the
masks histogram approach, native DuckDB benchmarks, source handling, the
#306 missing-pid prerequisite, the honesty rule (never baseline under
active filters), a 4-phase rollout, and risks.

Refs #305, #304, #306, #276.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 PLAN_305_facet_counts.md | 128 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 128 insertions(+)
 create mode 100644 PLAN_305_facet_counts.md
diff --git a/PLAN_305_facet_counts.md b/PLAN_305_facet_counts.md
new file mode 100644
index 0000000..22f4c7a
--- /dev/null
+++ b/PLAN_305_facet_counts.md
@@ -0,0 +1,128 @@
+# Plan — #305 Facet-count correctness & coherence (counts beyond the single-filter cube)
+
+**Status:** plan of record · **Authors:** Claude + Codex (joint, 2026-06-19) · **Tracks:** #305 (meta), #304 (live bug), #306 (data bug), #276 (semantics), #230, #286
+
+## Problem
+
+At the global/zoomed-out view, facet **counts** only cross-filter for a **single** active
+filter (served by the precomputed `facet_tree_cross_filter` cube via `effectiveSingleFilter()`,
+`explorer.qmd:3374`). With **2+ active selections** the cube is skipped and tree-dim counts
+revert to the **unfiltered baseline** (`explorer.qmd:3557-3562`) — deliberately, to avoid a
+~39M-row `sample_facet_membership` GROUP BY that stalls DuckDB-WASM. So the **map is correctly
+filtered** (#299 bitmask) but the **count numbers are wrong** for multi-filter at world view.
+Zoomed-in is already correct (live membership path, `explorer.qmd:3564-3598`). Reported by
+Eric in #304 (e.g. *Material: Anthropogenic* + *Specimen Type: Artifact*).
+
+## Core idea — count per node FROM the bitmasks (no membership scan)
+
+For a count dimension D over a qualifying pid set Q:
+
+```sql
+SELECT nb.concept_uri AS value, COUNT(*) AS count
+FROM <complete_facet_index> s
+JOIN read_parquet('${node_bits_url}') nb ON nb.facet_type = '<D>'
+  AND (s.<D>_mask & (1::BIGINT << nb.bit_index)) <> 0
+WHERE <AND of the OTHER active dims' mask predicates> <source> <viewport> <search>
+GROUP BY nb.concept_uri
+```
+
+- Masks are **one row per pid** → `COUNT(*)` == `COUNT(DISTINCT pid)` (do **not** add `COUNT(DISTINCT)` — wasted work).
+- Membership encodes **ancestor closure** → child bit ⇒ parent bit, so **parent ≥ child** holds for free.
+- The bitwise join is against `facet_node_bits` (≤63 rows/dim), not the 39M-row membership table.
+
+**Cross-filter semantics (confirmed correct):** OR among selected nodes *within* a dimension;
+AND *across* dimensions; **omit the target dimension's own predicate** (exclude-self), matching
+the UI contract (`explorer.qmd:3305`) and the existing cube (`build_frontend_derived.py:460`).
+Special case: **zero selected sources** ⇒ impossible filter for tree targets, but the source
+dimension's own histogram must still exclude source and stay constrained by the three trees.
+
+## Feasibility (Codex native DuckDB benchmarks, real 6M-pid data)
+
+| Query | Time |
+|---|---|
+| One unfiltered 22-node histogram | ~0.17s |
+| All three tree dims via one 56-node join | ~0.69s |
+| Eric's Material+Specimen-Type → context histogram | ~0.02s |
+| Embedded source predicate/group | ~0.014s |
+| `facets_v3` semi-join (rejected approach) | ~0.145s |
+
+Strongly supports feasibility. **Worst case ≈ 6M × 56 = ~336M cheap bit-tests**, not merely a
+6M-row scan → **WASM cold+warm benchmarking remains a hard release gate** (the app has documented
+DuckDB-WASM concurrency/deadlock sensitivity).
+
+## Source handling
+
+Add `source` as a plain **VARCHAR** column to a **new complete per-pid index** (source is
+exclusive, not multi-valued → a mask is wrong; a `facets_v3` semi-join adds a second 6M-row
+read + hash join). New schema:
+
+```
+pid · source · material_mask · context_mask · object_type_mask · build_id · schema_version
+```
+
+## Data-correctness prerequisite (#306)
+
+The current `sample_facet_masks` is built from `membership`, so it **omits ~29,917 located
+samples** with coordinates but no tree membership (`facets_v3` = 6,026,242 vs masks = 5,996,325).
+The new index must start from **`samp_geo`**, LEFT JOIN masks, and emit **zero masks** for
+no-membership pids. Tracked as **#306** (must land in Phase 1).
+
+## Honesty rule (non-negotiable)
+
+**Never fall back to the unfiltered baseline when filters are active** — that is exactly the
+#304 bug. `applyFacetCounts(..., null)` means baseline (`explorer.qmd:1381`). On any
+miss/error under active filters, show **"—" / "unavailable" / stale styling / hidden counts** —
+never authoritative-looking baseline numbers. Apply all dimension maps **atomically after one
+stale-guard check** (`facetCountsReqId`); never partially repaint dimensions.
+
+## Phased plan
+
+### Phase 1 — Artifact + semantic foundation
+- Build the complete per-pid index above. Start from `samp_geo`; include zero-mask pids (#306).
+- Add `source` VARCHAR; extend the build **fingerprint** to cover the pid/source universe
+  (today it fingerprints membership only → would go stale silently once source/coverage change).
+- AI-free validator: pid completeness vs `facets_v3`; source equality; node-bit coverage;
+  mask ≡ membership for pids that have membership; build_id consistency.
+- **Publish under a new immutable filename/tag — do NOT overwrite the cached `202608` artifact.**
+- Ratify the count contract (#276) as **membership / "anywhere in tree"** — already the shipped
+  semantics, so this is confirmation, not a redesign.
+
+### Phase 2 — Fix #304 at global view
+- Generalized **direct-control filter snapshot** (today's `describeCrossFilters()` zeroes tree
+  selections at global view — the root cause). Generalize `effectiveSingleFilter()` to all selections.
+- Route **multi-filter / full-tree-mode / global / no-search** counts through one mask-index
+  histogram query (the `tree_counts ∪ source_counts` shape).
+- Keep the single-filter cube as a **pure optimization**, not a required semantic path.
+- Apply dimension maps atomically after the stale check. On failure → unavailable, **not** baseline.
+- **Benchmark cold + warm DuckDB-WASM** across empty / narrow / broad / all-pid qualifying sets;
+  set a concrete interaction budget before release.
+
+### Phase 3 — Extend (not premature unification)
+- Add **viewport** support (identical padded bbox, `VIEWPORT_PAD_FACTOR = 0.3`).
+- Add **search** via `search_pids`.
+- Characterization-test the new path against the existing membership path; **then** retire the
+  membership count implementation. (Immediate unification would balloon the first fix into
+  viewport + search + flat-mode + stale-update refactoring — defer.)
+
+### Phase 4 — Verification + cleanup
+- Unit-test predicate generation (exclude-self, zero-source states).
+- Data validator gates from Phase 1, run in CI.
+- Playwright: Eric's exact URL, multi-value-in-one-dim, source+tree, impossible combos,
+  global↔viewport transitions, superseded requests.
+- Remove the misleading active-filter baseline fallback everywhere.
+- Codex adversarial review; staging verify on real R2 data; deploy smoke gate.
+
+## Constraints / landmines
+- **63-bit ceiling:** current dims are 19 / 22 / 15 nodes — ample headroom. Keep the build
+  hard-fail; design a future multiword schema but **do not implement now**.
+- **Flat / mixed render mode:** keep the mask path gated OFF (subtree-membership semantics are
+  wrong for a flat dim) — defer to existing paths, as the cube already does.
+- **Immutable data:** new tag/filename; never overwrite a cached artifact.
+
+## Top 3 risks
+1. **WASM latency / connection starvation** — credible SQL, unmeasured in-browser; ~336M bit-tests worst case.
+2. **Stale / incomplete index generations** — #306 missing pids + membership-only build_id; mixed cached generations → convincing-but-false counts.
+3. **Honesty & state-transition failures** — baseline-as-truth fallback + async camera transitions unless results applied atomically.
+
+---
+*Joint plan: Claude (code trace, drafting) + Codex (native benchmarks, source design, #306 discovery, honesty rule, phasing). 2026-06-19.*

From 473ec29eebce5e62e375ee6b99ddb2291908a5b7 Mon Sep 17 00:00:00 2001
From: Raymond Yee <raymond.yee@gmail.com>
Date: Fri, 19 Jun 2026 22:08:09 -0700
Subject: [PATCH 02/14] #306/#305: complete per-pid facet index
 (sample_facet_index)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Build sample_facet_index: one row per LOCATED sample (samp_geo), not only
those with tree membership. sample_facet_masks is built FROM membership and
silently omits ~29,917 located samples with no tree concept (#306); the
multi-filter global-view count path (#304/#305) must count over the whole
located universe, so it scans this complete index instead.

- start from samp_geo, LEFT JOIN membership-derived masks, zero-mask the
  no-membership pids (correct: in no subtree, but still located + still
  counts toward source/total)
- add `source` VARCHAR (exclusive, not multi-valued -> not a mask)
- build_id = "<membership_id>:<coverage_id>": membership half matches
  facet_node_bits (mask-bit interpretation gate); coverage half fingerprints
  the samp_geo (pid, source) universe so #306-class drift can't go stale
  silently (today's membership-only id is blind to it)
- schema_version column for forward-compat

Validator: --index gate asserts schema, one-row-per-pid, pid==facets_v2
(located universe completeness, #306), source==facets_v2, structured single
build_id matching node_bits, masks bit-identical to sample_facet_masks for
shared pids, and zero-mask for every no-membership pid.

Tests: a located no-membership sample (root-only material) — present + zero
masked in the index, absent from masks; coverage-id changes on source flip;
all corruption gates bite. 29/29 pass.

Docs: SERIALIZATIONS.md §4.12 ratifies the #276 count contract (membership /
"anywhere in tree") served by this index.

Refs #305 #306 #304 #276
---
 SERIALIZATIONS.md                    |  37 ++++++
 scripts/build_frontend_derived.py    |  92 +++++++++++++-
 scripts/validate_frontend_derived.py |  86 +++++++++++++
 tests/test_frontend_derived.py       | 174 +++++++++++++++++++++++++++
 4 files changed, 386 insertions(+), 3 deletions(-)

diff --git a/SERIALIZATIONS.md b/SERIALIZATIONS.md
index b5a9be5..bc9e0d1 100644
--- a/SERIALIZATIONS.md
+++ b/SERIALIZATIONS.md
@@ -122,6 +122,7 @@ builder — a fresh build is NOT bit-for-bit identical to them (see
 | `isamples_202601_sample_facets_v2.parquet` | `(pid, source, material, context, object_type, label, description, place_name)` — all VARCHAR scalars; each facet column is a single URI per sample (not an array) | 63 MB | 6.0 M | wide | Search Explorer multi-dim facet filtering | QUERY_SPEC §3.3, §5.1 |
 | `isamples_202601_facet_summaries.parquet` | Baseline `(facet_type, facet_value, scheme, count)` | 2 KB | 56 | wide | Every tutorial (instant initial facet counts) | QUERY_SPEC §3.3 tier 1 |
 | `isamples_202601_facet_cross_filter.parquet` | Pre-computed counts for single-filter cross-facet queries | 6 KB | 526 | wide | Search Explorer cross-filter UI | QUERY_SPEC §3.3 tier 2a |
+| `<tag>_sample_facet_index.parquet` | Complete per-pid facet index `(pid, source, material_mask, context_mask, object_type_mask, build_id, schema_version)` — **one row per located sample**, including samples with no tree membership (zero-masked, #306). Scanned by the multi-filter global-view count path (#304/#305). | ~60 MB | 6.0 M | wide (membership + samp_geo) | Interactive Explorer multi-filter facet counts | §4.12 below |
 
 ### Tier: vocabulary labels
 
@@ -296,6 +297,42 @@ for the alias when you want "latest."
 - **Consumer**: `scripts/enrich_wide_with_oc_thumbnails.py` uses OC narrow to fill thumbnails into 202604 unified wide. Also used directly in PQG benchmark work.
 - **Future**: these become the prototype upstream for per-source sidecars (see §3, bottom row).
 
+### 4.12 `<tag>_sample_facet_index.parquet` (complete per-pid facet index, #305/#306)
+
+- **Role**: The single artifact the **multi-filter** global-view count path scans
+  (#304/#305). Where `sample_facet_masks` is built FROM `membership` and therefore
+  silently omits located samples that carry no tree concept (~29,917 in the 202608
+  generation — **#306**), this index starts from `samp_geo` (the authoritative
+  located set) and is **complete**: one row per located `pid`. Samples with no
+  membership are present and **zero-masked**.
+- **Headline schema** (7 cols): `pid (VARCHAR), source (VARCHAR), material_mask
+  (BIGINT), context_mask (BIGINT), object_type_mask (BIGINT), build_id (VARCHAR),
+  schema_version (INTEGER)`. `source` is a plain VARCHAR (source is exclusive, not
+  multi-valued, so a mask would be wrong). Each `*_mask` bit `(1 << bit_index)` is
+  set iff the pid is a member of that node, under the bit assignment in
+  `facet_node_bits`; membership encodes ancestor closure, so a parent bit is set for
+  the whole subtree.
+- **`build_id` is structured** as `"<membership_id>:<coverage_id>"`:
+  - the **membership half** equals `facet_node_bits.build_id` — the explorer must
+    only interpret the mask bits when the two agree (else the bits are read under a
+    foreign assignment);
+  - the **coverage half** is a fingerprint of the `(pid, source)` universe over
+    `samp_geo`, so a changed source value or located-pid set (the #306 drift class)
+    changes `build_id` instead of going stale silently.
+- **Count contract (ratifies #276)**: a facet count is **membership / "anywhere in
+  the tree"** — a sample counts toward a node if that node is anywhere in the
+  sample's asserted-concept subtree closure (OR within a dimension, AND across
+  dimensions, **excluding the target dimension's own predicate**). This is the
+  already-shipped cube/UI semantics; this index does not change it, it serves the
+  same contract to the *multi-filter* case the single-filter cube cannot reach.
+- **Validation**: `validate_frontend_derived.py --index <file>` asserts: schema,
+  one row per pid, `pid` set == `sample_facets_v2.pid` (the located universe),
+  `source` == facets_v2 source, single structured `build_id` whose membership half
+  matches `node_bits`, masks bit-identical to `sample_facet_masks` for shared pids,
+  and that every no-membership pid is zero-masked.
+- **Immutability**: published under a **new** versioned filename — it is a new
+  artifact name and never overwrites a cached `sample_facet_masks` or any prior tag.
+
 ## 5. URL convention
 
 All substrate files live under `https://data.isamples.org/` — a
diff --git a/scripts/build_frontend_derived.py b/scripts/build_frontend_derived.py
index d92cff0..661a78e 100755
--- a/scripts/build_frontend_derived.py
+++ b/scripts/build_frontend_derived.py
@@ -19,6 +19,7 @@
   - {tag}_facet_summaries.parquet    facet_type, facet_value, scheme, count
   - {tag}_facet_cross_filter.parquet filter_source/material/context/object_type, facet_type, facet_value, count
   - {tag}_wide_h3.parquet            wide + h3_res4/6/8  (large; built only on --only wide_h3)
+  - {tag}_sample_facet_index.parquet pid, source, material_mask, context_mask, object_type_mask(BIGINT), build_id, schema_version(INT) — COMPLETE per-pid index over ALL located samples (incl. #306 no-membership pids, zero-masked); the multi-filter global-view count path (#304/#305) scans this
   - {tag}_manifest.json              provenance + per-output rowcount/schema/sha256
 
 MATERIAL SELECTION (issue #265/#271): the broad SKOS root
@@ -64,11 +65,16 @@
 ARTIFACTS = ["sample_facets_v2", "samples_map_lite", "h3_summaries",
              "facet_summaries", "facet_cross_filter", "wide_h3",
              "sample_facet_membership", "facet_tree_summaries",
-             "facet_tree_cross_filter", "facet_node_bits", "sample_facet_masks"]
+             "facet_tree_cross_filter", "facet_node_bits", "sample_facet_masks",
+             "sample_facet_index"]
 # #293: max tree nodes per dim that fit in a signed BIGINT mask (bits 0..62).
 # Live max is 22 (context); guard so a future vocab explosion fails loudly
 # instead of silently overflowing a mask bit.
 MASK_MAX_BITS = 63
+# #305/#306: schema version for the complete per-pid facet index. Bump when the
+# column set / semantics of sample_facet_index change so the explorer can refuse
+# an index it doesn't understand instead of mis-reading it.
+INDEX_SCHEMA_VERSION = 1
 
 # Shared SQL expression for sample_facets_v2.description (#277 part 2).
 # Appends space-joined concept labels (IC labels across all 4 concept dims)
@@ -508,6 +514,35 @@ def membership_build_id(con):
         FROM membership""").fetchone()[0]
 
 
+def coverage_build_id(con):
+    # #305/#306: fingerprint of the COMPLETE per-pid universe the index covers —
+    # the (pid, source) pairs over the LOCATED set (samp_geo), independent of tree
+    # membership. membership_build_id() alone is blind to this: an index built from
+    # the SAME membership but a changed source value or a changed located-pid set
+    # (exactly the #306 class of drift — located pids with no membership) would
+    # carry an identical membership id and go stale SILENTLY. Order-independent XOR
+    # of per-row hashes; samp_geo.pid is unique (hard-checked in build_base_tables),
+    # so no XOR cancellation. COALESCE(source,'') so a NULL→'' source flip is caught.
+    return con.sql("""
+        SELECT CAST(COALESCE(bit_xor(hash(pid || chr(31) || COALESCE(source, ''))), 0)
+                    AS VARCHAR)
+        FROM samp_geo""").fetchone()[0]
+
+
+def index_build_id(con):
+    # #305/#306: the index's generation identity = "<membership_id>:<coverage_id>".
+    # The two halves are deliberately separable so the explorer can BOTH:
+    #   1. gate the mask-bit interpretation by matching the membership half against
+    #      facet_node_bits.build_id (the bit assignment is a pure function of
+    #      membership — the index masks are only meaningful under the SAME node_bits
+    #      generation), and
+    #   2. detect coverage/source drift via the coverage half (would otherwise be
+    #      invisible to a membership-only id).
+    # ':' never appears in either half (both are decimal hash strings), so a single
+    # split is unambiguous. node_bits/masks keep their plain membership id.
+    return f"{membership_build_id(con)}:{coverage_build_id(con)}"
+
+
 def build_facet_node_bits(con, out, build_id):
     # #293: authoritative concept_uri -> bit_index assignment per tree dim. The
     # explorer loads this to turn a node selection into a bitmask and filter
@@ -559,6 +594,51 @@ def build_sample_facet_masks(con, out, build_id):
     ) TO '{out}' (FORMAT PARQUET, COMPRESSION ZSTD)""")
 
 
+def build_sample_facet_index(con, out, build_id):
+    # #305/#306: the COMPLETE per-pid facet index — one row for EVERY located
+    # sample (samp_geo), not only those with tree membership. This is the artifact
+    # the multi-filter global-view count path (#304) scans: it must be able to count
+    # over the whole located universe, including samples that carry NO tree concept.
+    #
+    # WHY this exists separately from sample_facet_masks: masks is built FROM
+    # `membership`, so it silently OMITS located samples with no membership row
+    # (~29,917 in the 202608 generation — #306). Counting/ filtering off masks alone
+    # undercounts the located universe. Here we start from samp_geo (the authoritative
+    # located set), LEFT JOIN the same membership-derived masks, and emit a ZERO mask
+    # for no-membership pids (a zero mask matches no node bit → contributes 0 to every
+    # tree facet, which is correct: the sample is in no subtree — but it IS still a
+    # located sample and still counts toward `source` and the located total).
+    #
+    # `source` is a plain VARCHAR (source is exclusive, not multi-valued — a mask
+    # would be wrong, a facets_v3 semi-join would add a second 6M-row read). The
+    # mask columns are bit-identical to sample_facet_masks for pids that have
+    # membership (validated). bit assignment == build_facet_node_bits / masks.
+    con.execute(f"""COPY (
+        WITH nb AS (
+          SELECT facet_type, concept_uri,
+                 (1::BIGINT << (ROW_NUMBER() OVER (PARTITION BY facet_type ORDER BY concept_uri) - 1)) AS bitval
+          FROM (SELECT DISTINCT facet_type, concept_uri FROM membership)
+        ),
+        masks AS (
+          SELECT m.pid,
+            COALESCE(bit_or(CASE WHEN m.facet_type='material'    THEN nb.bitval END), 0)::BIGINT AS material_mask,
+            COALESCE(bit_or(CASE WHEN m.facet_type='context'     THEN nb.bitval END), 0)::BIGINT AS context_mask,
+            COALESCE(bit_or(CASE WHEN m.facet_type='object_type' THEN nb.bitval END), 0)::BIGINT AS object_type_mask
+          FROM membership m JOIN nb ON nb.facet_type=m.facet_type AND nb.concept_uri=m.concept_uri
+          GROUP BY m.pid
+        )
+        SELECT sg.pid,
+               sg.source::VARCHAR                       AS source,
+               COALESCE(mk.material_mask, 0)::BIGINT     AS material_mask,
+               COALESCE(mk.context_mask, 0)::BIGINT      AS context_mask,
+               COALESCE(mk.object_type_mask, 0)::BIGINT  AS object_type_mask,
+               '{build_id}' AS build_id,
+               {INDEX_SCHEMA_VERSION}::INTEGER AS schema_version
+        FROM samp_geo sg LEFT JOIN masks mk ON mk.pid = sg.pid
+        ORDER BY sg.pid
+    ) TO '{out}' (FORMAT PARQUET, COMPRESSION ZSTD)""")
+
+
 def file_meta(con, path):
     n = con.sql(f"SELECT COUNT(*) FROM read_parquet('{path}')").fetchone()[0]
     schema = [(r[0], r[1]) for r in con.sql(f"DESCRIBE SELECT * FROM read_parquet('{path}')").fetchall()]
@@ -623,7 +703,8 @@ def emit(name, fn):
 
     # Hierarchy artifacts (#281/#282) — need vocab_labels for the SKOS tree.
     HIER_ARTIFACTS = {"sample_facet_membership", "facet_tree_summaries",
-                      "facet_tree_cross_filter", "facet_node_bits", "sample_facet_masks"}
+                      "facet_tree_cross_filter", "facet_node_bits", "sample_facet_masks",
+                      "sample_facet_index"}
     if any(want(a) for a in HIER_ARTIFACTS):
         if not args.vocab_labels:
             # Fail loud if the user EXPLICITLY asked for a hierarchy artifact
@@ -641,10 +722,15 @@ def emit(name, fn):
             # #293 bitmask filter artifacts — needs membership (above). node_bits
             # and masks share a node-set build_id so the explorer only uses the mask
             # path when the two are from the same generation (Codex P1).
-            if want("facet_node_bits") or want("sample_facet_masks"):
+            if want("facet_node_bits") or want("sample_facet_masks") or want("sample_facet_index"):
                 _bid = membership_build_id(con)
                 emit("facet_node_bits", lambda o: build_facet_node_bits(con, o, _bid))
                 emit("sample_facet_masks", lambda o: build_sample_facet_masks(con, o, _bid))
+                # #305/#306: complete per-pid index (every located pid + source,
+                # zero-mask for no-membership pids). Its build_id embeds the SAME
+                # membership id as node_bits (mask-bit interpretation gate) PLUS a
+                # coverage id over samp_geo's (pid, source) universe (staleness gate).
+                emit("sample_facet_index", lambda o: build_sample_facet_index(con, o, index_build_id(con)))
 
     if not args.no_manifest:
         log("hashing inputs/outputs for manifest…", t0)
diff --git a/scripts/validate_frontend_derived.py b/scripts/validate_frontend_derived.py
index debd8e4..17d3904 100755
--- a/scripts/validate_frontend_derived.py
+++ b/scripts/validate_frontend_derived.py
@@ -13,6 +13,8 @@
   python scripts/validate_frontend_derived.py \
       --facets URL --map-lite URL --summaries URL --cross-filter URL \
       --h3 URL4 URL6 URL8
+  # hierarchy + #305/#306 complete index (auto-discovered with --dir/--tag, or):
+  python scripts/validate_frontend_derived.py --dir DIR --tag TAG --index INDEX.parquet
 """
 import argparse, hashlib, json, os, sys
 import duckdb
@@ -55,6 +57,7 @@ def main():
     ap.add_argument("--tree-cross-filter", help="facet_tree_cross_filter parquet (#290/#293); optional")
     ap.add_argument("--node-bits", help="facet_node_bits parquet (#293); optional")
     ap.add_argument("--masks", help="sample_facet_masks parquet (#293); optional")
+    ap.add_argument("--index", help="sample_facet_index parquet (#305/#306); optional")
     ap.add_argument("--wide", help="source wide parquet — enables the SEMANTIC gate "
                     "(re-derive and diff the written files against a fresh build)")
     ap.add_argument("--min-rows", type=int, default=1_000_000,
@@ -300,6 +303,7 @@ def _opt(name, attr):
     treexf = _opt("facet_tree_cross_filter", "tree_cross_filter")
     nodebits = _opt("facet_node_bits", "node_bits")
     masks = _opt("sample_facet_masks", "masks")
+    index = _opt("sample_facet_index", "index")
     if tree:
         T = f"read_parquet('{tree}')"
         # parent ≥ child for every edge, every dim (distinct-pid UNION semantics —
@@ -477,6 +481,88 @@ def _opt(name, attr):
                          + (SELECT COUNT(*) FROM (SELECT * FROM b EXCEPT SELECT * FROM a))""")
                 check(f"masks filter == membership ({dim})", d == 0, f"{d} pids differ for a real {dim} node")
 
+    # --- #305/#306 complete per-pid facet index (sample_facet_index) ---
+    # The multi-filter global-view count path scans THIS file. It must cover the
+    # ENTIRE located universe (facets_v2 = samp_geo), carry the correct source per
+    # pid, zero-mask the no-membership pids (#306), and be bit-identical to
+    # sample_facet_masks for pids that DO have membership — otherwise counts drift.
+    if index:
+        IX = f"read_parquet('{index}')"
+        # schema/types contract (the explorer reads these columns positionally-ish)
+        ix_sch = [(r[0], r[1]) for r in con.sql(f"DESCRIBE SELECT * FROM {IX}").fetchall()]
+        EXP_IX = [("pid", "VARCHAR"), ("source", "VARCHAR"),
+                  ("material_mask", "BIGINT"), ("context_mask", "BIGINT"),
+                  ("object_type_mask", "BIGINT"), ("build_id", "VARCHAR"),
+                  ("schema_version", "INTEGER")]
+        check("index schema matches contract", ix_sch == EXP_IX, f"got {ix_sch}")
+        # one row per pid
+        ixdup = scalar(f"SELECT COUNT(*) FROM (SELECT pid FROM {IX} GROUP BY pid HAVING COUNT(*)>1)")
+        check("index: one row per pid", ixdup == 0, f"{ixdup} duplicate pids in index")
+        # COMPLETENESS (#306): index pid SET == facets_v2 pid SET (the located universe).
+        # This is the whole point — masks omits no-membership located pids; the index
+        # must not. A symmetric set-diff catches both missing and extra pids.
+        ix_diff = scalar(f"""SELECT
+            (SELECT COUNT(*) FROM (SELECT pid FROM {F} EXCEPT SELECT pid FROM {IX})) +
+            (SELECT COUNT(*) FROM (SELECT pid FROM {IX} EXCEPT SELECT pid FROM {F}))""")
+        check("index.pid == facets_v2.pid (complete located universe, #306)", ix_diff == 0,
+              f"{ix_diff} pids differ between index and facets_v2")
+        # SOURCE equality: index.source must equal facets_v2.source for every pid
+        # (IS NOT DISTINCT FROM so NULL==NULL). A source-value drift is exactly what
+        # the coverage half of build_id is meant to catch — assert it directly too.
+        src_bad = scalar(f"""SELECT COUNT(*) FROM {IX} i JOIN {F} f ON i.pid=f.pid
+            WHERE i.source IS DISTINCT FROM f.source""")
+        check("index.source == facets_v2.source", src_bad == 0, f"{src_bad} pids with mismatched source")
+        # single build_id, structured "<membership_id>:<coverage_id>"
+        ix_bids = scalar(f"SELECT COUNT(DISTINCT build_id) FROM {IX}")
+        check("index: single build_id", ix_bids == 1, f"{ix_bids} distinct build_ids (want 1)")
+        ix_bid_fmt = scalar(f"SELECT COUNT(*) FROM {IX} WHERE build_id NOT LIKE '%:%'")
+        check("index: build_id is '<membership>:<coverage>'", ix_bid_fmt == 0,
+              "build_id(s) missing the ':' coverage separator")
+        # single schema_version (sanity — the explorer keys behavior off it)
+        ix_sv = scalar(f"SELECT COUNT(DISTINCT schema_version) FROM {IX}")
+        check("index: single schema_version", ix_sv == 1, f"{ix_sv} distinct schema_versions")
+        if nodebits and ix_bids == 1:
+            # the membership half of the index build_id MUST equal node_bits.build_id
+            # so the mask bits are interpreted under the SAME assignment (else counts
+            # are computed against a stale/foreign bit layout — silently wrong).
+            nb_bid = scalar(f"SELECT COUNT(DISTINCT build_id) FROM read_parquet('{nodebits}')")
+            if nb_bid == 1:
+                membership_half_matches = scalar(
+                    f"SELECT (SELECT split_part(MIN(build_id), ':', 1) FROM {IX}) "
+                    f"= (SELECT MIN(build_id) FROM read_parquet('{nodebits}'))")
+                check("index build_id membership-half == node_bits build_id",
+                      bool(membership_half_matches),
+                      "index masks would be read under a different bit assignment than node_bits")
+        if masks:
+            MK = f"read_parquet('{masks}')"
+            # For pids that ALSO appear in masks (i.e. have membership), the three
+            # mask columns must be byte-identical — the index is a superset, not a
+            # re-computation that could drift.
+            mk_diff = scalar(f"""SELECT COUNT(*) FROM {IX} i JOIN {MK} m ON i.pid=m.pid
+                WHERE i.material_mask    IS DISTINCT FROM m.material_mask
+                   OR i.context_mask     IS DISTINCT FROM m.context_mask
+                   OR i.object_type_mask IS DISTINCT FROM m.object_type_mask""")
+            check("index masks == sample_facet_masks (shared pids)", mk_diff == 0,
+                  f"{mk_diff} shared pids with differing masks")
+            # The ONLY pids allowed to differ between the two files are the
+            # no-membership located pids (#306): present in index, absent from masks,
+            # and they MUST be zero-masked. Verify (a) the extra pids are exactly the
+            # zero-mask ones and (b) none are in masks.
+            extra_nonzero = scalar(f"""SELECT COUNT(*) FROM {IX} i
+                WHERE NOT EXISTS (SELECT 1 FROM {MK} m WHERE m.pid=i.pid)
+                  AND (i.material_mask <> 0 OR i.context_mask <> 0 OR i.object_type_mask <> 0)""")
+            check("index: no-membership extra pids are zero-masked (#306)", extra_nonzero == 0,
+                  f"{extra_nonzero} index-only pids carry a non-zero mask (should be 0)")
+        elif mem:
+            # No masks file to diff against, but we can still assert the #306 invariant
+            # directly off membership: every index pid NOT in membership is zero-masked.
+            M = f"read_parquet('{mem}')"
+            extra_nonzero = scalar(f"""SELECT COUNT(*) FROM {IX} i
+                WHERE NOT EXISTS (SELECT 1 FROM {M} m WHERE m.pid=i.pid)
+                  AND (i.material_mask <> 0 OR i.context_mask <> 0 OR i.object_type_mask <> 0)""")
+            check("index: no-membership pids zero-masked (#306, vs membership)", extra_nonzero == 0,
+                  f"{extra_nonzero} no-membership pids carry a non-zero mask")
+
     print(f"\n{'CHECK':<44} {'RESULT':<6} DETAIL\n" + "-" * 90)
     ok = True
     for name, passed, detail in R:
diff --git a/tests/test_frontend_derived.py b/tests/test_frontend_derived.py
index 94e05bc..73b5701 100644
--- a/tests/test_frontend_derived.py
+++ b/tests/test_frontend_derived.py
@@ -519,6 +519,180 @@ def test_facet_masks_only_builds(tmp_path):
     assert (tmp_path / "t_sample_facet_masks.parquet").exists()
 
 
+# ---------------------------------------------------------------------------
+# sample_facet_index — COMPLETE per-pid index (#305/#306)
+# Reuses the tree vocab but adds a LOCATED sample with NO tree membership
+# (root-only material, null context/object) — the exact #306 class that
+# sample_facet_masks silently omits. The index must include it, zero-masked.
+# ---------------------------------------------------------------------------
+# (pid, source, material_rids, context_rids, object_rids)
+INDEX_SAMPLES = [
+    ("s1", "A", [2], [11], [21]),    # mineral, earthinterior, othersolidobject (full membership)
+    ("s2", "A", [3], [11], [21]),    # rock,    earthinterior, othersolidobject (full membership)
+    ("s3", "B", [2], [11], [21]),    # mineral, earthinterior, othersolidobject (full membership)
+    ("s-nomem", "C", [1], None, None),  # material ROOT-only -> NULL material -> NO membership; located
+]
+
+
+def build_index_fixture(wide, vocab):
+    """Tree fixture + a located no-membership sample (the #306 case)."""
+    con = duckdb.connect(); con.execute("INSTALL spatial; LOAD spatial;")
+    ic_rows = " UNION ALL ".join(
+        f"SELECT 'IdentifiedConcept' AS otype, '{uri}' AS pid, {rid}::BIGINT AS row_id, NULL::VARCHAR AS n, "
+        f"'lbl' AS label, NULL::VARCHAR AS description, NULL::VARCHAR[] AS place_name, NULL::TIMESTAMP AS result_time, "
+        f"NULL AS geometry, NULL::BIGINT[] AS p__has_material_category, NULL::BIGINT[] AS p__has_context_category, "
+        f"NULL::BIGINT[] AS p__has_sample_object_type, NULL::BIGINT[] AS p__keywords"
+        for rid, uri in TREE_CONCEPTS)
+    msr = []
+    for i, (pid, src, m, c, o) in enumerate(INDEX_SAMPLES):
+        msr.append(
+            f"SELECT 'MaterialSampleRecord' AS otype, '{pid}' AS pid, NULL::BIGINT AS row_id, '{src}' AS n, "
+            f"'label {pid}' AS label, 'desc {pid}' AS description, ['plc-{pid}']::VARCHAR[] AS place_name, "
+            f"NULL::TIMESTAMP AS result_time, ST_AsWKB(ST_Point({10.0+i},{40.0+i})) AS geometry, "
+            f"{_arr(m)} AS p__has_material_category, {_arr(c)} AS p__has_context_category, "
+            f"{_arr(o)} AS p__has_sample_object_type, NULL::BIGINT[] AS p__keywords")
+    con.execute(f"COPY ({ic_rows} UNION ALL {' UNION ALL '.join(msr)}) TO '{wide}' (FORMAT PARQUET)")
+    edges = " UNION ALL ".join(
+        f"SELECT '{u}' uri, {('NULL' if p is None else repr(p))}::VARCHAR broader, 'data_v1' uri_form"
+        for u, p in TREE_EDGES)
+    con.execute(f"COPY ({edges}) TO '{vocab}' (FORMAT PARQUET)")
+    con.close()
+
+
+def _build_index(tmp_path, wide, vocab, tag="t"):
+    cmd = [sys.executable, BUILD, "--wide", wide, "--outdir", str(tmp_path), "--tag", tag,
+           "--skip", "wide_h3", "--no-manifest", "--vocab-labels", vocab]
+    return subprocess.run(cmd, capture_output=True, text=True)
+
+
+def test_sample_facet_index_complete_and_zero_masked(tmp_path):
+    """#305/#306: the index covers EVERY located pid (incl. the no-membership one,
+    which masks omits), carries source, and zero-masks no-membership pids while
+    staying bit-identical to masks for membership pids."""
+    wide = str(tmp_path / "wide.parquet"); vocab = str(tmp_path / "vocab.parquet")
+    build_index_fixture(wide, vocab)
+    r = _build_index(tmp_path, wide, vocab)
+    assert r.returncode == 0, f"index build failed:\n{r.stdout}\n{r.stderr}"
+    con = duckdb.connect()
+    IX = f"read_parquet('{tmp_path / 't_sample_facet_index.parquet'}')"
+    MK = f"read_parquet('{tmp_path / 't_sample_facet_masks.parquet'}')"
+    F = f"read_parquet('{tmp_path / 't_sample_facets_v2.parquet'}')"
+
+    # COMPLETENESS: index has all 4 located pids; masks has only the 3 with membership
+    assert con.sql(f"SELECT COUNT(*) FROM {IX}").fetchone()[0] == 4
+    assert con.sql(f"SELECT COUNT(*) FROM {MK}").fetchone()[0] == 3
+    # index pid set == facets_v2 pid set (the located universe)
+    diff = con.sql(f"""SELECT (SELECT COUNT(*) FROM (SELECT pid FROM {F} EXCEPT SELECT pid FROM {IX}))
+        + (SELECT COUNT(*) FROM (SELECT pid FROM {IX} EXCEPT SELECT pid FROM {F}))""").fetchone()[0]
+    assert diff == 0, "index pid set != facets_v2 located universe"
+    # the no-membership located sample IS present and FULLY zero-masked (#306)
+    row = con.sql(f"SELECT material_mask, context_mask, object_type_mask, source FROM {IX} WHERE pid='s-nomem'").fetchone()
+    assert row is not None, "s-nomem (located, no membership) missing from index — the #306 bug"
+    assert row[:3] == (0, 0, 0), f"s-nomem should be zero-masked, got {row[:3]}"
+    assert row[3] == "C", f"s-nomem source should be 'C', got {row[3]!r}"
+    # masks bit-identical for shared pids
+    mk_diff = con.sql(f"""SELECT COUNT(*) FROM {IX} i JOIN {MK} m ON i.pid=m.pid
+        WHERE i.material_mask IS DISTINCT FROM m.material_mask
+           OR i.context_mask IS DISTINCT FROM m.context_mask
+           OR i.object_type_mask IS DISTINCT FROM m.object_type_mask""").fetchone()[0]
+    assert mk_diff == 0, "index masks drift from sample_facet_masks for shared pids"
+    # build_id is structured "<membership>:<coverage>" and its membership half == node_bits
+    bid = con.sql(f"SELECT DISTINCT build_id FROM {IX}").fetchall()
+    assert len(bid) == 1 and ":" in bid[0][0], f"index build_id not single/structured: {bid}"
+    nb_bid = con.sql(f"SELECT DISTINCT build_id FROM read_parquet('{tmp_path / 't_facet_node_bits.parquet'}')").fetchone()[0]
+    assert bid[0][0].split(":", 1)[0] == nb_bid, "index build_id membership-half != node_bits build_id"
+
+
+def test_sample_facet_index_coverage_id_changes_with_source(tmp_path):
+    """The coverage half of build_id must change when source/coverage changes even
+    though membership is identical (the silent-staleness hole the plan calls out)."""
+    wide = str(tmp_path / "wide.parquet"); vocab = str(tmp_path / "vocab.parquet")
+    build_index_fixture(wide, vocab)
+    assert _build_index(tmp_path, wide, vocab, tag="a").returncode == 0
+    con = duckdb.connect()
+    bid_a = con.sql(f"SELECT DISTINCT build_id FROM read_parquet('{tmp_path / 'a_sample_facet_index.parquet'}')").fetchone()[0]
+
+    # rebuild with a flipped source on s-nomem (same membership, different coverage)
+    wide2 = str(tmp_path / "wide2.parquet")
+    con.execute(f"""COPY (
+        SELECT * REPLACE (CASE WHEN pid='s-nomem' THEN 'Z' ELSE n END AS n) FROM read_parquet('{wide}')
+    ) TO '{wide2}' (FORMAT PARQUET)""")
+    assert _build_index(tmp_path, wide2, vocab, tag="b").returncode == 0
+    bid_b = con.sql(f"SELECT DISTINCT build_id FROM read_parquet('{tmp_path / 'b_sample_facet_index.parquet'}')").fetchone()[0]
+
+    mem_a, cov_a = bid_a.split(":", 1)
+    mem_b, cov_b = bid_b.split(":", 1)
+    assert mem_a == mem_b, "membership half should be unchanged (membership identical)"
+    assert cov_a != cov_b, "coverage half should change when a source value changes"
+
+
+def test_sample_facet_index_validator_passes_and_gates_bite(tmp_path):
+    wide = str(tmp_path / "wide.parquet"); vocab = str(tmp_path / "vocab.parquet")
+    build_index_fixture(wide, vocab)
+    assert _build_index(tmp_path, wide, vocab).returncode == 0
+    # clean: validator (incl. the new index gates) must PASS
+    v = subprocess.run([sys.executable, VALIDATE, "--dir", str(tmp_path), "--tag", "t",
+                        "--min-rows", "1", "--wide", wide], capture_output=True, text=True)
+    assert v.returncode == 0, f"validator failed on clean index fixture:\n{v.stdout}\n{v.stderr}"
+    assert "index.pid == facets_v2.pid" in v.stdout
+
+    # corrupt 1: drop the no-membership pid (re-introduce the #306 bug) -> completeness gate fires
+    ix = str(tmp_path / "t_sample_facet_index.parquet")
+    con = duckdb.connect(); tmp_i = ix + ".tmp"
+    con.execute(f"""COPY (SELECT * FROM read_parquet('{ix}') WHERE pid <> 's-nomem')
+                   TO '{tmp_i}' (FORMAT PARQUET)"""); os.replace(tmp_i, ix)
+    v2 = subprocess.run([sys.executable, VALIDATE, "--dir", str(tmp_path), "--tag", "t", "--min-rows", "1"],
+                        capture_output=True, text=True)
+    assert v2.returncode != 0 and "index.pid == facets_v2.pid" in v2.stdout, \
+        f"completeness gate failed to catch a dropped located pid:\n{v2.stdout}"
+
+
+def test_sample_facet_index_nonzero_mask_on_nomem_caught(tmp_path):
+    """A no-membership pid carrying a non-zero mask is a corruption the gate must catch."""
+    wide = str(tmp_path / "wide.parquet"); vocab = str(tmp_path / "vocab.parquet")
+    build_index_fixture(wide, vocab)
+    assert _build_index(tmp_path, wide, vocab).returncode == 0
+    ix = str(tmp_path / "t_sample_facet_index.parquet")
+    con = duckdb.connect(); tmp_i = ix + ".tmp"
+    con.execute(f"""COPY (SELECT pid, source,
+                   CASE WHEN pid='s-nomem' THEN 1::BIGINT ELSE material_mask END AS material_mask,
+                   context_mask, object_type_mask, build_id, schema_version
+                   FROM read_parquet('{ix}')) TO '{tmp_i}' (FORMAT PARQUET)"""); os.replace(tmp_i, ix)
+    v = subprocess.run([sys.executable, VALIDATE, "--dir", str(tmp_path), "--tag", "t", "--min-rows", "1"],
+                       capture_output=True, text=True)
+    assert v.returncode != 0 and "zero-masked" in v.stdout, \
+        f"zero-mask gate failed to catch a non-zero no-membership pid:\n{v.stdout}"
+
+
+def test_sample_facet_index_source_drift_caught(tmp_path):
+    """index.source must equal facets_v2.source; a drifted source must fail the gate."""
+    wide = str(tmp_path / "wide.parquet"); vocab = str(tmp_path / "vocab.parquet")
+    build_index_fixture(wide, vocab)
+    assert _build_index(tmp_path, wide, vocab).returncode == 0
+    ix = str(tmp_path / "t_sample_facet_index.parquet")
+    con = duckdb.connect(); tmp_i = ix + ".tmp"
+    con.execute(f"""COPY (SELECT pid,
+                   CASE WHEN pid='s1' THEN 'WRONG' ELSE source END AS source,
+                   material_mask, context_mask, object_type_mask, build_id, schema_version
+                   FROM read_parquet('{ix}')) TO '{tmp_i}' (FORMAT PARQUET)"""); os.replace(tmp_i, ix)
+    v = subprocess.run([sys.executable, VALIDATE, "--dir", str(tmp_path), "--tag", "t", "--min-rows", "1"],
+                       capture_output=True, text=True)
+    assert v.returncode != 0 and "index.source == facets_v2.source" in v.stdout, \
+        f"source-equality gate failed to catch drift:\n{v.stdout}"
+
+
+def test_sample_facet_index_only_builds(tmp_path):
+    """--only sample_facet_index must produce the file (and its prereqs build)."""
+    wide = str(tmp_path / "wide.parquet"); vocab = str(tmp_path / "vocab.parquet")
+    build_index_fixture(wide, vocab)
+    r = subprocess.run([sys.executable, BUILD, "--wide", wide, "--outdir", str(tmp_path), "--tag", "t",
+                        "--no-manifest", "--vocab-labels", vocab,
+                        "--only", "sample_facet_index"], capture_output=True, text=True)
+    assert r.returncode == 0, f"{r.stdout}\n{r.stderr}"
+    assert (tmp_path / "t_sample_facet_index.parquet").exists(), \
+        f"--only sample_facet_index produced no file:\n{r.stdout}"
+
+
 def test_scheme_corruption_caught(tmp_path):
     wide = str(tmp_path / "wide.parquet"); build_fixture_wide(wide, "blob")
     assert _build(tmp_path, wide).returncode == 0

From bf3da2a0293ece399874c6819ad89eae9654f904 Mon Sep 17 00:00:00 2001
From: Raymond Yee <raymond.yee@gmail.com>
Date: Fri, 19 Jun 2026 22:19:10 -0700
Subject: [PATCH 03/14] #306/#305: harden index validator + fingerprint (Codex
 round 1)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Address Codex adversarial review of the sample_facet_index Phase-1 diff:

- #1 staleness gate: validator now INDEPENDENTLY recomputes both build_id
  halves from the written siblings (coverage from sample_facets_v2,
  membership from sample_facet_membership) and asserts equality — a stale or
  hand-edited coverage id now FAILS at build time even though node_bits is
  unchanged. Runtime explorer handshake documented as a Phase 2 item.
- #2 validator metadata/NULL gates: schema_version must equal the exact
  contract version (999 was accepted); build_id must match the exact shape
  <xor>_<sum>_<cnt>:<xor>_<sum>_<cnt> ("a colon somewhere" accepted
  <m>:bogus:extra); NOT-NULL contract on pid/masks/build_id/version; zero-mask
  checks use IS DISTINCT FROM 0 (a NULL mask escaped the old <> 0).
- #3 independent mask check: re-derive masks from membership + node_bits for
  EVERY index pid (incl. zero rows) and symmetric-diff — no longer trusts the
  optional sample_facet_masks file.
- #4 orphan artifact: facet_node_bits is force-emitted whenever masks/index is
  built, so --only sample_facet_index can't ship an uninterpretable mask file.
- #5 fingerprint robustness: shared MEMBERSHIP/COVERAGE token exprs; XOR+SUM+
  COUNT trio (resists XOR cancellation / 2-row swaps); NULL source encoded with
  a sentinel byte so NULL≠''; honest non-cryptographic caveat.
- SQL gap: reject NULL pids in build_base_tables (a single NULL slipped the
  dup check, broke pid joins, and was absent from the coverage hash).

Tests: +5 adversarial (null mask, schema_version=999, malformed build_id,
stale coverage id, null pid) — all gates bite. 34/34 pass.

Refs #305 #306
---
 SERIALIZATIONS.md                    |  21 +++--
 scripts/build_frontend_derived.py    |  81 ++++++++++++++-----
 scripts/validate_frontend_derived.py | 111 ++++++++++++++++++++-------
 tests/test_frontend_derived.py       | 105 +++++++++++++++++++++++--
 4 files changed, 257 insertions(+), 61 deletions(-)

diff --git a/SERIALIZATIONS.md b/SERIALIZATIONS.md
index bc9e0d1..2488f09 100644
--- a/SERIALIZATIONS.md
+++ b/SERIALIZATIONS.md
@@ -325,11 +325,22 @@ for the alias when you want "latest."
   dimensions, **excluding the target dimension's own predicate**). This is the
   already-shipped cube/UI semantics; this index does not change it, it serves the
   same contract to the *multi-filter* case the single-filter cube cannot reach.
-- **Validation**: `validate_frontend_derived.py --index <file>` asserts: schema,
-  one row per pid, `pid` set == `sample_facets_v2.pid` (the located universe),
-  `source` == facets_v2 source, single structured `build_id` whose membership half
-  matches `node_bits`, masks bit-identical to `sample_facet_masks` for shared pids,
-  and that every no-membership pid is zero-masked.
+- **Validation**: `validate_frontend_derived.py --index <file>` is an INDEPENDENT
+  oracle — it asserts schema + NOT-NULL columns, one row per pid, `pid` set ==
+  `sample_facets_v2.pid` (the located universe), `source` == facets_v2 source,
+  `schema_version` == the exact contract version, a well-formed single `build_id`,
+  and then **recomputes** both `build_id` halves from the written sibling files (the
+  coverage half from `sample_facets_v2`, the membership half from
+  `sample_facet_membership`) and asserts equality — so a stale/edited coverage id is
+  caught at build time even though `node_bits` is unchanged. Masks are re-derived
+  directly from `membership` + `node_bits` for **every** index pid (zero rows
+  included) and symmetric-diffed against the file.
+- **Runtime staleness handshake (Phase 2)**: the *explorer* loads the index by URL
+  and so cannot, on its own, know the expected coverage half. The load-time refusal
+  of a stale-coverage index requires an external trusted pointer (e.g. the expected
+  composite id co-published in the manifest / a `current` alias the explorer reads);
+  that handshake lands with the count-query consumer in Phase 2. Build-time
+  validation (above) is the gate for now.
 - **Immutability**: published under a **new** versioned filename — it is a new
   artifact name and never overwrites a cached `sample_facet_masks` or any prior tag.
 
diff --git a/scripts/build_frontend_derived.py b/scripts/build_frontend_derived.py
index 661a78e..13a1d5f 100755
--- a/scripts/build_frontend_derived.py
+++ b/scripts/build_frontend_derived.py
@@ -208,12 +208,19 @@ def build_base_tables(con, wide, t0):
     n_geo = con.sql("SELECT COUNT(*) FROM samp_geo").fetchone()[0]
     n_dup = con.sql("SELECT COUNT(*) FROM (SELECT pid FROM samp_geo GROUP BY pid HAVING COUNT(*)>1)").fetchone()[0]
     n_icdup = con.sql("SELECT COUNT(*) FROM (SELECT row_id FROM ic GROUP BY row_id HAVING COUNT(*)>1)").fetchone()[0]
-    log(f"samp={n_samp:,}  samp_geo={n_geo:,}  duplicate_pids={n_dup:,}  duplicate_concept_row_ids={n_icdup:,}", t0)
-    if n_dup or n_icdup:
-        # HARD fail (Codex): non-unique keys make the output grain wrong (inflated
+    # #305/#306 (Codex): the GROUP BY...HAVING COUNT>1 dup check above collapses all
+    # NULL pids into ONE group, so a SINGLE NULL pid slips through — yet a NULL pid
+    # breaks every pid join (joins drop it) and is silently absent from the coverage
+    # fingerprint. Reject it explicitly.
+    n_null = con.sql("SELECT COUNT(*) FROM samp_geo WHERE pid IS NULL").fetchone()[0]
+    log(f"samp={n_samp:,}  samp_geo={n_geo:,}  duplicate_pids={n_dup:,}  "
+        f"null_pids={n_null:,}  duplicate_concept_row_ids={n_icdup:,}", t0)
+    if n_dup or n_icdup or n_null:
+        # HARD fail (Codex): non-unique/NULL keys make the output grain wrong (inflated
         # facet counts, ambiguous joins, non-total ORDER BY pid). Abort before writing.
         raise SystemExit(
-            f"FATAL: non-unique keys — duplicate_pids={n_dup}, duplicate_concept_row_ids={n_icdup}. "
+            f"FATAL: non-unique/NULL keys — duplicate_pids={n_dup}, null_pids={n_null}, "
+            f"duplicate_concept_row_ids={n_icdup}. "
             f"Output grain/joins would be wrong; refusing to write.")
 
 
@@ -499,6 +506,32 @@ def build_facet_tree_cross_filter(con, out):
     ) TO '{out}' (FORMAT PARQUET, COMPRESSION ZSTD)""")
 
 
+# --- generation fingerprints (#293, #305/#306) -----------------------------
+# The per-row token expressions are module constants so the VALIDATOR can
+# recompute the exact same fingerprint independently from the written sibling
+# files (membership, sample_facets_v2) and assert the index's build_id — i.e.
+# the builder is not the sole authority on the index's generation identity.
+# chr(31) (US) separates fields. A NULL source is encoded with a distinct
+# sentinel byte so NULL and '' can never collide (Codex #5).
+MEMBERSHIP_TOKEN_EXPR = "pid || chr(31) || facet_type || chr(31) || concept_uri"
+COVERAGE_TOKEN_EXPR = ("pid || chr(31) || CASE WHEN source IS NULL THEN chr(0) ELSE chr(1) END "
+                       "|| chr(31) || COALESCE(source, '')")
+
+
+def _fingerprint(con, relation, token_expr):
+    # Order-independent generation fingerprint over `relation`. Combines THREE
+    # order-independent accumulators — XOR, SUM (HUGEINT, no overflow), and COUNT —
+    # of per-row hashes. XOR alone cancels identical rows and is vulnerable to a
+    # 2-row swap; SUM+COUNT defeat that. NOT a cryptographic digest: this defends
+    # against accidental drift / stale generations, not an adversary who engineers a
+    # multi-row hash collision. Grain is unique (validated), so the trio is stable.
+    x, s, n = con.sql(
+        f"SELECT COALESCE(bit_xor(hash({token_expr})), 0), "
+        f"       COALESCE(SUM(hash({token_expr})::HUGEINT), 0), COUNT(*) "
+        f"FROM {relation}").fetchone()
+    return f"{x}_{s}_{n}"
+
+
 def membership_build_id(con):
     # #293 (Codex P1, r2): a fingerprint of the FULL membership generation — not
     # just the node set. Both node_bits (positional bit assignment) and masks are
@@ -506,12 +539,8 @@ def membership_build_id(con):
     # change that would alter either artifact (new/dropped pids, re-mapped concepts,
     # AND node-set changes). Embedding this id in both lets the explorer refuse the
     # mask path unless the two are from the SAME generation (guards a stale-cached
-    # masks file). Order-independent XOR of per-row hashes (membership grain is
-    # unique per (pid,facet_type,concept_uri) — validated — so no XOR cancellation).
-    return con.sql("""
-        SELECT CAST(COALESCE(bit_xor(hash(pid || chr(31) || facet_type || chr(31) || concept_uri)), 0)
-                    AS VARCHAR)
-        FROM membership""").fetchone()[0]
+    # masks file). membership grain is unique per (pid,facet_type,concept_uri).
+    return _fingerprint(con, "membership", MEMBERSHIP_TOKEN_EXPR)
 
 
 def coverage_build_id(con):
@@ -519,14 +548,10 @@ def coverage_build_id(con):
     # the (pid, source) pairs over the LOCATED set (samp_geo), independent of tree
     # membership. membership_build_id() alone is blind to this: an index built from
     # the SAME membership but a changed source value or a changed located-pid set
-    # (exactly the #306 class of drift — located pids with no membership) would
-    # carry an identical membership id and go stale SILENTLY. Order-independent XOR
-    # of per-row hashes; samp_geo.pid is unique (hard-checked in build_base_tables),
-    # so no XOR cancellation. COALESCE(source,'') so a NULL→'' source flip is caught.
-    return con.sql("""
-        SELECT CAST(COALESCE(bit_xor(hash(pid || chr(31) || COALESCE(source, ''))), 0)
-                    AS VARCHAR)
-        FROM samp_geo""").fetchone()[0]
+    # (exactly the #306 class of drift — located pids with no membership) would carry
+    # an identical membership id and go stale SILENTLY. NULL vs '' source are encoded
+    # distinctly (sentinel byte) so a NULL↔'' flip changes the fingerprint.
+    return _fingerprint(con, "samp_geo", COVERAGE_TOKEN_EXPR)
 
 
 def index_build_id(con):
@@ -538,8 +563,9 @@ def index_build_id(con):
     #      generation), and
     #   2. detect coverage/source drift via the coverage half (would otherwise be
     #      invisible to a membership-only id).
-    # ':' never appears in either half (both are decimal hash strings), so a single
-    # split is unambiguous. node_bits/masks keep their plain membership id.
+    # Each half is "<xor>_<sum>_<count>" (decimal + underscore only); ':' never
+    # appears in either half, so split(':', 1) is unambiguous. node_bits/masks keep
+    # their plain membership id (== the membership half here, for the match gate).
     return f"{membership_build_id(con)}:{coverage_build_id(con)}"
 
 
@@ -724,7 +750,20 @@ def emit(name, fn):
             # path when the two are from the same generation (Codex P1).
             if want("facet_node_bits") or want("sample_facet_masks") or want("sample_facet_index"):
                 _bid = membership_build_id(con)
-                emit("facet_node_bits", lambda o: build_facet_node_bits(con, o, _bid))
+                # facet_node_bits is the bit-INTERPRETATION file: sample_facet_masks
+                # AND sample_facet_index store raw mask bits that are meaningless
+                # without it. So node_bits is FORCE-emitted whenever either consumer
+                # is built — even under `--only sample_facet_masks/_index` — so a
+                # build never ships an orphan mask file (Codex #4). The shared _bid
+                # lets the explorer gate the two on the same generation (Codex #293).
+                force_node_bits = want("sample_facet_masks") or want("sample_facet_index")
+                if want("facet_node_bits") or force_node_bits:
+                    build_node = lambda o: build_facet_node_bits(con, o, _bid)
+                    if want("facet_node_bits"):
+                        emit("facet_node_bits", build_node)
+                    else:
+                        build_node(p("facet_node_bits")); produced.append(p("facet_node_bits"))
+                        log("facet_node_bits ✓ (auto-paired with masks/index)", t0)
                 emit("sample_facet_masks", lambda o: build_sample_facet_masks(con, o, _bid))
                 # #305/#306: complete per-pid index (every located pid + source,
                 # zero-mask for no-membership pids). Its build_id embeds the SAME
diff --git a/scripts/validate_frontend_derived.py b/scripts/validate_frontend_derived.py
index 17d3904..84a4294 100755
--- a/scripts/validate_frontend_derived.py
+++ b/scripts/validate_frontend_derived.py
@@ -487,14 +487,35 @@ def _opt(name, attr):
     # pid, zero-mask the no-membership pids (#306), and be bit-identical to
     # sample_facet_masks for pids that DO have membership — otherwise counts drift.
     if index:
+        # Import the builder's canonical schema version + fingerprint token exprs so
+        # the validator is an INDEPENDENT oracle (Codex #1/#5): it recomputes the
+        # expected build_id from the written sibling files and asserts equality,
+        # rather than trusting whatever the builder stamped.
+        sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
+        from build_frontend_derived import (INDEX_SCHEMA_VERSION, MEMBERSHIP_TOKEN_EXPR,
+                                             COVERAGE_TOKEN_EXPR)
+
+        def _fp(relation, token_expr):  # mirror build_frontend_derived._fingerprint
+            x, s, n = con.sql(
+                f"SELECT COALESCE(bit_xor(hash({token_expr})), 0), "
+                f"COALESCE(SUM(hash({token_expr})::HUGEINT), 0), COUNT(*) FROM {relation}").fetchone()
+            return f"{x}_{s}_{n}"
+
         IX = f"read_parquet('{index}')"
-        # schema/types contract (the explorer reads these columns positionally-ish)
+        # schema/types contract (the explorer reads these columns by name)
         ix_sch = [(r[0], r[1]) for r in con.sql(f"DESCRIBE SELECT * FROM {IX}").fetchall()]
         EXP_IX = [("pid", "VARCHAR"), ("source", "VARCHAR"),
                   ("material_mask", "BIGINT"), ("context_mask", "BIGINT"),
                   ("object_type_mask", "BIGINT"), ("build_id", "VARCHAR"),
                   ("schema_version", "INTEGER")]
         check("index schema matches contract", ix_sch == EXP_IX, f"got {ix_sch}")
+        # NOT-NULL contract (source MAY be NULL — it mirrors facets_v2; everything
+        # else must be present). A NULL mask is a real corruption (Codex #2): the
+        # zero-mask check below uses IS DISTINCT FROM so it also rejects NULL.
+        nn_bad = scalar(f"""SELECT COUNT(*) FROM {IX} WHERE pid IS NULL
+            OR material_mask IS NULL OR context_mask IS NULL OR object_type_mask IS NULL
+            OR build_id IS NULL OR schema_version IS NULL""")
+        check("index: required columns non-NULL", nn_bad == 0, f"{nn_bad} rows with NULL pid/mask/build_id/version")
         # one row per pid
         ixdup = scalar(f"SELECT COUNT(*) FROM (SELECT pid FROM {IX} GROUP BY pid HAVING COUNT(*)>1)")
         check("index: one row per pid", ixdup == 0, f"{ixdup} duplicate pids in index")
@@ -507,20 +528,42 @@ def _opt(name, attr):
         check("index.pid == facets_v2.pid (complete located universe, #306)", ix_diff == 0,
               f"{ix_diff} pids differ between index and facets_v2")
         # SOURCE equality: index.source must equal facets_v2.source for every pid
-        # (IS NOT DISTINCT FROM so NULL==NULL). A source-value drift is exactly what
-        # the coverage half of build_id is meant to catch — assert it directly too.
+        # (IS DISTINCT FROM so NULL==NULL). A source-value drift is exactly what the
+        # coverage half of build_id is meant to catch — assert it directly too.
         src_bad = scalar(f"""SELECT COUNT(*) FROM {IX} i JOIN {F} f ON i.pid=f.pid
             WHERE i.source IS DISTINCT FROM f.source""")
         check("index.source == facets_v2.source", src_bad == 0, f"{src_bad} pids with mismatched source")
-        # single build_id, structured "<membership_id>:<coverage_id>"
+        # schema_version must be EXACTLY the contract version (Codex #2: "one distinct
+        # value" let 999 through). The explorer refuses an index it can't interpret.
+        ix_sv_bad = scalar(f"SELECT COUNT(*) FROM {IX} WHERE schema_version IS DISTINCT FROM {INDEX_SCHEMA_VERSION}")
+        check(f"index: schema_version == {INDEX_SCHEMA_VERSION}", ix_sv_bad == 0,
+              f"{ix_sv_bad} rows with schema_version != {INDEX_SCHEMA_VERSION}")
+        # build_id must be a SINGLE value of the EXACT shape "<m_xor>_<m_sum>_<m_cnt>:<c_xor>_<c_sum>_<c_cnt>"
+        # — exactly two ':'-separated halves, each exactly three '_'-separated decimals.
+        # (Codex #2: "a colon somewhere" accepted "<m>:bogus:extra".)
         ix_bids = scalar(f"SELECT COUNT(DISTINCT build_id) FROM {IX}")
         check("index: single build_id", ix_bids == 1, f"{ix_bids} distinct build_ids (want 1)")
-        ix_bid_fmt = scalar(f"SELECT COUNT(*) FROM {IX} WHERE build_id NOT LIKE '%:%'")
-        check("index: build_id is '<membership>:<coverage>'", ix_bid_fmt == 0,
-              "build_id(s) missing the ':' coverage separator")
-        # single schema_version (sanity — the explorer keys behavior off it)
-        ix_sv = scalar(f"SELECT COUNT(DISTINCT schema_version) FROM {IX}")
-        check("index: single schema_version", ix_sv == 1, f"{ix_sv} distinct schema_versions")
+        bid_re = r'^[0-9]+_[0-9]+_[0-9]+:[0-9]+_[0-9]+_[0-9]+$'
+        ix_bid_fmt = scalar(f"SELECT COUNT(*) FROM {IX} WHERE regexp_matches(build_id, '{bid_re}') = FALSE")
+        check("index: build_id is well-formed '<membership>:<coverage>'", ix_bid_fmt == 0,
+              "build_id(s) not exactly <xor>_<sum>_<cnt>:<xor>_<sum>_<cnt>")
+        # INDEPENDENT build_id recomputation (Codex #1/#5): the coverage half is
+        # recomputed from facets_v2 (the located (pid, source) universe — equivalent
+        # to samp_geo) and, when membership is present, the membership half from it.
+        # This is the gate that actually catches coverage/source staleness at
+        # validation time — a membership-only id (or a hand-edited coverage half)
+        # FAILS here even though node_bits is unchanged.
+        if ix_bids == 1:
+            ix_bid = scalar(f"SELECT MIN(build_id) FROM {IX}")
+            exp_cov = _fp(F, COVERAGE_TOKEN_EXPR)
+            got_cov = ix_bid.split(":", 1)[1] if ":" in ix_bid else ""
+            check("index build_id coverage-half == recomputed from facets_v2", got_cov == exp_cov,
+                  f"coverage half {got_cov!r} != expected {exp_cov!r} (stale/edited coverage)")
+            if mem:
+                exp_mem = _fp(f"read_parquet('{mem}')", MEMBERSHIP_TOKEN_EXPR)
+                got_mem = ix_bid.split(":", 1)[0]
+                check("index build_id membership-half == recomputed from membership", got_mem == exp_mem,
+                      f"membership half {got_mem!r} != expected {exp_mem!r} (stale/foreign generation)")
         if nodebits and ix_bids == 1:
             # the membership half of the index build_id MUST equal node_bits.build_id
             # so the mask bits are interpreted under the SAME assignment (else counts
@@ -533,35 +576,45 @@ def _opt(name, attr):
                 check("index build_id membership-half == node_bits build_id",
                       bool(membership_half_matches),
                       "index masks would be read under a different bit assignment than node_bits")
-        if masks:
+        # INDEPENDENT mask re-derivation (Codex #3): re-derive the expected masks for
+        # EVERY index pid directly from membership + node_bits (zero for pids with no
+        # membership) and symmetric-diff against the index. This validates the masks
+        # without trusting the optional sample_facet_masks file, AND covers the
+        # zero-mask rows that a masks-only diff never sees.
+        if mem and nodebits:
+            M = f"read_parquet('{mem}')"
+            NB = f"read_parquet('{nodebits}')"
+            ref = (f"WITH nb AS (SELECT facet_type, concept_uri, (1::BIGINT << bit_index) AS bitval FROM {NB}), "
+                   f"mm AS (SELECT m.pid, "
+                   f"COALESCE(bit_or(CASE WHEN m.facet_type='material' THEN nb.bitval END),0)::BIGINT material_mask, "
+                   f"COALESCE(bit_or(CASE WHEN m.facet_type='context' THEN nb.bitval END),0)::BIGINT context_mask, "
+                   f"COALESCE(bit_or(CASE WHEN m.facet_type='object_type' THEN nb.bitval END),0)::BIGINT object_type_mask "
+                   f"FROM {M} m JOIN nb ON nb.facet_type=m.facet_type AND nb.concept_uri=m.concept_uri GROUP BY m.pid) "
+                   f"SELECT i.pid, COALESCE(mm.material_mask,0)::BIGINT material_mask, "
+                   f"COALESCE(mm.context_mask,0)::BIGINT context_mask, "
+                   f"COALESCE(mm.object_type_mask,0)::BIGINT object_type_mask "
+                   f"FROM (SELECT pid FROM {IX}) i LEFT JOIN mm ON mm.pid=i.pid")
+            fil = f"SELECT pid, material_mask, context_mask, object_type_mask FROM {IX}"
+            mm_diff = scalar(f"SELECT (SELECT COUNT(*) FROM (({ref}) EXCEPT ({fil}))) "
+                             f"+ (SELECT COUNT(*) FROM (({fil}) EXCEPT ({ref})))")
+            check("index masks == re-derived from membership+node_bits (all pids)", mm_diff == 0,
+                  f"{mm_diff} index rows disagree with an independent re-derivation")
+        elif masks:
+            # Fallback when membership/node_bits aren't both available: diff shared
+            # pids against the masks file and assert no-membership extras are zero.
             MK = f"read_parquet('{masks}')"
-            # For pids that ALSO appear in masks (i.e. have membership), the three
-            # mask columns must be byte-identical — the index is a superset, not a
-            # re-computation that could drift.
             mk_diff = scalar(f"""SELECT COUNT(*) FROM {IX} i JOIN {MK} m ON i.pid=m.pid
                 WHERE i.material_mask    IS DISTINCT FROM m.material_mask
                    OR i.context_mask     IS DISTINCT FROM m.context_mask
                    OR i.object_type_mask IS DISTINCT FROM m.object_type_mask""")
             check("index masks == sample_facet_masks (shared pids)", mk_diff == 0,
                   f"{mk_diff} shared pids with differing masks")
-            # The ONLY pids allowed to differ between the two files are the
-            # no-membership located pids (#306): present in index, absent from masks,
-            # and they MUST be zero-masked. Verify (a) the extra pids are exactly the
-            # zero-mask ones and (b) none are in masks.
             extra_nonzero = scalar(f"""SELECT COUNT(*) FROM {IX} i
                 WHERE NOT EXISTS (SELECT 1 FROM {MK} m WHERE m.pid=i.pid)
-                  AND (i.material_mask <> 0 OR i.context_mask <> 0 OR i.object_type_mask <> 0)""")
+                  AND (i.material_mask IS DISTINCT FROM 0 OR i.context_mask IS DISTINCT FROM 0
+                       OR i.object_type_mask IS DISTINCT FROM 0)""")
             check("index: no-membership extra pids are zero-masked (#306)", extra_nonzero == 0,
-                  f"{extra_nonzero} index-only pids carry a non-zero mask (should be 0)")
-        elif mem:
-            # No masks file to diff against, but we can still assert the #306 invariant
-            # directly off membership: every index pid NOT in membership is zero-masked.
-            M = f"read_parquet('{mem}')"
-            extra_nonzero = scalar(f"""SELECT COUNT(*) FROM {IX} i
-                WHERE NOT EXISTS (SELECT 1 FROM {M} m WHERE m.pid=i.pid)
-                  AND (i.material_mask <> 0 OR i.context_mask <> 0 OR i.object_type_mask <> 0)""")
-            check("index: no-membership pids zero-masked (#306, vs membership)", extra_nonzero == 0,
-                  f"{extra_nonzero} no-membership pids carry a non-zero mask")
+                  f"{extra_nonzero} index-only pids carry a non-zero/NULL mask (should be 0)")
 
     print(f"\n{'CHECK':<44} {'RESULT':<6} DETAIL\n" + "-" * 90)
     ok = True
diff --git a/tests/test_frontend_derived.py b/tests/test_frontend_derived.py
index 73b5701..a6140aa 100644
--- a/tests/test_frontend_derived.py
+++ b/tests/test_frontend_derived.py
@@ -211,6 +211,24 @@ def test_duplicate_pid_hard_fails(tmp_path):
         f"builder should hard-fail on duplicate pids:\n{r.stdout}\n{r.stderr}"
 
 
+def test_null_pid_hard_fails(tmp_path):
+    """Codex (#305 SQL gap): a SINGLE NULL pid slips the GROUP BY...HAVING COUNT>1
+    dup check but breaks every pid join + the coverage fingerprint. Reject it."""
+    wide = str(tmp_path / "wide.parquet"); build_fixture_wide(wide, "blob")
+    con = duckdb.connect(); con.execute("INSTALL spatial; LOAD spatial;")
+    nullp = str(tmp_path / "wide_nullpid.parquet")
+    # append one located row with a NULL pid
+    con.execute(f"""COPY (
+        SELECT * FROM read_parquet('{wide}')
+        UNION ALL
+        SELECT * REPLACE (NULL::VARCHAR AS pid) FROM read_parquet('{wide}') WHERE pid='m-real-first'
+    ) TO '{nullp}' (FORMAT PARQUET)""")
+    con.close()
+    r = _build(tmp_path, nullp)
+    assert r.returncode != 0 and "null_pids=1" in (r.stdout + r.stderr).lower(), \
+        f"builder should hard-fail on a NULL pid:\n{r.stdout}\n{r.stderr}"
+
+
 def test_manifest_emitted(tmp_path):
     import json
     wide = str(tmp_path / "wide.parquet"); build_fixture_wide(wide, "blob")
@@ -648,7 +666,8 @@ def test_sample_facet_index_validator_passes_and_gates_bite(tmp_path):
 
 
 def test_sample_facet_index_nonzero_mask_on_nomem_caught(tmp_path):
-    """A no-membership pid carrying a non-zero mask is a corruption the gate must catch."""
+    """A no-membership pid carrying a non-zero mask is a corruption the gate must
+    catch. With membership+node_bits present the independent re-derivation fires."""
     wide = str(tmp_path / "wide.parquet"); vocab = str(tmp_path / "vocab.parquet")
     build_index_fixture(wide, vocab)
     assert _build_index(tmp_path, wide, vocab).returncode == 0
@@ -660,8 +679,78 @@ def test_sample_facet_index_nonzero_mask_on_nomem_caught(tmp_path):
                    FROM read_parquet('{ix}')) TO '{tmp_i}' (FORMAT PARQUET)"""); os.replace(tmp_i, ix)
     v = subprocess.run([sys.executable, VALIDATE, "--dir", str(tmp_path), "--tag", "t", "--min-rows", "1"],
                        capture_output=True, text=True)
-    assert v.returncode != 0 and "zero-masked" in v.stdout, \
-        f"zero-mask gate failed to catch a non-zero no-membership pid:\n{v.stdout}"
+    assert v.returncode != 0 and "re-derived from membership" in v.stdout, \
+        f"mask gate failed to catch a non-zero no-membership pid:\n{v.stdout}"
+
+
+def test_sample_facet_index_null_mask_caught(tmp_path):
+    """Codex #2: a NULL mask escaped the old `<> 0` predicate (NULL=unknown). The
+    NOT-NULL gate must catch it."""
+    wide = str(tmp_path / "wide.parquet"); vocab = str(tmp_path / "vocab.parquet")
+    build_index_fixture(wide, vocab)
+    assert _build_index(tmp_path, wide, vocab).returncode == 0
+    ix = str(tmp_path / "t_sample_facet_index.parquet")
+    con = duckdb.connect(); tmp_i = ix + ".tmp"
+    con.execute(f"""COPY (SELECT pid, source,
+                   CASE WHEN pid='s-nomem' THEN NULL::BIGINT ELSE material_mask END AS material_mask,
+                   context_mask, object_type_mask, build_id, schema_version
+                   FROM read_parquet('{ix}')) TO '{tmp_i}' (FORMAT PARQUET)"""); os.replace(tmp_i, ix)
+    v = subprocess.run([sys.executable, VALIDATE, "--dir", str(tmp_path), "--tag", "t", "--min-rows", "1"],
+                       capture_output=True, text=True)
+    assert v.returncode != 0 and "non-NULL" in v.stdout, \
+        f"NOT-NULL gate failed to catch a NULL mask:\n{v.stdout}"
+
+
+def test_sample_facet_index_bad_schema_version_caught(tmp_path):
+    """Codex #2: schema_version=999 passed the old 'one distinct value' check."""
+    wide = str(tmp_path / "wide.parquet"); vocab = str(tmp_path / "vocab.parquet")
+    build_index_fixture(wide, vocab)
+    assert _build_index(tmp_path, wide, vocab).returncode == 0
+    ix = str(tmp_path / "t_sample_facet_index.parquet")
+    con = duckdb.connect(); tmp_i = ix + ".tmp"
+    con.execute(f"""COPY (SELECT pid, source, material_mask, context_mask, object_type_mask,
+                   build_id, 999::INTEGER AS schema_version FROM read_parquet('{ix}'))
+                   TO '{tmp_i}' (FORMAT PARQUET)"""); os.replace(tmp_i, ix)
+    v = subprocess.run([sys.executable, VALIDATE, "--dir", str(tmp_path), "--tag", "t", "--min-rows", "1"],
+                       capture_output=True, text=True)
+    assert v.returncode != 0 and "schema_version ==" in v.stdout, \
+        f"schema_version gate failed to catch 999:\n{v.stdout}"
+
+
+def test_sample_facet_index_malformed_build_id_caught(tmp_path):
+    """Codex #2: '<membership>:bogus:extra' passed the old 'a colon somewhere' check."""
+    wide = str(tmp_path / "wide.parquet"); vocab = str(tmp_path / "vocab.parquet")
+    build_index_fixture(wide, vocab)
+    assert _build_index(tmp_path, wide, vocab).returncode == 0
+    ix = str(tmp_path / "t_sample_facet_index.parquet")
+    con = duckdb.connect(); tmp_i = ix + ".tmp"
+    # keep a correct membership half, append junk extra ':' segments
+    con.execute(f"""COPY (SELECT pid, source, material_mask, context_mask, object_type_mask,
+                   split_part(build_id, ':', 1) || ':bogus:extra' AS build_id, schema_version
+                   FROM read_parquet('{ix}')) TO '{tmp_i}' (FORMAT PARQUET)"""); os.replace(tmp_i, ix)
+    v = subprocess.run([sys.executable, VALIDATE, "--dir", str(tmp_path), "--tag", "t", "--min-rows", "1"],
+                       capture_output=True, text=True)
+    assert v.returncode != 0 and "well-formed" in v.stdout, \
+        f"build_id format gate failed to catch a malformed id:\n{v.stdout}"
+
+
+def test_sample_facet_index_stale_coverage_id_caught(tmp_path):
+    """Codex #1: a build_id whose coverage half doesn't match the actual (pid,source)
+    universe (e.g. membership-only id, or hand-edited coverage) must fail — even
+    though node_bits is unchanged."""
+    wide = str(tmp_path / "wide.parquet"); vocab = str(tmp_path / "vocab.parquet")
+    build_index_fixture(wide, vocab)
+    assert _build_index(tmp_path, wide, vocab).returncode == 0
+    ix = str(tmp_path / "t_sample_facet_index.parquet")
+    con = duckdb.connect(); tmp_i = ix + ".tmp"
+    # replace the coverage half with a structurally-valid but WRONG fingerprint
+    con.execute(f"""COPY (SELECT pid, source, material_mask, context_mask, object_type_mask,
+                   split_part(build_id, ':', 1) || ':1_2_3' AS build_id, schema_version
+                   FROM read_parquet('{ix}')) TO '{tmp_i}' (FORMAT PARQUET)"""); os.replace(tmp_i, ix)
+    v = subprocess.run([sys.executable, VALIDATE, "--dir", str(tmp_path), "--tag", "t", "--min-rows", "1"],
+                       capture_output=True, text=True)
+    assert v.returncode != 0 and "coverage-half" in v.stdout, \
+        f"coverage staleness gate failed to catch a wrong coverage id:\n{v.stdout}"
 
 
 def test_sample_facet_index_source_drift_caught(tmp_path):
@@ -681,8 +770,10 @@ def test_sample_facet_index_source_drift_caught(tmp_path):
         f"source-equality gate failed to catch drift:\n{v.stdout}"
 
 
-def test_sample_facet_index_only_builds(tmp_path):
-    """--only sample_facet_index must produce the file (and its prereqs build)."""
+def test_sample_facet_index_only_auto_pairs_node_bits(tmp_path):
+    """Codex #4: --only sample_facet_index must NOT ship an orphan — facet_node_bits
+    (the bit-interpretation file) is force-emitted alongside it. The resulting pair
+    must pass the validator (which needs node_bits to re-derive the masks)."""
     wide = str(tmp_path / "wide.parquet"); vocab = str(tmp_path / "vocab.parquet")
     build_index_fixture(wide, vocab)
     r = subprocess.run([sys.executable, BUILD, "--wide", wide, "--outdir", str(tmp_path), "--tag", "t",
@@ -690,7 +781,9 @@ def test_sample_facet_index_only_builds(tmp_path):
                         "--only", "sample_facet_index"], capture_output=True, text=True)
     assert r.returncode == 0, f"{r.stdout}\n{r.stderr}"
     assert (tmp_path / "t_sample_facet_index.parquet").exists(), \
-        f"--only sample_facet_index produced no file:\n{r.stdout}"
+        f"--only sample_facet_index produced no index file:\n{r.stdout}"
+    assert (tmp_path / "t_facet_node_bits.parquet").exists(), \
+        f"--only sample_facet_index did not auto-pair facet_node_bits (orphan):\n{r.stdout}"
 
 
 def test_scheme_corruption_caught(tmp_path):

From a308fe2288d678f50aac2012227e3c49a651e26f Mon Sep 17 00:00:00 2001
From: Raymond Yee <raymond.yee@gmail.com>
Date: Fri, 19 Jun 2026 22:24:32 -0700
Subject: [PATCH 04/14] #306/#305: close index validator fail-open gaps (Codex
 round 2)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- r2 #1: index validation now FAILS CLOSED when sample_facet_membership or
  facet_node_bits is absent — those siblings are what the build_id recompute
  and mask re-derivation depend on, so a missing sibling must gate, not run a
  silent partial pass (Codex reproduced a corrupt index + restamped node_bits
  passing once the siblings were removed). They're always co-produced, so
  --dir/--tag satisfies it.
- r2 #2: assert node_bits has EXACTLY ONE non-NULL build_id before the
  membership-half match — 0 or >1 distinct ids previously skipped the match
  entirely (fail-open).

Codex r2 confirmed: builder _fingerprint() == validator _fp() exactly;
coverage-from-facets_v2 == samp_geo; the three round-1 bypasses stay rejected;
node_bits force-emit has no double-build and masks/index can't ship without
node_bits (even with --skip).

Tests: +2 (missing-siblings fail-closed, multi-id node_bits). 36/36 pass.

Refs #305 #306
---
 scripts/validate_frontend_derived.py | 19 +++++++++++++--
 tests/test_frontend_derived.py       | 36 ++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/scripts/validate_frontend_derived.py b/scripts/validate_frontend_derived.py
index 84a4294..d4aac72 100755
--- a/scripts/validate_frontend_derived.py
+++ b/scripts/validate_frontend_derived.py
@@ -502,6 +502,16 @@ def _fp(relation, token_expr):  # mirror build_frontend_derived._fingerprint
             return f"{x}_{s}_{n}"
 
         IX = f"read_parquet('{index}')"
+        # GATING PRECONDITION (Codex r2 #1): the index cannot be fully validated
+        # without BOTH sample_facet_membership and facet_node_bits — they are what
+        # the independent build_id recompute and mask re-derivation depend on. If
+        # either is absent we must FAIL rather than silently run a partial (fail-open)
+        # validation that would pass a corrupt index/node_bits pair. They are always
+        # co-produced by a build, so --dir/--tag (or passing both) satisfies this.
+        check("index validation has membership + node_bits (required to gate)",
+              bool(mem and nodebits),
+              "pass --membership and --node-bits (or --dir/--tag with both present); "
+              "the index can't be validated in isolation")
         # schema/types contract (the explorer reads these columns by name)
         ix_sch = [(r[0], r[1]) for r in con.sql(f"DESCRIBE SELECT * FROM {IX}").fetchall()]
         EXP_IX = [("pid", "VARCHAR"), ("source", "VARCHAR"),
@@ -568,8 +578,13 @@ def _fp(relation, token_expr):  # mirror build_frontend_derived._fingerprint
             # the membership half of the index build_id MUST equal node_bits.build_id
             # so the mask bits are interpreted under the SAME assignment (else counts
             # are computed against a stale/foreign bit layout — silently wrong).
-            nb_bid = scalar(f"SELECT COUNT(DISTINCT build_id) FROM read_parquet('{nodebits}')")
-            if nb_bid == 1:
+            # (Codex r2 #2): assert node_bits has EXACTLY ONE non-NULL build_id FIRST —
+            # 0 or >1 distinct ids previously skipped the match entirely (fail-open).
+            nb_bid = scalar(f"SELECT COUNT(DISTINCT build_id) FROM read_parquet('{nodebits}') WHERE build_id IS NOT NULL")
+            nb_null = scalar(f"SELECT COUNT(*) FROM read_parquet('{nodebits}') WHERE build_id IS NULL")
+            check("node_bits: exactly one non-NULL build_id", nb_bid == 1 and nb_null == 0,
+                  f"{nb_bid} distinct non-NULL build_ids, {nb_null} NULLs (want 1 / 0)")
+            if nb_bid == 1 and nb_null == 0:
                 membership_half_matches = scalar(
                     f"SELECT (SELECT split_part(MIN(build_id), ':', 1) FROM {IX}) "
                     f"= (SELECT MIN(build_id) FROM read_parquet('{nodebits}'))")
diff --git a/tests/test_frontend_derived.py b/tests/test_frontend_derived.py
index a6140aa..bcf0152 100644
--- a/tests/test_frontend_derived.py
+++ b/tests/test_frontend_derived.py
@@ -770,6 +770,42 @@ def test_sample_facet_index_source_drift_caught(tmp_path):
         f"source-equality gate failed to catch drift:\n{v.stdout}"
 
 
+def test_sample_facet_index_validation_requires_siblings(tmp_path):
+    """Codex r2 #1: validating the index WITHOUT membership+node_bits must FAIL
+    (fail-closed), not silently run a partial pass. Reproduces the bypass where a
+    corrupt index + restamped node_bits passed once the siblings were removed."""
+    wide = str(tmp_path / "wide.parquet"); vocab = str(tmp_path / "vocab.parquet")
+    build_index_fixture(wide, vocab)
+    assert _build_index(tmp_path, wide, vocab).returncode == 0
+    # delete the siblings the index validation depends on
+    os.remove(tmp_path / "t_sample_facet_membership.parquet")
+    os.remove(tmp_path / "t_facet_node_bits.parquet")
+    os.remove(tmp_path / "t_sample_facet_masks.parquet")
+    v = subprocess.run([sys.executable, VALIDATE, "--dir", str(tmp_path), "--tag", "t", "--min-rows", "1"],
+                       capture_output=True, text=True)
+    assert v.returncode != 0 and "required to gate" in v.stdout, \
+        f"index validation should fail-closed without membership+node_bits:\n{v.stdout}"
+
+
+def test_sample_facet_index_bad_node_bits_build_id_caught(tmp_path):
+    """Codex r2 #2: a node_bits file with !=1 non-NULL build_id previously skipped
+    the membership-half match entirely. It must now FAIL."""
+    wide = str(tmp_path / "wide.parquet"); vocab = str(tmp_path / "vocab.parquet")
+    build_index_fixture(wide, vocab)
+    assert _build_index(tmp_path, wide, vocab).returncode == 0
+    nb = str(tmp_path / "t_facet_node_bits.parquet")
+    con = duckdb.connect(); tmp_nb = nb + ".tmp"
+    # give node_bits TWO distinct build_ids (one row restamped)
+    con.execute(f"""COPY (SELECT facet_type, concept_uri, bit_index,
+                   CASE WHEN bit_index = (SELECT MIN(bit_index) FROM read_parquet('{nb}'))
+                        THEN 'OTHER_GEN' ELSE build_id END AS build_id
+                   FROM read_parquet('{nb}')) TO '{tmp_nb}' (FORMAT PARQUET)"""); os.replace(tmp_nb, nb)
+    v = subprocess.run([sys.executable, VALIDATE, "--dir", str(tmp_path), "--tag", "t", "--min-rows", "1"],
+                       capture_output=True, text=True)
+    assert v.returncode != 0 and "exactly one non-NULL build_id" in v.stdout, \
+        f"node_bits build_id gate failed to catch a multi-id file:\n{v.stdout}"
+
+
 def test_sample_facet_index_only_auto_pairs_node_bits(tmp_path):
     """Codex #4: --only sample_facet_index must NOT ship an orphan — facet_node_bits
     (the bit-interpretation file) is force-emitted alongside it. The resulting pair

From 23cefacb44916470204b258e07e974e19f2bbbb2 Mon Sep 17 00:00:00 2001
From: Raymond Yee <raymond.yee@gmail.com>
Date: Fri, 19 Jun 2026 22:29:47 -0700
Subject: [PATCH 05/14] #306/#305: force-emit full fast-path bundle with index
 (Codex round 3)

Codex r3 caught a regression my own fixes introduced: --only sample_facet_index
emitted index + node_bits but NOT sample_facet_membership, so the new validator
gate (which requires membership as its independent oracle) would reject that
output. Index validation legitimately also needs sample_facets_v2 (the #306
completeness oracle), so --only sample_facet_index is a PARTIAL rebuild against
an existing full build, by design.

- force_dep(): whenever masks or index is built, force-emit BOTH
  sample_facet_membership and facet_node_bits (the bundle the index validator
  depends on), exactly once, recorded for the manifest.
- test now proves the real workflow: full build -> --only sample_facet_index
  rebuild into the same dir -> re-validate PASSES (not just file-existence).

Codex r3 confirmed both round-2 fail-open gaps stay closed (rc=1 on every
bypass) with no new skip path. 36/36 pass.

Refs #305 #306
---
 scripts/build_frontend_derived.py | 47 ++++++++++++++++++-------------
 tests/test_frontend_derived.py    | 26 +++++++++++------
 2 files changed, 46 insertions(+), 27 deletions(-)

diff --git a/scripts/build_frontend_derived.py b/scripts/build_frontend_derived.py
index 13a1d5f..bb1ac3c 100755
--- a/scripts/build_frontend_derived.py
+++ b/scripts/build_frontend_derived.py
@@ -741,29 +741,38 @@ def emit(name, fn):
             log("SKIP hierarchy artifacts: pass --vocab-labels <vocab_labels.parquet>", t0)
         else:
             build_concept_membership(con, args.wide, args.vocab_labels, t0)
-            emit("sample_facet_membership", lambda o: build_sample_facet_membership(con, o))
+            # The mask fast-path bundle = {membership, node_bits, masks?, index}.
+            # sample_facet_masks AND sample_facet_index store raw mask bits that are
+            # uninterpretable without facet_node_bits, and that the validator can only
+            # gate by re-deriving from sample_facet_membership. So whenever masks or
+            # index is requested, FORCE-emit membership + node_bits too (even under
+            # `--only sample_facet_index`) — otherwise the build ships an artifact its
+            # own validator must reject (Codex #4 / r3). force_dep() builds a not-wanted
+            # artifact exactly once and records it for the manifest.
+            need_fastpath = want("facet_node_bits") or want("sample_facet_masks") or want("sample_facet_index")
+            force_deps = want("sample_facet_masks") or want("sample_facet_index")
+            def force_dep(name, fn):
+                if want(name):
+                    emit(name, fn)
+                elif p(name) not in produced:
+                    fn(p(name)); produced.append(p(name)); log(f"{name} ✓ (auto-paired with masks/index)", t0)
+
+            if force_deps:
+                force_dep("sample_facet_membership", lambda o: build_sample_facet_membership(con, o))
+            else:
+                emit("sample_facet_membership", lambda o: build_sample_facet_membership(con, o))
             emit("facet_tree_summaries", lambda o: build_facet_tree_summaries(con, o))
             # #290/#293 cross-filter cube — needs membership (above) + samp_geo (source).
             emit("facet_tree_cross_filter", lambda o: build_facet_tree_cross_filter(con, o))
-            # #293 bitmask filter artifacts — needs membership (above). node_bits
-            # and masks share a node-set build_id so the explorer only uses the mask
-            # path when the two are from the same generation (Codex P1).
-            if want("facet_node_bits") or want("sample_facet_masks") or want("sample_facet_index"):
+            # #293 bitmask filter artifacts — needs membership (above). node_bits and
+            # masks share a node-set build_id so the explorer only uses the mask path
+            # when the two are from the same generation (Codex P1).
+            if need_fastpath:
                 _bid = membership_build_id(con)
-                # facet_node_bits is the bit-INTERPRETATION file: sample_facet_masks
-                # AND sample_facet_index store raw mask bits that are meaningless
-                # without it. So node_bits is FORCE-emitted whenever either consumer
-                # is built — even under `--only sample_facet_masks/_index` — so a
-                # build never ships an orphan mask file (Codex #4). The shared _bid
-                # lets the explorer gate the two on the same generation (Codex #293).
-                force_node_bits = want("sample_facet_masks") or want("sample_facet_index")
-                if want("facet_node_bits") or force_node_bits:
-                    build_node = lambda o: build_facet_node_bits(con, o, _bid)
-                    if want("facet_node_bits"):
-                        emit("facet_node_bits", build_node)
-                    else:
-                        build_node(p("facet_node_bits")); produced.append(p("facet_node_bits"))
-                        log("facet_node_bits ✓ (auto-paired with masks/index)", t0)
+                if force_deps:
+                    force_dep("facet_node_bits", lambda o: build_facet_node_bits(con, o, _bid))
+                else:
+                    emit("facet_node_bits", lambda o: build_facet_node_bits(con, o, _bid))
                 emit("sample_facet_masks", lambda o: build_sample_facet_masks(con, o, _bid))
                 # #305/#306: complete per-pid index (every located pid + source,
                 # zero-mask for no-membership pids). Its build_id embeds the SAME
diff --git a/tests/test_frontend_derived.py b/tests/test_frontend_derived.py
index bcf0152..42ac846 100644
--- a/tests/test_frontend_derived.py
+++ b/tests/test_frontend_derived.py
@@ -806,20 +806,30 @@ def test_sample_facet_index_bad_node_bits_build_id_caught(tmp_path):
         f"node_bits build_id gate failed to catch a multi-id file:\n{v.stdout}"
 
 
-def test_sample_facet_index_only_auto_pairs_node_bits(tmp_path):
-    """Codex #4: --only sample_facet_index must NOT ship an orphan — facet_node_bits
-    (the bit-interpretation file) is force-emitted alongside it. The resulting pair
-    must pass the validator (which needs node_bits to re-derive the masks)."""
+def test_sample_facet_index_only_auto_pairs_bundle(tmp_path):
+    """Codex #4/r3: --only sample_facet_index must NOT ship an orphan. It
+    force-emits its whole fast-path bundle — facet_node_bits (bit interpretation)
+    AND sample_facet_membership (the validator's independent oracle). A partial
+    rebuild into a directory holding a prior FULL build must then re-validate
+    cleanly (the real iterate-on-one-artifact workflow)."""
     wide = str(tmp_path / "wide.parquet"); vocab = str(tmp_path / "vocab.parquet")
     build_index_fixture(wide, vocab)
+    # 1) full build so the validator's non-index siblings (facets_v2, h3, …) exist
+    assert _build_index(tmp_path, wide, vocab).returncode == 0
+    # 2) rebuild ONLY the index into the same dir
     r = subprocess.run([sys.executable, BUILD, "--wide", wide, "--outdir", str(tmp_path), "--tag", "t",
                         "--no-manifest", "--vocab-labels", vocab,
                         "--only", "sample_facet_index"], capture_output=True, text=True)
     assert r.returncode == 0, f"{r.stdout}\n{r.stderr}"
-    assert (tmp_path / "t_sample_facet_index.parquet").exists(), \
-        f"--only sample_facet_index produced no index file:\n{r.stdout}"
-    assert (tmp_path / "t_facet_node_bits.parquet").exists(), \
-        f"--only sample_facet_index did not auto-pair facet_node_bits (orphan):\n{r.stdout}"
+    # the bundle the index needs to be validatable was (re)emitted, not orphaned
+    for sib in ("t_sample_facet_index.parquet", "t_facet_node_bits.parquet",
+                "t_sample_facet_membership.parquet"):
+        assert (tmp_path / sib).exists(), f"--only sample_facet_index did not emit {sib}:\n{r.stdout}"
+    # 3) the rebuilt artifacts validate against the existing full-build siblings
+    v = subprocess.run([sys.executable, VALIDATE, "--dir", str(tmp_path), "--tag", "t", "--min-rows", "1"],
+                       capture_output=True, text=True)
+    assert v.returncode == 0, f"--only sample_facet_index rebuild failed validation:\n{v.stdout}\n{v.stderr}"
+    assert "index.pid == facets_v2.pid" in v.stdout
 
 
 def test_scheme_corruption_caught(tmp_path):

From a20a94f0f691e221811ee141179b081745635951 Mon Sep 17 00:00:00 2001
From: Raymond Yee <raymond.yee@gmail.com>
Date: Fri, 19 Jun 2026 22:32:57 -0700
Subject: [PATCH 06/14] #306/#305: prove index-only rebuild RE-EMITS its bundle
 (Codex round 4)

Delete index+node_bits+membership after the full build and before the
--only sample_facet_index rebuild, so a stale full-build file can't satisfy
the existence checks. Retains the re-emission + validator-pass assertions.

Codex r4 confirmed the implementation correct: index-only and masks-only emit
membership+node_bits exactly once, no duplicate emissions in a full build,
partial-rebuild validation passes, manifest records each forced dep once, and
no invocation produces masks/index without node_bits+membership.

36/36 pass.

Refs #305 #306
---
 tests/test_frontend_derived.py | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/tests/test_frontend_derived.py b/tests/test_frontend_derived.py
index 42ac846..8ef109c 100644
--- a/tests/test_frontend_derived.py
+++ b/tests/test_frontend_derived.py
@@ -816,15 +816,21 @@ def test_sample_facet_index_only_auto_pairs_bundle(tmp_path):
     build_index_fixture(wide, vocab)
     # 1) full build so the validator's non-index siblings (facets_v2, h3, …) exist
     assert _build_index(tmp_path, wide, vocab).returncode == 0
+    # delete the bundle the index-only rebuild must RE-EMIT, so a stale full-build
+    # file can't satisfy the existence checks (Codex r4: prove re-emission, not
+    # leftover presence).
+    bundle = ("t_sample_facet_index.parquet", "t_facet_node_bits.parquet",
+              "t_sample_facet_membership.parquet")
+    for sib in bundle:
+        os.remove(tmp_path / sib)
     # 2) rebuild ONLY the index into the same dir
     r = subprocess.run([sys.executable, BUILD, "--wide", wide, "--outdir", str(tmp_path), "--tag", "t",
                         "--no-manifest", "--vocab-labels", vocab,
                         "--only", "sample_facet_index"], capture_output=True, text=True)
     assert r.returncode == 0, f"{r.stdout}\n{r.stderr}"
     # the bundle the index needs to be validatable was (re)emitted, not orphaned
-    for sib in ("t_sample_facet_index.parquet", "t_facet_node_bits.parquet",
-                "t_sample_facet_membership.parquet"):
-        assert (tmp_path / sib).exists(), f"--only sample_facet_index did not emit {sib}:\n{r.stdout}"
+    for sib in bundle:
+        assert (tmp_path / sib).exists(), f"--only sample_facet_index did not re-emit {sib}:\n{r.stdout}"
     # 3) the rebuilt artifacts validate against the existing full-build siblings
     v = subprocess.run([sys.executable, VALIDATE, "--dir", str(tmp_path), "--tag", "t", "--min-rows", "1"],
                        capture_output=True, text=True)

From 39dc101f20c8312d80d17cae656ba9cf0e0634d0 Mon Sep 17 00:00:00 2001
From: Raymond Yee <raymond.yee@gmail.com>
Date: Sat, 20 Jun 2026 06:10:17 -0700
Subject: [PATCH 07/14] #305/#306: keep membership_build_id format-compatible
 with deployed node_bits
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The Codex-r1 hardening switched membership_build_id to the XOR+SUM+COUNT trio —
but that id is a DEPLOYED CONTRACT: the live 202608 facet_node_bits/sample_facet_masks
carry it as a bare bit_xor decimal (e.g. 11317573279759780618), and the explorer's
facetIndexReady preflight matches the index's membership-half against the deployed
node_bits.build_id. The trio format would never match, leaving multi-filter counts
permanently "unavailable" against already-published artifacts.

Revert membership_build_id to the bare order-independent XOR (grain is unique →
no cancellation, as the original argued). Keep the richer trio ONLY for
coverage_build_id (a NEW id, no compat constraint). Index build_id is therefore
"<m_xor>:<c_xor>_<c_sum>_<c_cnt>". Validator recomputes the membership half with
bare XOR (_xor_fp) and the well-formedness regex matches the asymmetric shape.

Result: a freshly-built index's membership-half == the DEPLOYED node_bits.build_id,
so the index can be published standalone (new filename, non-destructive) without
republishing node_bits/masks/membership. 36/36 pass.

Refs #305 #306
---
 scripts/build_frontend_derived.py    | 22 +++++++++++++++++-----
 scripts/validate_frontend_derived.py | 19 ++++++++++++-------
 2 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/scripts/build_frontend_derived.py b/scripts/build_frontend_derived.py
index bb1ac3c..dd96baf 100755
--- a/scripts/build_frontend_derived.py
+++ b/scripts/build_frontend_derived.py
@@ -539,8 +539,19 @@ def membership_build_id(con):
     # change that would alter either artifact (new/dropped pids, re-mapped concepts,
     # AND node-set changes). Embedding this id in both lets the explorer refuse the
     # mask path unless the two are from the SAME generation (guards a stale-cached
-    # masks file). membership grain is unique per (pid,facet_type,concept_uri).
-    return _fingerprint(con, "membership", MEMBERSHIP_TOKEN_EXPR)
+    # masks file).
+    #
+    # FORMAT IS A DEPLOYED CONTRACT (#305): the live facet_node_bits / sample_facet_masks
+    # (202608) carry this id as a BARE bit_xor decimal, and the explorer's facetIndexReady
+    # preflight matches the index's membership-half against the DEPLOYED node_bits.build_id.
+    # So this MUST stay the plain order-independent XOR — do NOT switch it to the
+    # _fingerprint trio (that would change the string and break generation-matching
+    # against every already-published artifact). membership grain is unique per
+    # (pid,facet_type,concept_uri) — validated — so no XOR cancellation. The richer
+    # trio is reserved for coverage_build_id, which is a NEW id with no compat constraint.
+    return con.sql(
+        f"SELECT CAST(COALESCE(bit_xor(hash({MEMBERSHIP_TOKEN_EXPR})), 0) AS VARCHAR) FROM membership"
+    ).fetchone()[0]
 
 
 def coverage_build_id(con):
@@ -563,9 +574,10 @@ def index_build_id(con):
     #      generation), and
     #   2. detect coverage/source drift via the coverage half (would otherwise be
     #      invisible to a membership-only id).
-    # Each half is "<xor>_<sum>_<count>" (decimal + underscore only); ':' never
-    # appears in either half, so split(':', 1) is unambiguous. node_bits/masks keep
-    # their plain membership id (== the membership half here, for the match gate).
+    # FORMATS DIFFER BY DESIGN: membership half is the BARE bit_xor decimal (the
+    # deployed node_bits/masks contract — see membership_build_id), coverage half is
+    # the richer "<xor>_<sum>_<count>" trio (a new id). ':' appears in neither half
+    # (decimals + underscores only), so split(':', 1) is unambiguous.
     return f"{membership_build_id(con)}:{coverage_build_id(con)}"
 
 
diff --git a/scripts/validate_frontend_derived.py b/scripts/validate_frontend_derived.py
index d4aac72..754c9d0 100755
--- a/scripts/validate_frontend_derived.py
+++ b/scripts/validate_frontend_derived.py
@@ -495,12 +495,16 @@ def _opt(name, attr):
         from build_frontend_derived import (INDEX_SCHEMA_VERSION, MEMBERSHIP_TOKEN_EXPR,
                                              COVERAGE_TOKEN_EXPR)
 
-        def _fp(relation, token_expr):  # mirror build_frontend_derived._fingerprint
+        def _fp(relation, token_expr):  # mirror build_frontend_derived._fingerprint (coverage trio)
             x, s, n = con.sql(
                 f"SELECT COALESCE(bit_xor(hash({token_expr})), 0), "
                 f"COALESCE(SUM(hash({token_expr})::HUGEINT), 0), COUNT(*) FROM {relation}").fetchone()
             return f"{x}_{s}_{n}"
 
+        def _xor_fp(relation, token_expr):  # mirror membership_build_id (bare XOR — deployed contract)
+            return str(con.sql(
+                f"SELECT CAST(COALESCE(bit_xor(hash({token_expr})), 0) AS VARCHAR) FROM {relation}").fetchone()[0])
+
         IX = f"read_parquet('{index}')"
         # GATING PRECONDITION (Codex r2 #1): the index cannot be fully validated
         # without BOTH sample_facet_membership and facet_node_bits — they are what
@@ -548,15 +552,16 @@ def _fp(relation, token_expr):  # mirror build_frontend_derived._fingerprint
         ix_sv_bad = scalar(f"SELECT COUNT(*) FROM {IX} WHERE schema_version IS DISTINCT FROM {INDEX_SCHEMA_VERSION}")
         check(f"index: schema_version == {INDEX_SCHEMA_VERSION}", ix_sv_bad == 0,
               f"{ix_sv_bad} rows with schema_version != {INDEX_SCHEMA_VERSION}")
-        # build_id must be a SINGLE value of the EXACT shape "<m_xor>_<m_sum>_<m_cnt>:<c_xor>_<c_sum>_<c_cnt>"
-        # — exactly two ':'-separated halves, each exactly three '_'-separated decimals.
-        # (Codex #2: "a colon somewhere" accepted "<m>:bogus:extra".)
+        # build_id must be a SINGLE value of the EXACT shape "<m_xor>:<c_xor>_<c_sum>_<c_cnt>"
+        # — the membership half is the BARE bit_xor decimal (deployed node_bits contract),
+        # the coverage half is the three '_'-separated decimal trio. (Codex #2: "a colon
+        # somewhere" accepted "<m>:bogus:extra".)
         ix_bids = scalar(f"SELECT COUNT(DISTINCT build_id) FROM {IX}")
         check("index: single build_id", ix_bids == 1, f"{ix_bids} distinct build_ids (want 1)")
-        bid_re = r'^[0-9]+_[0-9]+_[0-9]+:[0-9]+_[0-9]+_[0-9]+$'
+        bid_re = r'^[0-9]+:[0-9]+_[0-9]+_[0-9]+$'
         ix_bid_fmt = scalar(f"SELECT COUNT(*) FROM {IX} WHERE regexp_matches(build_id, '{bid_re}') = FALSE")
         check("index: build_id is well-formed '<membership>:<coverage>'", ix_bid_fmt == 0,
-              "build_id(s) not exactly <xor>_<sum>_<cnt>:<xor>_<sum>_<cnt>")
+              "build_id(s) not exactly <m_xor>:<c_xor>_<c_sum>_<c_cnt>")
         # INDEPENDENT build_id recomputation (Codex #1/#5): the coverage half is
         # recomputed from facets_v2 (the located (pid, source) universe — equivalent
         # to samp_geo) and, when membership is present, the membership half from it.
@@ -570,7 +575,7 @@ def _fp(relation, token_expr):  # mirror build_frontend_derived._fingerprint
             check("index build_id coverage-half == recomputed from facets_v2", got_cov == exp_cov,
                   f"coverage half {got_cov!r} != expected {exp_cov!r} (stale/edited coverage)")
             if mem:
-                exp_mem = _fp(f"read_parquet('{mem}')", MEMBERSHIP_TOKEN_EXPR)
+                exp_mem = _xor_fp(f"read_parquet('{mem}')", MEMBERSHIP_TOKEN_EXPR)  # bare XOR (deployed contract)
                 got_mem = ix_bid.split(":", 1)[0]
                 check("index build_id membership-half == recomputed from membership", got_mem == exp_mem,
                       f"membership half {got_mem!r} != expected {exp_mem!r} (stale/foreign generation)")

From 7b032f738f1667465f1b4c5bf0a3b3625ac1ab95 Mon Sep 17 00:00:00 2001
From: Raymond Yee <raymond.yee@gmail.com>
Date: Sat, 20 Jun 2026 05:23:39 -0700
Subject: [PATCH 08/14] #304/#305 Phase 2: multi-filter facet counts at global
 view via mask index
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The single-filter cube (#298) serves correct cross-filtered counts for exactly
ONE active filter at global view. With 2+ filters, effectiveSingleFilter()
returns null, the cube is skipped, and updateCrossFilteredCounts falls to the
legacy early-return that reverts every tree dim to the UNFILTERED baseline
(explorer.qmd) — the #304 bug (e.g. Material=anthropogenic + Object=artifact
showed 261,086 artifacts instead of the cross-filtered 145,770).

Fix: route the global / full-tree-mode / no-search / multi-filter case through
the complete per-pid index (sample_facet_index, #305/#306) with the node_bits
bitmask predicate:
- directFilterSnapshot(): reads ALL selections from the controls WITHOUT zeroing
  tree dims at global view (that zeroing in describeCrossFilters is the root cause).
- maskIndexWhere(): per-dim cross-filter predicate (OR within a dim via the bit
  mask, AND across dims, exclude-self); source-impossible zeroes tree targets but
  not source's own histogram (the plan's special case).
- applyMaskIndexCounts(): one columnar bitmask query per dim, applied ATOMICALLY
  after a single facetCountsReqId stale check; COUNT(*) over the one-row-per-pid
  index == distinct pids.
- HONESTY RULE: on query failure → 'unavailable' (a "(—)" dash, .count-unavailable),
  NEVER the baseline. If the index/bitmap isn't usable yet (unpublished) →
  'fallthrough' to the legacy paths unchanged, so this is safe to ship first.
- facetIndexReady preflight: index present + schema_version==1 + build_id
  membership-half == node_bits generation; else stays unavailable, not wrong.

Single-filter cube kept as a pure optimization. Viewport + search still take the
live membership path (Phase 3).

Verified on real 202608 data: index covers 6,026,242 located pids vs masks'
5,996,325 — exactly the 29,917 #306 samples recovered; full validator passes;
Eric's case returns 145,770 (cross-filtered) not 261,086 (baseline).

Refs #305 #304 #306
---
 explorer.qmd | 205 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 205 insertions(+)

diff --git a/explorer.qmd b/explorer.qmd
index d52666d..122b13d 100644
--- a/explorer.qmd
+++ b/explorer.qmd
@@ -578,6 +578,9 @@ format:
     .facet-row.zero { opacity: 0.4; }
     .facet-row.zero:hover { opacity: 0.65; }
     .facet-count.recomputing { opacity: 0.55; font-style: italic; }
+    /* #304/#305: counts can't be computed for this filter combo (index
+       unpublished/inconsistent or query failed) — honest dash, NOT a baseline. */
+    .facet-count.count-unavailable { opacity: 0.5; font-style: italic; cursor: help; }
     /* #281/#282 Material tree (preview) — compact rows */
     .filter-body .facet-treeroot { font-weight: 600; font-size: 12px; line-height: 1.5; padding: 0 0 2px; color: #444; }
     .filter-body .facet-treenode { font-size: 12px; line-height: 1.5; }
@@ -816,6 +819,15 @@ tree_cross_filter_url = `${R2_BASE}/isamples_202608_facet_tree_cross_filter.parq
 // facetFilterSQL falls back to the membership scan (no regression).
 masks_url = `${R2_BASE}/isamples_202608_sample_facet_masks.parquet`
 node_bits_url = `${R2_BASE}/isamples_202608_facet_node_bits.parquet`
+// #304/#305 complete per-pid facet index: one row per LOCATED sample (incl. the
+// ~29,917 with no tree membership that sample_facet_masks omits, #306) + source.
+// The multi-filter global-view count path scans this with the node_bits bitmask
+// predicate so 2+ active filters get correct counts instead of reverting to the
+// unfiltered baseline (#304). Best-effort: if it isn't published/consistent,
+// facetIndexReady stays false and the multi-filter path shows "unavailable"
+// rather than a misleading baseline (the honesty rule — never baseline under
+// active filters).
+index_url = `${R2_BASE}/isamples_202608_sample_facet_index.parquet`
 
 // Canonical palette — see issue #113. Path-relative so this works under
 // both isamples.org (custom domain at root) and project-pages fork
@@ -1390,6 +1402,8 @@ function applyFacetCounts(facetKey, countsMap) {
         }
         el.textContent = `(${Number(count).toLocaleString()})`;
         el.classList.remove('recomputing');
+        el.classList.remove('count-unavailable');   // #304/#305: real count clears the dash
+        el.removeAttribute('title');
 
         const row = document.querySelector(`.facet-row[data-facet="${facetKey}"][data-value="${CSS.escape(value)}"]`);
         if (row) row.classList.toggle('zero', count === 0);
@@ -1400,6 +1414,23 @@ function markFacetCountsRecomputing() {
     document.querySelectorAll('.facet-count').forEach(el => el.classList.add('recomputing'));
 }
 
+// #304/#305: the honesty rule — when active filters are in play but we cannot
+// compute correct cross-filtered counts (the complete index is unpublished /
+// inconsistent, or a count query failed), we must NOT fall back to the unfiltered
+// baseline (that misleading baseline IS the #304 bug). Show an explicit
+// "unavailable" dash instead. Scoped to the given dim keys (default: all).
+function markFacetCountsUnavailable(facetKeys = null) {
+    const keys = facetKeys || ['source', 'material', 'context', 'object_type'];
+    for (const key of keys) {
+        document.querySelectorAll(`.facet-count[data-facet="${key}"]`).forEach(el => {
+            el.textContent = '(—)';
+            el.classList.remove('recomputing');
+            el.classList.add('count-unavailable');
+            el.title = 'Count unavailable for this filter combination';
+        });
+    }
+}
+
 // === URL State: encode/decode globe state in hash fragment ===
 // (decode side — parseNum + readHash — extracted to assets/js/explorer-utils.js, PR3)
 function buildHash(v) {
@@ -1804,10 +1835,55 @@ nodeBitsReady = {
             return false;
         }
         window.__nodeBits = map;
+        // #304/#305: stash the node_bits generation id so the facet-index preflight
+        // can require the index's mask bits be interpreted under THIS assignment.
+        window.__nodeBitsBuild = nbBuild;
         return true;
     } catch (err) {
         console.warn('node_bits/masks preflight failed; facetFilterSQL will use the membership fallback:', err);
         window.__nodeBits = null;
+        window.__nodeBitsBuild = null;
+        return false;
+    }
+}
+```
+
+```{ojs}
+//| echo: false
+//| output: false
+// #304/#305: preflight the complete per-pid facet index (sample_facet_index). The
+// multi-filter global-view count path scans it; it must be (a) present/readable,
+// (b) schema_version we understand, and (c) from the SAME node_bits generation as
+// the bit map (its build_id's membership half == window.__nodeBitsBuild) so the
+// mask bits mean what we think. Only then advertise readiness; otherwise the
+// multi-filter path shows "unavailable" instead of a baseline (honesty rule).
+// Depends on nodeBitsReady so window.__nodeBitsBuild is populated first.
+facetIndexReady = {
+    const _ = nodeBitsReady;          // sequence after the node_bits preflight
+    window.__facetIndexReady = false;
+    const INDEX_SCHEMA_VERSION = 1;   // mirrors build_frontend_derived.INDEX_SCHEMA_VERSION
+    try {
+        const nbBuild = window.__nodeBitsBuild;
+        if (!nbBuild) return false;   // no usable bit map → index path can't run
+        const rows = await db.query(
+            `SELECT DISTINCT build_id, schema_version FROM read_parquet('${index_url}')`);
+        if (!rows || rows.length !== 1) return false;           // missing / mixed generations
+        const sv = Number(rows[0].schema_version);
+        if (sv !== INDEX_SCHEMA_VERSION) {
+            console.warn('sample_facet_index schema_version unsupported; multi-filter counts unavailable', sv);
+            return false;
+        }
+        const membershipHalf = String(rows[0].build_id).split(':', 1)[0];
+        if (membershipHalf !== nbBuild) {
+            console.warn('sample_facet_index/node_bits generation mismatch; multi-filter counts unavailable',
+                         { indexMembershipHalf: membershipHalf, nbBuild });
+            return false;
+        }
+        window.__facetIndexReady = true;
+        return true;
+    } catch (err) {
+        console.warn('sample_facet_index preflight failed; multi-filter global counts will show unavailable:', err);
+        window.__facetIndexReady = false;
         return false;
     }
 }
@@ -3422,6 +3498,116 @@ zoomWatcher = {
         }
     }
 
+    // #304/#305: a DIRECT snapshot of every active selection read from the controls —
+    // unlike describeCrossFilters(), it does NOT zero tree selections at global view
+    // (that zeroing is the #304 root cause). Returns per-dim selected values (source
+    // strings; tree concept_uris), an active-dim count, and the source-impossible
+    // flag. Tree dims contribute only in full tree mode (flat dims keep facets_v3
+    // paths). source: [] means "no constraint" (all or none selected — disambiguated
+    // by sourceImpossible).
+    function directFilterSnapshot() {
+        const sourceTotal = document.querySelectorAll('#sourceFilter input[type="checkbox"]').length;
+        const sources = getActiveSources();
+        const sourceImpossible = sourceTotal > 0 && sources.length === 0;
+        // a partial source selection is a real constraint; all/none → no positive list
+        const sourceVals = (sources.length > 0 && sources.length < sourceTotal) ? sources : [];
+        const snap = { source: sourceVals, sourceImpossible, material: [], context: [], object_type: [] };
+        for (const key of TREE_DIM_KEYS) {
+            if (treeActive(key)) snap[key] = treeSelection(key) || [];
+        }
+        // dims that impose a constraint (drives "is there anything to cross-filter?")
+        let active = sourceImpossible || sourceVals.length > 0 ? 1 : 0;
+        for (const key of TREE_DIM_KEYS) if (snap[key].length > 0) active++;
+        snap.activeCount = active;
+        return snap;
+    }
+
+    // #304/#305: build the cross-filter WHERE conditions over sample_facet_index
+    // (alias-free; columns resolve to the index) for the OTHER active dims, EXCLUDING
+    // `excludeKey` (exclude-self). Tree dims → `(<dim>_mask & <orMask>) <> 0` from the
+    // node_bits bit map; source → `source IN (...)`. Returns null if a tree selection
+    // can't be turned into a mask (bit map absent / unknown node) so the caller can
+    // bail honestly instead of under-constraining. A source-impossible state zeroes
+    // every NON-source target (the source dim's own histogram still computes).
+    function maskIndexWhere(snap, excludeKey) {
+        const bits = (typeof window !== 'undefined') ? window.__nodeBits : null;
+        const conds = [];
+        for (const key of TREE_DIM_KEYS) {
+            if (key === excludeKey) continue;
+            const sel = snap[key];
+            if (!sel || sel.length === 0) continue;
+            if (!bits || !bits[key]) return null;
+            let mask = 0n;
+            for (const uri of sel) {
+                const idx = bits[key].get(uri);
+                if (idx === undefined) return null;           // unknown node → bail
+                mask |= (1n << BigInt(idx));
+            }
+            if (mask === 0n) return null;
+            conds.push(`(${key}_mask & ${mask}) <> 0`);        // BigInt → decimal BIGINT literal
+        }
+        if (excludeKey !== 'source') {
+            if (snap.sourceImpossible) return ['1=0'];          // no source selectable → 0 for tree targets
+            if (snap.source.length > 0) {
+                conds.push(`source IN (${snap.source.map(v => `'${escSql(v)}'`).join(',')})`);
+            }
+        }
+        return conds;
+    }
+
+    // #304/#305: compute multi-filter cross-filtered counts for ALL dims off the
+    // complete per-pid index and apply them ATOMICALLY after a single stale check.
+    // Returns: 'ok' (applied), 'superseded' (a newer request won), 'unavailable'
+    // (index/bitmap usable but a query failed → honesty: caller shows the dash), or
+    // 'fallthrough' (index/bitmap not usable → caller uses legacy paths).
+    async function applyMaskIndexCounts(snap, dims, myReq) {
+        if (!window.__facetIndexReady || !window.__nodeBits) return 'fallthrough';
+        try {
+            const results = await Promise.all(dims.map(async (d) => {
+                const conds = maskIndexWhere(snap, d.key);
+                if (conds === null) throw { __fallthrough: true };  // can't build masks → legacy
+                const whereSQL = conds.length ? `WHERE ${conds.join(' AND ')}` : '';
+                let rows;
+                if (d.key === 'source') {
+                    // source target: COUNT(*) per source over the index (one row/pid),
+                    // constrained by the tree dims only (exclude-self).
+                    const w = conds.length ? `${whereSQL} AND source IS NOT NULL` : 'WHERE source IS NOT NULL';
+                    rows = await db.query(`
+                        SELECT source AS value, COUNT(*) AS count
+                        FROM read_parquet('${index_url}') s
+                        ${w}
+                        GROUP BY source
+                    `);
+                } else {
+                    // tree target: JOIN node_bits; each (pid,node) the pid is a member
+                    // of yields one row, so COUNT(*) per concept_uri == distinct pids
+                    // with that bit. Membership encodes ancestor closure → parent ≥ child.
+                    rows = await db.query(`
+                        SELECT nb.concept_uri AS value, COUNT(*) AS count
+                        FROM read_parquet('${index_url}') s
+                        JOIN read_parquet('${node_bits_url}') nb
+                          ON nb.facet_type = '${d.key}'
+                         AND (s.${d.key}_mask & (1::BIGINT << nb.bit_index)) <> 0
+                        ${whereSQL}
+                        GROUP BY nb.concept_uri
+                    `);
+                }
+                const map = new Map();
+                for (const r of rows) map.set(r.value, Number(r.count));
+                return { key: d.key, map };
+            }));
+            if (myReq !== facetCountsReqId) return 'superseded';
+            // atomic repaint: all dims at once, only after the single stale check
+            for (const { key, map } of results) applyFacetCounts(key, map);
+            return 'ok';
+        } catch (err) {
+            if (myReq !== facetCountsReqId) return 'superseded';
+            if (err && err.__fallthrough) return 'fallthrough';
+            console.warn('mask-index cross-filter counts failed; counts unavailable (not baseline):', err);
+            return 'unavailable';
+        }
+    }
+
     async function updateCrossFilteredCounts(myReq) {
         if (myReq !== facetCountsReqId) return;
         const { dims, activeDims, totalActiveValues, sourceImpossible } = describeCrossFilters();
@@ -3490,6 +3676,25 @@ zoomWatcher = {
             if (myReq !== facetCountsReqId) return;
         }
 
+        // #304/#305: MULTI-filter (or multi-value single-dim, or source-impossible,
+        // or a single-filter cube MISS) at global view + full tree mode + no search.
+        // The single-filter cube above can't serve these, and the legacy slow path
+        // below reverts every tree dim to the unfiltered BASELINE at global view
+        // (the #304 bug). Route through the complete per-pid index instead — correct
+        // multi-filter counts in one columnar bitmask scan, applied atomically.
+        // HONESTY RULE: on any failure show "unavailable", NEVER the baseline. A
+        // 'fallthrough' (index/bitmap not usable, e.g. not yet published) defers to
+        // the legacy paths unchanged so this is safe to ship before the index lands.
+        if (bboxSQL === null && !searchIsActive() && TREE_DIM_KEYS.every(treeActive)) {
+            const snap = directFilterSnapshot();
+            if (snap.activeCount >= 1) {
+                const res = await applyMaskIndexCounts(snap, dims, myReq);
+                if (res === 'ok' || res === 'superseded') return;
+                if (res === 'unavailable') { markFacetCountsUnavailable(); return; }
+                // res === 'fallthrough' → continue to the legacy paths below
+            }
+        }
+
         // Baseline early-return only applies when there is no filter AND no
         // spatial constraint. In a non-global view with no facet filter, B1
         // still wants per-value counts scoped to what's visible — fall

From 6401ea0046d6669f7cfb1f1330bf62d87c3fd1a0 Mon Sep 17 00:00:00 2001
From: Raymond Yee <raymond.yee@gmail.com>
Date: Sat, 20 Jun 2026 05:32:50 -0700
Subject: [PATCH 09/14] #304/#305 Phase 2: address Codex review (honesty, boot
 race, coverage, decoupling)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- P1 fallthrough reintroduced #304: at global + full tree + active filters there
  is NO correct legacy path (it reverts to baseline), so 'unavailable' AND
  'fallthrough' now both show the dash, never legacy baseline. + defensive stale
  check before markFacetCountsUnavailable.
- P1 boot race: facetIndexReady now calls window.__onFacetIndexReady() when it
  settles true; the main script registers it to refreshFacetCounts() if filters
  are active, so a URL-restored multi-filter that computed before readiness
  reconciles to correct counts instead of staying on the dash.
- P1/P2 runtime coverage handshake: facetIndexReady now also verifies the index
  row count == facets (located universe) — catches stale/partial/mixed-cache
  indexes (the #306 drift class a membership-only id is blind to). Full
  fingerprint stays a build-time gate (SERIALIZATIONS §4.12).
- P2 decouple node_bits from masks: nodeBitsReady publishes __nodeBitsMap +
  __nodeBitsBuild as soon as node_bits is valid (count path needs only these);
  __nodeBits (masks-gated) stays for facetFilterSQL. A missing masks file no
  longer disables a valid count bundle. Count path now reads __nodeBitsMap.
- P2 WASM concurrency: the 4 per-dim queries are now ONE UNION ALL statement
  (one connection slot) instead of 4 concurrent full-index scans.
- P2 SQL injection: R2_BASE (user-controllable via ?data_base=/localStorage) is
  rejected if it contains a quote/backslash/control char or isn't http(s)/root-
  relative, so it can't break out of read_parquet('...') in any query.

Re-verified on real 202608 data: single UNION ALL query returns Eric's case as
145,770 cross-filtered across all dims (vs 261,086 baseline).

Codex confirmed sound: COUNT(*) per concept == distinct pids; exclude-self /
OR-within / AND-across / source-impossible / multi-value-single-dim semantics;
single post-Promise stale check. WASM cold/warm browser benchmark remains the
Phase 4 release gate.

Refs #305 #304 #306
---
 explorer.qmd | 164 ++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 110 insertions(+), 54 deletions(-)

diff --git a/explorer.qmd b/explorer.qmd
index 122b13d..0d2f8b3 100644
--- a/explorer.qmd
+++ b/explorer.qmd
@@ -752,9 +752,20 @@ Cesium.Ion.defaultAccessToken = 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJqdGkiOi
 //   localStorage ISAMPLES_DATA_BASE  (sticky per-browser override)
 // Defaults to the production R2 origin, so shipped builds are unchanged.
 R2_BASE = (() => {
+    const DEFAULT = "https://data.isamples.org";
     const raw = new URLSearchParams(location.search).get('data_base')
         || (typeof localStorage !== 'undefined' && localStorage.getItem('ISAMPLES_DATA_BASE'))
-        || "https://data.isamples.org";
+        || DEFAULT;
+    // The base is interpolated into `read_parquet('${url}/...')` in many queries, so
+    // a value containing a single quote (or backslash) could break out of the SQL
+    // string literal (Codex P2 — data_base is user-controllable via ?data_base= /
+    // localStorage). Reject any such value and fall back to the default. Also require
+    // an http(s) or root-relative form (httpfs only range-fetches absolute http(s);
+    // a relative is resolved against origin below).
+    if (/['"\\\x00-\x1f]/.test(raw) || !(/^https?:\/\//.test(raw) || raw.startsWith('/'))) {
+        if (raw !== DEFAULT) console.warn('Ignoring unsafe/invalid data_base override; using default origin:', raw);
+        return DEFAULT;
+    }
     // DuckDB-WASM's httpfs only range-fetches ABSOLUTE http(s) URLs. A root-
     // relative override (?data_base=/data) is read as a virtual-filesystem
     // glob and fails with "No files found that match the pattern" — silently,
@@ -1806,7 +1817,17 @@ db = {
 // null and facetFilterSQL falls back to the membership scan — so this is safe to
 // ship before sample_facet_masks / facet_node_bits are published.
 nodeBitsReady = {
+    // Two SEPARATE readiness signals (Codex P2 #4 — they were wrongly coupled):
+    //   __nodeBitsMap + __nodeBitsBuild → the concept_uri→bit map + its generation.
+    //       Valid as soon as node_bits itself is. The #304/#305 COUNT path
+    //       (sample_facet_index) needs ONLY this, not masks.
+    //   __nodeBits → the SAME map, but advertised only after the masks file
+    //       preflights & generation-matches. facetFilterSQL's mask FILTER path reads
+    //       masks_url, so it must not run until masks is proven present/consistent.
+    // A missing/mismatched masks file must NOT disable the (valid) count bundle.
     window.__nodeBits = null;
+    window.__nodeBitsMap = null;
+    window.__nodeBitsBuild = null;
     try {
         const rows = await db.query(
             `SELECT facet_type, concept_uri, bit_index, build_id FROM read_parquet('${node_bits_url}')`);
@@ -1821,27 +1842,30 @@ nodeBitsReady = {
         // let a last-row-wins value coincidentally match masks — Codex r2).
         if (!haveBits || nbBuilds.size !== 1) return false;
         const nbBuild = [...nbBuilds][0];
+        // node_bits is valid on its own → publish the bit map + generation NOW so the
+        // count path (which depends only on node_bits + the index) can run even if
+        // masks is absent. facetIndexReady consumes window.__nodeBitsBuild.
+        window.__nodeBitsMap = map;
+        window.__nodeBitsBuild = nbBuild;
         // Codex P1.1 + P1.2: PREFLIGHT the masks file AND require its build_id to
         // match node_bits. This proves masks is present/readable (else the mask
-        // query path would fail with no retry) and that both artifacts are from the
-        // SAME generation (else a stale-cached masks file would map bits wrong).
-        // Only then advertise readiness; otherwise facetFilterSQL uses membership.
+        // FILTER path would fail with no retry) and that both artifacts are from the
+        // SAME generation. Only then advertise __nodeBits for facetFilterSQL;
+        // otherwise facetFilterSQL uses the membership fallback (count path unaffected).
         const mrows = await db.query(
             `SELECT DISTINCT build_id FROM read_parquet('${masks_url}')`);
         const maskBuilds = mrows.map(r => r.build_id);
         if (maskBuilds.length !== 1 || maskBuilds[0] !== nbBuild) {
             console.warn('masks/node_bits build_id mismatch or missing; using membership fallback',
                          { nbBuild, maskBuilds });
-            return false;
+            return false;   // count path stays enabled via __nodeBitsMap
         }
         window.__nodeBits = map;
-        // #304/#305: stash the node_bits generation id so the facet-index preflight
-        // can require the index's mask bits be interpreted under THIS assignment.
-        window.__nodeBitsBuild = nbBuild;
         return true;
     } catch (err) {
-        console.warn('node_bits/masks preflight failed; facetFilterSQL will use the membership fallback:', err);
+        console.warn('node_bits preflight failed; facetFilterSQL + count path use fallbacks:', err);
         window.__nodeBits = null;
+        window.__nodeBitsMap = null;
         window.__nodeBitsBuild = null;
         return false;
     }
@@ -1853,11 +1877,15 @@ nodeBitsReady = {
 //| output: false
 // #304/#305: preflight the complete per-pid facet index (sample_facet_index). The
 // multi-filter global-view count path scans it; it must be (a) present/readable,
-// (b) schema_version we understand, and (c) from the SAME node_bits generation as
-// the bit map (its build_id's membership half == window.__nodeBitsBuild) so the
-// mask bits mean what we think. Only then advertise readiness; otherwise the
-// multi-filter path shows "unavailable" instead of a baseline (honesty rule).
-// Depends on nodeBitsReady so window.__nodeBitsBuild is populated first.
+// (b) schema_version we understand, (c) from the SAME node_bits generation as the
+// bit map (build_id membership half == window.__nodeBitsBuild) so the mask bits
+// mean what we think, AND (d) COVERAGE-COMPLETE — its pid count equals the located
+// universe (facets) so a stale/incomplete index (the #306 drift class, which a
+// membership-only id is blind to) can't silently undercount. The full coverage
+// fingerprint is enforced at BUILD time (validate_frontend_derived); this is the
+// cheap runtime handshake the plan calls for (SERIALIZATIONS §4.12). Only then
+// advertise readiness; otherwise the multi-filter path shows "unavailable" instead
+// of a baseline (honesty rule). Depends on nodeBitsReady for __nodeBitsBuild.
 facetIndexReady = {
     const _ = nodeBitsReady;          // sequence after the node_bits preflight
     window.__facetIndexReady = false;
@@ -1879,7 +1907,23 @@ facetIndexReady = {
                          { indexMembershipHalf: membershipHalf, nbBuild });
             return false;
         }
+        // (d) runtime coverage handshake: the index must cover the SAME located
+        // universe the counts are about. Compare its row count to facets (the
+        // located set). A stale/partial index (fewer pids) or a mixed-cache
+        // deployment fails here rather than returning convincing-but-low counts.
+        const cov = await db.query(
+            `SELECT (SELECT COUNT(*) FROM read_parquet('${index_url}')) AS idx_n,
+                    (SELECT COUNT(*) FROM read_parquet('${facets_url}')) AS facets_n`);
+        const idxN = Number(cov[0].idx_n), facetsN = Number(cov[0].facets_n);
+        if (idxN !== facetsN) {
+            console.warn('sample_facet_index coverage != located universe; multi-filter counts unavailable',
+                         { idxN, facetsN });
+            return false;
+        }
         window.__facetIndexReady = true;
+        // readiness can flip true AFTER boot already computed counts once (lost the
+        // async race) — reconcile any active multi-filter now (Codex P1 boot race).
+        if (typeof window.__onFacetIndexReady === 'function') window.__onFacetIndexReady();
         return true;
     } catch (err) {
         console.warn('sample_facet_index preflight failed; multi-filter global counts will show unavailable:', err);
@@ -3530,7 +3574,9 @@ zoomWatcher = {
     // bail honestly instead of under-constraining. A source-impossible state zeroes
     // every NON-source target (the source dim's own histogram still computes).
     function maskIndexWhere(snap, excludeKey) {
-        const bits = (typeof window !== 'undefined') ? window.__nodeBits : null;
+        // #304/#305: the count path uses __nodeBitsMap (valid as soon as node_bits is)
+        // — NOT __nodeBits, which is gated on the masks FILTER artifact (Codex #4).
+        const bits = (typeof window !== 'undefined') ? window.__nodeBitsMap : null;
         const conds = [];
         for (const key of TREE_DIM_KEYS) {
             if (key === excludeKey) continue;
@@ -3561,48 +3607,45 @@ zoomWatcher = {
     // (index/bitmap usable but a query failed → honesty: caller shows the dash), or
     // 'fallthrough' (index/bitmap not usable → caller uses legacy paths).
     async function applyMaskIndexCounts(snap, dims, myReq) {
-        if (!window.__facetIndexReady || !window.__nodeBits) return 'fallthrough';
+        if (!window.__facetIndexReady || !window.__nodeBitsMap) return 'fallthrough';
+        // Build ONE query (UNION ALL of per-dim arms) rather than 4 concurrent
+        // full-index scans — the documented DuckDB-WASM single-connection
+        // starvation risk (Codex P2). One statement = one connection slot. Each arm
+        // carries its OWN exclude-self cross-filter WHERE. Common output shape
+        // (dim, value, count) so one pass groups them.
+        const arms = [];
+        for (const d of dims) {
+            const conds = maskIndexWhere(snap, d.key);
+            if (conds === null) return 'unavailable';   // selected node missing a bit → can't compute honestly
+            const whereSQL = conds.length ? `WHERE ${conds.join(' AND ')}` : '';
+            if (d.key === 'source') {
+                // source target: COUNT(*) per source over the index (one row/pid),
+                // constrained by the tree dims only (exclude-self).
+                const w = conds.length ? `${whereSQL} AND source IS NOT NULL` : 'WHERE source IS NOT NULL';
+                arms.push(`SELECT 'source' AS dim, source AS value, COUNT(*) AS count
+                           FROM read_parquet('${index_url}') s ${w} GROUP BY source`);
+            } else {
+                // tree target: JOIN node_bits; each (pid,node) the pid is a member of
+                // yields one row, so COUNT(*) per concept_uri == distinct pids with
+                // that bit. Membership encodes ancestor closure → parent ≥ child.
+                arms.push(`SELECT '${d.key}' AS dim, nb.concept_uri AS value, COUNT(*) AS count
+                           FROM read_parquet('${index_url}') s
+                           JOIN read_parquet('${node_bits_url}') nb
+                             ON nb.facet_type = '${d.key}'
+                            AND (s.${d.key}_mask & (1::BIGINT << nb.bit_index)) <> 0
+                           ${whereSQL} GROUP BY nb.concept_uri`);
+            }
+        }
         try {
-            const results = await Promise.all(dims.map(async (d) => {
-                const conds = maskIndexWhere(snap, d.key);
-                if (conds === null) throw { __fallthrough: true };  // can't build masks → legacy
-                const whereSQL = conds.length ? `WHERE ${conds.join(' AND ')}` : '';
-                let rows;
-                if (d.key === 'source') {
-                    // source target: COUNT(*) per source over the index (one row/pid),
-                    // constrained by the tree dims only (exclude-self).
-                    const w = conds.length ? `${whereSQL} AND source IS NOT NULL` : 'WHERE source IS NOT NULL';
-                    rows = await db.query(`
-                        SELECT source AS value, COUNT(*) AS count
-                        FROM read_parquet('${index_url}') s
-                        ${w}
-                        GROUP BY source
-                    `);
-                } else {
-                    // tree target: JOIN node_bits; each (pid,node) the pid is a member
-                    // of yields one row, so COUNT(*) per concept_uri == distinct pids
-                    // with that bit. Membership encodes ancestor closure → parent ≥ child.
-                    rows = await db.query(`
-                        SELECT nb.concept_uri AS value, COUNT(*) AS count
-                        FROM read_parquet('${index_url}') s
-                        JOIN read_parquet('${node_bits_url}') nb
-                          ON nb.facet_type = '${d.key}'
-                         AND (s.${d.key}_mask & (1::BIGINT << nb.bit_index)) <> 0
-                        ${whereSQL}
-                        GROUP BY nb.concept_uri
-                    `);
-                }
-                const map = new Map();
-                for (const r of rows) map.set(r.value, Number(r.count));
-                return { key: d.key, map };
-            }));
+            const rows = await db.query(arms.join('\nUNION ALL\n'));
             if (myReq !== facetCountsReqId) return 'superseded';
+            const grouped = { source: new Map(), material: new Map(), context: new Map(), object_type: new Map() };
+            for (const r of rows) if (grouped[r.dim]) grouped[r.dim].set(r.value, Number(r.count));
             // atomic repaint: all dims at once, only after the single stale check
-            for (const { key, map } of results) applyFacetCounts(key, map);
+            for (const d of dims) applyFacetCounts(d.key, grouped[d.key]);
             return 'ok';
         } catch (err) {
             if (myReq !== facetCountsReqId) return 'superseded';
-            if (err && err.__fallthrough) return 'fallthrough';
             console.warn('mask-index cross-filter counts failed; counts unavailable (not baseline):', err);
             return 'unavailable';
         }
@@ -3689,9 +3732,16 @@ zoomWatcher = {
             const snap = directFilterSnapshot();
             if (snap.activeCount >= 1) {
                 const res = await applyMaskIndexCounts(snap, dims, myReq);
-                if (res === 'ok' || res === 'superseded') return;
-                if (res === 'unavailable') { markFacetCountsUnavailable(); return; }
-                // res === 'fallthrough' → continue to the legacy paths below
+                if (res === 'superseded' || res === 'ok') return;
+                // res is 'unavailable' OR 'fallthrough' (index not ready / a node has
+                // no bit). At global + full tree + active filters there is NO correct
+                // legacy path — the legacy slow path reverts tree dims to the
+                // unfiltered BASELINE, which IS the #304 bug. Honesty rule: show the
+                // dash, never baseline. (When readiness flips true later,
+                // __onFacetIndexReady re-runs this and the dash becomes real counts.)
+                if (myReq !== facetCountsReqId) return;   // defensive stale check (Codex)
+                markFacetCountsUnavailable();
+                return;
             }
         }
 
@@ -4547,6 +4597,12 @@ zoomWatcher = {
     window.__onFilteredClustersReady = () => {
         if (hasFacetFilters()) reconcileGlobeForFilters();
     };
+    // #304/#305: when the facet-index preflight settles true AFTER boot already
+    // computed counts (async race), recompute so a multi-filter that fell back to
+    // the "unavailable" dash becomes correct counts without a user action.
+    window.__onFacetIndexReady = () => {
+        if (hasFacetFilters()) refreshFacetCounts();
+    };
     setTimeout(() => {
         if (window.__filteredClustersReady === true && hasFacetFilters()) {
             const h = viewer.camera.positionCartographic.height;

From e8e18bcbde5143dbd7d2217a97ba9851071d26c0 Mon Sep 17 00:00:00 2001
From: Raymond Yee <raymond.yee@gmail.com>
Date: Sat, 20 Jun 2026 05:38:27 -0700
Subject: [PATCH 10/14] #304/#305 Phase 2: fix masks-catch recoupling +
 source-only refresh (Codex r2)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- P1: nodeBitsReady split into TWO try blocks — a masks-query failure no longer
  throws into the shared catch that nulled __nodeBitsMap/__nodeBitsBuild. node_bits
  publishes the count-path values in step 1; masks preflight is step 2 and only
  withholds __nodeBits (the FILTER signal) on failure. Count path now truly
  survives a missing/mismatched masks file.
- P1: __onFacetIndexReady refreshes UNCONDITIONALLY — hasFacetFilters() excludes
  source, so a source-only filter that painted dashes before readiness now also
  reconciles. Debounce + reqId guard make a redundant no-filter refresh harmless.
- P2 staleness: coverage handshake upgraded from bare row-count to per-SOURCE
  histogram equality (index vs facets_v3) — catches equal-cardinality PID
  replacement + source drift, not just a smaller index. Verified mismatch=0 on real
  202608 data (GEOME/OPENCONTEXT/SESAR/SMITHSONIAN all match). Still a proxy, not
  the full fingerprint (build-time gate); generation-id match guards mask corruption.

Codex r2 confirmed otherwise sound: no global active-filter route reaches baseline;
UNION ALL exclude-self + dim regrouping correct; unqualified cols bind to index `s`;
all R2_BASE forms pass; 36 py + 13 node tests pass.

Refs #305 #304 #306
---
 explorer.qmd | 76 +++++++++++++++++++++++++++++++---------------------
 1 file changed, 46 insertions(+), 30 deletions(-)

diff --git a/explorer.qmd b/explorer.qmd
index 0d2f8b3..307963c 100644
--- a/explorer.qmd
+++ b/explorer.qmd
@@ -1828,10 +1828,15 @@ nodeBitsReady = {
     window.__nodeBits = null;
     window.__nodeBitsMap = null;
     window.__nodeBitsBuild = null;
+    let map, nbBuild;
+    // STEP 1: node_bits — its OWN try, so the count-path values it publishes are
+    // never cleared by a downstream masks failure (Codex r2 P1: the masks query
+    // previously threw into a shared catch that nulled __nodeBitsMap/__nodeBitsBuild,
+    // re-coupling the two and disabling valid counts).
     try {
         const rows = await db.query(
             `SELECT facet_type, concept_uri, bit_index, build_id FROM read_parquet('${node_bits_url}')`);
-        const map = { material: new Map(), context: new Map(), object_type: new Map() };
+        map = { material: new Map(), context: new Map(), object_type: new Map() };
         const nbBuilds = new Set();
         for (const r of rows) {
             if (map[r.facet_type]) map[r.facet_type].set(r.concept_uri, Number(r.bit_index));
@@ -1841,32 +1846,37 @@ nodeBitsReady = {
         // node_bits must carry exactly ONE build_id (a mixed file is corrupt; don't
         // let a last-row-wins value coincidentally match masks — Codex r2).
         if (!haveBits || nbBuilds.size !== 1) return false;
-        const nbBuild = [...nbBuilds][0];
+        nbBuild = [...nbBuilds][0];
         // node_bits is valid on its own → publish the bit map + generation NOW so the
-        // count path (which depends only on node_bits + the index) can run even if
-        // masks is absent. facetIndexReady consumes window.__nodeBitsBuild.
+        // count path (which depends only on node_bits + the index) runs even if masks
+        // is absent. facetIndexReady consumes window.__nodeBitsBuild.
         window.__nodeBitsMap = map;
         window.__nodeBitsBuild = nbBuild;
-        // Codex P1.1 + P1.2: PREFLIGHT the masks file AND require its build_id to
-        // match node_bits. This proves masks is present/readable (else the mask
-        // FILTER path would fail with no retry) and that both artifacts are from the
-        // SAME generation. Only then advertise __nodeBits for facetFilterSQL;
-        // otherwise facetFilterSQL uses the membership fallback (count path unaffected).
+    } catch (err) {
+        console.warn('node_bits preflight failed; facetFilterSQL + count path use fallbacks:', err);
+        window.__nodeBits = null;
+        window.__nodeBitsMap = null;
+        window.__nodeBitsBuild = null;
+        return false;
+    }
+    // STEP 2: masks — a SEPARATE try. A masks failure here must leave __nodeBitsMap /
+    // __nodeBitsBuild intact (count path stays enabled) and only withhold __nodeBits
+    // (the masks-gated FILTER signal for facetFilterSQL). Codex P1.1 + P1.2: masks
+    // must be present/readable AND the SAME generation as node_bits, else
+    // facetFilterSQL uses the membership fallback.
+    try {
         const mrows = await db.query(
             `SELECT DISTINCT build_id FROM read_parquet('${masks_url}')`);
         const maskBuilds = mrows.map(r => r.build_id);
         if (maskBuilds.length !== 1 || maskBuilds[0] !== nbBuild) {
-            console.warn('masks/node_bits build_id mismatch or missing; using membership fallback',
+            console.warn('masks/node_bits build_id mismatch; using membership fallback (count path unaffected)',
                          { nbBuild, maskBuilds });
-            return false;   // count path stays enabled via __nodeBitsMap
+            return false;
         }
         window.__nodeBits = map;
         return true;
     } catch (err) {
-        console.warn('node_bits preflight failed; facetFilterSQL + count path use fallbacks:', err);
-        window.__nodeBits = null;
-        window.__nodeBitsMap = null;
-        window.__nodeBitsBuild = null;
+        console.warn('masks preflight failed; facetFilterSQL uses membership fallback (count path unaffected):', err);
         return false;
     }
 }
@@ -1908,16 +1918,21 @@ facetIndexReady = {
             return false;
         }
         // (d) runtime coverage handshake: the index must cover the SAME located
-        // universe the counts are about. Compare its row count to facets (the
-        // located set). A stale/partial index (fewer pids) or a mixed-cache
-        // deployment fails here rather than returning convincing-but-low counts.
-        const cov = await db.query(
-            `SELECT (SELECT COUNT(*) FROM read_parquet('${index_url}')) AS idx_n,
-                    (SELECT COUNT(*) FROM read_parquet('${facets_url}')) AS facets_n`);
-        const idxN = Number(cov[0].idx_n), facetsN = Number(cov[0].facets_n);
-        if (idxN !== facetsN) {
-            console.warn('sample_facet_index coverage != located universe; multi-filter counts unavailable',
-                         { idxN, facetsN });
+        // universe the counts are about. Compare the per-SOURCE histogram of the
+        // index to facets (the located set) — a symmetric diff that catches a
+        // stale/partial index (fewer pids), an equal-cardinality PID replacement,
+        // AND source drift, all of which a bare row-count check would miss (Codex r2).
+        // It is still a cardinality/source proxy, NOT the full coverage fingerprint
+        // (that's the build-time gate, SERIALIZATIONS §4.12) — a same-source-histogram
+        // mask corruption could pass, which the generation-id match above guards.
+        const cov = await db.query(`
+            WITH i AS (SELECT source, COUNT(*) c FROM read_parquet('${index_url}') GROUP BY source),
+                 f AS (SELECT source, COUNT(*) c FROM read_parquet('${facets_url}') GROUP BY source)
+            SELECT (SELECT COUNT(*) FROM (SELECT * FROM i EXCEPT SELECT * FROM f))
+                 + (SELECT COUNT(*) FROM (SELECT * FROM f EXCEPT SELECT * FROM i)) AS mismatch`);
+        if (Number(cov[0].mismatch) !== 0) {
+            console.warn('sample_facet_index per-source coverage != located universe; multi-filter counts unavailable',
+                         { mismatch: Number(cov[0].mismatch) });
             return false;
         }
         window.__facetIndexReady = true;
@@ -4598,11 +4613,12 @@ zoomWatcher = {
         if (hasFacetFilters()) reconcileGlobeForFilters();
     };
     // #304/#305: when the facet-index preflight settles true AFTER boot already
-    // computed counts (async race), recompute so a multi-filter that fell back to
-    // the "unavailable" dash becomes correct counts without a user action.
-    window.__onFacetIndexReady = () => {
-        if (hasFacetFilters()) refreshFacetCounts();
-    };
+    // computed counts (async race), recompute so any filter that fell back to the
+    // "unavailable" dash becomes correct counts without a user action. Refresh
+    // UNCONDITIONALLY — hasFacetFilters() excludes SOURCE selections (Codex r2 P1),
+    // and refreshFacetCounts is debounced + facetCountsReqId-guarded, so a redundant
+    // no-filter refresh just recomputes the baseline harmlessly.
+    window.__onFacetIndexReady = () => { refreshFacetCounts(); };
     setTimeout(() => {
         if (window.__filteredClustersReady === true && hasFacetFilters()) {
             const h = viewer.camera.positionCartographic.height;

From 0832091239e6c8b15624aec73b193289d14b50ec Mon Sep 17 00:00:00 2001
From: Raymond Yee <raymond.yee@gmail.com>
Date: Sat, 20 Jun 2026 05:41:25 -0700
Subject: [PATCH 11/14] #304/#305 Phase 2: correct coverage-handshake claim
 (Codex r3)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The per-source histogram handshake does NOT catch a same-source, same-cardinality
PID swap (mismatch stays 0) — correct the overclaim. That residual is closed by
(i) the generation-id match (the membership half hashes pid, so any swap of a pid
WITH membership changes it — leaving only #306 no-membership pids) and (ii) the
authoritative BUILD-TIME coverage fingerprint + pid-set equality gate. Runtime
stays a cheap source/cardinality proxy by design (boot-query budget); a future
co-published expected-id pointer would let it assert the full fingerprint.

Codex r3 confirmed all P1s closed: masks failure never clears the count-path
values; map/nbBuild correctly scoped; unconditional readiness refresh can't loop;
coverage SQL valid + NULL-safe + cheap.

Refs #305 #304 #306
---
 explorer.qmd | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/explorer.qmd b/explorer.qmd
index 307963c..f6dc97f 100644
--- a/explorer.qmd
+++ b/explorer.qmd
@@ -1920,11 +1920,18 @@ facetIndexReady = {
         // (d) runtime coverage handshake: the index must cover the SAME located
         // universe the counts are about. Compare the per-SOURCE histogram of the
         // index to facets (the located set) — a symmetric diff that catches a
-        // stale/partial index (fewer pids), an equal-cardinality PID replacement,
-        // AND source drift, all of which a bare row-count check would miss (Codex r2).
-        // It is still a cardinality/source proxy, NOT the full coverage fingerprint
-        // (that's the build-time gate, SERIALIZATIONS §4.12) — a same-source-histogram
-        // mask corruption could pass, which the generation-id match above guards.
+        // stale/partial index (a per-source count drift) and SOURCE drift, which a
+        // bare total-row-count check would miss (Codex r2). This is a deliberately
+        // CHEAP proxy (a one-time projected source-column scan), NOT a complete
+        // staleness check: it does NOT detect a same-source, same-cardinality PID
+        // swap (mismatch stays 0). That residual is closed by (i) the generation-id
+        // match above — the membership half is a hash over membership *including pid*,
+        // so any swap of a pid that HAS membership changes it — leaving only swaps of
+        // the #306 no-membership pids, and (ii) the BUILD-TIME full coverage
+        // fingerprint + pid-set equality gate (validate_frontend_derived,
+        // SERIALIZATIONS §4.12), which is the authoritative coverage check. A future
+        // co-published expected-composite-id pointer would let the runtime assert the
+        // full fingerprint directly (deferred).
         const cov = await db.query(`
             WITH i AS (SELECT source, COUNT(*) c FROM read_parquet('${index_url}') GROUP BY source),
                  f AS (SELECT source, COUNT(*) c FROM read_parquet('${facets_url}') GROUP BY source)

From fad4201450f9a98041c8851797fb173f2c184ff8 Mon Sep 17 00:00:00 2001
From: Raymond Yee <raymond.yee@gmail.com>
Date: Sat, 20 Jun 2026 06:49:05 -0700
Subject: [PATCH 12/14] #304/#305 Phase 2: make the facetIndexReady coverage
 check cheap (boot-perf)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In-browser DuckDB-WASM verification against the live index showed the per-source
coverage handshake cost ~8.6s at boot — it scanned the 60MB facets_v3. Compare
the index's per-source histogram to facet_summaries' source rows instead: the
builder computes those as GROUP BY source over the SAME located set (samp_geo),
the file is ~2KB and already loaded at boot, and it's verified equivalent
(mismatch=0 on real 202608 data). This removes a multi-second connection hold at
boot — exactly the single-connection starvation the app guards against — with no
loss of coverage signal. Index source filter mirrors the builder's
NULLIF(TRIM(source),'') so empty-string sources can't cause a false mismatch.

Verified end-to-end in WASM: preflight PASS, Eric's case = 145,770, cold 1121ms /
warm 174ms.

Refs #305 #304
---
 explorer.qmd | 34 +++++++++++++++++++---------------
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/explorer.qmd b/explorer.qmd
index f6dc97f..2ea754c 100644
--- a/explorer.qmd
+++ b/explorer.qmd
@@ -1919,22 +1919,26 @@ facetIndexReady = {
         }
         // (d) runtime coverage handshake: the index must cover the SAME located
         // universe the counts are about. Compare the per-SOURCE histogram of the
-        // index to facets (the located set) — a symmetric diff that catches a
-        // stale/partial index (a per-source count drift) and SOURCE drift, which a
-        // bare total-row-count check would miss (Codex r2). This is a deliberately
-        // CHEAP proxy (a one-time projected source-column scan), NOT a complete
-        // staleness check: it does NOT detect a same-source, same-cardinality PID
-        // swap (mismatch stays 0). That residual is closed by (i) the generation-id
-        // match above — the membership half is a hash over membership *including pid*,
-        // so any swap of a pid that HAS membership changes it — leaving only swaps of
-        // the #306 no-membership pids, and (ii) the BUILD-TIME full coverage
-        // fingerprint + pid-set equality gate (validate_frontend_derived,
-        // SERIALIZATIONS §4.12), which is the authoritative coverage check. A future
-        // co-published expected-composite-id pointer would let the runtime assert the
-        // full fingerprint directly (deferred).
+        // index to facet_summaries' source rows (which the builder computes as
+        // GROUP BY source over the SAME located set, samp_geo) — a symmetric diff
+        // that catches a stale/partial index (per-source count drift) and SOURCE
+        // drift, which a bare total-row-count check would miss (Codex r2).
+        // facet_summaries is ~2 KB and already loaded at boot, so this is a near-free
+        // check; an earlier draft scanned the 60 MB facets_v3 and cost ~8.6 s of
+        // DuckDB-WASM connection time at boot (measured in-browser) — exactly the
+        // single-connection starvation this app guards against. It is still a CHEAP
+        // proxy, NOT a complete staleness check: it does NOT detect a same-source,
+        // same-cardinality PID swap (mismatch stays 0). That residual is closed by
+        // (i) the generation-id match above — the membership half is a hash over
+        // membership *including pid*, so any swap of a pid that HAS membership changes
+        // it, leaving only swaps of the #306 no-membership pids — and (ii) the
+        // BUILD-TIME full coverage fingerprint + pid-set equality gate
+        // (validate_frontend_derived, SERIALIZATIONS §4.12), the authoritative check.
         const cov = await db.query(`
-            WITH i AS (SELECT source, COUNT(*) c FROM read_parquet('${index_url}') GROUP BY source),
-                 f AS (SELECT source, COUNT(*) c FROM read_parquet('${facets_url}') GROUP BY source)
+            WITH i AS (SELECT source AS v, COUNT(*) c FROM read_parquet('${index_url}')
+                       WHERE NULLIF(TRIM(source), '') IS NOT NULL GROUP BY source),
+                 f AS (SELECT facet_value AS v, count AS c
+                       FROM read_parquet('${facet_summaries_url}') WHERE facet_type = 'source')
             SELECT (SELECT COUNT(*) FROM (SELECT * FROM i EXCEPT SELECT * FROM f))
                  + (SELECT COUNT(*) FROM (SELECT * FROM f EXCEPT SELECT * FROM i)) AS mismatch`);
         if (Number(cov[0].mismatch) !== 0) {

From c2b3553124ff36ee0c4f29ae0f31d25bec5daa5a Mon Sep 17 00:00:00 2001
From: Raymond Yee <raymond.yee@gmail.com>
Date: Sat, 20 Jun 2026 06:19:35 -0700
Subject: [PATCH 13/14] #304/#305 Phase 3: extend mask-index counts to viewport
 + search
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Phase 2 routed global, no-search multi-filter counts through sample_facet_index.
Phase 3 extends the SAME path to the two cases it gated OUT:
- VIEWPORT: the index has no coordinates, so a non-global view JOINs
  samples_map_lite (l.latitude/l.longitude) on pid (one row/pid → no grain
  change) and appends the same VIEWPORT_PAD_FACTOR=0.3 padded bbox the table,
  point loader and heatmap use.
- SEARCH: appends the search_pids semi-join (s.pid IN (SELECT pid FROM
  search_pids)).
Both fold into the single UNION ALL statement via a bare-predicate WHERE builder.
Index columns are now s.-qualified because lite ALSO has a `source` column.

Gate generalized: the index path now runs whenever there's ANY constraint
(cross-filter, viewport, or search) in full tree mode. Fallback differs by view
(honesty rule): at GLOBAL+no-search an index miss shows the dash (legacy =
baseline = #304); at VIEWPORT/SEARCH an index miss falls through to the legacy
membership path, which is CORRECT there. Both paths coexist — the membership
count path is RETAINED this phase; retirement is deferred to a browser-verified
follow-up (plan: retire only after parity is confirmed in-app).

Characterization (native DuckDB parity, real 202608 data): object_type counts for
material=anthropogenic scoped to a Mediterranean bbox are IDENTICAL between the
index path and the membership path (COUNT(*) over the mask join == COUNT(DISTINCT
pid) over membership). The search semi-join is structurally identical to the
membership path's, so search parity follows from the proven mask≡membership set
identity.

Refs #305 #304
---
 explorer.qmd | 85 ++++++++++++++++++++++++++++++++++------------------
 1 file changed, 56 insertions(+), 29 deletions(-)

diff --git a/explorer.qmd b/explorer.qmd
index 2ea754c..881cf9e 100644
--- a/explorer.qmd
+++ b/explorer.qmd
@@ -3616,12 +3616,15 @@ zoomWatcher = {
                 mask |= (1n << BigInt(idx));
             }
             if (mask === 0n) return null;
-            conds.push(`(${key}_mask & ${mask}) <> 0`);        // BigInt → decimal BIGINT literal
+            // s.-qualified: Phase 3 JOINs samples_map_lite (which ALSO has a `source`
+            // column) for viewport scoping, so unqualified `source` would be ambiguous.
+            // Mask columns exist only on the index, but qualify everything for clarity.
+            conds.push(`(s.${key}_mask & ${mask}) <> 0`);      // BigInt → decimal BIGINT literal
         }
         if (excludeKey !== 'source') {
             if (snap.sourceImpossible) return ['1=0'];          // no source selectable → 0 for tree targets
             if (snap.source.length > 0) {
-                conds.push(`source IN (${snap.source.map(v => `'${escSql(v)}'`).join(',')})`);
+                conds.push(`s.source IN (${snap.source.map(v => `'${escSql(v)}'`).join(',')})`);
             }
         }
         return conds;
@@ -3632,24 +3635,40 @@ zoomWatcher = {
     // Returns: 'ok' (applied), 'superseded' (a newer request won), 'unavailable'
     // (index/bitmap usable but a query failed → honesty: caller shows the dash), or
     // 'fallthrough' (index/bitmap not usable → caller uses legacy paths).
-    async function applyMaskIndexCounts(snap, dims, myReq) {
+    async function applyMaskIndexCounts(snap, dims, myReq, bboxSQL = null) {
         if (!window.__facetIndexReady || !window.__nodeBitsMap) return 'fallthrough';
         // Build ONE query (UNION ALL of per-dim arms) rather than 4 concurrent
         // full-index scans — the documented DuckDB-WASM single-connection
         // starvation risk (Codex P2). One statement = one connection slot. Each arm
         // carries its OWN exclude-self cross-filter WHERE. Common output shape
         // (dim, value, count) so one pass groups them.
+        //
+        // #305 Phase 3 — viewport + search: the index carries no coordinates, so a
+        // viewport-scoped count JOINs samples_map_lite (l.latitude/l.longitude) on
+        // pid (one row/pid → no grain change) and appends the SAME padded bbox the
+        // table/heatmap use; an active search appends the search_pids semi-join.
+        // bboxSQL (when set) is already built against l.latitude/l.longitude by the
+        // caller. These are bare predicates (no leading AND) so the WHERE assembles
+        // cleanly even when there is no cross-filter constraint (viewport-only).
+        const liteJoin = bboxSQL ? `JOIN read_parquet('${lite_url}') l ON l.pid = s.pid` : '';
+        const bboxBare = bboxSQL ? bboxSQL.replace(/^\s*AND\s+/i, '') : null;
+        const searchBare = searchIsActive() ? 's.pid IN (SELECT pid FROM search_pids)' : null;
+        const buildWhere = (conds, extra = []) => {
+            const preds = [...conds, ...extra];
+            if (bboxBare) preds.push(bboxBare);
+            if (searchBare) preds.push(searchBare);
+            return preds.length ? `WHERE ${preds.join(' AND ')}` : '';
+        };
         const arms = [];
         for (const d of dims) {
             const conds = maskIndexWhere(snap, d.key);
             if (conds === null) return 'unavailable';   // selected node missing a bit → can't compute honestly
-            const whereSQL = conds.length ? `WHERE ${conds.join(' AND ')}` : '';
             if (d.key === 'source') {
                 // source target: COUNT(*) per source over the index (one row/pid),
-                // constrained by the tree dims only (exclude-self).
-                const w = conds.length ? `${whereSQL} AND source IS NOT NULL` : 'WHERE source IS NOT NULL';
-                arms.push(`SELECT 'source' AS dim, source AS value, COUNT(*) AS count
-                           FROM read_parquet('${index_url}') s ${w} GROUP BY source`);
+                // constrained by the tree dims only (exclude-self) + viewport/search.
+                arms.push(`SELECT 'source' AS dim, s.source AS value, COUNT(*) AS count
+                           FROM read_parquet('${index_url}') s ${liteJoin}
+                           ${buildWhere(conds, ['s.source IS NOT NULL'])} GROUP BY s.source`);
             } else {
                 // tree target: JOIN node_bits; each (pid,node) the pid is a member of
                 // yields one row, so COUNT(*) per concept_uri == distinct pids with
@@ -3659,7 +3678,8 @@ zoomWatcher = {
                            JOIN read_parquet('${node_bits_url}') nb
                              ON nb.facet_type = '${d.key}'
                             AND (s.${d.key}_mask & (1::BIGINT << nb.bit_index)) <> 0
-                           ${whereSQL} GROUP BY nb.concept_uri`);
+                           ${liteJoin}
+                           ${buildWhere(conds)} GROUP BY nb.concept_uri`);
             }
         }
         try {
@@ -3745,29 +3765,36 @@ zoomWatcher = {
             if (myReq !== facetCountsReqId) return;
         }
 
-        // #304/#305: MULTI-filter (or multi-value single-dim, or source-impossible,
-        // or a single-filter cube MISS) at global view + full tree mode + no search.
-        // The single-filter cube above can't serve these, and the legacy slow path
-        // below reverts every tree dim to the unfiltered BASELINE at global view
-        // (the #304 bug). Route through the complete per-pid index instead — correct
-        // multi-filter counts in one columnar bitmask scan, applied atomically.
-        // HONESTY RULE: on any failure show "unavailable", NEVER the baseline. A
-        // 'fallthrough' (index/bitmap not usable, e.g. not yet published) defers to
-        // the legacy paths unchanged so this is safe to ship before the index lands.
-        if (bboxSQL === null && !searchIsActive() && TREE_DIM_KEYS.every(treeActive)) {
+        // #304/#305: route facet counts through the complete per-pid index whenever
+        // there is ANY constraint (cross-filter, viewport, or search) in full tree
+        // mode. Phase 2 covered global+no-search multi-filter; Phase 3 extends the
+        // SAME index path to VIEWPORT (JOIN samples_map_lite + padded bbox) and
+        // SEARCH (search_pids semi-join). The single-filter cube above stays a pure
+        // global fast-path optimization.
+        //
+        // FALLBACK SEMANTICS differ by view (the honesty rule):
+        //  - GLOBAL + no search: the legacy slow path reverts tree dims to the
+        //    unfiltered BASELINE — that IS the #304 bug. So on index miss/error here
+        //    we show the "unavailable" dash, NEVER baseline. (__onFacetIndexReady
+        //    re-runs this when readiness flips, turning the dash into real counts.)
+        //  - VIEWPORT or SEARCH: the legacy membership path below computes the
+        //    CORRECT viewport/search counts, so an index miss/error falls through to
+        //    it (both paths coexist until the membership path is retired — Phase 3
+        //    end, after characterization confirms parity).
+        if (TREE_DIM_KEYS.every(treeActive)) {
             const snap = directFilterSnapshot();
-            if (snap.activeCount >= 1) {
-                const res = await applyMaskIndexCounts(snap, dims, myReq);
+            const atGlobalNoSearch = (bboxSQL === null && !searchIsActive());
+            const hasConstraint = snap.activeCount >= 1 || bboxSQL !== null || searchIsActive();
+            if (hasConstraint) {
+                const res = await applyMaskIndexCounts(snap, dims, myReq, bboxSQL);
                 if (res === 'superseded' || res === 'ok') return;
-                // res is 'unavailable' OR 'fallthrough' (index not ready / a node has
-                // no bit). At global + full tree + active filters there is NO correct
-                // legacy path — the legacy slow path reverts tree dims to the
-                // unfiltered BASELINE, which IS the #304 bug. Honesty rule: show the
-                // dash, never baseline. (When readiness flips true later,
-                // __onFacetIndexReady re-runs this and the dash becomes real counts.)
                 if (myReq !== facetCountsReqId) return;   // defensive stale check (Codex)
-                markFacetCountsUnavailable();
-                return;
+                if (atGlobalNoSearch) {
+                    // no correct legacy path here → honest dash, never baseline (#304)
+                    markFacetCountsUnavailable();
+                    return;
+                }
+                // viewport/search: fall through to the correct legacy membership path
             }
         }
 

From 4aa989bef0295f2deefe4d353110608a0518fec1 Mon Sep 17 00:00:00 2001
From: Raymond Yee <raymond.yee@gmail.com>
Date: Sat, 20 Jun 2026 06:23:53 -0700
Subject: [PATCH 14/14] #304/#305 Phase 3: fix global-search honesty fallback
 (Codex P3)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

At global view WITH active search, an index miss/error fell through (atGlobalNoSearch
was false because search was active) to the legacy tree path, which hits
`if (treeActive && !bboxSQL) applyFacetCounts(d.key, null)` — the unfiltered
BASELINE, ignoring search. Honesty violation.

The correct pivot is whether a VIEWPORT BBOX exists, not whether search is active:
at global (bboxSQL === null) there is no correct legacy tree path regardless of
search, so an index miss now shows the dash. Fall through to the legacy membership
path ONLY when a viewport bbox exists (where it is correct). Drops atGlobalNoSearch
in favor of `bboxSQL === null`.

Codex confirmed everything else sound: grain (lite join one-row-per-pid under the
validated PID-set-equality contract), column binding, bbox regex (strips only the
first leading AND, anti-meridian inner ANDs intact), escaping, viewport-only
histograms, concurrency/atomicity.

The requested regression test (global+search + forced index-unavailable → tree
counts "(—)" not baseline) is recorded as a Phase 4 Playwright case (#43) — the
count engine runs in-browser, so it needs the Playwright harness, not a headless
unit test.

Refs #305 #304
---
 explorer.qmd | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/explorer.qmd b/explorer.qmd
index 881cf9e..412d372 100644
--- a/explorer.qmd
+++ b/explorer.qmd
@@ -3772,29 +3772,32 @@ zoomWatcher = {
         // SEARCH (search_pids semi-join). The single-filter cube above stays a pure
         // global fast-path optimization.
         //
-        // FALLBACK SEMANTICS differ by view (the honesty rule):
-        //  - GLOBAL + no search: the legacy slow path reverts tree dims to the
-        //    unfiltered BASELINE — that IS the #304 bug. So on index miss/error here
-        //    we show the "unavailable" dash, NEVER baseline. (__onFacetIndexReady
-        //    re-runs this when readiness flips, turning the dash into real counts.)
-        //  - VIEWPORT or SEARCH: the legacy membership path below computes the
-        //    CORRECT viewport/search counts, so an index miss/error falls through to
-        //    it (both paths coexist until the membership path is retired — Phase 3
-        //    end, after characterization confirms parity).
+        // FALLBACK SEMANTICS differ by view (the honesty rule). The pivot is whether
+        // a viewport bbox exists, NOT whether search is active (Codex P3):
+        //  - GLOBAL (bboxSQL === null), with OR without search: the legacy tree path
+        //    below hits `if (treeActive && !bboxSQL) applyFacetCounts(d.key, null)` —
+        //    the unfiltered BASELINE — which IGNORES search and the cross-filters
+        //    (#304). There is NO correct legacy global tree path. So on index
+        //    miss/error at global we show the "unavailable" dash, NEVER baseline.
+        //    (__onFacetIndexReady re-runs this when readiness flips → real counts.)
+        //  - VIEWPORT (bboxSQL !== null), ±search: the legacy membership path JOINs
+        //    samples_map_lite + bbox + the search semi-join and is CORRECT, so an
+        //    index miss/error falls through to it (both paths coexist until the
+        //    membership path is retired — after in-app parity is confirmed).
         if (TREE_DIM_KEYS.every(treeActive)) {
             const snap = directFilterSnapshot();
-            const atGlobalNoSearch = (bboxSQL === null && !searchIsActive());
             const hasConstraint = snap.activeCount >= 1 || bboxSQL !== null || searchIsActive();
             if (hasConstraint) {
                 const res = await applyMaskIndexCounts(snap, dims, myReq, bboxSQL);
                 if (res === 'superseded' || res === 'ok') return;
                 if (myReq !== facetCountsReqId) return;   // defensive stale check (Codex)
-                if (atGlobalNoSearch) {
-                    // no correct legacy path here → honest dash, never baseline (#304)
+                if (bboxSQL === null) {
+                    // global (incl. global+search): no correct legacy tree path →
+                    // honest dash, never baseline (#304).
                     markFacetCountsUnavailable();
                     return;
                 }
-                // viewport/search: fall through to the correct legacy membership path
+                // viewport (±search): fall through to the correct legacy membership path
             }
         }