Skip to content

feat(#1425): strict_provenance config flag for runtime enforcement#1474

Open
dimitri-yatsenko wants to merge 1 commit into
feat/1424-self-upstreamfrom
feat/1425-strict-provenance
Open

feat(#1425): strict_provenance config flag for runtime enforcement#1474
dimitri-yatsenko wants to merge 1 commit into
feat/1424-self-upstreamfrom
feat/1425-strict-provenance

Conversation

@dimitri-yatsenko

Copy link
Copy Markdown
Member

Summary

T2.2.c of the provenance trinity — completes the trio.

When `dj.config["strict_provenance"] = True`, runtime gates enforce the upstream-only convention inside `make()`:

  • Reads must target a table in the active trace's allowed set (declared ancestors + self + self's Parts).
  • Writes must target self or self's Parts.
  • Inserted rows' PK columns that overlap with the current key must equal the key's values.

Default is False. Existing `make()` bodies are unaffected.

Closes #1425. Slated for DataJoint 2.3.

Branch stack

Layer PR Branch
T1.4 cascade fix #1468 `fix/1429-cascade-part-part-renamed-fk`
T2.2.a `Diagram.trace()` #1471 `feat/1423-diagram-trace`
T2.2.b `self.upstream` #1473 `feat/1424-self-upstream`
T2.2.c `strict_provenance` (this) this PR `feat/1425-strict-provenance`

Will rebase onto master after the chain merges in order.

What's added

Component File
Runtime context module — `ContextVar` + push/pop helpers + read/write gates `src/datajoint/provenance.py` (new)
Config flag `strict_provenance: bool` (default False) + env-var `DJ_STRICT_PROVENANCE` `src/datajoint/settings.py`
Push/pop context in `_populate_one` around the `make()` invocation `src/datajoint/autopopulate.py`
Read gate in `QueryExpression.cursor` `src/datajoint/expression.py`
Write gate in `Table.insert` (after existing `_allow_insert`) `src/datajoint/table.py`
6 integration tests `tests/integration/test_strict_provenance.py` (new)

Implementation notes

  • ContextVar (not threading.local) so the active-make context propagates correctly across `contextvars`-aware boundaries (asyncio, multiprocessing-with-context-propagation).
  • Part-table detection uses class `dict` traversal filtered to `Part` subclasses — `dir/getattr` would trigger the `_JobsDescriptor` and try to lazy-declare `~~table` inside the populate transaction (caught on the first test).
  • Read gate is no-op outside make() — checks `ContextVar.get()` returns None.
  • Write gate is no-op outside make() — same.

Documented limitation (deferred)

The read gate does not distinguish reads that came through `self.upstream` from reads of the same ancestor via a direct expression. Both are allowed if the table is in the allowed set. The intent is to catch reads from undeclared dependencies; tightening the "must come through self.upstream" path requires propagating an attribution marker through QueryExpression composition and is left for a follow-up.

Tests

  • 6/6 new strict-mode tests pass.
  • 17/17 existing autopopulate tests pass with strict default-off — confirms no regression.
  • 8/8 trace + 9/9 cascade tests unaffected.

Test plan

  • 6/6 new tests
  • 17/17 default-off regression
  • CI green after rebase onto master
  • Manual: enable in a staging pipeline and verify legitimate `make()`s still run while violations raise with clear messages

Implements T2.2.c of the provenance trinity, completing the trio
(Diagram.trace → self.upstream → strict_provenance).

When dj.config["strict_provenance"] = True, runtime gates enforce the
upstream-only convention inside make():
- Reads must target a table in the active trace's allowed set
  (declared ancestors + self + self's Parts).
- Writes must target self or self's Parts.
- Inserted rows' PK columns that overlap with the current key must
  equal the key's values (key-consistency rule).

Default is False. Existing make() bodies are unaffected.

Branch stacked on feat/1424-self-upstream (#1473) → feat/1423-diagram-trace
(#1471) → fix/1429-cascade-part-part-renamed-fk (#1468). Will rebase
onto master after the chain merges.

What's added:

- src/datajoint/provenance.py (new): the runtime context module.
  - `_active_strict_make` ContextVar holding (target, allowed_tables,
    key) for the currently-executing make() invocation. ContextVar
    chosen over threading.local to propagate correctly across
    contextvars-aware concurrency boundaries.
  - `push_strict_make_context` / `pop_strict_make_context` — context
    lifecycle managed by `_populate_one`'s try/finally.
  - `assert_read_allowed(query_expression)` — read gate. Recursively
    discovers base tables via the QueryExpression's `_support` chain
    and checks each against the allowed set.
  - `assert_write_allowed(target_table, rows)` — write gate. Verifies
    the target is self or one of self's Part tables, and checks the
    key-consistency rule on each dict row.

- src/datajoint/settings.py: new `strict_provenance: bool` field on
  Config (default False), env-var `DJ_STRICT_PROVENANCE`, ENV_VAR_MAPPING
  entry.

- src/datajoint/autopopulate.py: in `_populate_one`, push the strict
  context (when the flag is on) just before the make() invocation
  block. The allowed table set = trace's ancestor nodes ∪ {self.full_table_name}
  ∪ {self's Parts}. Pop in the existing `finally` block.

- src/datajoint/expression.py: `QueryExpression.cursor` now calls
  `assert_read_allowed(self)` before issuing SQL. No-op outside make().

- src/datajoint/table.py: `Table.insert` calls `assert_write_allowed(self, rows)`
  after the existing `_allow_insert` check. No-op outside make().

Part-table detection uses class `__dict__` traversal (filtered to Part
subclasses) instead of `dir/getattr` to avoid triggering the
`_JobsDescriptor` (which would lazy-declare ~~table inside the populate
transaction — caught by the first test iteration).

Documented limitation (deferred): the read gate does not distinguish
reads that came through `self.upstream` from reads of the same ancestor
via a direct expression. Both are allowed if the table is in the
allowed set. The intent is to catch reads from *undeclared*
dependencies; tightening the "must come through self.upstream" path
requires propagating an attribution marker through QueryExpression
composition and is left for a follow-up release.

Tests in tests/integration/test_strict_provenance.py (6 new):

- test_strict_compliant_make_passes — make() reading via self.upstream
  and writing self.insert1 with matching key runs cleanly under strict.
- test_strict_blocks_read_from_undeclared_table — read from an unrelated
  table raises with "strict_provenance ... undeclared" message.
- test_strict_blocks_write_to_other_table — insert into a non-self,
  non-Part target raises "not permitted".
- test_strict_blocks_write_with_mismatched_key — row PK that disagrees
  with the current key raises "does not match the current make() key".
- test_strict_writes_to_part_table_pass — self.PartName.insert(...) works.
- test_strict_off_by_default_no_change — default-off regression check;
  the canonical "direct (Ancestor & key).fetch1()" pattern still works
  when strict_provenance is unset.

Regression: 17/17 autopopulate tests pass with strict_provenance unset
(default). 6/6 new strict tests pass with strict_provenance=True.
8/8 trace tests + 9/9 cascade tests unaffected.

Slated for DataJoint 2.3.
@dimitri-yatsenko dimitri-yatsenko force-pushed the feat/1425-strict-provenance branch from 9aa7784 to f60495b Compare June 23, 2026 13:23
@dimitri-yatsenko dimitri-yatsenko force-pushed the feat/1424-self-upstream branch from 86b1ae7 to 7e4130f Compare June 23, 2026 13:23
@dimitri-yatsenko dimitri-yatsenko marked this pull request as ready for review June 23, 2026 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant