Skip to content

Fix structural annotation document visibility#2089

Merged
JSv4 merged 6 commits into
mainfrom
codex/fix-vulnerability-in-structural-annotations
Jul 1, 2026
Merged

Fix structural annotation document visibility#2089
JSv4 merged 6 commits into
mainfrom
codex/fix-vulnerability-in-structural-annotations

Conversation

@JSv4

@JSv4 JSv4 commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Motivation

  • The explicit AnnotationType.resolve_document path (enabled by @bypass_get_queryset) could return a Document from a shared StructuralAnnotationSet without enforcing Document visibility, allowing private documents or signed URLs to leak when an unrelated shared document is visible.
  • The unscoped annotations browsing path prefetched unfiltered structural-set documents, which could cause the resolver to pick an unauthorized document when the prefetch included private rows.

Description

  • Update config/graphql/annotation_types.py::AnnotationType.resolve_document to gate returned documents through BaseService.filter_visible_qs(...) for the direct document_id case, the prefetched structural-set candidate list, and the fallback documents queryset so only visible documents are returned.
  • Preserve deterministic/context-scoped ordering by intersecting the prefetched candidates with the permission-filtered set and returning the first visible prefetched document while keeping prefetch order.
  • Add a regression in opencontractserver/tests/test_corpus_cards_structural_document_resolution.py that creates other_user and a private_doc that shares the same structural set and would sort first in an unscoped prefetch, along with _UNSCOPED_QUERY and test_unscoped_structural_resolution_skips_private_shared_documents to assert no private document is returned.

Testing

  • Compiled the modified files with python -m compileall config/graphql/annotation_types.py opencontractserver/tests/test_corpus_cards_structural_document_resolution.py which succeeded.
  • Attempted to run python -m pytest opencontractserver/tests/test_corpus_cards_structural_document_resolution.py -q but test execution was blocked by the environment due to ModuleNotFoundError: No module named 'django' so the new test was not executed here.
  • The changes were committed as Fix structural annotation document visibility and include the code and test updates in the described files.

Codex Task

@codecov

codecov Bot commented Jun 29, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@claude

claude Bot commented Jun 30, 2026

Copy link
Copy Markdown

Code Review

This PR fixes a real IDOR vulnerability: structural annotations sharing a StructuralAnnotationSet with a private document could leak that document through an unscoped prefetch. The approach — intersecting the prefetch with BaseService.filter_visible_qs — is conceptually correct, and the new test exercises the private-doc scenario. However the implementation introduces two N+1 query regressions and a correctness gap that need addressing before merge.


Finding 1 — N+1 queries in the document_id branch (High)

config/graphql/annotation_types.py:663

The old code:

if self.document_id:
    return self.document

returned the already-loaded FK object at zero cost. The outer annotation list resolver (in annotation_queries.py:520-524) unconditionally applies .select_related("document", "document__creator"), which populates self.document in the Python instance dict via a SQL JOIN — no extra query needed.

The new code:

return (
    BaseService.filter_visible_qs(
        Document.objects.filter(pk=self.document_id),
        info.context.user, request=info.context,
    )
    .select_related("creator")
    .first()
)

ignores the cached self.document entirely and always issues a fresh SELECT … WHERE pk = ? LIMIT 1. For a response with N non-structural annotations this is N extra round-trips; in a corpus with hundreds of annotations the latency impact is significant.

Fix: For annotations that already have document_id set, check self.document first (Django caches it); only fall through to the DB if the FK is not populated. A guarded approach:

if self.document_id:
    cached = self.__dict__.get("document")
    if cached is not None:
        return cached
    # rare: document was not select_related — do the safe fetch
    from opencontractserver.documents.models import Document
    return (
        BaseService.filter_visible_qs(
            Document.objects.filter(pk=self.document_id),
            info.context.user, request=info.context,
        ).first()
    )

Or, more simply: because every code path in annotation_queries.py applies select_related("document"), the permission re-check on an already-user-scoped queryset is redundant — self.document can be returned directly (annotations are already fetched through visible_to_user).


Finding 2 — N+1 visibility queries per structural annotation (High)

config/graphql/annotation_types.py:685

resolve_document is called once per annotation. Inside the prefetch path:

visible_ids = set(
    BaseService.filter_visible_qs(
        Document.objects.filter(pk__in=prefetched_ids),
        info.context.user, request=info.context,
    ).values_list("pk", flat=True)
)

For N structural annotations that all share the same structural_set, prefetched_ids is identical on every call, but Document.objects.filter(pk__in=prefetched_ids)… fires a fresh DB query each time — N round-trips instead of 1.

The right fix is to scope the prefetch itself at queryset build time (so the Prefetch queryset already contains only visible documents), or to memoize the visibility result keyed on (structural_set_id, user.pk) for the duration of a single request. Moving the permission gate into AnnotationService.structural_document_prefetch is the cleanest: pass a user argument and have it construct Prefetch("structural_set__documents", queryset=Document.objects.visible_to_user(user).filter(...)).


Finding 3 — Prefetch short-circuits to None when all prefetched docs are invisible (Medium)

config/graphql/annotation_types.py:692

When the prefetch list is non-empty but visible_ids turns out to be empty:

return next(
    (doc for doc in prefetched if doc.id in visible_ids),
    None,   # ← exits the function here
)

the function returns None unconditionally, and the fallback DB query at lines 707-724 (which searches the full structural set for any visible document) is never reached. A user whose corpus copy is not in the prefetch but does exist as a visible document in the structural set gets null where a valid document resolution would exist.

This can happen today when structural_document_prefetch is called with document_id=some_doc that the user cannot see but another visible document exists in the set.

Fix: Change return next(...) to assign the result, then fall through if it is None:

result = next(
    (doc for doc in prefetched if doc.id in visible_ids),
    None,
)
if result is not None:
    return result
# fall through to the DB fallback

Finding 4 — New test only excludes the private doc, not the expected doc (Low)

opencontractserver/tests/test_corpus_cards_structural_document_resolution.py:307

self.assertNotEqual(
    resolved_doc["id"],
    to_global_id("DocumentType", self.private_doc.id),
    "resolve_document returned a private document from the shared set",
)

This assertion confirms the private doc is excluded, but a regression that returned doc_b (the corpus-B copy) or source_doc (the standalone source with no active corpus path) would still pass. Add a positive assertion for the expected document:

allowed_doc_ids = {
    to_global_id("DocumentType", d.id)
    for d in (self.source_doc, self.doc_a, self.doc_b)
}
self.assertIn(resolved_doc["id"], allowed_doc_ids)

Minor

  • The from opencontractserver.documents.models import Document import appears twice in the same method (once in each branch). Since it's not the cycle-prone document_types import, it can be hoisted to a single deferred import at the top of the function body.
  • request=info.context passed to filter_visible_qs has no effect today (base.py:130-131 explicitly does not forward it to visible_to_user). Passing it creates a false impression that the Tier-2 per-object cache is engaged for queryset filters — consider removing or adding a comment.

JSv4 added 2 commits June 30, 2026 06:57
resolve_document (config/graphql/annotation_types.py) could leak a private
document that shares a content-hash-deduplicated StructuralAnnotationSet.

- structural_document_prefetch now requires a keyword-only user and scopes the
  prefetch with Document.objects.visible_to_user(user), so the visibility gate
  runs once per page; resolve_document trusts the user-scoped prefetch and no
  longer issues a per-row visibility SELECT (review finding 2).
- document_id branch returns the already select_related-cached FK instead of a
  fresh per-row query; falls back to a single permission-scoped fetch only when
  the FK was not cached (review finding 1).
- An empty (fully filtered) prefetch falls through to the corpus-scoped DB
  fallback instead of short-circuiting to null (review finding 3).
- SemanticSearchResultType.resolve_document (config/graphql/social_types.py)
  delegates to the gated AnnotationType.resolve_document, closing the
  convenience field's raw-FK bypass and adding structural resolution.
- Update both prefetch callers (annotation_queries, search_queries) to pass
  user; positive assertion added to the regression test.

Consolidates the fix from the closed duplicate PR #2088.
@claude

claude Bot commented Jun 30, 2026

Copy link
Copy Markdown

Code Review

This PR closes a real security gap: StructuralAnnotationSet rows are content-hash-deduplicated and therefore shared across documents and owners, so the prior unscoped prefetch/fallback could surface a private document to a user who never had READ on it. The core fix—requiring user= in structural_document_prefetch and gating the fallback through filter_visible_qs—is correct and well-motivated. A few things worth fixing before merge:


1. filter_visible_qs fallback for non-structural annotations is unreachable dead code

File: config/graphql/annotation_types.py, lines 674-683

_meta.get_field("document") returns the ForeignKey field object. Field.is_cached(instance) (Django base class) checks "document_id" in instance.__dict__ — i.e., whether the integer PK column was loaded, not whether the related Document object is in the select_related cache. Since document_id is a concrete column that is always loaded with the annotation row (unless explicitly deferred), this check returns True in 100% of normal fetches, and the filter_visible_qs fallback path (lines 677-683) is dead code.

The guard was intended to detect "FK was NOT select_related, so the Document object isn't in memory." The correct idiom is Annotation.document.is_cached(self) (using the descriptor at the class level, not the field object from _meta.get_field), or checking "document" in self.__dict__. As written, the filter_visible_qs safety net for uncached FKs can never fire.

The practical risk today is low because the annotation list resolver enforces document visibility before returning annotations, but this guard was added explicitly as defense-in-depth—and it doesn't work.


2. Empty permission-filtered prefetch triggers a redundant fallback DB query

File: config/graphql/annotation_types.py, lines 700-703 and 712-721

When structural_document_prefetch runs and the user has no READ on any member of the shared set, the prefetch cache key "documents" is present but evaluates to an empty list. Lines 700-703 detect the empty list and fall through to the DB fallback at line 712, which will also find nothing (same user, same visibility filter). This is two queries—prefetch + fallback—where the prefetch result was already definitive. The comment says "fall through rather than short-circuiting to None" but an empty permission-gated prefetch is already a definitive answer:

if "documents" in prefetched_cache:
    prefetched = list(structural_set.documents.all())
    return prefetched[0] if prefetched else None  # prefetch was permission-gated; trust it

3. _prefetched_objects_cache is a private Django internal; silent N+1 regression risk

File: config/graphql/annotation_types.py, line 699

_prefetched_objects_cache is a private attribute Django can rename or restructure without a deprecation warning. If it disappears in a future Django upgrade, the getattr default is {}, so "documents" in prefetched_cache silently returns False—bypassing the prefetch entirely and issuing one per-row fallback DB query per structural annotation on every page. This would be a silent N+1 performance regression that no test would catch.

The existing codebase already uses the public prefetch-probe idiom (prefetch_cache_name on the descriptor). Consider probing via the descriptor—or simply call structural_set.documents.all() unconditionally and let Django serve the prefetch cache when available.


4. Fallback with no corpus context may return a cross-corpus document

File: config/graphql/annotation_types.py, lines 712-721

When the fallback fires and self.corpus_id is None (structural annotations not created in a corpus context), the corpus filter at line 715 is skipped. filter_visible_qs still prevents privilege escalation, but the result can be a document from a completely different corpus (or the standalone import source), ordered only by slug. If returning None is preferable to a cross-corpus result, an explicit guard like if not self.corpus_id: return None would be cleaner.


5. Inline Document.objects.filter() in GraphQL resolver bypasses the service layer

File: config/graphql/annotation_types.py, lines 712-721

CLAUDE.md requires: "Code with a user context (GraphQL resolvers ...) must reach models through opencontractserver/<app>/services/—never compose visible_to_user / user_can / user_has_permission_for_obj inline." The fallback path constructs a queryset directly on Document.objects and calls BaseService.filter_visible_qs inline. The automated system check (E001) misses this because it only flags specific patterns, so it passes the gate silently. The fix is to delegate to an AnnotationService or DocumentService method.


6. New regression test only covers the fallback path, not the prefetch path

File: opencontractserver/tests/test_corpus_cards_structural_document_resolution.py, line 291

test_unscoped_structural_resolution_skips_private_shared_documents sends an unscoped annotations(structural: true) query without a corpus prefetch, exercising only the DB fallback. The primary production path—the user-scoped prefetch applied by structural_document_prefetch in the corpus cards and semantic search resolvers—has no test asserting it suppresses a private set member when the user lacks READ on that document. Adding a corpus-scoped variant (using the existing _nodes_by_annotation_id helper with self.private_doc in scope) would close this gap.


Summary: The security fix is sound and the approach is right. The two items most worth addressing before merge are the unreachable filter_visible_qs gate (#1) and the private-Django-internal probe (#3)—both undermine the correctness guarantees the PR is trying to establish. Items #2, #4, #5, and #6 are lower-priority cleanup.

…rough service layer

Follow-up to eca04a3's structural-annotation visibility fix, addressing
second-round review findings on config/graphql/annotation_types.py:

- An empty (fully permission-filtered) structural_set__documents prefetch
  now returns None directly instead of falling through to a second,
  identically-scoped DB query — the prefetch result was already
  definitive.
- The structural DB fallback now requires corpus_id and returns None
  without it, rather than risking a cross-corpus document pick when no
  corpus context is available to scope against.
- Both the non-structural and structural fallback branches move out of
  the resolver and into AnnotationService.resolve_owned_document /
  resolve_structural_document_fallback, replacing inline
  Document.objects.filter() + BaseService.filter_visible_qs composition.
- Adds a regression test exercising the corpus-scoped prefetch path
  (not just the DB fallback) with a private shared document, plus a
  captured-query-count assertion that fails if the
  _prefetched_objects_cache detection ever silently degrades to a
  per-row fallback query.

Verified the FK is_cached() check flagged as "dead code" in review is
correct as written: Field.is_cached() (Django's FieldCacheMixin) checks
instance._state.fields_cache, not the raw document_id column, confirmed
against the pinned Django 5.2.15 source and an empirical select_related
test. Left in place with a clarifying comment instead of a no-op change.
@claude

claude Bot commented Jul 1, 2026

Copy link
Copy Markdown

Review

This is a well-scoped, well-documented fix for a real IDOR/privacy leak: StructuralAnnotationSet is content-hash-deduplicated across owners, so the old structural_set.documents.first() / Document.objects.filter(structural_annotation_set_id=...) paths could hand back another user's private document. The final iteration (after two rounds of self-review baked into the commit history) looks correct and reasonably thorough. Specific notes below.

Correctness

  • is_cached() check (config/graphql/annotation_types.py:684): document_field.is_cached(self) is FieldCacheMixin.is_cached, which checks instance._state.fields_cache — i.e. whether select_related("document") actually populated the FK object, not just whether document_id is loaded. This is correctly distinguished in the comment and matches Django's actual implementation (django.db.models.fields.mixins.FieldCacheMixin). Good catch preserving this as intentional rather than "dead code."
  • Hot-path safety: Both GraphQL entry points that produce AnnotationType instances (resolve_annotations in annotation_queries.py, and the semantic-search re-fetch in search_queries.py) already select_related("document"), so the new gated fallback (AnnotationService.resolve_owned_document) is a genuinely defensive path, not something exercised on every row — good for the N+1 concern the tests are guarding against.
  • SemanticSearchResultType.resolve_document delegation (config/graphql/social_types.py): calling AnnotationType.resolve_document(self.annotation, info) is correct — graphene resolvers bind against the underlying model instance (self.annotation is an Annotation model instance, matching what AnnotationType.resolve_document's self expects), not the GraphQL wrapper type. This closes a real bypass since the old code did a raw self.annotation.document FK read with no structural-set handling at all.
  • resolve_structural_document_fallback requiring corpus_id: returning None rather than guessing across corpora when there's no corpus_id to scope against is the right conservative call for a security fix — avoids resurrecting the "arbitrary member of the shared set" bug in a different shape.

Minor / non-blocking

  • Style inconsistency: structural_document_prefetch filters with Document.objects.visible_to_user(user) directly, while the two new fallback methods use cls.filter_visible_qs(...) (the BaseService helper). Both are fine functionally (this file is the service layer itself, not config/graphql/, so it's outside the scope of the opencontracts.E001 inline-Tier-0 check), but going through filter_visible_qs/filter_visible consistently in the same file would read more uniformly and match the pattern the CLAUDE.md service-layer guidance recommends elsewhere in this class.
  • No direct unit test for resolve_owned_document's corpus-shared-but-not-document-shared edge case: the two new tests (test_unscoped_structural_resolution_skips_private_shared_documents, test_corpus_scoped_prefetch_skips_private_shared_documents) thoroughly cover the structural leak this PR targets, including a query-count regression guard. There's no equivalent test that drives the plain document_id fallback (resolve_owned_document) through a scenario where a document is visible via corpus sharing but not direct document permission — worth a follow-up test if that fallback is ever expected to be hit by a real caller (today it looks unreachable from the two production call sites since both select_related("document")).
  • Semantic note for future readers: Document.objects.visible_to_user(user) is direct-document-permission semantics only (creator/public/direct guardian grant) — it does not independently re-derive corpus-level sharing the way CorpusDocumentService.get_corpus_documents_visible_to_user does. That's fine here because everything upstream of these resolvers already gates on document-level user_can/visible_to_user (confirmed in AnnotationService.get_corpus_annotations and _compute_effective_permissions), so there's no regression path where a user who can see an Annotation only via corpus access would now see document: null. Just worth being explicit about, since "corpus-as-gate vs MIN" is a documented sharp edge in this codebase (issue Reconcile CorpusObjsService corpus-as-gate vs GraphQL MIN-permission semantic #1682).

Test coverage

Good — the added tests specifically construct a document that sorts first alphabetically in an unscoped/unfiltered prefetch (aaa-private-* slugs) to prove the fix isn't accidentally passing due to ordering, and the query-count delta assertion is a solid regression guard against the _prefetched_objects_cache detection silently degrading to N+1.

Nits

  • The changelog fragment (changelog.d/2089-structural-annotation-visibility.security.md) is thorough and correctly typed as security. Consider trimming it — it currently reads more like an engineering changelog than a user-facing security advisory bullet, though this is a matter of house style preference.

Overall: solid fix, correctly reasoned through multiple review rounds, no blocking issues found.

AnnotationService.resolve_owned_document and
resolve_structural_document_fallback only run when a caller fetches
Annotation without select_related("document") / the structural
prefetch — every production query path always applies both, so these
defensive fallbacks were unreachable from the existing GraphQL-level
tests and codecov/patch failed at 41.67%. Exercises both methods
directly and through AnnotationType.resolve_document with
deliberately uncached/unprefetched Annotation instances.
@claude

claude Bot commented Jul 1, 2026

Copy link
Copy Markdown

Review: Fix structural annotation document visibility

This is a well-scoped, well-tested security fix. I traced the full call graph (both GraphQL entry points, the service layer, and the new/updated tests) and the fix is architecturally sound.

What's good

  • Correctly identifies and closes the leak: StructuralAnnotationSet is content-hash deduplicated across documents and owners, so an unscoped .documents.first() could return another user's private document. Gating both the prefetch (AnnotationService.structural_document_prefetch, now requiring a keyword-only user) and the DB fallback (resolve_structural_document_fallback) through Document.objects.visible_to_user(user) / BaseService.filter_visible_qs closes this correctly.
  • SemanticSearchResultType.resolve_document (config/graphql/social_types.py) previously returned self.annotation.document via a raw FK with zero visibility check — a second, independent leak point. Delegating to AnnotationType.resolve_document closes it and also picks up structural resolution for free. I verified this delegation works correctly even though a Django model instance (self.annotation) is passed where a graphene type instance would normally be self — that's safe because graphene-django resolvers already receive the underlying model instance as root, so self._meta.get_field(...) resolves against Django's model _meta, not graphene's type meta, in both call sites.
  • Verified the visibility-inheritance assumption holds: the new "trust the cached FK when document_field.is_cached(self)" fast path for non-structural annotations relies on Annotation.objects.visible_to_user() already inheriting document/corpus permission (per AnnotationQuerySet.visible_to_user, confirmed via opencontractserver/annotations/migrations/0077_...). I confirmed both GraphQL callers (resolve_annotations in annotation_queries.py, and the defensive re-fetch in resolve_semantic_search in search_queries.py) unconditionally select_related("document") on every branch, so the "trust the cache" fast path is never reachable with an unfiltered FK in production. Good.
  • Fallback behavior is a genuine improvement over the pre-fix code, not just equivalent: previously a structural annotation with no corpus_id would resolve via unscoped documents.order_by("slug").first() across the entire shared set — now resolve_structural_document_fallback returns None outright when there's no corpus to scope against, rather than guessing. Nice defensive tightening.
  • Test coverage is thorough: unscoped-browse leak, corpus-scoped-prefetch leak (with a private_doc_in_a deliberately sorting first via an "aaa-" slug to catch a silently-broken filter), a query-count-delta assertion guarding against the _prefetched_objects_cache detection silently degrading into an N+1 fallback, plus direct unit coverage of the two new AnnotationService methods and the uncached-FK path. The commit history shows this was already iterated through two review rounds (dead-code question on is_cached(), unreachable-fallback coverage triggering low codecov/patch) — good process.
  • Follows repo conventions well: routes through the service layer (opencontractserver/annotations/services/annotation_service.py) per the CLAUDE.md service-layer rule, uses BaseService.filter_visible_qs/Document.objects.visible_to_user rather than ad hoc composition, and ships a changelog.d/ fragment with the security type tag.

Minor / non-blocking observations

  1. document_field.is_cached(self) relies on private-ish Django internals (FieldCacheMixin/instance._state.fields_cache). The PR description says this was verified against pinned Django 5.2.15 source plus an empirical test, and the code comment documents the reasoning clearly — I'd just flag that a Django upgrade could theoretically change this behavior. The existing _prefetched_objects_cache check a few lines down carries the identical trade-off (and is explicitly noted as already-accepted precedent in extract_types.py::resolve_document_count), and the new query-count-delta test would catch a silent regression in either detection, so this seems adequately guarded.

  2. Out of scope for this PR, but related: opencontractserver/mcp/tools.py::search_corpus (~line 476) resolves structural passages' document slug/title via Document.objects.filter(structural_annotation_set_id__in=..., path_records__corpus_id=corpus.id, ...) without .visible_to_user(user) — only the corpus itself is visibility-checked. Per CLAUDE.md this may be intentional "corpus-as-gate" semantics (documented as the default for MCP/pipeline callers), but since it's the same code shape as the bug fixed here (structural-set → document resolution without a per-document visibility gate), it might be worth a follow-up confirmation that this is deliberate rather than the same class of leak in a different resolver. Not something this PR needs to fix.

  3. Small nit: AnnotationService.resolve_owned_document / resolve_structural_document_fallback don't accept/forward request=info.context. I checked BaseService.filter_visible_qs and confirmed request is intentionally not consumed there (it's only meaningful for the Tier-2 single-object cache used by get_or_none/user_can), so this isn't a real gap — just noting I checked it since CLAUDE.md calls out the request=info.context convention for GraphQL resolvers.

No blocking issues found. The security fix is correct, the regression tests meaningfully exercise both the leak and the N+1 risk introduced by the fix, and the code follows the project's established permissioning patterns.

@JSv4 JSv4 merged commit 0d2d1a7 into main Jul 1, 2026
13 checks passed
@JSv4 JSv4 deleted the codex/fix-vulnerability-in-structural-annotations branch July 1, 2026 06:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant