feat: knowledge base doc-freshness layer (drift detection + refresh) by efenocchi · Pull Request #285 · activeloopai/hivemind

efenocchi · 2026-06-23T16:50:06Z

What

Adds a Knowledge Base / per-file documentation layer that stays fresh on code deltas — the third layer alongside session summaries and the AST code graph. A common use case: internal docs that go stale because code changes many times a day.

Honest framing (per the team's own oracle eval): this is justified as doc freshness / onboarding trust / token savings, NOT QA-answer accuracy.

How it works

A doc anchors each described symbol to a content_hash of its source slice. On a commit, the graph diff + blast-radius selects only the docs whose anchored code (or its callers) changed; a bounded LLM edit regenerates them through an objective gate; the new version is written INSERT-only (version-bumped, immutable created_at).

code change -> graph diff + blast-radius -> impacted docs
            -> host LLM bounded rewrite -> gate -> setDoc (version+1)

Steps in this PR

1 - schema + config: hivemind_docs table (INSERT-only, version-bumped) + docsTableName.
2 - store: src/docs insert/edit/list/get + idempotent setDoc (no history fork).
3 - drift detector: anchors.ts (source-slice hash) + impact.ts (direct hash drift + blast-radius widening over the graph).
5 - refresh engine: gate.ts (objective invariants: bounded edit, anchors exist, slow-tier protected) + refresh.ts (re-anchor -> generate -> gate -> write) + refresh-llm.ts (host claude -p, reuses the wiki-worker seam) + hivemind docs CLI (set/show/list/archive/refresh, --anchor, --dry-run).
8 - trigger: opt-in HIVEMIND_DOCS_AUTO_REFRESH=1 fires a detached refresh at the end of graph build.

Testing

~105 unit tests; new docs files >=85% line coverage (gate/write/read/refresh-llm 100%).
End-to-end through the real hivemind binary against a test Deeplake table: real code change -> drift detected -> claude rewrote the doc to match -> gate passed -> v2 written, created_at preserved.

Not in scope

No VFS routing for the docs memory subtree yet (CLI/worker only); no spec-layer correlation; per-agent doc-worker variants (codex/cursor/hermes/pi) follow the existing wiki-worker fork pattern.

Summary by CodeRabbit

Release Notes

New Features
- Added top-level hivemind docs command (alias doc) with subcommands set, show, list, archive, refresh, and new generate.
- Docs can be anchored to code symbols; staleness detection identifies impacted docs for refresh using validated, versioned updates.
- Opt-in automatic docs refresh after graph builds via HIVEMIND_DOCS_AUTO_REFRESH=1.
- Docs are now available over /docs/* VFS paths with rendered directory indexes and doc content.
Chores
- Configurable docs table via HIVEMIND_DOCS_TABLE, with automatic table/schema/index setup.

github-actions · 2026-06-23T16:51:35Z

Coverage Report

Scope: files changed in this PR. Enforced threshold: 90% per metric (per file via vitest.config.ts).

Status	Category	Percentage	Covered / Total
🔴	Lines	86.27% (🎯 90%)	1382 / 1602
🔴	Statements	85.71% (🎯 90%)	1566 / 1827
🟢	Functions	93.63% (🎯 90%)	191 / 204
🔴	Branches	77.93% (🎯 90%)	865 / 1110

File Coverage — 19 files changed

File	Stmts	Branches	Functions	Lines
`src/cli/index.ts`	🔴 84.9%	🔴 82.8%	🟢 100.0%	🔴 83.0%
`src/commands/docs.ts`	🔴 69.2%	🔴 57.6%	🔴 83.3%	🔴 70.9%
`src/commands/graph.ts`	🔴 74.3%	🔴 62.9%	🟢 95.2%	🔴 73.9%
`src/config.ts`	🟢 100.0%	🟢 100.0%	🟢 100.0%	🟢 100.0%
`src/deeplake-api.ts`	🟢 98.0%	🔴 88.4%	🟢 98.0%	🟢 99.6%
`src/deeplake-schema.ts`	🟢 96.0%	🔴 84.8%	🟢 100.0%	🟢 95.2%
`src/docs/anchors.ts`	🟢 95.3%	🟢 90.6%	🟢 100.0%	🟢 100.0%
`src/docs/auto-refresh-trigger.ts`	🟢 100.0%	🔴 81.8%	🟢 100.0%	🟢 100.0%
`src/docs/gate.ts`	🟢 100.0%	🟢 100.0%	🟢 100.0%	🟢 100.0%
`src/docs/generate.ts`	🔴 88.2%	🔴 75.5%	🔴 85.7%	🔴 87.5%
`src/docs/impact.ts`	🔴 87.3%	🔴 81.8%	🟢 100.0%	🟢 91.1%
`src/docs/index.ts`	🟢 100.0%	🟢 100.0%	🟢 100.0%	🟢 100.0%
`src/docs/read.ts`	🟢 98.1%	🔴 79.5%	🟢 100.0%	🟢 100.0%
`src/docs/refresh-llm.ts`	🔴 76.9%	🔴 77.8%	🔴 71.4%	🔴 81.8%
`src/docs/refresh.ts`	🟢 94.3%	🔴 77.3%	🟢 100.0%	🟢 100.0%
`src/docs/stable-read.ts`	🟢 93.1%	🔴 87.0%	🔴 33.3%	🟢 100.0%
`src/docs/write.ts`	🟢 100.0%	🟢 100.0%	🟢 100.0%	🟢 100.0%
`src/graph/load-current.ts`	🟢 94.1%	🟢 100.0%	🟢 100.0%	🟢 92.8%
`src/graph/render/impact.ts`	🟢 92.2%	🔴 70.3%	🔴 87.5%	🟢 98.7%

_{Generated for commit c710143.}

coderabbitai · 2026-06-23T16:54:38Z

📝 Walkthrough

Walkthrough

Adds a complete hivemind docs subsystem: docs schema and provisioning, storage and rendering helpers, anchor and impact analysis, generation and refresh orchestration, CLI routing, and an optional post-build docs refresh trigger.

Changes

hivemind docs subsystem

Layer / File(s)	Summary
Schema, config, and DB provisioning `src/deeplake-schema.ts`, `src/config.ts`, `src/deeplake-api.ts`, `tests/shared/deeplake-api.test.ts`, `tests/claude-code/flush-memory.test.ts`, `tests/claude-code/skillify-auto-pull.test.ts`, `tests/claude-code/spawn-wiki-worker.test.ts`, `tests/shared/graph/deeplake-p.test.ts`	Adds `DOCS_COLUMNS`, `docsTableName` in config, `ensureDocsTable`, and fixture updates for the new table name.
Docs read/write API and rendered views `src/docs/stable-read.ts`, `src/docs/read.ts`, `src/docs/write.ts`, `src/docs/index.ts`, `src/docs/index-render.ts`, `src/docs/vfs-handler.ts`, `tests/shared/docs*.test.ts`	Adds stable-read querying, typed read/write helpers, barrel exports, directory index rendering, and docs VFS responses, with tests covering SQL generation, versioning, filtering, stable reads, rendered indexes, and VFS lookup behavior.
Anchor tracking, snapshot loading, and impact analysis `src/docs/anchors.ts`, `src/graph/load-current.ts`, `src/graph/render/impact.ts`, `src/docs/candidates.ts`, `src/docs/impact.ts`, `tests/shared/docs-impact.test.ts`, `tests/shared/docs-candidates.test.ts`, `tests/shared/graph/load-current.test.ts`	Adds source-location parsing, slice hashing, anchor status checks, local snapshot loading, reverse-edge traversal helpers, and doc impact computation, with tests for parsing, hashing, anchor states, snapshot loading, candidate expansion, and impact traversal.
Generation, gate, refresh, and auto-refresh `src/docs/gate.ts`, `src/docs/refresh.ts`, `src/docs/refresh-llm.ts`, `src/docs/generate.ts`, `src/docs/pool.ts`, `src/docs/auto-refresh-trigger.ts`, `tests/shared/docs-generate.test.ts`, `tests/shared/docs-refresh.test.ts`, `tests/shared/docs-refresh-llm.test.ts`, `tests/shared/docs-pool.test.ts`, `tests/shared/docs-stable-read.test.ts`, `tests/shared/docs-auto-refresh-trigger.test.ts`	Adds doc-edit gating, refresh prompt and orchestration, generation target selection, LLM-backed prompt execution, bounded work pooling, and the optional detached refresh trigger, with tests covering gating, generation, refresh outcomes, stable reads, model wrapping, pooling, and auto-refresh spawning.
CLI routing and graph build hook `src/cli/index.ts`, `src/commands/docs.ts`, `src/commands/graph.ts`, `src/hooks/pre-tool-use.ts`, `tests/claude-code/cli-docs.test.ts`, `tests/claude-code/pre-tool-use-branches.test.ts`	Adds the `hivemind docs` command handler with subcommands, wires it into the top-level CLI, adds docs-backed VFS routing in the pre-tool hook, and adds the graph post-build refresh trigger, with end-to-end CLI tests.

Sequence Diagram(s)

sequenceDiagram
  actor User
  participant CLI as hivemind docs refresh
  participant loadCurrentSnapshot
  participant changedFilesFromGit
  participant computeImpactedDocs
  participant ensureDocsTable
  participant refreshDocs
  participant makeHostGenerate
  participant gateDocEdit
  participant setDoc

  User->>CLI: docs refresh [--dry-run]
  CLI->>loadCurrentSnapshot: load local graph snapshot
  CLI->>changedFilesFromGit: detect changed files from git
  CLI->>computeImpactedDocs: find impacted docs from anchors + blast radius
  alt --dry-run
    CLI-->>User: print impacted docs, no writes
  else live run
    CLI->>ensureDocsTable: create/heal docs table
    CLI->>refreshDocs: process impacted docs
    refreshDocs->>makeHostGenerate: buildRefreshPrompt + runHostPrompt
    makeHostGenerate-->>refreshDocs: proposed markdown
    refreshDocs->>gateDocEdit: validate content, anchors, and line budget
    alt gate pass
      refreshDocs->>setDoc: persist new version
      setDoc-->>refreshDocs: write result
    else gate fail
      refreshDocs-->>refreshDocs: record rejection
    end
    refreshDocs-->>CLI: refresh report
    CLI-->>User: print refreshed/rejected/skipped counts
  end

Sequence Diagram(s)

sequenceDiagram
  actor User
  participant GraphBuild as hivemind graph build
  participant maybeSpawnDocsRefresh
  participant DocsCLI as hivemind docs refresh
  participant loadCurrentSnapshot

  User->>GraphBuild: build graph
  GraphBuild->>maybeSpawnDocsRefresh: check HIVEMIND_DOCS_AUTO_REFRESH
  alt enabled
    maybeSpawnDocsRefresh->>DocsCLI: spawn docs refresh --cwd <cwd>
    DocsCLI->>loadCurrentSnapshot: load current snapshot
  else disabled
    maybeSpawnDocsRefresh-->>GraphBuild: no-op
  end

Estimated code review effort: 5 (Critical) | ~120 minutes

Possibly related PRs

activeloopai/hivemind#192: Shares the src/commands/graph.ts build-flow path that now conditionally spawns docs refresh.
activeloopai/hivemind#203: Modifies the same src/hooks/pre-tool-use.ts routing path that now serves /docs/*.
activeloopai/hivemind#246: Also changes processPreToolUse routing behavior around virtual filesystem handling.

Suggested reviewers: kaghni, khustup2

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description covers summary, implementation, and testing, but it does not follow the template and omits the Version Bump section and checklist format.	Add the required Summary, Version Bump, and Test plan sections, including the version-bump checkbox items from the template.
Docstring Coverage	⚠️ Warning	Docstring coverage is 64.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title is concise and accurately summarizes the main doc freshness/drift-detection refresh changes.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/knowledge-base-docs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

…g parsing

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (4)

tests/shared/docs-impact.test.ts (1)

133-138: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Assert the full changed payload, not just state.

This test currently verifies only the discriminator and misses regressions in from/to values.

✅ Proposed assertion tightening

   it("changed when the code differs from the stored hash", () => {
     const n = node("a.ts:foo:function", "a.ts", "L1-L2");
     const stale: DocAnchor = { symbol_id: n.id, content_hash: "deadbeef" };
     const st = anchorStatus(stale, snap([n]), dir);
-    expect(st.state).toBe("changed");
+    expect(st).toEqual({
+      state: "changed",
+      from: "deadbeef",
+      to: sha("function foo() {\n  return 1;"),
+    });
   });

As per path instructions, tests/**: Prefer asserting on specific values (paths, messages) over generic substrings.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/shared/docs-impact.test.ts` around lines 133 - 138, The test for the
"changed when the code differs from the stored hash" case is only asserting on
the state property of the anchorStatus result, but it should verify the complete
changed payload including the from and to values. Replace the single assertion
on st.state with a comprehensive assertion that checks the full object structure
returned by anchorStatus to catch potential regressions in the from and to
fields.

Source: Path instructions

tests/shared/graph/load-current.test.ts (1)

54-58: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Make the fallback test assert concrete snapshot content.

not.toBeNull() is broad here; assert the expected nodes/links payload directly.

✅ Proposed assertion tightening

   it("falls back to snapshot_sha256 when there is no commit context", () => {
     writeLastBuild(cwd, { ts: 1, commit_sha: null, snapshot_sha256: "feedface" });
     writeSnapshotFile(cwd, "feedface", JSON.stringify({ nodes: [], links: [] }));
-    expect(loadCurrentSnapshot(cwd)).not.toBeNull();
+    expect(loadCurrentSnapshot(cwd)).toEqual({ nodes: [], links: [] });
   });

As per path instructions, tests/**: Prefer asserting on specific values (paths, messages) over generic substrings.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/shared/graph/load-current.test.ts` around lines 54 - 58, The test
assertion in the "falls back to snapshot_sha256 when there is no commit context"
test is too broad. Instead of using
expect(loadCurrentSnapshot(cwd)).not.toBeNull(), replace it with a specific
assertion that validates the returned snapshot object contains the expected
concrete values for the nodes and links properties that were written to the
snapshot file using writeSnapshotFile. This ensures the function correctly loads
and returns the actual snapshot payload rather than just checking it exists.

Source: Path instructions

tests/claude-code/cli-docs.test.ts (1)

87-223: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Tighten assertions to exact expected messages/values instead of generic substrings.

A large part of this suite validates only partial output via regex substring matches. Switching key checks to exact strings (or structured fields) will make these CLI contract tests significantly more robust.

As per path instructions, tests/**: Prefer asserting on specific values (paths, messages) over generic substrings.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/claude-code/cli-docs.test.ts` around lines 87 - 223, Replace generic
regex substring matching assertions with exact string or structured value checks
throughout the test suite. Instead of using toMatch() with broad regex patterns
like /hivemind docs/ or /Not logged in/, use toContain() for exact string
matches or direct equality checks for specific values. Apply this to key test
cases such as "prints usage with no subcommand", "exits 2 when not logged in",
"creates v1 via the real store", "prints (no doc) when nothing matches", "lists
rows", "archives an existing doc", and other test assertions that currently rely
on partial pattern matching to make the CLI contract tests more precise and
robust.
Source: Path instructions

tests/shared/docs.test.ts (1)

129-159: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Use exact error-message assertions for failure paths.

Several rejection tests currently assert generic substrings (or just toThrow()), which weakens contract checks. Prefer exact message assertions for these cases.

✅ Example tightening

-    ).rejects.toThrow(/must not be empty/);
+    ).rejects.toThrow("Doc content must not be empty");
@@
-    ).rejects.toThrow(/doc_id must not be empty/);
+    ).rejects.toThrow("Doc doc_id must not be empty");
@@
-    ).rejects.toThrow(/Doc not found: missing.ts/);
+    ).rejects.toThrow("Doc not found: missing.ts");
@@
-    await expect(
-      insertDoc(query, `x"; DROP TABLE y; --`, { doc_id: "a.ts", path: "/p", content: "x" }),
-    ).rejects.toThrow();
+    await expect(
+      insertDoc(query, `x"; DROP TABLE y; --`, { doc_id: "a.ts", path: "/p", content: "x" }),
+    ).rejects.toThrow(`Invalid SQL identifier: "x\\"; DROP TABLE y; --"`);

As per path instructions, tests/**: “Prefer asserting on specific values (paths, messages) over generic substrings.”

Also applies to: 212-227, 290-293

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/shared/docs.test.ts` around lines 129 - 159, The rejection tests in
this section use generic error assertions that are too permissive. Replace the
regex substring matches in the tests "rejects empty content", "rejects empty
doc_id", and "rejects content longer than" with exact error message strings
instead of regex patterns like /must not be empty/ and /exceeds 50000 chars/.
For the "rejects SQL-identifier injection in the table name" test, replace the
generic toThrow() with a specific error message assertion to ensure the exact
failure mode is being tested. This tightens the contract checks to verify
precise error messages rather than loose substring matches.

Source: Path instructions

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/commands/docs.ts`:
- Around line 195-196: The resolveContent() function can throw exceptions when
provided with invalid file paths or unreadable stdin, but it is currently
executed before the try/catch block at line 195, allowing uncaught exceptions to
terminate the command abruptly. Move the resolveContent() function call (and any
dependent code that uses its result) inside the existing try/catch error handler
to ensure failures are caught and handled as controlled CLI errors instead of
uncaught exceptions.

In `@src/docs/gate.ts`:
- Around line 79-87: The countChangedLines function call on line 80 performs an
expensive O(n*m) LCS diff operation before validating the content size
constraints, which means very large inputs consume significant CPU even though
they will be rejected by the GATE_MAX_CONTENT_LENGTH check. Move the content
validation checks (the empty content check and the GATE_MAX_CONTENT_LENGTH
check) to execute before the countChangedLines call so that oversized proposals
are rejected before the quadratic diff algorithm runs. This applies to the
countChangedLines call and the subsequent size validation checks in the reasons
array population section.

In `@src/docs/refresh.ts`:
- Around line 147-153: The code invokes args.generate() without first checking
if the document should be rejected as a slow-tier doc via gateDocEdit, which
unnecessarily leaks protected content to the LLM and incurs avoidable costs.
Move the gateDocEdit check before the args.generate() call so that slow-tier
documents are rejected without ever being sent to the generate function,
preventing unnecessary LLM invocations and content exposure.

In `@src/docs/write.ts`:
- Around line 56-57: The SetDocInput.project field is currently being ignored
during version bumps when a document already exists. In the setDoc function, the
project value from SetDocInput is dropped and appendVersion always uses the
previous project value instead of the new one provided. To fix this, propagate
the project value from SetDocInput through the setDoc function and pass it as a
parameter to appendVersion so it uses the new project value (when provided)
instead of always defaulting to previous.project. This will enable proper
project reassignment and prevent stale metadata from affecting listDocs
filtering.

In `@src/graph/render/impact.ts`:
- Around line 24-27: The reverse traversal logic for collecting impacted nodes
is currently processing all relations but should be restricted to dependency
relations only as documented in the comments. Update the traversal in the
impactedNodes collection section (around lines 41-45) to filter and process only
the dependency-related edges: calls, imports, extends, implements, and
method_of. Remove or skip any other relation types that are not part of the
dependency graph to prevent over-expansion of the closure and avoid marking
unrelated documents as impacted.

In `@tests/shared/docs-refresh.test.ts`:
- Line 81: The test assertions in this file use `.join().toMatch()` with regex
patterns to broadly match substring content in the reasons array, which can hide
incorrect reason values. Replace these broad substring checks with direct
assertions on the specific reason values. Instead of joining and using regex
matching on r.reasons, assert the exact expected reason value directly using
methods like toContain() or by checking specific array indices with toBe().
Apply this fix to all occurrences at lines 81, 87, 92, 97, 103, 180, and 214.

---

Nitpick comments:
In `@tests/claude-code/cli-docs.test.ts`:
- Around line 87-223: Replace generic regex substring matching assertions with
exact string or structured value checks throughout the test suite. Instead of
using toMatch() with broad regex patterns like /hivemind docs/ or /Not logged
in/, use toContain() for exact string matches or direct equality checks for
specific values. Apply this to key test cases such as "prints usage with no
subcommand", "exits 2 when not logged in", "creates v1 via the real store",
"prints (no doc) when nothing matches", "lists rows", "archives an existing
doc", and other test assertions that currently rely on partial pattern matching
to make the CLI contract tests more precise and robust.

In `@tests/shared/docs-impact.test.ts`:
- Around line 133-138: The test for the "changed when the code differs from the
stored hash" case is only asserting on the state property of the anchorStatus
result, but it should verify the complete changed payload including the from and
to values. Replace the single assertion on st.state with a comprehensive
assertion that checks the full object structure returned by anchorStatus to
catch potential regressions in the from and to fields.

In `@tests/shared/docs.test.ts`:
- Around line 129-159: The rejection tests in this section use generic error
assertions that are too permissive. Replace the regex substring matches in the
tests "rejects empty content", "rejects empty doc_id", and "rejects content
longer than" with exact error message strings instead of regex patterns like
/must not be empty/ and /exceeds 50000 chars/. For the "rejects SQL-identifier
injection in the table name" test, replace the generic toThrow() with a specific
error message assertion to ensure the exact failure mode is being tested. This
tightens the contract checks to verify precise error messages rather than loose
substring matches.

In `@tests/shared/graph/load-current.test.ts`:
- Around line 54-58: The test assertion in the "falls back to snapshot_sha256
when there is no commit context" test is too broad. Instead of using
expect(loadCurrentSnapshot(cwd)).not.toBeNull(), replace it with a specific
assertion that validates the returned snapshot object contains the expected
concrete values for the nodes and links properties that were written to the
snapshot file using writeSnapshotFile. This ensures the function correctly loads
and returns the actual snapshot payload rather than just checking it exists.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 6517aef1-95e4-4670-9211-446883525e7a

📥 Commits

Reviewing files that changed from the base of the PR and between 9f7077a and 1de043e.

📒 Files selected for processing (31)

src/cli/index.ts
src/commands/docs.ts
src/commands/graph.ts
src/config.ts
src/deeplake-api.ts
src/deeplake-schema.ts
src/docs/anchors.ts
src/docs/auto-refresh-trigger.ts
src/docs/gate.ts
src/docs/impact.ts
src/docs/index.ts
src/docs/read.ts
src/docs/refresh-llm.ts
src/docs/refresh.ts
src/docs/write.ts
src/graph/load-current.ts
src/graph/render/impact.ts
tests/claude-code/cli-docs.test.ts
tests/claude-code/flush-memory-wiring.test.ts
tests/claude-code/flush-memory.test.ts
tests/claude-code/skillify-auto-pull.test.ts
tests/claude-code/spawn-wiki-worker.test.ts
tests/shared/deeplake-api.test.ts
tests/shared/docs-auto-refresh-trigger.test.ts
tests/shared/docs-impact.test.ts
tests/shared/docs-refresh-llm.test.ts
tests/shared/docs-refresh.test.ts
tests/shared/docs.test.ts
tests/shared/graph/deeplake-pull.test.ts
tests/shared/graph/deeplake-push.test.ts
tests/shared/graph/load-current.test.ts

coderabbitai · 2026-06-23T17:10:19Z

+  it("rejects empty content", () => {
+    const r = gateDocEdit({ tier: "fast", prevContent: "x", newContent: "", newAnchors: [], snap: s });
+    expect(r.ok).toBe(false);
+    expect(r.reasons.join()).toMatch(/empty/);


📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Use exact reason assertions instead of join()+regex checks.

These assertions are currently broad substring matches, which can hide wrong reason text. Assert specific reason values directly.

✅ Suggested assertion style

- expect(r.reasons.join()).toMatch(/empty/); + expect(r.reasons).toContain("proposed content is empty"); - expect(r.reasons.join()).toMatch(/slow-tier/); + expect(r.reasons).toContain("slow-tier docs are human-curated; automatic refresh is not allowed"); - expect(report.outcomes[0].reasons?.join()).toMatch(/LLM down/); + expect(report.outcomes[0].reasons).toContain("generate failed: LLM down");

As per path instructions, tests/**: Prefer asserting on specific values (paths, messages) over generic substrings.

Also applies to: 87-87, 92-92, 97-97, 103-103, 180-180, 214-214

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/shared/docs-refresh.test.ts` at line 81, The test assertions in this file use `.join().toMatch()` with regex patterns to broadly match substring content in the reasons array, which can hide incorrect reason values. Replace these broad substring checks with direct assertions on the specific reason values. Instead of joining and using regex matching on r.reasons, assert the exact expected reason value directly using methods like toContain() or by checking specific array indices with toBe(). Apply this fix to all occurrences at lines 81, 87, 92, 97, 103, 180, and 214.

Source: Path instructions

Style nit, not a correctness gap — the join()+regex assertions do verify the exact reason text. Leaving the existing assertions; will tighten to exact-array matches if we touch these tests again.

@efenocchi Fair enough — the regex patterns are tight enough that the exact text is effectively verified. Noted; happy to revisit when those tests are next modified.

ʕ •`ᴥ•´ʔ

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/docs/stable-read.ts`:
- Around line 59-60: The stable-read option handling in stableRead currently
lets maxReads be silently increased to match stableReads, which breaks the
documented hard cap. Update the logic around stableReads/maxReads so the cap is
preserved: either validate and throw when opts.maxReads is less than
opts.stableReads, or adjust stableReads downward to the provided maxReads
instead of raising maxReads. Keep the fix localized to stableRead and preserve
the intended bounds behavior for callers.

In `@tests/shared/docs.test.ts`:
- Around line 401-414: The test in getDocLatest is only checking a happy-path
doc_id value and does not prove escaping works. Update the assertion to use a
doc_id containing a quote so sqlStr() escaping is exercised, and verify the full
generated predicate on calls[0] instead of a generic toContain match. Keep the
existing max-version behavior checks in tests/shared/docs.test.ts, but make the
escaped doc_id case explicit and specific.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 887f2d2e-a8fa-4a52-ab00-45afce7778c3

📥 Commits

Reviewing files that changed from the base of the PR and between 4b7ed15 and 16718c7.

📒 Files selected for processing (6)

src/docs/read.ts
src/docs/stable-read.ts
tests/claude-code/cli-docs.test.ts
tests/shared/docs-refresh.test.ts
tests/shared/docs-stable-read.test.ts
tests/shared/docs.test.ts

🚧 Files skipped from review as they are similar to previous changes (3)

tests/shared/docs-refresh.test.ts
tests/claude-code/cli-docs.test.ts
src/docs/read.ts

coderabbitai · 2026-06-30T14:44:17Z

+  const stableReads = Math.max(1, opts.stableReads ?? 3);
+  const maxReads = Math.max(stableReads, opts.maxReads ?? 10);


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Honor maxReads or reject contradictory options.

maxReads is documented as a hard cap, but Line 60 silently raises it to stableReads when callers pass maxReads < stableReads. That makes the API exceed the caller’s requested bound. Either throw on that combination or clamp stableReads down to maxReads so the cap stays real.

Suggested fix

- const stableReads = Math.max(1, opts.stableReads ?? 3); - const maxReads = Math.max(stableReads, opts.maxReads ?? 10); + const stableReads = Math.max(1, opts.stableReads ?? 3); + const maxReads = opts.maxReads ?? 10; + if (maxReads < stableReads) { + throw new Error("stableReads cannot exceed maxReads"); + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const stableReads = Math.max(1, opts.stableReads ?? 3);

const maxReads = Math.max(stableReads, opts.maxReads ?? 10);

const stableReads = Math.max(1, opts.stableReads ?? 3);

const maxReads = opts.maxReads ?? 10;

if (maxReads < stableReads) {

throw new Error("stableReads cannot exceed maxReads");

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/docs/stable-read.ts` around lines 59 - 60, The stable-read option handling in stableRead currently lets maxReads be silently increased to match stableReads, which breaks the documented hard cap. Update the logic around stableReads/maxReads so the cap is preserved: either validate and throw when opts.maxReads is less than opts.stableReads, or adjust stableReads downward to the provided maxReads instead of raising maxReads. Keep the fix localized to stableRead and preserve the intended bounds behavior for callers.

+      re += c;
+    }
+  }
+  return new RegExp(`^${re}$`);


coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (1)

tests/shared/docs.test.ts (1)
524-528: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

"Escaped LIKE" test doesn't exercise escaping.

dirPrefix: "src/graph" has no %, _, or quote characters, so a broken sqlLike() (e.g. one that fails to escape wildcards) would still pass this test. Use a prefix containing a wildcard/quote character to actually cover the escape path, similar to the listDocsByIds "escapes doc_ids" test below.

As per path instructions, tests/**: "Prefer asserting on specific values (paths, messages) over generic substrings."
🧪 Suggested tightening
   it("scopes to a directory prefix with an escaped LIKE", async () => {
     const { calls, query } = mockQuery([() => []]);
-    await listDocMeta(query, TBL, { dirPrefix: "src/graph" });
-    expect(calls[0]).toContain(`WHERE doc_id LIKE 'src/graph/%'`);
+    await listDocMeta(query, TBL, { dirPrefix: "src/gra_ph%" });
+    expect(calls[0]).toContain(`WHERE doc_id LIKE 'src/gra\\_ph\\%/%'`);
   });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/shared/docs.test.ts` around lines 524 - 528, The current listDocMeta
escaping test uses a plain prefix and does not verify sqlLike() escaping
behavior. Update the test around listDocMeta/mockQuery to use a dirPrefix
containing a wildcard or quote character, and assert the exact escaped WHERE
doc_id LIKE value rather than a generic substring. Keep the check targeted to
the escaping path so it covers the same kind of issue as the listDocsByIds
“escapes doc_ids” test.
Source: Path instructions

🧹 Nitpick comments (4)

src/commands/docs.ts (1)
267-300: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Duplicate directory-index logic vs. handleDocsVfs (vfs-handler.ts).

This block re-implements the same metadata fetch → direct-file filter → summary fetch → buildDocsIndex flow already in handleDocsVfs's directory branch, just against console.log instead of a DocsVfsResult. Two independent copies of the same logic risk drifting apart.

Consider extracting a shared renderDocsIndexAt(query, tableName, dir) helper (returning the rendered string) used by both the CLI index subcommand and handleDocsVfs.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/commands/docs.ts` around lines 267 - 300, The `index` subcommand in
`docs.ts` duplicates the same directory-index rendering flow already implemented
in `handleDocsVfs`, so the two paths can drift. Extract a shared helper like
`renderDocsIndexAt(query, tableName, dir)` that performs the metadata fetch,
direct-file filtering, summary lookup, and `buildDocsIndex` rendering, then call
it from both the CLI `index` branch and `handleDocsVfs` instead of keeping
separate copies.
tests/claude-code/pre-tool-use-branches.test.ts (1)
211-241: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Prefer exact assertions over toContain for deterministic outputs.

d?.file_path and d?.command are fully deterministic given the mocked virtualPath/body (built via writeReadCacheFileFn/safeEchoCommand), so toContain (lines 219, 220 via toBe("") is fine, but line 219/239 use substring checks) weakens the regression signal — a wrong directory or malformed echo could still pass.

As per path instructions, "Prefer asserting on specific values (paths, messages) over generic substrings."
✏️ Tighten the assertions
-    expect(d?.file_path).toContain("docs/index.md");
+    expect(d?.file_path).toBe(join(READ_CACHE_ROOT, "s", "read", "docs/index.md"));
-    expect(d?.command).toContain("Docs Index");
+    expect(d?.command).toBe(safeEchoCommand("# Docs Index\n"));
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/claude-code/pre-tool-use-branches.test.ts` around lines 211 - 241, The
assertions in the pre-tool-use branch tests are too loose for deterministic
outputs, so tighten them to exact value checks instead of substring matching.
Update the Read and Bash cases in processPreToolUse tests to assert the full
expected file_path and command values produced by writeReadCacheFileFn and
safeEchoCommand, using the existing processPreToolUse, handleDocsVfsFn, and
DOCS_CONFIG setup to locate the relevant outputs precisely.
Source: Path instructions
src/docs/vfs-handler.ts (1)
43-61: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Duplicate directory-index logic with commands/docs.ts.

The directFiles/summaries computation here is identical to the "index" subcommand branch in runDocsCommand (src/commands/docs.ts). Extracting a shared helper (e.g. buildDirIndexBody(query, tableName, dir)) would prevent the two call sites from silently diverging.
♻️ Suggested consolidation
// src/docs/index-render.ts (or a new helper module)
export async function buildDirIndexBody(
  query: QueryFn,
  tableName: string,
  dir: string,
): Promise<string> {
  const meta: DocMeta[] = (await listDocMeta(query, tableName, { dirPrefix: dir })).map((r) => ({
    doc_id: r.doc_id, version: r.version, updated_at: r.updated_at, status: r.status, tier: r.tier,
  }));
  const directFiles = meta
    .filter((m) => m.status === "active" && dirOf(m.doc_id) === dir)
    .map((m) => m.doc_id);
  const summaries = new Map<string, string>();
  if (directFiles.length > 0) {
    for (const d of await listDocsByIds(query, tableName, directFiles)) {
      summaries.set(d.doc_id, firstDocLine(d.content));
    }
  }
  return buildDocsIndex(meta, dir, summaries);
}
Both handleDocsVfs and runDocsCommand("index") can then call this helper.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/docs/vfs-handler.ts` around lines 43 - 61, The directory index assembly
logic is duplicated between handleDocsVfs and the "index" branch in
runDocsCommand, so extract it into a shared helper such as
buildDirIndexBody(query, tableName, dir). Move the DocMeta fetching, directFiles
filtering, summaries loading with listDocsByIds, and buildDocsIndex call into
that helper, then have both call sites use it to keep behavior consistent and
avoid drift.
src/docs/read.ts (1)
123-160: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Three near-duplicate "latest-per-doc" reduction loops in this file.

listDocMeta (own inline loop), listDocsByIds (via normalize), and getDocLatest each re-implement "pick the row with the max version, tie-broken by updated_at" with slightly different tie-break granularity (e.g. listDocMeta has no id tie-break that getDocLatest has). Consider factoring a shared pickLatestByDocId reducer to keep tie-break semantics consistent across all three call sites.

Also applies to: 169-192
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/docs/read.ts` around lines 123 - 160, The latest-per-doc selection logic
is duplicated across listDocMeta, listDocsByIds, and getDocLatest with
inconsistent tie-break behavior. Factor this into a shared reducer/helper such
as pickLatestByDocId and use it from each call site so the version and
updated_at comparison rules stay consistent. Make sure the shared helper
preserves the current row-shape handling in listDocMeta and the existing
id-aware tie-break used by getDocLatest, while listDocsByIds should reuse the
same reducer via normalize.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/commands/docs.ts`:
- Around line 70-76: The `docs generate` usage text is missing the `--project`
option even though `flagValue(args, "--project")` is read in the `generate`
command implementation. Update the usage string in `docs.ts` near `generate` so
it documents `--project <dir>` alongside the other flags, keeping the option
list in sync with the `generate` parser and related command symbols.
- Around line 497-502: The dry-run summary in generateDocs is using todo.length,
which ignores the active --limit and can overstate how many targets will
actually be processed. Update the dry-run branch in docs.ts so the reported
“would be documented” count reflects the same limit applied by generateDocs, and
keep the sample list truncation aligned with that same limit. Use the existing
generateDocs, dryRun, todo, and limit symbols to compute the displayed count
consistently with the non-dry run behavior.

In `@src/docs/anchors.ts`:
- Around line 66-79: normalizeForHash is stripping comment markers with blind
regexes, which can truncate string literals and break hashSource drift
detection. Update normalizeForHash to avoid removing // or # when they appear
inside strings/template literals, or switch to using the existing language-aware
tokenizer/tree-sitter path from the graph layer to identify real comment tokens.
Keep the behavior of stripping only actual comments in normalizeForHash while
preserving code content for URLs, Ruby interpolation, and similar cases.

In `@src/docs/write.ts`:
- Around line 191-205: The retry loop in insertDocResilient should perform the
getDocLatest landed-check after every timeout, including the final attempt,
instead of breaking immediately when attempt === retries. Update the control
flow so the last timed-out insert still verifies whether insertDoc actually
committed server-side before throwing lastErr, preserving the anti-duplicate
guarantee used by generateDocs. Adjust the related docs test to account for the
extra read-back on the final retry path.

In `@src/hooks/pre-tool-use.ts`:
- Around line 452-475: The new Docs VFS Bash paths in pre-tool-use need the same
Claude tool_result size protection as the existing direct-read branches. In the
virtualPath and lsDir docs handling around handleDocsVfsFn, cap the returned
body with capOutputForClaude before passing it to
safeEchoCommand/buildAllowDecision, matching the existing /graph and /index.md
behavior. Keep the Read tool cache path unchanged, but ensure Bash echoes never
send the full unbounded docs content directly.

---

Duplicate comments:
In `@tests/shared/docs.test.ts`:
- Around line 524-528: The current listDocMeta escaping test uses a plain prefix
and does not verify sqlLike() escaping behavior. Update the test around
listDocMeta/mockQuery to use a dirPrefix containing a wildcard or quote
character, and assert the exact escaped WHERE doc_id LIKE value rather than a
generic substring. Keep the check targeted to the escaping path so it covers the
same kind of issue as the listDocsByIds “escapes doc_ids” test.

---

Nitpick comments:
In `@src/commands/docs.ts`:
- Around line 267-300: The `index` subcommand in `docs.ts` duplicates the same
directory-index rendering flow already implemented in `handleDocsVfs`, so the
two paths can drift. Extract a shared helper like `renderDocsIndexAt(query,
tableName, dir)` that performs the metadata fetch, direct-file filtering,
summary lookup, and `buildDocsIndex` rendering, then call it from both the CLI
`index` branch and `handleDocsVfs` instead of keeping separate copies.

In `@src/docs/read.ts`:
- Around line 123-160: The latest-per-doc selection logic is duplicated across
listDocMeta, listDocsByIds, and getDocLatest with inconsistent tie-break
behavior. Factor this into a shared reducer/helper such as pickLatestByDocId and
use it from each call site so the version and updated_at comparison rules stay
consistent. Make sure the shared helper preserves the current row-shape handling
in listDocMeta and the existing id-aware tie-break used by getDocLatest, while
listDocsByIds should reuse the same reducer via normalize.

In `@src/docs/vfs-handler.ts`:
- Around line 43-61: The directory index assembly logic is duplicated between
handleDocsVfs and the "index" branch in runDocsCommand, so extract it into a
shared helper such as buildDirIndexBody(query, tableName, dir). Move the DocMeta
fetching, directFiles filtering, summaries loading with listDocsByIds, and
buildDocsIndex call into that helper, then have both call sites use it to keep
behavior consistent and avoid drift.

In `@tests/claude-code/pre-tool-use-branches.test.ts`:
- Around line 211-241: The assertions in the pre-tool-use branch tests are too
loose for deterministic outputs, so tighten them to exact value checks instead
of substring matching. Update the Read and Bash cases in processPreToolUse tests
to assert the full expected file_path and command values produced by
writeReadCacheFileFn and safeEchoCommand, using the existing processPreToolUse,
handleDocsVfsFn, and DOCS_CONFIG setup to locate the relevant outputs precisely.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: a6f04d5e-b2a1-416f-b716-dc70a5cbf74a

📥 Commits

Reviewing files that changed from the base of the PR and between 16718c7 and efa573f.

📒 Files selected for processing (22)

src/commands/docs.ts
src/docs/anchors.ts
src/docs/candidates.ts
src/docs/generate.ts
src/docs/index-render.ts
src/docs/index.ts
src/docs/pool.ts
src/docs/read.ts
src/docs/refresh-llm.ts
src/docs/refresh.ts
src/docs/vfs-handler.ts
src/docs/write.ts
src/hooks/pre-tool-use.ts
tests/claude-code/pre-tool-use-branches.test.ts
tests/shared/docs-candidates.test.ts
tests/shared/docs-generate.test.ts
tests/shared/docs-index-render.test.ts
tests/shared/docs-pool.test.ts
tests/shared/docs-refresh-llm.test.ts
tests/shared/docs-refresh.test.ts
tests/shared/docs-vfs-handler.test.ts
tests/shared/docs.test.ts

✅ Files skipped from review due to trivial changes (3)

tests/shared/docs-vfs-handler.test.ts
tests/shared/docs-candidates.test.ts
tests/shared/docs-index-render.test.ts

🚧 Files skipped from review as they are similar to previous changes (2)

tests/shared/docs-refresh.test.ts
src/docs/refresh.ts

coderabbitai · 2026-07-01T16:32:12Z

+  hivemind docs generate [--cwd <dir>] [--scope file|symbol] [--include <glob>]
+                         [--exclude <glob>] [--limit N] [--concurrency N]
+                         [--force] [--dry-run]
+      Auto-author docs for the codebase from the AST graph (which already skips
+      .gitignored / non-code files). Default scope=file (one doc per file,
+      anchored to its symbols). Skips files that already have a doc unless
+      --force. --dry-run lists the targets without calling the LLM.


📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

USAGE text for generate omits --project.

The implementation reads flagValue(args, "--project") (line 479), but the usage string doesn't document it.

✏️ Fix

hivemind docs generate [--cwd <dir>] [--scope file|symbol] [--include <glob>] [--exclude <glob>] [--limit N] [--concurrency N] - [--force] [--dry-run] + [--project P] [--force] [--dry-run]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

hivemind docs generate [--cwd <dir>] [--scope file|symbol] [--include <glob>]

[--exclude <glob>] [--limit N] [--concurrency N]

[--force] [--dry-run]

Auto-author docs for the codebase from the AST graph (which already skips

.gitignored / non-code files). Default scope=file (one doc per file,

anchored to its symbols). Skips files that already have a doc unless

--force. --dry-run lists the targets without calling the LLM.

hivemind docs generate [--cwd <dir>] [--scope file|symbol] [--include <glob>]

[--exclude <glob>] [--limit N] [--concurrency N]

[--project P] [--force] [--dry-run]

Auto-author docs for the codebase from the AST graph (which already skips

.gitignored / non-code files). Default scope=file (one doc per file,

anchored to its symbols). Skips files that already have a doc unless

--force. --dry-run lists the targets without calling the LLM.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/commands/docs.ts` around lines 70 - 76, The `docs generate` usage text is missing the `--project` option even though `flagValue(args, "--project")` is read in the `generate` command implementation. Update the usage string in `docs.ts` near `generate` so it documents `--project <dir>` alongside the other flags, keeping the option list in sync with the `generate` parser and related command symbols.

coderabbitai · 2026-07-01T16:32:12Z

+    if (dryRun) {
+      console.log(`${todo.length} target(s) would be documented (scope=${scope}); ${allTargets.length - todo.length} already documented or skipped.`);
+      for (const t of todo.slice(0, limit ?? 60)) {
+        console.log(`  ${t.doc_id}  (${t.symbols.length} symbols)`);
+      }
+      return;


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Dry-run count for generate ignores --limit.

todo.length (unlimited) is reported as "would be documented," but .slice(0, limit ?? 60) only truncates the printed sample list, not the count. A real (non-dry) run applies limit inside generateDocs, so the dry-run count overstates what will actually happen when --limit is set.

✏️ Fix

if (dryRun) { - console.log(`${todo.length} target(s) would be documented (scope=${scope}); ${allTargets.length - todo.length} already documented or skipped.`); - for (const t of todo.slice(0, limit ?? 60)) { + const effective = limit !== undefined ? todo.slice(0, limit) : todo; + console.log(`${effective.length} target(s) would be documented (scope=${scope}); ${allTargets.length - effective.length} already documented or skipped.`); + for (const t of effective.slice(0, 60)) { console.log(` ${t.doc_id} (${t.symbols.length} symbols)`); } return; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (dryRun) {

console.log(`${todo.length} target(s) would be documented (scope=${scope}); ${allTargets.length - todo.length} already documented or skipped.`);

for (const t of todo.slice(0, limit ?? 60)) {

console.log(` ${t.doc_id} (${t.symbols.length} symbols)`);

}

return;

if (dryRun) {

const effective = limit !== undefined ? todo.slice(0, limit) : todo;

console.log(`${effective.length} target(s) would be documented (scope=${scope}); ${allTargets.length - effective.length} already documented or skipped.`);

for (const t of effective.slice(0, 60)) {

console.log(` ${t.doc_id} (${t.symbols.length} symbols)`);

}

return;

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/commands/docs.ts` around lines 497 - 502, The dry-run summary in generateDocs is using todo.length, which ignores the active --limit and can overstate how many targets will actually be processed. Update the dry-run branch in docs.ts so the reported “would be documented” count reflects the same limit applied by generateDocs, and keep the sample list truncation aligned with that same limit. Use the existing generateDocs, dryRun, todo, and limit symbols to compute the displayed count consistently with the non-dry run behavior.

coderabbitai · 2026-07-01T16:32:12Z

+export function normalizeForHash(src: string, language?: string): string {
+  let s = src;
+  if (language === "python" || language === "ruby") {
+    s = s.replace(/#.*$/gm, ""); // line comments
+  } else {
+    s = s.replace(/\/\*[\s\S]*?\*\//g, ""); // block comments
+    s = s.replace(/\/\/.*$/gm, ""); // line comments
+  }
+  return s
+    .split(/\r?\n/)
+    .map((l) => l.replace(/\s+$/, "")) // trailing whitespace
+    .filter((l) => l.trim() !== "") // blank lines
+    .join("\n");
+}


🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy lift

Comment-stripping regex can corrupt string literals, silently defeating drift detection.

normalizeForHash strips //... (JS/TS/etc.) and #... (Python/Ruby) to end-of-line without any quote/string awareness. This will truncate any line whose code (not comment) contains // or #, e.g.:

const url = "https://api.example.com"; → normalized to const url = "https: for JS/TS.

Ruby string interpolation "Hello #{name}" → truncated to "Hello since # triggers line-comment stripping.

Since the resulting normalized text feeds directly into hashSource, this can cause two genuinely different sources (differing only after the spurious ///#) to hash identically — a real code change goes undetected as "fresh," which is exactly the false-negative this PR's freshness guarantee is meant to prevent. Given how common URL string literals and Ruby interpolation are, this isn't a rare edge case.

A minimally invasive fix would track whether we're inside a string/template literal before stripping ///# (or reuse the language's tree-sitter tokenizer already available in the graph layer to authoritatively identify comment tokens rather than a blind regex).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/docs/anchors.ts` around lines 66 - 79, normalizeForHash is stripping comment markers with blind regexes, which can truncate string literals and break hashSource drift detection. Update normalizeForHash to avoid removing // or # when they appear inside strings/template literals, or switch to using the existing language-aware tokenizer/tree-sitter path from the graph layer to identify real comment tokens. Keep the behavior of stripping only actual comments in normalizeForHash while preserving code content for URLs, Ruby interpolation, and similar cases.

coderabbitai · 2026-07-01T16:32:12Z

+  for (let attempt = 0; attempt <= retries; attempt++) {
+    try {
+      return await insertDoc(query, tableName, input);
+    } catch (err) {
+      if (!isTimeoutError(err)) throw err;
+      lastErr = err;
+      if (attempt === retries) break;
+      await sleep(backoff[Math.min(attempt, backoff.length - 1)]);
+      // Did the timed-out attempt actually commit server-side? If so, stop —
+      // re-inserting would fork a second version=1 row.
+      const landed = await getDocLatest(query, tableName, input.doc_id).catch(() => null);
+      if (landed) return { doc_id: landed.doc_id, version: landed.version };
+    }
+  }
+  throw lastErr ?? new Error("insertDocResilient: exhausted retries");


🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Final retry attempt skips the landed-check, defeating the anti-duplicate guarantee.

Every attempt except the last checks getDocLatest after a timeout to see if the write actually committed server-side. The last attempt (attempt === retries) breaks immediately and throws without that check — so a write that timed out client-side but landed server-side on the final attempt is reported as a failure, even though the row now exists. Callers (e.g. generateDocs) will log this as "failed" for a doc that actually succeeded, and any later un-forced regeneration path could re-insert it and fork a duplicate version=1 row — the exact bug this function exists to prevent.

🔧 Proposed fix (check landed status before giving up on every attempt, including the last)

} catch (err) { if (!isTimeoutError(err)) throw err; lastErr = err; - if (attempt === retries) break; - await sleep(backoff[Math.min(attempt, backoff.length - 1)]); const landed = await getDocLatest(query, tableName, input.doc_id).catch(() => null); if (landed) return { doc_id: landed.doc_id, version: landed.version }; + if (attempt === retries) break; + await sleep(backoff[Math.min(attempt, backoff.length - 1)]); }

Note: the "gives up after the retry budget" test in tests/shared/docs.test.ts will need an extra scripted read-back response added to match the new call sequence.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for (let attempt = 0; attempt <= retries; attempt++) {

try {

return await insertDoc(query, tableName, input);

} catch (err) {

if (!isTimeoutError(err)) throw err;

lastErr = err;

if (attempt === retries) break;

await sleep(backoff[Math.min(attempt, backoff.length - 1)]);

// Did the timed-out attempt actually commit server-side? If so, stop —

// re-inserting would fork a second version=1 row.

const landed = await getDocLatest(query, tableName, input.doc_id).catch(() => null);

if (landed) return { doc_id: landed.doc_id, version: landed.version };

}

}

throw lastErr ?? new Error("insertDocResilient: exhausted retries");

for (let attempt = 0; attempt <= retries; attempt++) {

try {

return await insertDoc(query, tableName, input);

} catch (err) {

if (!isTimeoutError(err)) throw err;

lastErr = err;

const landed = await getDocLatest(query, tableName, input.doc_id).catch(() => null);

if (landed) return { doc_id: landed.doc_id, version: landed.version };

if (attempt === retries) break;

await sleep(backoff[Math.min(attempt, backoff.length - 1)]);

// Did the timed-out attempt actually commit server-side? If so, stop —

// re-inserting would fork a second version=1 row.

}

}

throw lastErr ?? new Error("insertDocResilient: exhausted retries");

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/docs/write.ts` around lines 191 - 205, The retry loop in insertDocResilient should perform the getDocLatest landed-check after every timeout, including the final attempt, instead of breaking immediately when attempt === retries. Update the control flow so the last timed-out insert still verifies whether insertDoc actually committed server-side before throwing lastErr, preserving the anti-duplicate guarantee used by generateDocs. Adjust the related docs test to account for the extra read-back on the final retry path.

coderabbitai · 2026-07-01T16:32:12Z

+    // Docs VFS dispatch — the browsable per-directory docs index under
+    // <memory>/docs/. Separate surface from the memory index (no retrieval
+    // pollution). Backed by the docs TABLE, so this is async via api.query.
+    // `/docs` (bare) and `/docs/index.md` both render the root index.
+    if (virtualPath && (virtualPath === "/docs" || virtualPath.startsWith("/docs/")) && !virtualPath.endsWith("/")) {
+      const subpath = virtualPath === "/docs" ? "" : virtualPath.slice("/docs/".length);
+      logFn(`docs vfs: ${subpath || "(root)"}`);
+      const result = await handleDocsVfsFn(subpath, (sql) => api.query(sql), config.docsTableName);
+      const body = result.kind === "ok" ? result.body : `(${result.kind}) ${result.message}`;
+      if (input.tool_name === "Read") {
+        const file_path = writeReadCacheFileFn(input.session_id, virtualPath, body);
+        return buildReadDecision(file_path, `[hivemind docs] ${virtualPath}`);
+      }
+      return buildAllowDecision(safeEchoCommand(body), `[hivemind docs] /docs/${subpath}`);
+    }
+    if (lsDir === "/docs" || lsDir === "/docs/") {
+      const result = await handleDocsVfsFn("", (sql) => api.query(sql), config.docsTableName);
+      const body = result.kind === "ok" ? result.body : `(${result.kind}) ${result.message}`;
+      if (input.tool_name === "Read") {
+        const file_path = writeReadCacheFileFn(input.session_id, "/docs/_listing.txt", body);
+        return buildReadDecision(file_path, "[hivemind docs] ls /docs");
+      }
+      return buildAllowDecision(safeEchoCommand(body), `[hivemind docs] ls /docs`);
+    }


🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Docs VFS Bash echoes are not capped to the Claude tool_result size limit.

The comment at lines 507-509 explains that Bash intercepts must go through capOutputForClaude before the 16KB threshold, and the existing /graph and /index.md "direct read" branches follow it. Both new docs branches (lines 459-465 and 468-474) skip this and echo body/full doc content raw — a large doc's cat will likely exceed the Claude Code tool_result limit.

🛡️ Proposed fix

const result = await handleDocsVfsFn(subpath, (sql) => api.query(sql), config.docsTableName); const body = result.kind === "ok" ? result.body : `(${result.kind}) ${result.message}`; if (input.tool_name === "Read") { const file_path = writeReadCacheFileFn(input.session_id, virtualPath, body); return buildReadDecision(file_path, `[hivemind docs] ${virtualPath}`); } - return buildAllowDecision(safeEchoCommand(body), `[hivemind docs] /docs/${subpath}`); + return buildAllowDecision(safeEchoCommand(capOutputForClaude(body, { kind: "cat" })), `[hivemind docs] /docs/${subpath}`); } if (lsDir === "/docs" || lsDir === "/docs/") { const result = await handleDocsVfsFn("", (sql) => api.query(sql), config.docsTableName); const body = result.kind === "ok" ? result.body : `(${result.kind}) ${result.message}`; if (input.tool_name === "Read") { const file_path = writeReadCacheFileFn(input.session_id, "/docs/_listing.txt", body); return buildReadDecision(file_path, "[hivemind docs] ls /docs"); } - return buildAllowDecision(safeEchoCommand(body), `[hivemind docs] ls /docs`); + return buildAllowDecision(safeEchoCommand(capOutputForClaude(body, { kind: "ls" })), `[hivemind docs] ls /docs`); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Docs VFS dispatch — the browsable per-directory docs index under

// <memory>/docs/. Separate surface from the memory index (no retrieval

// pollution). Backed by the docs TABLE, so this is async via api.query.

// `/docs` (bare) and `/docs/index.md` both render the root index.

if (virtualPath && (virtualPath === "/docs" || virtualPath.startsWith("/docs/")) && !virtualPath.endsWith("/")) {

const subpath = virtualPath === "/docs" ? "" : virtualPath.slice("/docs/".length);

logFn(`docs vfs: ${subpath || "(root)"}`);

const result = await handleDocsVfsFn(subpath, (sql) => api.query(sql), config.docsTableName);

const body = result.kind === "ok" ? result.body : `(${result.kind}) ${result.message}`;

if (input.tool_name === "Read") {

const file_path = writeReadCacheFileFn(input.session_id, virtualPath, body);

return buildReadDecision(file_path, `[hivemind docs] ${virtualPath}`);

}

return buildAllowDecision(safeEchoCommand(body), `[hivemind docs] /docs/${subpath}`);

}

if (lsDir === "/docs" || lsDir === "/docs/") {

const result = await handleDocsVfsFn("", (sql) => api.query(sql), config.docsTableName);

const body = result.kind === "ok" ? result.body : `(${result.kind}) ${result.message}`;

if (input.tool_name === "Read") {

const file_path = writeReadCacheFileFn(input.session_id, "/docs/_listing.txt", body);

return buildReadDecision(file_path, "[hivemind docs] ls /docs");

}

return buildAllowDecision(safeEchoCommand(body), `[hivemind docs] ls /docs`);

}

// Docs VFS dispatch — the browsable per-directory docs index under

// <memory>/docs/. Separate surface from the memory index (no retrieval

// pollution). Backed by the docs TABLE, so this is async via api.query.

// `/docs` (bare) and `/docs/index.md` both render the root index.

if (virtualPath && (virtualPath === "/docs" || virtualPath.startsWith("/docs/")) && !virtualPath.endsWith("/")) {

const subpath = virtualPath === "/docs" ? "" : virtualPath.slice("/docs/".length);

logFn(`docs vfs: ${subpath || "(root)"}`);

const result = await handleDocsVfsFn(subpath, (sql) => api.query(sql), config.docsTableName);

const body = result.kind === "ok" ? result.body : `(${result.kind}) ${result.message}`;

if (input.tool_name === "Read") {

const file_path = writeReadCacheFileFn(input.session_id, virtualPath, body);

return buildReadDecision(file_path, `[hivemind docs] ${virtualPath}`);

}

return buildAllowDecision(safeEchoCommand(capOutputForClaude(body, { kind: "cat" })), `[hivemind docs] /docs/${subpath}`);

}

if (lsDir === "/docs" || lsDir === "/docs/") {

const result = await handleDocsVfsFn("", (sql) => api.query(sql), config.docsTableName);

const body = result.kind === "ok" ? result.body : `(${result.kind}) ${result.message}`;

if (input.tool_name === "Read") {

const file_path = writeReadCacheFileFn(input.session_id, "/docs/_listing.txt", body);

return buildReadDecision(file_path, "[hivemind docs] ls /docs");

}

return buildAllowDecision(safeEchoCommand(capOutputForClaude(body, { kind: "ls" })), `[hivemind docs] ls /docs`);

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/hooks/pre-tool-use.ts` around lines 452 - 475, The new Docs VFS Bash paths in pre-tool-use need the same Claude tool_result size protection as the existing direct-read branches. In the virtualPath and lsDir docs handling around handleDocsVfsFn, cap the returned body with capOutputForClaude before passing it to safeEchoCommand/buildAllowDecision, matching the existing /graph and /index.md behavior. Keep the Read tool cache path unchanged, but ensure Bash echoes never send the full unbounded docs content directly.

efenocchi added 11 commits June 22, 2026 10:28

feat: add hivemind_docs schema and config

da6a0ed

feat: add insert-only hivemind_docs store

30a87e0

fix: add idempotent setDoc to prevent doc history fork

10443fd

feat: add ensureDocsTable and hivemind docs command

0ce3c1b

refactor: extract reusable impactedNodes from renderImpact

2f9518e

feat: add doc drift detection via anchors and blast-radius

266bb36

feat: add doc-edit gate and refresh engine

7e2a1c8

feat: add loadCurrentSnapshot for non-vfs callers

68a141f

feat: add docs refresh command and anchor authoring

71c8b0d

feat: trigger doc refresh from graph build opt-in

2ddbcd6

test: cover docs refresh engine, cli, and snapshot loader

7822f38

test: cover ensureDocsTable create and heal paths

eb92ff8

coderabbitai Bot requested a review from khustup2 June 23, 2026 16:55

fix: keep refresh dry-run read-only, unwrap model fences, boolean fla…

1de043e

…g parsing

coderabbitai Bot reviewed Jun 23, 2026

View reviewed changes

efenocchi added 5 commits June 24, 2026 15:16

fix: short-circuit slow-tier and oversized edits before expensive work

8e5d220

fix: propagate project on doc version bumps

6783553

fix: guard docs set content read inside error handler

4b7ed15

fix: add read-stability gate for deeplake partial reads

c7ec190

test: stub read-stability gate in doc unit tests

16718c7

coderabbitai Bot reviewed Jun 30, 2026

View reviewed changes

efenocchi added 2 commits June 30, 2026 16:57

fix: normalize source before hashing to reduce doc churn

2d17808

feat: add selective docs generate command

9c2d878

github-advanced-security AI found potential problems Jun 30, 2026

View reviewed changes

Comment thread src/docs/generate.ts

re += c;

}

}

return new RegExp(`^${re}$`);

efenocchi added 4 commits July 1, 2026 10:48

fix: retry timed-out doc writes so bulk generate survives backend load

4a961e7

fix: archive docs for deleted or renamed files instead of rewriting

c32ae51

feat: add browsable per-directory docs index

2d2d2ce

feat: serve the docs index as a virtual filesystem under docs/

2b1a2c4

efenocchi added 5 commits July 1, 2026 15:14

refactor: update docs in place instead of insert-only append

f7f5c82

perf: scope docs refresh to the git diff, not the whole corpus

041ce52

feat: make the docs LLM invocation pluggable per host agent

f7108ff

feat: auto-generate docs for newly added files on refresh

9edaed8

perf: refresh docs in parallel with rate-limit backoff

efa573f

coderabbitai Bot reviewed Jul 1, 2026

View reviewed changes

		const stableReads = Math.max(1, opts.stableReads ?? 3);
		const maxReads = Math.max(stableReads, opts.maxReads ?? 10);

Uh oh!

Conversation

efenocchi commented Jun 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

How it works

Steps in this PR

Testing

Not in scope

Summary by CodeRabbit

Release Notes

Uh oh!

github-actions Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage Report

Uh oh!

coderabbitai Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Sequence Diagram(s)

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

efenocchi Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

efenocchi commented Jun 23, 2026 •

edited by coderabbitai Bot

Loading

github-actions Bot commented Jun 23, 2026 •

edited

Loading

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading

coderabbitai Bot Jun 23, 2026 •

edited

Loading