Skip to content

Fix corrupted VCS command output (diffs, blobs, errors)#538

Open
tony wants to merge 7 commits into
masterfrom
subprocess-trimming
Open

Fix corrupted VCS command output (diffs, blobs, errors)#538
tony wants to merge 7 commits into
masterfrom
subprocess-trimming

Conversation

@tony

@tony tony commented Jun 20, 2026

Copy link
Copy Markdown
Member

Summary

  • Fix libvcs._internal.run.run corrupting whitespace-significant output: it stripped every captured line, dropped blank lines, and removed the trailing newline — so a captured git diff was rejected by git apply, git cat-file blob lost indentation and blank lines, and multi-line stderr was concatenated into one run-on string. Scalar reads like rev-parse HEAD hid the defect.
  • Add a trim parameter (default True). The default trims only the whole output (str.rstrip), preserving leading indentation, blank lines, and interior structure while keeping the established no-trailing-newline contract. trim=False returns byte-faithful output for diffs destined for git apply and exact blob contents.
  • Fix stderr assembly so multi-line errors keep their line structure in libvcs.exc.CommandError.output.
  • Document the decision and the measured alternatives in ADR 0001 (docs/project/adr/0001-faithful-subprocess-output-capture.md).

Before / After

Before (default .run(["diff"])) — leading spaces gone, blank context line dropped, no trailing newline:

@@ -1,5 +1,5 @@
def greet(name):
-    message = "hello, " + name
+    message = "hi, " + name
print(message)
return message

git applyerror: corrupt patch at line 6.

After (.run(["diff"], trim=False)) — byte-identical to git diff, applies cleanly:

@@ -1,5 +1,5 @@
 def greet(name):
-    message = "hello, " + name
+    message = "hi, " + name
 
     print(message)
     return message

Changes by area

  • src/libvcs/_internal/run.py: Capture stdout/stderr as raw bytes, decode once, and trim only the whole output when trim=True. Adds the trim parameter (reachable through the Git/Hg/Svn .run() **kwargs passthrough). Removes the per-line strip + blank-line filter and the separator-less stderr join.
  • src/libvcs/cmd/svn.py: Correct the Svn.blame doctest, which expected the previously-stripped (corrupted) output.
  • tests/cmd/test_git.py: Functional tests for trim=False diff fidelity + git apply, byte-exact blob round-trip, the default no-trailing-newline contract, and stderr line preservation.
  • docs/project/adr/: New ADR section (index + ADR 0001) wired into the Project docs.

Design decisions

  • Per-call trim, default True (not verbatim-by-default): returning fully verbatim output breaks the no-trailing-newline contract that callers, doctests, and downstream vcspull rely on for scalar reads. Keeping the trimmed default preserves that contract; trim=False is the explicit opt-in for byte fidelity.
  • Whole-output rstrip, not per-line strip: matches the convention of every tool surveyed in the ADR (pip, uv, mise, mercurial, gitoxide, jujutsu) — capture pristine, transform at the edge. The per-line strip was the outlier and the source of the corruption.
  • Scope is Phase 1 of ADR 0001. The structured CompletedProcess backend (libvcs._internal.subprocess.SubprocessCommand) and retiring console_to_str are Phase 2, deliberately deferred to keep this change additive and back-compatible.

Verification

The old per-line strip/filter is gone:

$ rg "line.strip\(\) for line in" src/libvcs/_internal/run.py

(returns zero matches)

Test plan

  • test_run_trim_false_preserves_diff — captured diff is byte-identical to git diff and passes git apply --check
  • test_run_trim_false_preserves_blobcat-file blob round-trips file contents byte-for-byte
  • test_run_default_trims_trailing_newline — default .run() keeps the no-trailing-newline contract
  • test_run_failure_preserves_stderr_lines — failed command keeps stderr line structure in CommandError.output
  • uv run pytest — full suite green
  • uv run ruff format --check . && uv run ruff check .
  • uv run mypy src tests

why: The legacy run() helper post-processes captured output
(per-line strip, drops blank lines, no trailing newline), which
corrupts whitespace-significant output -- captured diffs no
longer apply, blob reads lose content, multi-line errors smash
together. The decision and rejected alternatives need a durable
record so the behavior isn't "simplified" back into the bug.

what:
- Record the decision: capture pristine, trim/decode at the
  edge, phased migration onto SubprocessCommand.
- Capture the measured alternatives matrix and prior-art lessons
  (pip, uv, mise, mercurial, gitoxide, jujutsu, git) as evidence.
- Start a docs/project/adr/ section with an index and toctree.
@codecov

codecov Bot commented Jun 20, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 94.73684% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.15%. Comparing base (163a945) to head (c42d2d6).

Files with missing lines Patch % Lines
src/libvcs/_internal/run.py 66.66% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #538      +/-   ##
==========================================
+ Coverage   60.98%   61.15%   +0.16%     
==========================================
  Files          40       40              
  Lines        6531     6564      +33     
  Branches     1101     1103       +2     
==========================================
+ Hits         3983     4014      +31     
- Misses       1951     1952       +1     
- Partials      597      598       +1     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

why: The runner trimmed each captured line and dropped blank lines,
so whitespace-significant output was corrupted: captured diffs were
rejected by `git apply`, `cat-file blob` lost indentation and blank
lines, and multi-line stderr was concatenated into one run-on line.
Single-token reads like `rev-parse HEAD` hid the defect.

what:
- Capture stdout/stderr verbatim; decode once; trim only the whole
  output (str.rstrip), preserving leading indentation, blank lines,
  and interior structure.
- Add `trim` (default True). Pass `trim=False` for byte-faithful
  output -- diffs that feed `git apply`, exact blob contents.
- Rejoin stderr without smashing lines together.
- Correct the Svn.blame doctest, which had encoded the stripped output.
- Cover trim=False fidelity, the default trim contract, and stderr
  line preservation with functional tests.
@tony tony force-pushed the subprocess-trimming branch from 7d956cd to 9619dba Compare June 20, 2026 23:49
tony added 5 commits June 20, 2026 19:57
why: Trimming in the capture path corrupted whitespace-significant
output. Returning verbatim by default makes the plain
`.run(["diff"])` faithful and applyable, fixing the original bug
without an opt-in. Bare-value reads opt in with `trim=True`.

what:
- Flip `trim` default from True to False (verbatim).
- Rewrite the docstring around the verbatim default.
why: get_revision() is a scalar API downstream compares to bare
SHAs; with verbatim run() it would carry a trailing newline.

what:
- Strip the rev-parse result so get_revision() stays bare.
why: With verbatim run() output, example output retains its
trailing newline.

what:
- Update git/hg/svn command doctests to the real verbatim output.
- Mark affected docstrings raw so the literal newlines render.
why: Tests that read scalars via the low-level run() or thin cmd
wrappers relied on the old implicit trim.

what:
- Strip rev-parse / symbolic-ref / is-shallow / version reads.
- Invert the default-behavior test to assert verbatim output.
why: The chosen design flipped to verbatim-by-default during review.

what:
- Update ADR 0001 Phase 1, alternatives, and consequences to
  reflect verbatim default with `trim=True` opt-in.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant