Skip to content

PDF: expose object structure as a browsable filesystem#557

Draft
andiwand wants to merge 1 commit into
mainfrom
pdf-filesystem-objects
Draft

PDF: expose object structure as a browsable filesystem#557
andiwand wants to merge 1 commit into
mainfrom
pdf-filesystem-objects

Conversation

@andiwand

@andiwand andiwand commented Jun 24, 2026

Copy link
Copy Markdown
Member

🤖 Generated with Claude Code

What

Make a PDF's internal object structure browsable by exposing it through the filesystem API and wiring it to the existing filesystem HTML viewer.

New pdf::PdfFilesystem — a read-only ReadableFilesystem adapter over a PDF:

/trailer              the trailer dictionary
/objects/<id>_<gen>   each indirect object's value (stream dict for stream objects)
/streams/<id>_<gen>   each stream object's bytes (filter-decoded when possible, raw otherwise)

create_object_filesystem(pdf) wraps a PDF into an odr::Filesystem, so it browses and renders through the existing filesystem HTML viewer (html::translate(filesystem, …)) — no viewer changes needed.

How

  • The object set is enumerated from the cross-reference table at construction (free entries and unreadable objects are skipped).
  • Content is serialized lazily on open; the backing DocumentParser is held for the filesystem's lifetime, so an instance is transient — one per browse — matching the parser's usage contract.
  • Stream decode failures (image codecs / unsupported filters) fall back to the raw stream bytes so the object stays inspectable.

Tests

test/src/internal/pdf/pdf_filesystem.cpp covers the object/stream listing, serialized object and trailer content, decoded stream bytes, and end-to-end rendering through the filesystem viewer.

Add `pdf::PdfFilesystem`, a read-only `ReadableFilesystem` adapter over a
PDF's low-level object structure:

  /trailer              the trailer dictionary
  /objects/<id>_<gen>   each indirect object's value
  /streams/<id>_<gen>   each stream object's bytes (filter-decoded when
                        possible, raw otherwise)

The object set comes from the cross-reference table at construction;
content is serialized lazily on `open`. `create_object_filesystem` wraps
a PDF into an `odr::Filesystem` so it browses and renders through the
existing filesystem HTML viewer with no viewer changes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011jPrCgocCTnKhMi3k6VBjU
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant