PDF stage 4.5: image XObjects — JPEG (DCTDecode) pass-through by andiwand · Pull Request #563 · opendocument-app/OpenDocument.core

andiwand · 2026-06-25T19:53:33Z

Stage 4.5: render image XObjects invoked by Do, starting with JPEGs the browser decodes itself (ISO 32000-1 8.9 / 8.10.5). Stacked on #562 (4.4).

What changed

The filter framework already returns a DCTDecode payload undecoded. parse_x_object now reads an image XObject's bytes at parse time and, for a non-/ImageMask JPEG, keeps the raw payload + image/jpeg mime on the XObject. Other codecs (Flate/LZW raster, masks) leave it empty for later stages, so Do skips them.
A new ImageElement joins the page-element variant: the CTM at Do (its unit square maps to user space), the snapshotted clip, and the encoded bytes. invoke_x_object emits one for a pass-through image.
HTML: an <image> in the page <svg> (so it layers by paint order), placed by flip * CTM * to_box — the vertical flip accounts for the image's first row being its top — with a data:image/jpeg;base64 href and the clip applied via clip-path.

Tests

Extractor-level: placement by the CTM, the no-pass-through skip, clip snapshot, paint-order interleaving. Verified end to end: the odr engine renders speisekarte_helmarts.pdf with four <image> tags carrying real JPEG (/9j/…) data URIs. Full suite green.

🤖 Generated with Claude Code

Render image XObjects invoked by `Do`, starting with JPEGs the browser can decode itself (ISO 32000-1 8.9 / 8.10.5). - The filter framework already hands back a `DCTDecode` payload undecoded (`stopped_at_filter`). `parse_x_object` now reads an image XObject's bytes at parse time and, for a non-`/ImageMask` JPEG, keeps the raw payload + `image/jpeg` mime on the `XObject`. Other codecs (Flate/LZW raster, masks) leave it empty for later stages. - A new `ImageElement` joins the page-element variant: the CTM in effect at `Do` (its unit square maps to user space), the snapshotted clip and the encoded bytes. `invoke_x_object` emits one for a pass-through image. - HTML: an `<image>` in the page `<svg>` (so it layers by paint order), placed by `flip * CTM * to_box` — the vertical flip accounts for the image's first row being its top — with a `data:image/jpeg;base64` href and the clip applied via `clip-path`. Tested at the extractor level (placement by the CTM, the no-pass-through skip, clip snapshot, paint-order interleaving) and verified end to end: the odr engine renders `speisekarte_helmarts.pdf` with four `<image>` tags carrying JPEG data URIs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01F7Lp7cZPX84gvGmaGA6bHq

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c61143729f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-25T19:56:32Z

+        decode_parms =
+            parser.deep_resolve_object_copy(dictionary["DecodeParms"]);
+      }
+      DecodeResult result = decode(filter, decode_parms, std::move(raw));


Avoid decoding images that will be skipped

When a page resource contains a non-JPEG Image XObject with a filter chain the generic decoder does not support for raster images, this unconditional decode() throws during parse_resources even though the image is supposed to be skipped. For example, a /FlateDecode image using TIFF predictor parameters that apply_tiff_predictor rejects will now abort rendering of the whole PDF, including text and paths; before this change image XObjects were ignored, and the nearby comment says non-JPEG codecs should just leave image_data empty. Check the filter chain for a pass-through DCT case, or catch non-pass-through decode failures, before decoding skipped images.

Useful? React with 👍 / 👎.

chatgpt-codex-connector Bot reviewed Jun 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PDF stage 4.5: image XObjects — JPEG (DCTDecode) pass-through#563

PDF stage 4.5: image XObjects — JPEG (DCTDecode) pass-through#563
andiwand wants to merge 1 commit into
pdf-stage-4.4-functions-colorfrom
pdf-stage-4.5-jpeg-images

andiwand commented Jun 25, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

andiwand commented Jun 25, 2026

What changed

Tests

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant