Skip to content

perf(dashboard): avoid full note reads when building excerpts#1886

Open
joshtrichards wants to merge 3 commits into
mainfrom
jtr/perf-dashboard-excerpt
Open

perf(dashboard): avoid full note reads when building excerpts#1886
joshtrichards wants to merge 3 commits into
mainfrom
jtr/perf-dashboard-excerpt

Conversation

@joshtrichards

@joshtrichards joshtrichards commented Jun 4, 2026

Copy link
Copy Markdown
Member

Summary

Optimize note excerpt generation by reading only a small prefix of the file instead of loading the full note content.

Changes

  • add a lightweight best-effort excerpt source reader by switching from the File API's getContent to own fopen/fread
  • read only a bounded prefix sized for excerpt generation
  • trim incomplete trailing UTF-8 bytes with mb_strcut()
  • preserve existing excerpt formatting behavior after markdown stripping
  • return an empty excerpt if the lightweight preview read fails, instead of falling back to a full file read
  • additional optimization: use pre-compiled string literal for BOM stripping eliminating need for pack() at runtime
  • also fixes a small bug: $title is typed as string and empty() ends up being true on things like "0".

Why

Excerpt generation only needs roughly the first 100 visible characters, so reading the entire note is unnecessary work. This change reduces I/O and memory usage for note list rendering, especially for larger notes or slower storage backends.

Notes

  • this is intentionally a performance-oriented heuristic, not a full-fidelity content read
  • markdown-heavy prefixes may occasionally produce shorter previews than before
  • full note reads remain unchanged via getContent()

@joshtrichards joshtrichards added bug Something isn't working enhancement New feature or request feature: dashboard Related to Nextcloud dashboard 3. to review performance 🚀 labels Jun 4, 2026
@joshtrichards

Copy link
Copy Markdown
Member Author

Lint php-cs failures are unrelated - see #1905

@joshtrichards joshtrichards changed the title perf(dashboard): use lightweight streamed reads for note excerpts perf(dashboard): avoid full note reads when building excerpts Jun 19, 2026
Signed-off-by: Josh <josh.t.richards@gmail.com>
Signed-off-by: Josh <josh.t.richards@gmail.com>
(not using ISimpleFile oops)

Signed-off-by: Josh <josh.t.richards@gmail.com>
@enjeck enjeck force-pushed the jtr/perf-dashboard-excerpt branch from 93364d6 to 9a9f8b1 Compare June 22, 2026 06:05

@enjeck enjeck left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments:

Comment thread lib/Service/Note.php
Comment on lines +67 to +68
$excerpt = $this->noteUtil->stripMarkdown($this->getExcerptContent($maxlen));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this no longer goes through getContent(), it loses the non-UTF-8 handling done there. A UTF-16-encoded note that produced a readable excerpt before will now be read as raw bytes and decoded as UTF-8 below, yielding garbage?

Comment thread lib/Service/Note.php
// Over-read bytes assuming worst-case UTF-8 size (up to 4 bytes per
// character). This is only a heuristic for preview generation; markdown
// stripping may reduce the visible character count further.
$bytesToRead = max(512, $maxlen * 4);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe * 6 is better. With the default maxlen=100 this reads 512 bytes. After stripMarkdown() and the leading-title strip (lines 71–76), a long first line / URL / long title can push the visible excerpt below maxlen, where the old full-read produced a complete one

Comment thread lib/Service/Note.php
Comment on lines +108 to +109
// Remove any partial trailing multibyte character from the truncated read.
$content = mb_strcut($content, 0, strlen($content), 'UTF-8');

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont understand. The comment says this removes a partial trailing multibyte char, but passing strlen($content) (the full byte length) as the cut length makes it effectively a no-op for that purpos?

Comment thread lib/Service/Note.php
Comment on lines +111 to +112
// Strip Byte Order Marks (BOM) for UTF-8, UTF-16 BE, and UTF-16 LE
$content = str_replace(["\xEF\xBB\xBF", "\xFE\xFF", "\xFF\xFE"], '', $content);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UTF-16 BOMs (\xFE\xFF, \xFF\xFE) are stripped here, but since the body isn't transcoded from UTF-16 (a i said at https://github.com/nextcloud/notes/pull/1886/changes#r3450190129) , the rest of a UTF-16 note is still mis-decoded?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3. to review bug Something isn't working enhancement New feature or request feature: dashboard Related to Nextcloud dashboard performance 🚀

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants