perf(dashboard): avoid full note reads when building excerpts#1886
perf(dashboard): avoid full note reads when building excerpts#1886joshtrichards wants to merge 3 commits into
Conversation
|
Lint php-cs failures are unrelated - see #1905 |
Signed-off-by: Josh <josh.t.richards@gmail.com>
Signed-off-by: Josh <josh.t.richards@gmail.com>
(not using ISimpleFile oops) Signed-off-by: Josh <josh.t.richards@gmail.com>
93364d6 to
9a9f8b1
Compare
| $excerpt = $this->noteUtil->stripMarkdown($this->getExcerptContent($maxlen)); | ||
|
|
There was a problem hiding this comment.
since this no longer goes through getContent(), it loses the non-UTF-8 handling done there. A UTF-16-encoded note that produced a readable excerpt before will now be read as raw bytes and decoded as UTF-8 below, yielding garbage?
| // Over-read bytes assuming worst-case UTF-8 size (up to 4 bytes per | ||
| // character). This is only a heuristic for preview generation; markdown | ||
| // stripping may reduce the visible character count further. | ||
| $bytesToRead = max(512, $maxlen * 4); |
There was a problem hiding this comment.
Maybe * 6 is better. With the default maxlen=100 this reads 512 bytes. After stripMarkdown() and the leading-title strip (lines 71–76), a long first line / URL / long title can push the visible excerpt below maxlen, where the old full-read produced a complete one
| // Remove any partial trailing multibyte character from the truncated read. | ||
| $content = mb_strcut($content, 0, strlen($content), 'UTF-8'); |
There was a problem hiding this comment.
I dont understand. The comment says this removes a partial trailing multibyte char, but passing strlen($content) (the full byte length) as the cut length makes it effectively a no-op for that purpos?
| // Strip Byte Order Marks (BOM) for UTF-8, UTF-16 BE, and UTF-16 LE | ||
| $content = str_replace(["\xEF\xBB\xBF", "\xFE\xFF", "\xFF\xFE"], '', $content); |
There was a problem hiding this comment.
The UTF-16 BOMs (\xFE\xFF, \xFF\xFE) are stripped here, but since the body isn't transcoded from UTF-16 (a i said at https://github.com/nextcloud/notes/pull/1886/changes#r3450190129) , the rest of a UTF-16 note is still mis-decoded?
Summary
Optimize note excerpt generation by reading only a small prefix of the file instead of loading the full note content.
Changes
mb_strcut()pack()at runtime$titleis typed as string andempty()ends up being true on things like"0".Why
Excerpt generation only needs roughly the first 100 visible characters, so reading the entire note is unnecessary work. This change reduces I/O and memory usage for note list rendering, especially for larger notes or slower storage backends.
Notes
getContent()