Skip to content

run checks for zimcheck quality#348

Open
elfkuzco wants to merge 4 commits into
mainfrom
detect-zimcheck-errors
Open

run checks for zimcheck quality#348
elfkuzco wants to merge 4 commits into
mainfrom
detect-zimcheck-errors

Conversation

@elfkuzco

Copy link
Copy Markdown
Contributor

Rationale

This PR adds a function to check zimcheck results for any errors to the call to update_book_issues. This way, zimcheck errors can be re-computed when book is processed newly or re-processed

Changes

  • add function to run zimcheck quality in body of update_book_issues
  • add zimcheck_summary column to book (for storing aggregation results and reusing)

This closes #346

@elfkuzco elfkuzco self-assigned this Jun 18, 2026
@elfkuzco elfkuzco requested a review from benoit74 June 18, 2026 11:46
@elfkuzco

Copy link
Copy Markdown
Contributor Author

@benoit74 , should we add a migration script to populate the zimcheck summary for existing books?

Also, regarding the pyright downgrade to 1.1.400, I find my editor (Vim) doesn't sync well with 1.1.409 and it doesn't report errors unless I run in a new shell which is a bit of a pain as opposed to having the errors in the editor itself.

@codecov

codecov Bot commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 55.66038% with 47 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.99%. Comparing base (07be618) to head (67d872f).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
backend/src/cms_backend/utils/zim.py 34.37% 19 Missing and 2 partials ⚠️
backend/src/cms_backend/db/book.py 61.11% 9 Missing and 5 partials ⚠️
backend/src/cms_backend/utils/requests.py 54.54% 10 Missing ⚠️
backend/src/cms_backend/context.py 71.42% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #348      +/-   ##
==========================================
- Coverage   81.92%   80.99%   -0.94%     
==========================================
  Files          60       61       +1     
  Lines        3203     3309     +106     
  Branches      333      354      +21     
==========================================
+ Hits         2624     2680      +56     
- Misses        484      525      +41     
- Partials       95      104       +9     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@benoit74

Copy link
Copy Markdown
Contributor

should we add a migration script to populate the zimcheck summary for existing books?

Yes please

@elfkuzco

Copy link
Copy Markdown
Contributor Author

Added a maint script to populate the summaries and it appears a lot of books have no summaries. DB is a mirror of prod

[2026-06-19 08:52:11,676: WARNING] Book 22166e0a-1318-3520-b4f3-075b3a98cfb9 has no zimcheck URL. Skipping...
[2026-06-19 08:52:11,680: WARNING] Book 221922e9-38e6-bc51-67ca-2f6acecc5ee3 has no zimcheck URL. Skipping...
[2026-06-19 08:52:11,683: WARNING] Book 221af22c-99e1-f28e-a5d5-90d10b28b051 has no zimcheck URL. Skipping...
[2026-06-19 08:52:13,226: INFO] Computed zimcheck summary for book 2224872d-0e3c-4865-4987-df5e78e6bd8b
[2026-06-19 08:52:13,230: WARNING] Book 2225c363-a464-620f-eabf-2cbfca9b00e9 has no zimcheck URL. Skipping...
[2026-06-19 08:52:13,233: WARNING] Book 22301b04-4fca-56cd-47bc-0cfd0991cdff has no zimcheck URL. Skipping...
[2026-06-19 08:52:13,235: WARNING] Book 2232c978-604b-bb4a-7b0a-10a6969ee779 has no zimcheck URL. Skipping...
[2026-06-19 08:52:13,237: WARNING] Book 22343719-f1d8-df77-375a-7bd9601c5d5c has no zimcheck URL. Skipping...
[2026-06-19 08:52:13,240: WARNING] Book 223aa124-cf47-1f9d-df23-3f7f9675c6af has no zimcheck URL. Skipping...
[2026-06-19 08:52:13,244: WARNING] Book 22436a7b-02a0-a44f-6a81-09a0a8ba4d4b has no zimcheck URL. Skipping...
[2026-06-19 08:52:13,247: WARNING] Book 2244b5bc-74dd-def4-fa91-3f7c7c7dae3f has no zimcheck URL. Skipping...
[2026-06-19 08:52:13,251: WARNING] Book 224931f2-b996-737e-56a4-45d55a8edeae has no zimcheck URL. Skipping...
[2026-06-19 08:52:13,254: WARNING] Book 224dea47-62c5-4848-beaf-7589d2531191 has no zimcheck URL. Skipping...
[2026-06-19 08:52:13,257: WARNING] Book 2256ad44-307d-2b06-1cf2-b9eab79e13e1 has no zimcheck URL. Skipping...
[2026-06-19 08:52:13,260: WARNING] Book 22580ebc-519c-81f1-a090-f2a86f4c30be has no zimcheck URL. Skipping...
[2026-06-19 08:52:13,262: WARNING] Book 22660129-28d2-7a4a-ebde-5431c06b9d79 has no zimcheck URL. Skipping...
[2026-06-19 08:52:13,265: WARNING] Book 227b9081-ed5c-956e-04c5-4b41f18f5f16 has no zimcheck URL. Skipping...
[2026-06-19 08:52:13,268: WARNING] Book 2282b958-3859-7878-4773-c1fc74431ba8 has no zimcheck URL. 

@elfkuzco

Copy link
Copy Markdown
Contributor Author

[2026-06-19 09:20:38,883: INFO] Finished populating zimcheck summaries: nb_success=575, nb_failure=3, nb_skipped=7915

@benoit74

Copy link
Copy Markdown
Contributor

@elfkuzco I forgot to mention we've discussed the matter with kelson42 and we do not need to recompute historical values. Let's drop the maintenance script. I will review this tomorrow

Comment thread backend/src/cms_backend/db/book.py Outdated
If book is missing zimcheck summary, results will be downloaded from URL and set.
Returns true if it is impossible to run checks or there are no errors.
"""
if book.zim_metadata.get("Scraper") is None:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not prevent from running quality check, just we won't be able to whitelist the scraper but this is OK. You can drop this check

Comment thread backend/src/cms_backend/db/book.py Outdated
zimcheck_summary: ZimcheckSummarySchema
if missing_summary_keys:
if not book.zimcheck_result_url:
return True

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be considered as a quality issue as well (return False), zimcheck is mandatory now.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a book event explaining the issue

Comment thread backend/src/cms_backend/db/book.py Outdated
logger.debug(message)
if raise_exceptions:
raise ValueError(message)
return True

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not happen, so it is a quality issue as well (return False)

Comment thread backend/src/cms_backend/db/book.py Outdated
if zimcheck_summary.error_count is not None and zimcheck_summary.error_count > 0:
if update_events:
book.events.append(
f"{getnow()}: book has zimcheck {zimcheck_summary.error_count} errors"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=> "book has {n} error(s) in zimcheck"

if Context.ignored_scrapers_regex is not None and re.search(
Context.ignored_scrapers_regex, scraper
):
return True

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a book event saying scraper is whitelisted for zimcheck quality

"Unable to retrieve zimcheck results from "
f"{book.zimcheck_result_url}: {zimcheck_response.json}"
)
logger.debug(message)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather add a book event with the message (without response)

@elfkuzco elfkuzco requested a review from benoit74 June 23, 2026 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Detect books with zimcheck errors

2 participants