Let admins mask classifier predictions to a species list by mohamedelabbas1996 · Pull Request #999 · RolnickLab/antenna

mohamedelabbas1996 · 2025-10-14T19:04:33Z

Summary

This adds class masking, a post-processing task an admin can run to clean up classifier output after a pipeline has finished.

Class masking is for the common case where you know which species are actually possible at a site (a regional checklist) but the classifier was trained on a much larger label set. Given a taxa list, it re-weights a classifier's predictions so that only species on the list can contribute, then writes new terminal classifications and recomputes the affected occurrence determinations.

It runs through the post-processing framework that merged in #1289, so it is triggered the same way as the existing small-size filter: select rows in the Django admin, pick the task, fill in its options, and the work runs as a background Job you can follow from the admin. Class masking can be run on a single occurrence (the fast path for iterating on a checklist) or on a whole capture set.

This PR is a clean rework of the original class-masking work (which predated the framework and carried a hand-rolled admin action). The branch was reset onto current main, so the diff shows only the class-masking additions.

Rank roll-up has moved to its own follow-up PR (#1361). It was originally part of this branch, but class masking is validated end to end and well tested, while roll-up still has an open design question (whether the per-rank pick should be constrained to the source taxon's ancestors). Splitting lets class masking merge on its own; roll-up continues in #1361, stacked on this branch.

List of Changes

#	Change (what it does)	How
1	Admins can re-weight a classifier's predictions to a species checklist and have determinations recomputed	`ClassMaskingTask(BasePostProcessingTask)` + `make_post_processing_action` wiring on `OccurrenceAdmin` (single occurrence) and `SourceImageCollectionAdmin` (capture set)
2	An excluded species can no longer win the masked prediction, even if it had the highest raw logit	the winner is the argmax over the kept classes only — the model's softmax is recomputed from `logits`, excluded classes are set to exactly 0, and the top of what remains wins; raises if the taxa list excludes every class in the category map
3	Masked classifications point at a real, reusable algorithm and category map	one masking `Algorithm` per `(source algorithm, taxa list)` with its category map persisted (was set in memory only, leaving a null map)
4	The API exposes where a masked classification came from	`applied_to` populated on new masked classifications and added to the Classification serializers, with `applied_to__algorithm` prefetched in the occurrence list/detail and classification viewset (no N+1)
5	On a single occurrence the "Source classifier" dropdown offers only classifiers that actually ran on it; on a capture set it offers the full list	`ClassMaskingActionForm` scopes the algorithm field for the occurrence path via a small generic seam on the action form base. The capture-set path keeps the full list on purpose: the equivalent lookup is an unbounded `DISTINCT` over every classification in the collection and can time out while the form renders — an over-broad option is harmless because masking a classifier that produced nothing in scope changes nothing
6	Class masking can also be run from the command line on a capture set	`run_class_masking` management command
7	Class masking runs on a whole capture set without stalling or running out of memory, and shows live progress	the task commits in batches (default 200) and, on each batch, updates the Job's progress and surfaces stage counters — classifications checked, classifications masked, and occurrences updated — so an operator can watch it move; rows are streamed with `iterator()` to bound memory. Mirrors the small-size filter. `occurrences_updated` counts only occurrences whose determination actually changed, not every row re-saved
8	An admin can choose whether to renormalise the kept scores	a Reweight checkbox on the form (default on) threads through to the task. On, the kept classes are renormalised to sum to 1; off, the model's original absolute probabilities are kept. The chosen species is identical either way — the argmax is taken over the kept classes regardless — so this only affects the confidence number shown. The two modes write to distinct masking algorithms (`…_reweighted` / `…_absolute`) so a masked classification's provenance records which mode produced it
9	Long post-processing Jobs are no longer revoked while they are still working	progress heartbeats now bump `Job.updated_at`. The stale-job reaper revokes running jobs whose `updated_at` is older than 10 minutes; a narrowed `save(update_fields=["progress"])` did not move the `auto_now` `updated_at`, so any run over 10 minutes was reaped mid-flight even while streaming progress. This also fixes the same latent issue for the small-size filter

Detailed Description

No migration. No model fields change — applied_to already exists on Classification, and the task runs under the existing post_processing job type, so the registry addition does not touch any field definition. makemigrations --check is clean.
Masking is derived from logits alone. The masked score vector is recomputed from the stored logits (softmax over all classes, then excluded classes zeroed and optionally renormalised); the task never reads the stored scores field, which is on a path to being retired. The persisted logits are written back unchanged (JSON-safe) so the masked classification keeps the raw model output; the mask is reflected in the new scores (excluded classes → 0). While scores == softmax(logits) this is numerically identical to the previous behaviour, so the chosen taxon and confidence are unchanged.
Batching and observability (change 7). A single-transaction run over a large capture set could exhaust memory and produced no visible progress until it finished. The task now flushes each batch (bulk-update the source classifications, bulk-create the masked ones, then recompute the touched occurrences) and reports progress plus stage counters per batch, matching how the small-size filter reports. Verified end to end from the admin action on a ~3,800-classification capture set: the Job streamed counts to completion and finished as SUCCESS.
Reaper heartbeat (change 9). BasePostProcessingTask.update_progress and report_stage_metrics now save update_fields=["progress", "updated_at"], mirroring the async-job progress save in ami/jobs/models.py that already does this for the same reason. Django does not auto-add auto_now fields to a narrowed update_fields, so the timestamp had to be listed explicitly. A regression test freezes updated_at past the reaper cutoff and asserts each heartbeat path moves it forward.

How to test

Class masking on a single occurrence (fast path)

Django admin → Occurrences → filter to a project that has classifications with stored logits.
Select one occurrence → Action → the Class Masking post-processing action → Go.
On the intermediate form, pick a Taxa list and a Source classifier (the dropdown lists only classifiers that produced classifications on that occurrence), optionally toggle Reweight, → run.
A background Job is created; the success message links to it. When it finishes, the occurrence's classifications include a new terminal classification re-weighted to the list, and the determination is recomputed.

Class masking on a capture set

Django admin → Capture Sets (SourceImageCollection) → select one → choose the Class Masking action → fill in options → run.
Follow the Job from the admin: the progress bar and the stage counters (classifications checked / masked, occurrences updated) update as it works, and a large set runs to completion rather than being revoked partway.

Verify provenance in the API

GET /api/v2/occurrences/{id}/

Under detections[].classifications[], the new masked classification carries an applied_to object describing the source classification it was derived from.

Let admins roll up low-confidence predictions to a coarser rank #1361 — rank roll-up, split out and stacked on this PR
Measure the effect of post-processing filters on final predictions (offline eval) #1360 — offline evaluation to quantify how each post-processing filter moves final predictions vs ground truth (admin-only-deploy follow-up)
Framework for admins to trigger and review post-processing methods #1289 — post-processing framework this builds on (merged)
Add class masking (draft) #915 — original class-masking discussion
Implement a reusable post-processing framework #957

netlify · 2025-10-14T19:05:10Z

✅ Deploy Preview for antenna-preview canceled.

Name	Link
🔨 Latest commit	`1afe75a`
🔍 Latest deploy log	https://app.netlify.com/projects/antenna-preview/deploys/6a45a03814b6430008c5bf57

Copilot

Pull Request Overview

Implements class masking as a post-processing task that recalculates classifications by masking out classes not present in a provided taxa list and updates occurrences accordingly.

Adds ClassMaskingTask to the post-processing framework and registers it.
Filters and recalculates logits/scores per taxa list, creates new terminal classifications, and updates occurrences.
Minor logging update in job runner.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 10 comments.

File	Description
ami/ml/post_processing/class_masking.py	New class masking task and supporting functions to filter classifications by taxa list and recompute softmax.
ami/ml/post_processing/init.py	Registers the new class_masking task module.
ami/jobs/models.py	Improves log line to print only the task config for post-processing.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

mihow · 2025-10-15T15:17:23Z

+    # Update the occurrence determinations
+    logger.info(f"Updating the determinations for {len(occurrences_to_update)} occurrences")
+    for occurrence in occurrences_to_update:
+        occurrence.save(update_determination=True)


@mohamedelabbas1996 here is how I updated all of the determinations previously

netlify · 2026-02-18T11:37:00Z

✅ Deploy Preview for antenna-ssec canceled.

Name	Link
🔨 Latest commit	`1afe75a`
🔍 Latest deploy log	https://app.netlify.com/projects/antenna-ssec/deploys/6a45a0383965040008e8407c

coderabbitai · 2026-02-18T11:37:10Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a new ClassMaskingTask post-processing feature that re-scores terminal classifications by masking classes outside a provided taxa list, with associated config, admin actions/forms, a management command, registry wiring, applied_to provenance in serializers/prefetches, job heartbeat fixes, and tests.

Changes

Class masking post-processing feature

Layer / File(s)	Summary
Config and core rescoring logic `ami/ml/post_processing/class_masking.py`	`ClassMaskingConfig` enforces exactly one scope; `make_classifications_filtered_by_taxa_list` masks/renormalizes logits, demotes old classifications, creates new terminal ones, and updates occurrence determinations in batched transactions.
Task orchestration and registry `ami/ml/post_processing/class_masking.py`, `ami/ml/post_processing/registry.py`, `ami/ml/post_processing/__init__.py`	`ClassMaskingTask` resolves scope, creates/reuses a masking algorithm, and runs rescoring with progress/metrics reporting; registered in `POSTPROCESSING_TASKS` and package exports.
Admin action and form wiring `ami/ml/post_processing/admin/forms.py`, `ami/ml/post_processing/admin/actions.py`, `ami/ml/post_processing/admin/class_masking_form.py`, `ami/main/admin.py`	`ClassMaskingActionForm` exposes algorithm/taxa list/reweight fields, scoped by `scope_queryset`; `run_class_masking` actions added to `OccurrenceAdmin` and `SourceImageCollectionAdmin`.
Management command `ami/ml/management/commands/run_class_masking.py`	New `run_class_masking` command validates inputs, reports counts, supports `--dry-run`, and runs `ClassMaskingTask`.
Applied-to provenance in API `ami/main/api/serializers.py`, `ami/main/api/views.py`, `ami/main/models_future/occurrence.py`	`ClassificationAppliedToSerializer` and `applied_to` fields added to classification serializers; `select_related`/prefetch updated to eagerly load the provenance relation.
Job heartbeat fix `ami/ml/post_processing/base.py`, `ami/ml/tests.py`	`update_progress` and `report_stage_metrics` now also save `updated_at` to avoid stale-job reaping.
Tests `ami/ml/post_processing/tests/test_class_masking.py`, `ami/ml/post_processing/tests/test_class_masking_admin.py`	New tests cover masking logic, task runs, batching, metrics, reweight toggle, config validation, and admin action/form behavior.

Estimated code review effort: 4 (Complex) | ~60 minutes

Sequence Diagram(s)

sequenceDiagram
  participant Admin as Django Admin User
  participant Form as ClassMaskingActionForm
  participant Task as ClassMaskingTask
  participant Fn as make_classifications_filtered_by_taxa_list
  participant DB as Database

  Admin->>Form: select occurrence/collection, submit action
  Form-->>Admin: render algorithm, taxa_list, reweight fields
  Admin->>Task: confirm (creates Job with ClassMaskingConfig)
  Task->>Task: resolve scope, get/create masking algorithm
  Task->>Fn: run rescoring on scoped classifications
  Fn->>DB: bulk_update demote originals
  Fn->>DB: bulk_create new terminal classifications
  Fn->>DB: save occurrences (update_determination=True)
  Fn-->>Task: metrics (masked, occurrences_updated)
  Task-->>Admin: job progress and completion

Possibly related PRs

RolnickLab/antenna#1289: Uses the same make_post_processing_action/base admin form-task framework that the new run_class_masking action wires into.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 26.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: adding admin class-masking of classifier predictions to a species list.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The description covers summary, changes, details, testing, and related issues; only optional screenshots, deployment notes, and checklist are omitted.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/postprocessing-class-masking

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 12

🧹 Nitpick comments (3)

ami/ml/post_processing/class_masking.py (2)

149-161: Variable shadowing: logits and scores are reassigned, hiding the originals from line 126.

logits is first bound from classification.logits (line 126), then rebound to logits_np.tolist() (line 151). Similarly scores is rebound at line 161. This makes the code confusing—use distinct names like masked_logits and masked_scores.

Proposed fix

-        logits: list[float] = logits_np.tolist()
+        masked_logits: list[float] = logits_np.tolist()
         ...
-        scores: list = scores_np.tolist()
+        masked_scores: list[float] = scores_np.tolist()

Then update all downstream references accordingly (lines 163, 166, 174, 186-188).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ami/ml/post_processing/class_masking.py` around lines 149 - 161, The code
rebinds existing variables logits and scores (originally from
classification.logits) which causes variable shadowing and confusion; rename the
converted/filtered values (e.g., change logits_np.tolist() to masked_logits and
scores_np.tolist() to masked_scores), keep logits_np and scores_np as-is, and
update all downstream uses that currently reference logits or scores to use
masked_logits and masked_scores (also check any other places using logits or
scores after the masking block and adjust them to the new names).

48-75: job parameter is accepted but unused.

update_occurrences_in_collection accepts a job parameter (line 55) but never uses it. Either remove it or pass it along for progress tracking.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ami/ml/post_processing/class_masking.py` around lines 48 - 75, The function
update_occurrences_in_collection currently accepts a job parameter but never
uses it; either remove the unused parameter or propagate it for progress
tracking—preferably pass job through to downstream processing so callers can
monitor progress. Modify update_occurrences_in_collection (and any TODO block
that creates the AlgorithmCategoryMap) to accept and forward job to
make_classifications_filtered_by_taxa_list by adding job=job to that call and,
if needed, update make_classifications_filtered_by_taxa_list's signature (and
any other helper functions) to accept and use job for progress/log updates;
alternatively, if you choose to remove it, delete the job parameter from
update_occurrences_in_collection and update all callers accordingly.

ami/ml/post_processing/rank_rollup.py (1)

104-106: Remove commented-out debugger import.

Proposed fix

                     threshold = thresholds.get(rank, 1.0)
-                    # import pdb
-
-                    # pdb.set_trace()
                     candidates = {t: s for t, s in taxon_scores.items() if t.rank == rank}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ami/ml/post_processing/rank_rollup.py` around lines 104 - 106, Remove the
leftover commented debugger lines in the rank rollup module: delete the lines
containing "# import pdb" and "# pdb.set_trace()" in
ami/ml/post_processing/rank_rollup.py (look for the commented debugger import
and trace calls near the post-processing/rank rollup logic) so no commented
debugger statements remain; ensure no other pdb references are left in functions
or methods within that module.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ami/main/admin.py`:
- Around line 738-757: The DEFAULT_THRESHOLDS in the admin action
run_rank_rollup uses lowercase keys ("species","genus","family") but
RankRollupTask expects uppercase keys via RankRollupTask.ROLLUP_ORDER; update
DEFAULT_THRESHOLDS to use the same uppercase keys (e.g.,
"SPECIES","GENUS","FAMILY") or transform keys before creating the Job params so
they match RankRollupTask.ROLLUP_ORDER/lookup; ensure the symbol
DEFAULT_THRESHOLDS and the params payload in run_rank_rollup are consistent with
RankRollupTask.ROLLUP_ORDER used in rank_rollup.py.
- Around line 429-474: The admin action update_with_newfoundland_species
hardcodes TaxaList and Algorithm names and uses a bare except Exception; change
it to accept configurable taxa_list and algorithm (e.g., read from settings,
admin action parameters, or a constant at module top) instead of using the
literal "Newfoundland Species" and "Quebec & Vermont Species Classifier - Apr
2024", and add a TODO or link to your tracking issue to avoid shipping hardcoded
values; also replace the broad except in the loop with a narrower exception (or
catch Exception only to log full traceback via logging.exception and then
re-raise or continue) so programming errors aren't silently swallowed—update
references: update_with_newfoundland_species, TaxaList.objects.get,
Algorithm.objects.get, and update_single_occurrence accordingly.

In `@ami/ml/post_processing/class_masking.py`:
- Line 129: Move the numpy imports out of the loop and to module level: add
"import numpy as np", "from numpy import exp", and "from numpy import sum as
np_sum" at the top of the file, then delete the in-loop statements "import numpy
as np", "from numpy import exp", and "from numpy import sum" (the imports
currently inside the for loop). Ensure subsequent code uses np, exp, and np_sum
consistently after the change.
- Around line 170-171: Remove the two debug print statements that output "Top
taxon" and "Top index" in ami/ml/post_processing/class_masking.py; either delete
the lines that call print(...) or replace them with a proper logger.debug(...)
call (using the module/class logger) so the variables
category_map_with_taxa[top_index] and top_index are not printed to stdout in
production.
- Around line 206-208: Replace the root-logger call logging.info(...) with the
module-level logger call logger.info(...) where the message "Adding new
classification for Taxon {top_taxon} to occurrence
{new_classification.detection.occurrence}" is emitted; locate this call in the
function that creates new_classification and ensure the module-level logger
(logger defined at top of the module) is used so module-specific logging
configuration is respected.
- Around line 183-197: The new Classification is being created with
category_map=None which breaks methods like predictions() and top_n() because
Classification.save() (which auto-populates category_map from
algorithm.category_map) isn't called when later using bulk_create; update the
creation logic in class_masking.py so the new Classification's category_map is
explicitly set from new_algorithm.category_map (or otherwise ensure
Classification.save() runs before bulk_create) when constructing
new_classification (referencing Classification, new_classification,
new_algorithm, and Classification.save()) so downstream calls to
.predictions()/.top_n() have a valid category_map.
- Around line 84-112: The queryset taxa_in_list is iterated with "not in"
causing repeated DB hits and len(classifications) forces full evaluation; change
taxa_in_list = taxa_list.taxa.all() to a concrete set (e.g., a set of taxon PKs
or Taxon instances) and update the exclusion check in the comprehension that
builds excluded_category_map_with_taxa to test membership against that set (use
category["taxon"].pk or category["taxon"] depending on which identity you
store), and replace len(classifications) with classifications.count() to avoid
loading the whole QuerySet into memory; update any related variables like
category_map_with_taxa and excluded_category_indices to work with the chosen
identity (pk vs instance).

In `@ami/ml/post_processing/rank_rollup.py`:
- Around line 37-38: DEFAULT_THRESHOLDS and ROLLUP_ORDER use uppercase rank keys
while external callers pass lowercase, causing thresholds.get(rank, 1.0) to miss
matches; fix by normalizing incoming threshold keys to the same case as
ROLLUP_ORDER/DEFAULT_THRESHOLDS (e.g., convert provided thresholds to upper()
with something like thresholds = {k.upper(): v for k, v in thresholds.items()}
before any lookups/merging), and ensure any merging logic that references
DEFAULT_THRESHOLDS (and places where thresholds.get(rank, 1.0) is used) operates
on the normalized dict so rollup comparisons use the intended values.
- Around line 138-143: The code assumes clf.detection.occurrence is always
present and will raise when it's None; update the block around
updated_occurrences.append(occurrence) and the self.logger.info call to guard
against a missing occurrence: retrieve occurrence = clf.detection.occurrence,
then if occurrence is not None append it and log the rolled-up message using
occurrence.pk, otherwise handle the null case (e.g., skip appending and log a
different message referencing clf.detection.pk or clf.pk and the
new_taxon/new_score) so no AttributeError is raised when occurrence is None.
- Around line 70-71: The log formatting crashes when Classification.score is
None; update the logger.info call inside the loop (the for i, clf in
enumerate(qs.iterator(), start=1) block) to handle a nullable clf.score by
rendering a safe value—e.g., compute a short_score variable that is
f"{clf.score:.3f}" if clf.score is not None else "N/A" (or 0.0) and use that in
the message instead of directly formatting clf.score with : .3f so None never
raises TypeError.
- Around line 82-96: The per-label DB hit comes from
Taxon.objects.filter(name=label).first() inside the loop over
clf.scores/clf.category_map.labels; build a single mapping of labels to Taxon
before iterating classifications by collecting all labels (from
clf.category_map.labels across classifications), querying
Taxon.objects.filter(name__in=labels) once to produce a dict label_to_taxon,
then replace Taxon.objects.filter(...).first() with label_to_taxon.get(label)
and keep the existing handling (log skip when None) while leaving the rest of
the logic (rollup_order, find_ancestor_by_parent_chain, taxon_scores
accumulation, and debug logging) unchanged.

In `@ami/ml/tests.py`:
- Around line 973-1009: The thresholds dict keys are case-sensitive causing
lookups in RankRollupTask to miss lowercase keys and fall back to 1.0; fix this
by normalizing threshold keys or lookup to be case-insensitive — e.g., in the
RankRollupTask initializer or wherever you read thresholds (the thresholds
parameter and the code that uses thresholds.get(...)), convert keys to a
canonical form (upper-case) or use thresholds.get(taxon.rank.upper(),
default_threshold) when computing aggregate scores; after fixing, update the
test inputs if needed so genus/species scores exercise the intended non-1.0
thresholds.

---

Nitpick comments:
In `@ami/ml/post_processing/class_masking.py`:
- Around line 149-161: The code rebinds existing variables logits and scores
(originally from classification.logits) which causes variable shadowing and
confusion; rename the converted/filtered values (e.g., change logits_np.tolist()
to masked_logits and scores_np.tolist() to masked_scores), keep logits_np and
scores_np as-is, and update all downstream uses that currently reference logits
or scores to use masked_logits and masked_scores (also check any other places
using logits or scores after the masking block and adjust them to the new
names).
- Around line 48-75: The function update_occurrences_in_collection currently
accepts a job parameter but never uses it; either remove the unused parameter or
propagate it for progress tracking—preferably pass job through to downstream
processing so callers can monitor progress. Modify
update_occurrences_in_collection (and any TODO block that creates the
AlgorithmCategoryMap) to accept and forward job to
make_classifications_filtered_by_taxa_list by adding job=job to that call and,
if needed, update make_classifications_filtered_by_taxa_list's signature (and
any other helper functions) to accept and use job for progress/log updates;
alternatively, if you choose to remove it, delete the job parameter from
update_occurrences_in_collection and update all callers accordingly.

In `@ami/ml/post_processing/rank_rollup.py`:
- Around line 104-106: Remove the leftover commented debugger lines in the rank
rollup module: delete the lines containing "# import pdb" and "#
pdb.set_trace()" in ami/ml/post_processing/rank_rollup.py (look for the
commented debugger import and trace calls near the post-processing/rank rollup
logic) so no commented debugger statements remain; ensure no other pdb
references are left in functions or methods within that module.

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (5)

ami/ml/post_processing/class_masking.py (2)
51-75: job and params parameters are accepted but never used.

job is accepted (line 58) but never referenced for progress reporting, and params (line 55) is only logged. If progress tracking is intended, wire job into the loop. Otherwise, remove unused parameters to avoid confusion.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/ml/post_processing/class_masking.py` around lines 51 - 75, The function
update_occurrences_in_collection accepts job and params but never uses them;
either remove these parameters or wire them up: if progress reporting is
intended, pass the job into make_classifications_filtered_by_taxa_list (or use
job.update_progress / task_logger within the loop inside
make_classifications_filtered_by_taxa_list) and use params to control behavior
(e.g., filter options or batch sizes) when building classifications; otherwise
remove job and params from the signature and all call sites to avoid dead API
surface. Ensure you update references to update_occurrences_in_collection and
make_classifications_filtered_by_taxa_list so their signatures match and
document the chosen behavior.
184-186: assert statements will be stripped under python -O; use explicit checks.

Lines 184–186 use assert for runtime validation of detection/occurrence presence. These are silently removed when Python runs with optimizations enabled. Since this guards data integrity before bulk operations, use explicit if/raise instead.
Proposed fix
-        assert new_classification.detection is not None
-        assert new_classification.detection.occurrence is not None
-        occurrences_to_update.add(new_classification.detection.occurrence)
+        if new_classification.detection is None:
+            raise ValueError(f"New classification for {classification.pk} has no detection.")
+        if new_classification.detection.occurrence is None:
+            raise ValueError(f"Detection {new_classification.detection.pk} has no occurrence.")
+        occurrences_to_update.add(new_classification.detection.occurrence)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/ml/post_processing/class_masking.py` around lines 184 - 186, Replace the
runtime asserts that check new_classification.detection and
new_classification.detection.occurrence with explicit checks that raise a clear
exception (e.g., ValueError or RuntimeError) so they are not stripped under
python -O; specifically, in the block where new_classification is processed (the
code referencing new_classification.detection,
new_classification.detection.occurrence and occurrences_to_update.add(...)),
change the two assert lines to if-not checks that raise including contextual
information (e.g., include new_classification id or index) before adding to
occurrences_to_update.
ami/ml/tests.py (1)
1182-1187: Missing logits__isnull=False filter — inconsistent with production code path.

update_occurrences_in_collection in class_masking.py (line 67) filters with logits__isnull=False, but this test queryset at lines 1182–1187 omits it. While the test data always has logits, adding the filter would keep the test closer to the production query and help catch regressions.
Proposed fix
         classifications = Classification.objects.filter(
             detection__occurrence=occ,
             terminal=True,
             algorithm=self.algorithm,
             scores__isnull=False,
+            logits__isnull=False,
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/ml/tests.py` around lines 1182 - 1187, The test queryset using
Classification.objects.filter(detection__occurrence=occ, terminal=True,
algorithm=self.algorithm, scores__isnull=False) should mirror production by also
filtering out null logits; update the filter call in the test to include
logits__isnull=False so the query matches the one used in
update_occurrences_in_collection in class_masking.py and will catch regressions
related to missing logits.
ami/ml/post_processing/rank_rollup.py (2)
13-26: Parent-chain traversal triggers a DB query per hop.

Each call to current.parent is a lazy FK lookup — one query per ancestor level. When called inside the nested loop (line 95), this multiplies with the number of labels × ranks × classifications.

Consider prefetching or caching the parent chain. For example, if Taxon stores parents_json or a materialized path, use that instead. Otherwise, a single upfront query building a taxon_id → parent map would eliminate the repeated hits.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/ml/post_processing/rank_rollup.py` around lines 13 - 26, The parent-chain
traversal in find_ancestor_by_parent_chain causes a DB query per current.parent
access; modify the implementation to avoid lazy FK hops by either using
Taxon.parents_json or a materialized path if available, or change the function
signature to accept a prebuilt mapping (e.g., taxon_id → parent_id/object) and
traverse that map instead of current.parent; update callers (the nested loop
that invokes find_ancestor_by_parent_chain) to pass the preloaded map or
parents_json after performing a single upfront query that fetches all taxa and
parent relationships, so ancestor lookups are O(depth) in-memory rather than
multiple DB queries.
70-152: Long-running transaction wraps the entire classification loop including per-row DB writes.

The transaction.atomic() block spans the full iteration over potentially thousands of classifications, each performing an UPDATE (line 126) and a CREATE (line 127). This holds a database transaction open for the entire duration, which can cause lock contention and connection pool exhaustion under load.

Consider batching: accumulate the updates/creates and perform them in smaller atomic batches (similar to how class_masking.py uses bulk_update/bulk_create outside the loop).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/ml/post_processing/rank_rollup.py` around lines 70 - 152, The
transaction.atomic() currently wraps the entire loop causing long transactions;
change to process classifications in small batches: iterate the queryset without
a global atomic, accumulate pending Classification updates (e.g., mark
non-terminal) and new Classification instances (for create) into lists
(reference the Classification.objects.filter(...).update(...) and
Classification.objects.create(...) usages), and every N items (configurable
batch_size) run a short transaction.atomic() that performs bulk_update or
bulk_create for the accumulated items and clears the buffers; ensure you still
append updated occurrences to updated_occurrences and call update_progress at
the same batch boundary so behavior remains the same while avoiding one long
transaction.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ami/ml/post_processing/class_masking.py`:
- Around line 238-250: self.algorithm.category_map is only updated in memory
(assignment of algorithm.category_map to self.algorithm.category_map) so the
Algorithm DB row stays stale; after the in-memory assignment you should persist
the change by saving the Algorithm model (e.g., call
self.algorithm.save(update_fields=["category_map"])) or alternatively avoid
mutating self.algorithm and pass algorithm.category_map explicitly into
update_occurrences_in_collection/Classification creation; update the code around
the assignment and before calling update_occurrences_in_collection to ensure the
Algorithm DB record's category_map FK is saved.
- Around line 148-155: The equality check using classification.scores == scores
and classification.logits == logits is fragile due to floating-point
differences; replace it with a tolerance-based comparison (e.g., numpy.allclose
or an elementwise isclose with a small rtol/atol) when comparing
classification.scores to scores and classification.logits to logits inside the
block around top_index/top_taxon; alternatively or additionally, check whether
any indices that would be masked had non-negligible original logits (e.g.,
abs(logit) > tiny_threshold) before deciding to skip updating to avoid spurious
updates from rounding noise.
- Around line 109-113: The logger.info call is forcing materialization of the
classifications QuerySet by using len(classifications); replace that with the
previously computed classification_count (or call classifications.count()) so
the QuerySet is not evaluated into memory—update the f-string in the logging
statement where logger.info is called (referencing category_map_with_taxa,
excluded_category_map_with_taxa, and classifications/classification_count) to
use classification_count instead of len(classifications).

In `@ami/ml/post_processing/rank_rollup.py`:
- Around line 59-63: The query building variable qs uses
Classification.objects.filter(...) joining through
detection__source_image__collections which can produce duplicate Classification
rows; update the query in rank_rollup.py (the qs assignment using
Classification.objects.filter with terminal=True, taxon__isnull=False,
detection__source_image__collections__id=collection_id) to call .distinct()
after the filter to deduplicate results (match the pattern used by
update_occurrences_in_collection in class_masking.py).

In `@ami/ml/tests.py`:
- Around line 942-971: The category map is built from species_taxa using the
model's default ordering which can differ from the tests that explicitly use
.order_by("name") — update the query in _create_category_map_with_algorithm to
fetch species_taxa =
list(self.project.taxa.filter(rank=TaxonRank.SPECIES.name).order_by("name")[:3])
(and apply the same .order_by("name") change to any other similar query that
builds category maps or compares indices, e.g., the other occurrence around
lines 1025-1037) so the AlgorithmCategoryMap.data and labels indices align with
the test's expected species ordering; keep building AlgorithmCategoryMap and
Algorithm exactly as before.

---

Duplicate comments:
In `@ami/main/admin.py`:
- Around line 429-474: The admin action update_with_newfoundland_species
currently swallows all exceptions with a broad "except Exception as e" around
update_single_occurrence calls; when you generalize this action replace that
broad handler with a narrower one such as "except (ValueError, RuntimeError,
TaxaList.DoesNotExist) as e" and keep the existing self.message_user(request,
f'Error processing occurrence {occurrence.pk}: {str(e)}', level='error') pattern
(or add logging/traceback) so single failures still report but you avoid hiding
unexpected exceptions from update_single_occurrence and related code paths.

In `@ami/ml/post_processing/rank_rollup.py`:
- Around line 84-98: The loop issues an N+1 by calling
Taxon.objects.filter(name=label).first() for every label; instead, gather the
set of all labels used by clf.category_map.labels before iterating
classifications and fetch all matching Taxon rows once (e.g.,
Taxon.objects.filter(name__in=labels) and build a dict mapping name→Taxon), then
replace Taxon.objects.filter(name=label).first() with a lookup into that mapping
(handle missing names by logging and continuing). Update references in the
function containing clf.scores and clf.category_map.labels so taxon lookup uses
the preloaded dict.

---

Nitpick comments:
In `@ami/ml/post_processing/class_masking.py`:
- Around line 51-75: The function update_occurrences_in_collection accepts job
and params but never uses them; either remove these parameters or wire them up:
if progress reporting is intended, pass the job into
make_classifications_filtered_by_taxa_list (or use job.update_progress /
task_logger within the loop inside make_classifications_filtered_by_taxa_list)
and use params to control behavior (e.g., filter options or batch sizes) when
building classifications; otherwise remove job and params from the signature and
all call sites to avoid dead API surface. Ensure you update references to
update_occurrences_in_collection and make_classifications_filtered_by_taxa_list
so their signatures match and document the chosen behavior.
- Around line 184-186: Replace the runtime asserts that check
new_classification.detection and new_classification.detection.occurrence with
explicit checks that raise a clear exception (e.g., ValueError or RuntimeError)
so they are not stripped under python -O; specifically, in the block where
new_classification is processed (the code referencing
new_classification.detection, new_classification.detection.occurrence and
occurrences_to_update.add(...)), change the two assert lines to if-not checks
that raise including contextual information (e.g., include new_classification id
or index) before adding to occurrences_to_update.

In `@ami/ml/post_processing/rank_rollup.py`:
- Around line 13-26: The parent-chain traversal in find_ancestor_by_parent_chain
causes a DB query per current.parent access; modify the implementation to avoid
lazy FK hops by either using Taxon.parents_json or a materialized path if
available, or change the function signature to accept a prebuilt mapping (e.g.,
taxon_id → parent_id/object) and traverse that map instead of current.parent;
update callers (the nested loop that invokes find_ancestor_by_parent_chain) to
pass the preloaded map or parents_json after performing a single upfront query
that fetches all taxa and parent relationships, so ancestor lookups are O(depth)
in-memory rather than multiple DB queries.
- Around line 70-152: The transaction.atomic() currently wraps the entire loop
causing long transactions; change to process classifications in small batches:
iterate the queryset without a global atomic, accumulate pending Classification
updates (e.g., mark non-terminal) and new Classification instances (for create)
into lists (reference the Classification.objects.filter(...).update(...) and
Classification.objects.create(...) usages), and every N items (configurable
batch_size) run a short transaction.atomic() that performs bulk_update or
bulk_create for the accumulated items and clears the buffers; ensure you still
append updated occurrences to updated_occurrences and call update_progress at
the same batch boundary so behavior remains the same while avoiding one long
transaction.

In `@ami/ml/tests.py`:
- Around line 1182-1187: The test queryset using
Classification.objects.filter(detection__occurrence=occ, terminal=True,
algorithm=self.algorithm, scores__isnull=False) should mirror production by also
filtering out null logits; update the filter call in the test to include
logits__isnull=False so the query matches the one used in
update_occurrences_in_collection in class_masking.py and will catch regressions
related to missing logits.

mihow · 2026-02-18T12:19:02Z

Browser Testing: Class Masking & Admin Intermediate Page

1. Management Command Test (end-to-end)

Ran run_class_masking against real data in local dev:

Collection:     VCE - Mothra, Every 10 minutes (id=26)
Taxa list:      Taxa returned by Quebec & Vermont Species Classifier - Apr 2024 (id=10, 1688 taxa)
Algorithm:      Quebec & Vermont Species Classifier - Apr 2024 (id=10, 2497 labels)
Classifications to process: 470

Category map: 2497 categories total, 809 excluded by taxa list
Result: 470 existing classifications marked non-terminal
        470 new classifications created with masked logits
        421 occurrences had determinations updated

Dry-run mode (--dry-run) also works correctly — reports counts without making changes.

2. Admin Intermediate Page (replaces hardcoded action)

The hardcoded update_with_newfoundland_species action has been replaced with a generic run_class_masking action that shows an intermediate form:

User selects occurrences in the changelist
Chooses "Run class masking (select taxa list & algorithm)" from the action dropdown
An intermediate page shows:
- How many occurrences are selected
- What the action will do (mark existing as non-terminal, create new masked classifications, update determinations)
- Taxa list dropdown — all taxa lists with >0 taxa, showing taxa count
- Source algorithm dropdown — all algorithms with category maps, showing label count
- Submit and Cancel buttons

The dropdowns are populated dynamically from the database, so any new taxa lists or algorithms are automatically available.

3. Classifications After Masking (admin view)

After running class masking, the Classification admin shows the new entries:

Algorithm column shows #27 "Class masking" (class_masking) v1
Each classification has the correct species, scores recalculated from masked logits
Original classifications were marked as non-terminal

Changes in this update

ami/main/admin.py: Replaced hardcoded update_with_newfoundland_species with run_class_masking using Django admin intermediate page pattern
ami/templates/admin/main/class_masking_confirmation.html: New template for the intermediate form
Import ordering cleaned up (Count, TemplateResponse)

mihow · 2026-02-18T12:24:14Z

Browser Testing: Class Masking in the UI

Management Command

Ran class masking on real data:

Collection 26: VCE - Mothra (470 classifications with logits)
Algorithm 10: Quebec & Vermont Species Classifier - Apr 2024 (2497 labels)
TaxaList 10: Taxa returned by QC&VT Species Classifier (1688 taxa, 809 categories excluded)
Result: 470 new classifications created, 421 occurrence determinations updated

Admin: Intermediate Page

Replaced the hardcoded update_with_newfoundland_species action with a generic run_class_masking action that shows an intermediate form to select any TaxaList and Algorithm. The form filters to only show TaxaLists with taxa and Algorithms with category maps.

Occurrence Detail — Identifications Tab

Tested affected occurrences (e.g., /projects/18/occurrences/43498?tab=identification).

What works well:

"Class masking" shows as the algorithm name on the prediction card
"Terminal classification" / "Intermediate classification" labels are correct
Predictions appear in chronological order (newest first)
Original classifier prediction preserved as "Intermediate" (not deleted)
Agree/Confirm buttons work on the new class masking prediction

Observations / potential improvements:

No visual distinction for post-processing: Class masking predictions look identical to primary classifier predictions. Consider a badge, icon, or subtitle indicating this is a post-processing step.
applied_to lineage not exposed: The Classification.applied_to FK exists in the backend but isn't in ClassificationNestedSerializer. The UI can't show "derived from [original classification]".
Score displays as "(0.00)": After masking 809 of 2497 categories, probability mass is redistributed. Top scores drop from ~0.94 to ~0.001, which formats as "(0.00)" in detection labels. Mathematically correct but looks odd.
No undo mechanism: Original classifications are preserved as non-terminal, so reversal is possible, but there's no UI for it.

Sample Results

Occurrence	Before (QC&VT)	After (Class Masking)
43498	Campaea perlata (0.94)	Protoboarmia porcelaria (0.001)
43500	Lymantria dispar (0.94)	Lymantria dispar (0.002)
43501	Platynota exasperatana	Platynota exasperatana (unchanged)

mihow · 2026-02-18T12:24:33Z

Browser Testing: Identification Tab After Class Masking

Tested occurrence detail pages

Navigated to the occurrence detail page (Identifications tab) for occurrences affected by class masking.

Occurrence 43498 — determination changed from Campaea perlata to Protoboarmia porcelaria:

The Identifications tab shows three predictions in order:

Class masking — "Terminal classification" — Protoboarmia porcelaria (Feb 18, 2026)
Quebec & Vermont Species Classifier - Apr 2024 — "Intermediate classification" — Protoboarmia porcelaria (Jan 26, 2025)
Moth / Non-Moth Classifier — "Intermediate classification" — Moth (duplicate) (Jan 26, 2025)

Occurrence 43501 — determination stayed the same (Platynota exasperatana):
Same structure — Class masking terminal on top, original classifier demoted to intermediate.

Key observations

What works well:

"Class masking" appears as the algorithm name, clickable → links to algorithm detail page
"Terminal classification" / "Intermediate classification" labels are correct
Chronological ordering is correct (newest first)
Agree/Confirm buttons appear on the terminal classification
Taxonomy breadcrumbs (Family › Tribe › Genus) display correctly

Issues / improvement ideas:

Scores are very low — detection label shows "(0.00)" because masked softmax redistributes probability. The original score was 0.94, now it's 0.0015. This is technically correct but could confuse users.
No visual distinction for post-processing vs. ML classifier predictions. "Class masking" looks identical to "Quebec & Vermont Species Classifier". A badge or icon could help indicate post-processing origin.
applied_to not surfaced — the Classification.applied_to FK tracks which original classification was masked, but ClassificationNestedSerializer doesn't include it. The UI can't show lineage ("this result was derived from X").
Algorithm name "Class masking" is generic — doesn't indicate which taxa list was used. Consider including taxa list name in algorithm name or adding metadata.

Admin intermediate page

The hardcoded update_with_newfoundland_species action has been replaced with a dynamic form:

Dropdown for Taxa list (filtered to lists with >0 taxa, shows count)
Dropdown for Source algorithm (filtered to algorithms with category maps, shows label count)
Explanation of what will happen, submit + cancel buttons

Screenshots taken locally at docs/claude/screenshots/.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

ami/main/admin.py (3)

763-782: DEFAULT_THRESHOLDS is a hidden magic constant buried inside the action.

The uppercase keys now match what RankRollupTask expects (the previous bug is fixed). However, defining DEFAULT_THRESHOLDS as a local variable makes the knobs invisible to operators who want to tune rollup behaviour without touching code. Promoting it to a module-level constant (or a class attribute of SourceImageCollectionAdmin) makes it immediately discoverable.

♻️ Suggested improvement

+RANK_ROLLUP_DEFAULT_THRESHOLDS = {"SPECIES": 0.8, "GENUS": 0.6, "FAMILY": 0.4}
+
 ...
     `@admin.action`(description="Run Rank Rollup post-processing task (async)")
     def run_rank_rollup(self, request: HttpRequest, queryset: QuerySet[SourceImageCollection]) -> None:
         """Trigger the Rank Rollup post-processing job asynchronously."""
         jobs = []
-        DEFAULT_THRESHOLDS = {"SPECIES": 0.8, "GENUS": 0.6, "FAMILY": 0.4}
 
         for collection in queryset:
             job = Job.objects.create(
                 ...
-                params={"task": "rank_rollup", "config": {"source_image_collection_id": collection.pk, "thresholds": DEFAULT_THRESHOLDS}},
+                params={"task": "rank_rollup", "config": {"source_image_collection_id": collection.pk, "thresholds": RANK_ROLLUP_DEFAULT_THRESHOLDS}},
             )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ami/main/admin.py` around lines 763 - 782, DEFAULT_THRESHOLDS is defined
locally inside the run_rank_rollup admin action, making tuning invisible; move
it to a discoverable location (either a module-level constant named
DEFAULT_THRESHOLDS or as a class attribute on SourceImageCollectionAdmin) and
update run_rank_rollup to reference that constant instead of the local variable
so operators can adjust thresholds without editing the method; ensure the
constant uses the same uppercase keys expected by RankRollupTask.

478-484: N+1 queries: missing select_related("category_map") on the algorithms queryset.

Algorithm.objects.filter(category_map__isnull=False) does not prefetch category_map. The loop on line 483 therefore issues one additional SELECT per algorithm to lazy-load its AlgorithmCategoryMap. Adding select_related collapses this to a single JOIN.

⚡ Proposed fix

-        algorithms = Algorithm.objects.filter(category_map__isnull=False).order_by("name")
+        algorithms = Algorithm.objects.filter(category_map__isnull=False).select_related("category_map").order_by("name")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ami/main/admin.py` around lines 478 - 484, The algorithms queryset is causing
N+1 queries because Algorithm.objects.filter(category_map__isnull=False) does
not fetch the related AlgorithmCategoryMap; update the queryset used when
building alg_list to use select_related("category_map") so the category_map is
joined in the same query, then keep the loop that assigns alg.labels_count =
len(alg.category_map.labels) if alg.category_map else 0 (using the
already-fetched relation) to avoid per-algorithm lazy loads; look for the
variables/functions Algorithm, category_map, algorithms, alg_list and the loop
assigning labels_count to implement this change.

503-506: Ruff RUF012: mutable class-level attributes should be annotated with ClassVar.

Both inlines and actions are plain mutable lists assigned as class attributes, which Ruff flags. The standard Django admin idiom uses plain lists, but the warning can be silenced cleanly with a ClassVar annotation.

🛠️ Proposed fix

+from typing import ClassVar
 ...
     # Add classifications as inline
-    inlines = [DetectionInline]
+    inlines: ClassVar[list] = [DetectionInline]
 
-    actions = [run_class_masking]
+    actions: ClassVar[list] = [run_class_masking]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ami/main/admin.py` around lines 503 - 506, Ruff flags the mutable class-level
attributes inlines and actions; update the admin class to annotate these
attributes as ClassVar to silence RUF012: add "from typing import ClassVar,
List" (or appropriate typing imports) and change the annotations for inlines
(e.g., inlines: ClassVar[List[type[DetectionInline]]]) and actions (e.g.,
actions: ClassVar[List] or ClassVar[List[callable]]/List[str] as appropriate)
while keeping the assigned values (inlines = [DetectionInline], actions =
[run_class_masking]).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ami/templates/admin/main/class_masking_confirmation.html`:
- Around line 21-30: The markup currently nests a <ul> inside a <p> in
class_masking_confirmation.html which is invalid; fix by either replacing the
surrounding <p> with a block-level container (e.g., <div>) so the intro sentence
and the list are both within valid block content, or split into a standalone <p>
for the intro sentence followed by the <ul>; update the element(s) containing
"This will:" and the following list accordingly to preserve the same text and
semantics.

---

Duplicate comments:
In `@ami/main/admin.py`:
- Around line 466-471: The current except block in the admin handler uses a bare
"except Exception as e" which can swallow programming errors; update the
exception handling around the occurrence processing (the block that calls
self.message_user with occurrence.pk) to either catch only the specific expected
exceptions (e.g., ValueError, DatabaseError, etc.) or, if you intend a broad
catch, log the full traceback before reporting to the user (use
logging.exception or capture traceback.format_exc()) and then re-raise
unexpected exceptions as needed; ensure you still call
self.message_user(request, f"Error processing occurrence {occurrence.pk}: {e}",
level="error") after logging the detailed error.

---

Nitpick comments:
In `@ami/main/admin.py`:
- Around line 763-782: DEFAULT_THRESHOLDS is defined locally inside the
run_rank_rollup admin action, making tuning invisible; move it to a discoverable
location (either a module-level constant named DEFAULT_THRESHOLDS or as a class
attribute on SourceImageCollectionAdmin) and update run_rank_rollup to reference
that constant instead of the local variable so operators can adjust thresholds
without editing the method; ensure the constant uses the same uppercase keys
expected by RankRollupTask.
- Around line 478-484: The algorithms queryset is causing N+1 queries because
Algorithm.objects.filter(category_map__isnull=False) does not fetch the related
AlgorithmCategoryMap; update the queryset used when building alg_list to use
select_related("category_map") so the category_map is joined in the same query,
then keep the loop that assigns alg.labels_count = len(alg.category_map.labels)
if alg.category_map else 0 (using the already-fetched relation) to avoid
per-algorithm lazy loads; look for the variables/functions Algorithm,
category_map, algorithms, alg_list and the loop assigning labels_count to
implement this change.
- Around line 503-506: Ruff flags the mutable class-level attributes inlines and
actions; update the admin class to annotate these attributes as ClassVar to
silence RUF012: add "from typing import ClassVar, List" (or appropriate typing
imports) and change the annotations for inlines (e.g., inlines:
ClassVar[List[type[DetectionInline]]]) and actions (e.g., actions:
ClassVar[List] or ClassVar[List[callable]]/List[str] as appropriate) while
keeping the assigned values (inlines = [DetectionInline], actions =
[run_class_masking]).

mihow · 2026-02-18T13:17:41Z

Proposed `applied_to` API schema for UI

applied_to is now a nested object on all Classification endpoints. When a classification was produced by post-processing (class masking, rank rollup), it links back to the original classification it was derived from.

Response shape:

{
  "id": 12345,
  "taxon": 42,
  "score": 0.87,
  "terminal": true,
  "algorithm": {
    "id": 15,
    "name": "Quebec & Vermont Species Classifier (filtered by taxa list Newfoundland Species)",
    ...
  },
  "applied_to": {
    "id": 12300,
    "created_at": "2026-01-15T10:30:00Z",
    "algorithm": {
      "id": 10,
      "name": "Quebec & Vermont Species Classifier - Apr 2024",
      ...
    }
  },
  "created_at": "2026-02-18T..."
}

For original (non-post-processed) classifications, applied_to is null.

UI integration idea: When applied_to is non-null, the identification tab could show something like:

"Derived from Quebec & Vermont Species Classifier - Apr 2024 prediction"

Or show the lineage as a small chain/badge next to the prediction.

Endpoints where this appears:

GET /api/v2/classifications/ (list)
GET /api/v2/classifications/{id}/ (detail)
GET /api/v2/occurrences/{id}/ → nested in detections[].classifications[] and predictions[]

@mihow @mohamedelabbas1996 — does this schema work for the UI? Open to feedback on the shape.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (8)

ami/ml/post_processing/class_masking.py (3)

233-235: logging.exception already includes the exception message — the {e} interpolation is redundant.

self.logger.exception(...) automatically appends the current exception info. Interpolating {e} duplicates it.

♻️ Proposed fix

-            self.logger.exception(f"Failed to load objects: {e}")
+            self.logger.exception("Failed to load objects")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ami/ml/post_processing/class_masking.py` around lines 233 - 235, The log call
in the exception handler redundantly interpolates the exception because
self.logger.exception already logs the exception info; in the except block that
currently does self.logger.exception(f"Failed to load objects: {e}") simply
remove the "{e}" interpolation and call self.logger.exception("Failed to load
objects") so the message and full traceback are logged once (update the except
block where self.logger.exception is used for loading objects).

129-129: Downgrade per-iteration sum logs from INFO to DEBUG.

These execute once per classification (potentially thousands) and will flood logs at INFO level. Use logger.debug for both.

♻️ Proposed fix

-        logger.info(f"Previous totals: {sum(scores)} scores, {sum(logits)} logits")
+        logger.debug(f"Previous totals: {sum(scores)} scores, {sum(logits)} logits")

-        logger.info(f"New totals: {sum(scores)} scores, {sum(logits)} logits")
+        logger.debug(f"New totals: {sum(scores)} scores, {sum(logits)} logits")

Also applies to: 145-145

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ami/ml/post_processing/class_masking.py` at line 129, Change the two
per-iteration log calls that currently use logger.info to logger.debug so they
don't flood INFO logs: replace the logger.info(f"Previous totals: {sum(scores)}
scores, {sum(logits)} logits") call and the other similar logger.info call (at
the second occurrence around line 145) with logger.debug using the same message
formatting; keep the sums computation and formatting identical, only change the
logging level.

51-58: job parameter is accepted but never used.

The job argument is passed at the call-site (line 248 of run()) but never referenced in update_occurrences_in_collection. Either forward it to make_classifications_filtered_by_taxa_list if needed for progress reporting, or remove it.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ami/ml/post_processing/class_masking.py` around lines 51 - 58, The job
parameter on update_occurrences_in_collection is accepted but unused; either
remove it entirely or forward it into the downstream call that likely needs it —
update_occurrences_in_collection(collection, taxa_list, algorithm, params,
new_algorithm, task_logger=logger, job=None) should pass job through to
make_classifications_filtered_by_taxa_list (or whatever internal call performs
progress reporting) so the progress reporter receives the same job object;
locate the call to make_classifications_filtered_by_taxa_list inside
update_occurrences_in_collection and add job as an argument (and update that
callee signature if necessary), or if job is genuinely unused, remove the job
parameter from update_occurrences_in_collection and from the run() call site to
keep signatures consistent.

ami/ml/tests.py (2)

919-940: Rename unused loop variable i to _i.

Ruff B007: i is never referenced in the loop body.

♻️ Proposed fix

-        for i in range(num):
+        for _i in range(num):

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ami/ml/tests.py` around lines 919 - 940, The loop in
_create_occurrences_with_classifications uses an unused variable named i which
triggers Ruff B007; rename i to _i in the for loop header (for _i in
range(num):) to indicate the variable is intentionally unused, leaving the loop
body (Detection.objects.create, Occurrence.objects.create,
Classification.objects.create and occurrences.append) unchanged.

1034-1034: Move import math to the top of the file.

import math appears inside four separate test methods (lines 1034, 1144, 1217, 1266). Python caches module imports, so there's no runtime cost, but it's non-idiomatic. A single import math at the top of the file is cleaner.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ami/ml/tests.py` at line 1034, Move the duplicate "import math" statements
out of the individual test functions and add a single "import math" at the
top-level imports of the test module; remove the four in-function imports so all
tests use the top-level math import (search for the in-function "import math"
occurrences to update them). Ensure no other name conflicts occur and run tests
to confirm no behavioral changes.

ami/ml/post_processing/rank_rollup.py (3)

37-38: Mutable class-level defaults should be annotated with ClassVar.

♻️ Proposed fix

+    from typing import ClassVar
+
-    DEFAULT_THRESHOLDS = {"SPECIES": 0.8, "GENUS": 0.6, "FAMILY": 0.4}
-    ROLLUP_ORDER = ["SPECIES", "GENUS", "FAMILY"]
+    DEFAULT_THRESHOLDS: ClassVar[dict[str, float]] = {"SPECIES": 0.8, "GENUS": 0.6, "FAMILY": 0.4}
+    ROLLUP_ORDER: ClassVar[list[str]] = ["SPECIES", "GENUS", "FAMILY"]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ami/ml/post_processing/rank_rollup.py` around lines 37 - 38,
DEFAULT_THRESHOLDS and ROLLUP_ORDER are mutable class-level defaults and should
be declared as ClassVar to avoid accidental per-instance mutation; update the
class that defines DEFAULT_THRESHOLDS and ROLLUP_ORDER to annotate them like
DEFAULT_THRESHOLDS: ClassVar[Dict[str, float]] and ROLLUP_ORDER:
ClassVar[List[str]] (or the appropriate typing aliases), add the necessary "from
typing import ClassVar, Dict, List" import if missing, and keep the existing
constant values unchanged.

116-118: t.rank == rank should normalize case to match find_ancestor_by_parent_chain.

find_ancestor_by_parent_chain defensively calls .upper() on current.rank before comparing (line 22). The candidates filter uses a direct equality check, creating a silent failure if any Taxon.rank value is ever lowercase — rollup candidates would always be empty.

♻️ Proposed fix

-                    candidates = {t: s for t, s in taxon_scores.items() if t.rank == rank}
+                    candidates = {t: s for t, s in taxon_scores.items() if t.rank.upper() == rank}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ami/ml/post_processing/rank_rollup.py` around lines 116 - 118, The candidates
filter silently fails on case mismatch because it compares Taxon.rank directly;
update the comprehension that builds candidates (using rollup_order, thresholds,
taxon_scores) to normalize case before comparing (e.g., compare t.rank.upper()
to rank or normalize both sides) so it matches the defensive behavior used in
find_ancestor_by_parent_chain; ensure you only change the equality check in the
comprehension that constructs candidates = {t: s for t, s in
taxon_scores.items() if ... } to use the normalized rank comparison.

83-157: Multiple N+1 FK accesses inside qs.iterator() — select_related is not effective here.

clf.category_map (lines 90, 97), clf.detection (lines 138–140), and clf.detection.occurrence (line 149) each trigger a separate DB query per iteration when using iterator(), since select_related() is not compatible with iterator() in older Django.

Suggested mitigations:

Category map: Build a {pk: AlgorithmCategoryMap} cache from the already-computed cat_map_ids (see previous comment) and look up by clf.category_map_id.
Detection FK: Use clf.detection_id (the raw FK column, no DB hit) in the filter(detection_id=...) and create(detection_id=...) calls.
Occurrence: Pre-fetch occurrence PKs or consolidate occurrence updates after the loop.

♻️ Illustrative patch for category_map cache and detection_id usage

+        category_map_cache = {cm.pk: cm for cm in AlgorithmCategoryMap.objects.filter(pk__in=cat_map_ids)}

         with transaction.atomic():
             for i, clf in enumerate(qs.iterator(), start=1):
                 ...
-                if not clf.category_map:
+                cm = category_map_cache.get(clf.category_map_id)
+                if not cm:
                     self.logger.info(f"Skipping classification #{clf.pk}: no category_map assigned")
                     continue

                 ...
                 for idx, score in enumerate(clf.scores):
-                    label = clf.category_map.labels[idx]
+                    label = cm.labels[idx]
                     ...

                 ...
-                Classification.objects.filter(detection=clf.detection).update(terminal=False)
+                Classification.objects.filter(detection_id=clf.detection_id).update(terminal=False)
                 Classification.objects.create(
-                    detection=clf.detection,
+                    detection_id=clf.detection_id,
                     ...
                 )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ami/ml/post_processing/rank_rollup.py` around lines 83 - 157, The loop over
qs.iterator() causes N+1 DB hits for clf.category_map, clf.detection and
clf.detection.occurrence because select_related() is ineffective with
iterator(); fix by 1) building an in-memory cache of AlgorithmCategoryMap
objects keyed by pk from the precomputed cat_map_ids and replace accesses to
clf.category_map with a lookup using clf.category_map_id, 2) avoid resolving the
Detection FK by using clf.detection_id in DB calls (replace
Classification.objects.filter(detection=clf.detection).update(...) with
filter(detection_id=clf.detection_id) and pass detection_id=clf.detection_id to
Classification.objects.create), and 3) stop accessing clf.detection.occurrence
inside the loop — instead collect detection_ids (or classification ids) during
iteration and after the loop bulk-fetch occurrences (or occurrence pks) and
append to updated_occurrences or perform consolidated updates to avoid per-row
queries.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ami/ml/post_processing/class_masking.py`:
- Around line 162-174: The new Classification created in class_masking.py is
missing the applied_to field, losing post-processing lineage; update the
new_classification instantiation to include applied_to referencing the original
classification (e.g., applied_to=classification) similar to how RankRollupTask
sets applied_to=clf so the created classification preserves its source lineage.

In `@ami/ml/post_processing/rank_rollup.py`:
- Around line 68-78: qs.only("category_map") still triggers N+1 because
accessing clf.category_map performs a deferred fetch per row; instead, collect
distinct category_map_id from the queryset (use
qs.values_list("category_map_id", flat=True).distinct()), fetch all
AlgorithmCategoryMap objects for those ids in one query, then iterate over the
pre-fetched maps to gather labels (avoid accessing clf.category_map directly).
Populate all_labels from the cached AlgorithmCategoryMap instances, build
label_to_taxon as before, and log using the same self.logger.info call;
reference qs.only("category_map"), clf.category_map, and AlgorithmCategoryMap to
locate and replace the logic.

---

Nitpick comments:
In `@ami/ml/post_processing/class_masking.py`:
- Around line 233-235: The log call in the exception handler redundantly
interpolates the exception because self.logger.exception already logs the
exception info; in the except block that currently does
self.logger.exception(f"Failed to load objects: {e}") simply remove the "{e}"
interpolation and call self.logger.exception("Failed to load objects") so the
message and full traceback are logged once (update the except block where
self.logger.exception is used for loading objects).
- Line 129: Change the two per-iteration log calls that currently use
logger.info to logger.debug so they don't flood INFO logs: replace the
logger.info(f"Previous totals: {sum(scores)} scores, {sum(logits)} logits") call
and the other similar logger.info call (at the second occurrence around line
145) with logger.debug using the same message formatting; keep the sums
computation and formatting identical, only change the logging level.
- Around line 51-58: The job parameter on update_occurrences_in_collection is
accepted but unused; either remove it entirely or forward it into the downstream
call that likely needs it — update_occurrences_in_collection(collection,
taxa_list, algorithm, params, new_algorithm, task_logger=logger, job=None)
should pass job through to make_classifications_filtered_by_taxa_list (or
whatever internal call performs progress reporting) so the progress reporter
receives the same job object; locate the call to
make_classifications_filtered_by_taxa_list inside
update_occurrences_in_collection and add job as an argument (and update that
callee signature if necessary), or if job is genuinely unused, remove the job
parameter from update_occurrences_in_collection and from the run() call site to
keep signatures consistent.

In `@ami/ml/post_processing/rank_rollup.py`:
- Around line 37-38: DEFAULT_THRESHOLDS and ROLLUP_ORDER are mutable class-level
defaults and should be declared as ClassVar to avoid accidental per-instance
mutation; update the class that defines DEFAULT_THRESHOLDS and ROLLUP_ORDER to
annotate them like DEFAULT_THRESHOLDS: ClassVar[Dict[str, float]] and
ROLLUP_ORDER: ClassVar[List[str]] (or the appropriate typing aliases), add the
necessary "from typing import ClassVar, Dict, List" import if missing, and keep
the existing constant values unchanged.
- Around line 116-118: The candidates filter silently fails on case mismatch
because it compares Taxon.rank directly; update the comprehension that builds
candidates (using rollup_order, thresholds, taxon_scores) to normalize case
before comparing (e.g., compare t.rank.upper() to rank or normalize both sides)
so it matches the defensive behavior used in find_ancestor_by_parent_chain;
ensure you only change the equality check in the comprehension that constructs
candidates = {t: s for t, s in taxon_scores.items() if ... } to use the
normalized rank comparison.
- Around line 83-157: The loop over qs.iterator() causes N+1 DB hits for
clf.category_map, clf.detection and clf.detection.occurrence because
select_related() is ineffective with iterator(); fix by 1) building an in-memory
cache of AlgorithmCategoryMap objects keyed by pk from the precomputed
cat_map_ids and replace accesses to clf.category_map with a lookup using
clf.category_map_id, 2) avoid resolving the Detection FK by using
clf.detection_id in DB calls (replace
Classification.objects.filter(detection=clf.detection).update(...) with
filter(detection_id=clf.detection_id) and pass detection_id=clf.detection_id to
Classification.objects.create), and 3) stop accessing clf.detection.occurrence
inside the loop — instead collect detection_ids (or classification ids) during
iteration and after the loop bulk-fetch occurrences (or occurrence pks) and
append to updated_occurrences or perform consolidated updates to avoid per-row
queries.

In `@ami/ml/tests.py`:
- Around line 919-940: The loop in _create_occurrences_with_classifications uses
an unused variable named i which triggers Ruff B007; rename i to _i in the for
loop header (for _i in range(num):) to indicate the variable is intentionally
unused, leaving the loop body (Detection.objects.create,
Occurrence.objects.create, Classification.objects.create and occurrences.append)
unchanged.
- Line 1034: Move the duplicate "import math" statements out of the individual
test functions and add a single "import math" at the top-level imports of the
test module; remove the four in-function imports so all tests use the top-level
math import (search for the in-function "import math" occurrences to update
them). Ensure no other name conflicts occur and run tests to confirm no
behavioral changes.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

ami/main/api/serializers.py (1)

877-884: ClassificationAppliedToSerializer embeds the full AlgorithmSerializer despite being described as "lightweight"

AlgorithmSerializer exposes 13+ fields including nested category_map (itself a nested serializer), uri, version_name, task_type, description, etc. For the stated purpose of showing "Derived from [algorithm]" lineage, a minimal inline would be more appropriate and avoids pulling unnecessary data into every nested classification response.

♻️ Suggested minimal algorithm representation

+class MinimalAlgorithmSerializer(serializers.ModelSerializer):
+    class Meta:
+        model = Algorithm
+        fields = ["id", "name", "key", "version"]
+

 class ClassificationAppliedToSerializer(serializers.ModelSerializer):
     """Lightweight nested representation of the parent classification this was derived from."""

-    algorithm = AlgorithmSerializer(read_only=True)
+    algorithm = MinimalAlgorithmSerializer(read_only=True)

     class Meta:
         model = Classification
         fields = ["id", "created_at", "algorithm"]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ami/main/api/serializers.py` around lines 877 - 884,
ClassificationAppliedToSerializer currently embeds the full AlgorithmSerializer
(too heavy); create a lightweight nested serializer (e.g.,
MinimalAlgorithmSerializer) with only the minimal fields needed for lineage
(such as id and a human label like name or version_name and optionally uri) and
replace algorithm = AlgorithmSerializer(read_only=True) with algorithm =
MinimalAlgorithmSerializer(read_only=True) inside the same module; ensure the
new MinimalAlgorithmSerializer references the Algorithm model and is listed
before ClassificationAppliedToSerializer so imports/resolution succeed.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ami/main/api/serializers.py`:
- Around line 877-884: This is a RUF012 false positive for the mutable default
class attribute on the Meta.fields list in ClassificationAppliedToSerializer; if
CI enforces RUF012, suppress it by adding a local noqa comment on the
Meta.fields line (i.e., add "# noqa: RUF012" to the fields = ["id",
"created_at", "algorithm"] line in the ClassificationAppliedToSerializer.Meta
block) so the linter ignores this diff-scoped artifact.

In `@ami/main/api/views.py`:
- Around line 1228-1238: OccurrenceListSerializer.get_determination_details() is
triggering N+1 queries by accessing Detection.best_prediction (a cached_property
that calls Detection.predictions().order_by(...).first()) without eager-loading
applied_to and applied_to.algorithm; fix by ensuring the queryset used when
building detections for list view (or inside Detection.predictions()) prefetches
applied_to__algorithm for Classification objects, or alternatively adjust
OccurrenceListSerializer to avoid serializing applied_to in list action (e.g.,
use a lightweight nested serializer for list views); target symbols:
OccurrenceListSerializer.get_determination_details, Detection.best_prediction,
Detection.predictions, and ClassificationNestedSerializer (and the
applied_to__algorithm prefetch) so the top prediction’s applied_to and its
algorithm are loaded eagerly.

---

Nitpick comments:
In `@ami/main/api/serializers.py`:
- Around line 877-884: ClassificationAppliedToSerializer currently embeds the
full AlgorithmSerializer (too heavy); create a lightweight nested serializer
(e.g., MinimalAlgorithmSerializer) with only the minimal fields needed for
lineage (such as id and a human label like name or version_name and optionally
uri) and replace algorithm = AlgorithmSerializer(read_only=True) with algorithm
= MinimalAlgorithmSerializer(read_only=True) inside the same module; ensure the
new MinimalAlgorithmSerializer references the Algorithm model and is listed
before ClassificationAppliedToSerializer so imports/resolution succeed.

annavik · 2026-03-16T14:16:46Z

Here is a summary of the plan we discussed @mihow, let me know if I remember it wrong :)

First we get this PR merged (let me know if you need my help with anything here, I removed myself from this PR now?)
After that, we setup a pipeline for Newfoundland with class masking pre configured
After this, we can test this pipeline on sample data and I can work on the FE related tasks that was selected for this month: Present details in UI related to post processing #1165

mihow · 2026-04-30T02:26:20Z

Claude says: Non-blocking follow-up note from the tracking-revival branch (claude/revive-tracking-feature-OyMO3):

The run_class_masking admin action runs the masking loop synchronously in the request thread. Worth a follow-up to convert it to a ClassMaskingTask(BasePostProcessingTask) dispatched via Job.objects.create(job_type_key="post_processing", params={"task": "class_masking", "config": {...}}) + job.enqueue(), mirroring the pattern adopted for Occurrence Tracking (intermediate confirmation page → async Job → log stream on the Job admin page).

Benefits:

Logs and progress visible on the Job detail page instead of being lost to the request thread.
No request-thread timeout risk for large queryset selections.
Tunables surfaced as a Django form on the intermediate page rather than hard-coded.

Not a blocker for this PR — the pattern lands here, and class masking can fold in later.

Adds the shared infrastructure that PRs #999 (class masking) and #1272 (tracking) each grew independently. Uses SmallSizeFilterTask as the migration consumer to prove the pattern without carving up another open PR. Changes: - BasePostProcessingTask gains required pydantic config_schema class attr; validates Job.params['config'] at task __init__ via self.config_schema(**config). self.config is now a BaseModel instance (was: freeform dict). - SmallSizeFilterTask: adds SmallSizeFilterConfig schema (size_threshold + source_image_collection_id, validators for 0<x<1, extra=forbid). Existing default 0.0008 preserved. - New admin form base BasePostProcessingActionForm with to_config() contract, plus SmallSizeFilterActionForm exposing size_threshold knob (currently unreachable — admin trigger hardcoded empty config). - Parameterized confirmation template at admin/post_processing/confirmation.html with overridable {% block intro %} and a _form_fieldset.html partial. - run_small_size_filter admin action rewrites onto new pattern: intermediate confirmation page on first POST, validate + enqueue on confirm. Per-collection Job creation preserved (each Job gets correct project FK). - 17 tests covering schema contract, form validation, intermediate page render, multi-collection partitioning by project FK. Pydantic v1 syntax throughout (repo pins pydantic<2.0). Memory note: container is v1, .dict() / .__fields__ are correct here. Out of scope (explicitly deferred): - Project-partitioning helper (defer to whichever multi-scope adopter lands first, likely #1272's tracking which partitions events across projects) - REST API trigger surface (eventual primary; admin not future-primary) - Rank rollup, class masking, tracking tasks themselves — stay in their PRs Design doc: docs/claude/planning/2026-05-01-post-processing-admin-scaffolding-design.md Co-Authored-By: Claude <noreply@anthropic.com>

Address review feedback on the repetitive run_/render pattern. The confirm -> render -> validate -> enqueue flow is now built by make_post_processing_action() in ami/ml/post_processing/admin/actions.py, so each task declares only what varies: its task class, knob form, and how a selected row maps to a Job (scope_resolver / project_resolver / name_resolver). Tasks whose row->Job mapping isn't one-Job-per-row (e.g. PR #1272's per-project event partitioning) pass a custom build_jobs callable; PR #999 likewise inherits the form/template instead of hand-rolling HTML. Config validation now has a single source of truth: the task's pydantic config_schema. The knob form no longer re-encodes the (0, 1) bound, and the FloatField min/max that contradicted the exclusive interval is gone. Schema errors raised while building Jobs are mapped back onto the form so the operator sees them inline on the confirmation page (non-field errors now render too). This also removes the near-dead per-row try/except in the old admin action. Adds test_action_factory.py (action registration, default builder, schema mapping, all-or-nothing, build_jobs override hook) and updates the form tests to reflect that range enforcement moved to the schema layer. Co-Authored-By: Claude <noreply@anthropic.com>

mihow · 2026-06-05T23:49:04Z

Claude says: (rebase guidance drafted on @mihow's behalf)

Rebase path onto #1289 (post-processing admin scaffolding)

#1289 landed a shared admin-trigger pattern for post-processing tasks (head abcb446e). It's worth rebasing this PR onto it once it merges — the class-masking trigger, which currently uses hand-rolled <select> HTML in the action, becomes a small declarative form. Concrete steps:

1. Schemas. Add ClassMaskingConfig and RankRollupConfig pydantic models (v1 syntax — validator, .dict(), class Config: extra = "forbid") and set them as config_schema on the respective task classes. Read typed self.config in each task instead of pulling values out of a freeform dict. The base class validates config at construction now, so a bad Job.params['config'] fails fast in the worker logs with a clear pydantic error.

2. Form. Replace the hand-rolled <select name="taxa_list"> / <select name="algorithm"> HTML with a ClassMaskingActionForm(BasePostProcessingActionForm) (in ami/ml/post_processing/admin/class_masking_form.py) using ModelChoiceField(queryset=...) for the taxa list and algorithm. The form declares fields only; validation rules live on the schema. Schema errors raised while building Jobs are mapped back onto the matching form field automatically, so you get inline error display for free.

3. Admin action. Replace the hand-written action in ami/main/admin.py with a make_post_processing_action(...) call (from ami/ml/post_processing/admin/actions.py). Class masking is single-row-scoped, so the default one-Job-per-row builder likely fits — pass a scope_resolver that maps a selected row to its config fields:

run_class_masking = make_post_processing_action(
    ClassMaskingTask,
    ClassMaskingActionForm,
    scope_resolver=lambda obj: {"<scope_field>": obj.pk},
)

If a row maps to more than one Job (or needs grouping), pass a custom build_jobs= callable instead and raise ConfigValidationErrors([(field, message), ...]) to re-render the form. Drop the direct request.POST reads and any manual validation — the factory handles confirm/render/validate/enqueue, the select_across guard, and all-or-nothing creation.

4. Template. class_masking_confirmation.html becomes a thin override of the shared shell at admin/post_processing/confirmation.html, overriding {% block intro %} for the masking-specific preview. The _form_fieldset.html partial renders fields, help text, and errors uniformly.

5. Expected conflicts (small): ami/ml/post_processing/registry.py and ami/main/admin.py. The run_class_masking management command can keep validating via the same ClassMaskingConfig schema, so the CLI and admin paths share one contract.

The rank-rollup task is unaffected beyond adding its config_schema; only the class-masking trigger needs reshaping.

Make post-processing filters easier to develop against and observe. Per-occurrence trigger (the dev/spot path): SmallSizeFilterConfig now accepts either source_image_collection_id or occurrence_id (discriminated scope, exactly-one enforced by a pydantic root_validator), and the task resolves its detection queryset from whichever scope is set. OccurrenceAdmin gains a run_small_size_filter action via the same make_post_processing_action factory, so an operator can run a filter on a single selected occurrence without standing up a whole capture set. This discriminated-scope shape is the pattern future per-occurrence / per-event tasks (#999, #1272) copy. Job stage metrics (run observability): BasePostProcessingTask.report_stage_metrics writes named counters onto the job's post_processing stage (falling back to a log line when run without a job). The small size filter emits detections_checked, detections_flagged and occurrences_updated at each flush and on completion, so the Jobs admin page shows what a run examined and changed. Tests: schema scope cases (occurrence-only valid, both-scopes raise, no-scope raises); the OccurrenceAdmin action enqueues a job scoped to occurrence_id; run-time occurrence-scope isolation (sibling occurrences untouched); and stage-metric reporting on a real job. Co-Authored-By: Claude <noreply@anthropic.com>

…dmin framework Rework of #999 onto the #1289 post-processing framework. The branch was cut from an old main with its own hand-rolled admin action; ClassMaskingTask and RankRollupTask now subclass BasePostProcessingTask with pydantic config schemas and are triggered through make_post_processing_action (collection scope on SourceImageCollection, single-occurrence scope on Occurrence for class masking). Correctness fixes from the review threads: - Class masking selects the top class from an -inf-masked softmax, so a class excluded by the taxa list can never win even when it had the highest logit; raises when the taxa list excludes every class in the category map. Stored logits stay raw (JSON-safe) and the mask is captured in scores (excluded -> 0). - The masked-output Algorithm is one per (source algorithm, taxa list) and its category map is persisted (previously set in memory only, so masked classifications referenced a null map). - applied_to is populated on new masked classifications (the provenance the API exposes was left blank). - Rank rollup preloads category-map labels in two queries and select_relates the per-row relations instead of dereferencing category_map per classification. Surfaces provenance in the API: applied_to is added to the Classification serializers, and applied_to__algorithm is prefetched in the occurrence list/detail prefetch and the classification viewset to avoid an N+1 on render. Tests: pydantic config validation, admin trigger for both scopes, the masking maths (including the excluded-class guarantee and the all-excluded error), ClassMaskingTask.run() end to end for both scopes, and rank rollup. 20 new tests; full post_processing suite and occurrence query-count tests pass. Co-Authored-By: Claude <noreply@anthropic.com>

…1289) * docs(post-processing): design spec for admin scaffolding precursor PR Context: PRs #999 (class masking) and #1272 (tracking) each grew their own admin-trigger plumbing (registry registration, intermediate confirmation page with form, freeform dict config). This precursor extracts the shared pattern using SmallSizeFilterTask as the migration consumer — proves the abstraction without carving up another contributor's open PR. Spec covers: pydantic config_schema contract on BasePostProcessingTask, BasePostProcessingActionForm class, parameterized confirmation template + partial, SmallSizeFilterTask migration (exposes existing size_threshold knob via admin), test plan, rebase impact for #999 + #1272. Co-Authored-By: Claude <noreply@anthropic.com> * feat(post-processing): admin scaffolding precursor Adds the shared infrastructure that PRs #999 (class masking) and #1272 (tracking) each grew independently. Uses SmallSizeFilterTask as the migration consumer to prove the pattern without carving up another open PR. Changes: - BasePostProcessingTask gains required pydantic config_schema class attr; validates Job.params['config'] at task __init__ via self.config_schema(**config). self.config is now a BaseModel instance (was: freeform dict). - SmallSizeFilterTask: adds SmallSizeFilterConfig schema (size_threshold + source_image_collection_id, validators for 0<x<1, extra=forbid). Existing default 0.0008 preserved. - New admin form base BasePostProcessingActionForm with to_config() contract, plus SmallSizeFilterActionForm exposing size_threshold knob (currently unreachable — admin trigger hardcoded empty config). - Parameterized confirmation template at admin/post_processing/confirmation.html with overridable {% block intro %} and a _form_fieldset.html partial. - run_small_size_filter admin action rewrites onto new pattern: intermediate confirmation page on first POST, validate + enqueue on confirm. Per-collection Job creation preserved (each Job gets correct project FK). - 17 tests covering schema contract, form validation, intermediate page render, multi-collection partitioning by project FK. Pydantic v1 syntax throughout (repo pins pydantic<2.0). Memory note: container is v1, .dict() / .__fields__ are correct here. Out of scope (explicitly deferred): - Project-partitioning helper (defer to whichever multi-scope adopter lands first, likely #1272's tracking which partitions events across projects) - REST API trigger surface (eventual primary; admin not future-primary) - Rank rollup, class masking, tracking tasks themselves — stay in their PRs Design doc: docs/claude/planning/2026-05-01-post-processing-admin-scaffolding-design.md Co-Authored-By: Claude <noreply@anthropic.com> * docs(post-processing): add PR #1289 admin smoke screenshots Confirmation page (intermediate page rendering with size_threshold form field) and success message (after submitting form, Job 1571 enqueued with typed config payload). Co-Authored-By: Claude <noreply@anthropic.com> * chore: host PR screenshots on S3 instead of committing to repo Screenshots for the PR description now live in the ami-media-staging object store rather than docs/claude/screenshots/. Keeps binary assets out of the repo history. Co-Authored-By: Claude <noreply@anthropic.com> * refactor(post-processing): extract shared admin-action factory Address review feedback on the repetitive run_/render pattern. The confirm -> render -> validate -> enqueue flow is now built by make_post_processing_action() in ami/ml/post_processing/admin/actions.py, so each task declares only what varies: its task class, knob form, and how a selected row maps to a Job (scope_resolver / project_resolver / name_resolver). Tasks whose row->Job mapping isn't one-Job-per-row (e.g. PR #1272's per-project event partitioning) pass a custom build_jobs callable; PR #999 likewise inherits the form/template instead of hand-rolling HTML. Config validation now has a single source of truth: the task's pydantic config_schema. The knob form no longer re-encodes the (0, 1) bound, and the FloatField min/max that contradicted the exclusive interval is gone. Schema errors raised while building Jobs are mapped back onto the form so the operator sees them inline on the confirmation page (non-field errors now render too). This also removes the near-dead per-row try/except in the old admin action. Adds test_action_factory.py (action registration, default builder, schema mapping, all-or-nothing, build_jobs override hook) and updates the form tests to reflect that range enforcement moved to the schema layer. Co-Authored-By: Claude <noreply@anthropic.com> * fix(post-processing): refuse "select all across pages" in admin trigger Addresses review feedback. When an operator uses Django admin's "select all across pages", the action receives the entire filtered table as its queryset, which would serialize every pk into hidden _selected_action inputs on the confirmation page (a very large POST body, potentially over request limits). This trigger is for explicit, bounded selections, so the across-pages case is now refused with a clear message instead of rendering an unbounded form. Also folds in two cosmetic review nits: replace the Unicode multiplication sign in the size-threshold help text with a plain "x", and add a language tag to the module-layout code fence in the design doc. Co-Authored-By: Claude <noreply@anthropic.com> * refactor(post-processing): address re-review on the action factory Follow-up to the head-commit CodeRabbit re-review: - Wrap the Job-creation loop in default_build_jobs in transaction.atomic(). Admin requests are already atomic (ATOMIC_REQUESTS=True), but this helper can also be called outside a request (e.g. a management command), so the explicit block keeps creation all-or-nothing there too. Job.enqueue() uses transaction.on_commit, so the (millisecond) block only guards row creation; the long-running task still runs asynchronously in the worker after commit. - Guard BasePostProcessingTask.__init_subclass__ with inspect.isabstract so an abstract intermediary task class isn't forced to declare key/name/config_schema. - Resolve the selection once in render_confirmation (len of the materialized pk list) instead of a separate .count() query. - Omit scope_resolver from the build_jobs kwargs when it isn't set, so a custom build_jobs supplied without one never receives a None it might call. - Derive the admin action's labels (title, submit, dropdown description) from task.name instead of hardcoding title-case strings, so the operator-facing label and the Job name stay consistent and there are fewer per-task strings. - Make the build_jobs override-hook test actually invoke the action and assert the custom builder ran (and that scope_resolver was not forwarded as None). 24/24 post_processing tests green. Co-Authored-By: Claude <noreply@anthropic.com> * test(post-processing): prune redundant tests, cover atomicity and abstract guard End-of-PR test audit (separate from the TDD scaffolding): - Remove test_default_threshold_applies_when_form_uses_initial: size_threshold is a required field and can't be omitted, so it just re-POSTed the default value — identical mechanism to the existing valid-post test. The default is already covered by the form-initial and schema-default unit tests. - Rewrite the mislabeled "all-or-nothing" test into a real one. The old version gave every row the same out-of-range threshold, so both failed at validation and the creation loop was never reached. The new test passes validation, forces a failure while creating the second Job, and asserts the first is rolled back — actually exercising the transaction.atomic() wrap. - Add coverage for the inspect.isabstract guard in __init_subclass__: an abstract intermediary may defer key/name/config_schema to its concrete subclasses. 24 post_processing tests green. Co-Authored-By: Claude <noreply@anthropic.com> * feat(post-processing): per-occurrence trigger and job stage metrics Make post-processing filters easier to develop against and observe. Per-occurrence trigger (the dev/spot path): SmallSizeFilterConfig now accepts either source_image_collection_id or occurrence_id (discriminated scope, exactly-one enforced by a pydantic root_validator), and the task resolves its detection queryset from whichever scope is set. OccurrenceAdmin gains a run_small_size_filter action via the same make_post_processing_action factory, so an operator can run a filter on a single selected occurrence without standing up a whole capture set. This discriminated-scope shape is the pattern future per-occurrence / per-event tasks (#999, #1272) copy. Job stage metrics (run observability): BasePostProcessingTask.report_stage_metrics writes named counters onto the job's post_processing stage (falling back to a log line when run without a job). The small size filter emits detections_checked, detections_flagged and occurrences_updated at each flush and on completion, so the Jobs admin page shows what a run examined and changed. Tests: schema scope cases (occurrence-only valid, both-scopes raise, no-scope raises); the OccurrenceAdmin action enqueues a job scoped to occurrence_id; run-time occurrence-scope isolation (sibling occurrences untouched); and stage-metric reporting on a real job. Co-Authored-By: Claude <noreply@anthropic.com> * fix(post-processing): dedup occurrences_updated across flush batches The metric incremented from a batch-local set that is cleared each flush, so an occurrence whose detections span more than one batch was counted once per batch. Track occurrence ids in a persistent set and report its size instead. Detections never recur across batches, so detections_flagged was already correct. Addresses CodeRabbit review on beacb3f. Co-Authored-By: Claude <noreply@anthropic.com> * test: cut post-processing test fixture cost Replace per-test setup_test_project calls with minimal Project/Collection/ Occurrence rows and class-level setUpTestData in the admin-action and factory tests; the admin flow only reads FKs, so the full fixture (storage source, deployment, processing service per call) was wasted cost. Convert the pure form tests to SimpleTestCase (no DB) and drop one redundant valid-path form test already covered end to end by the admin and factory flows. TestPostProcessingTasks now builds its images/events/collection once per class instead of once per test. Module wall-clock roughly halves locally (8.8s -> 4.5s warm), and the double setup_test_project in the multi-collection admin test that intermittently hit a statement timeout in CI is gone. Co-Authored-By: Claude <noreply@anthropic.com> * fix(post-processing): render admin field errors as errorlist, clarify select_across refusal The post-processing confirmation form rendered each field's validation errors with the page-level `errornote` banner style. Switch the form-fieldset partial to Django's native `{{ field.errors }}` and `{{ form.non_field_errors }}` so errors render inline as `errorlist`, matching Django admin conventions. The invalid-threshold test now asserts the inline `errorlist` class. Also reframe the "select all across pages" comment: the reason to refuse it is that the action would apply to the entire filtered table rather than the rows the operator explicitly selected. The oversized POST body is a secondary symptom, not the primary concern. Co-Authored-By: Claude <noreply@anthropic.com> * fix(post-processing): count only occurrences whose determination changed The small size filter reported `occurrences_updated` as the number of occurrences it re-saved, but re-saving recomputes the determination in place. An occurrence pinned to a human identification keeps its taxon when its detection is flagged "Not identifiable", so it was counted even though nothing about it actually changed. Capture each occurrence's determination before the save and count it only when the determination changes after. Add a regression test pairing an occurrence that carries a human identification (determination unchanged, not counted) with an un-identified one (determination flips, counted). Co-Authored-By: Claude <noreply@anthropic.com> * feat(post-processing): link the admin action result to the created job(s) After an operator runs a post-processing task from the admin, the success message now links each created Job to its admin change page instead of just printing the raw pks. From there the operator can follow the job's progress and read any failure reason, which is otherwise only visible in the worker log. The link points at the admin change page because that page is always reachable from this admin action; the public UI host is not reliably known in this context. The message is built with format_html so the links render as anchors. Co-Authored-By: Claude <noreply@anthropic.com> * perf(admin): speed up the occurrence changelist and add id search The occurrence admin changelist is the entry point for the per-occurrence post-processing trigger, but on a large table (~1.3M occurrences) it took ~15s to load a single page. Two causes, both fixed here: - get_queryset counted detections with a JOIN + GROUP BY. A grouped count must aggregate the whole occurrence x detection join before the changelist's ORDER BY ... LIMIT can take a page, so it scanned every row just to show 25. Replaced with a correlated subquery (Coalesced to 0 for occurrences with no detections, matching the old JOIN count), which runs only for the rows on the page. Measured 14.9s -> 0.04s for one page on the large table. - The list ordered by -created_at, which has no index, forcing a full sort of the table to find the newest page. Switched to -id, the indexed primary key, which increases with insertion time so newest-first is preserved. Also let an all-digit search term act as an exact occurrence-id lookup, so an operator can jump straight to an occurrence by id; non-numeric terms still use the determination-name search. Co-Authored-By: Claude <noreply@anthropic.com> * perf(admin): speed up the detection changelist and link detections from occurrences The detection admin changelist counted classifications with Count() + GROUP BY and ordered by an unindexed -created_at. On a large table (~1.4M detections, ~2.2M classifications) the grouped aggregate runs over the whole detection x classification join before the page LIMIT applies — slow enough to exhaust work_mem and error out. Replaced with a correlated subquery (Coalesced to 0) and -id ordering, mirroring the occurrence changelist fix. Also set show_change_link on the detection inline of the occurrence page, so an operator can open a detection's change page — where the classifications inline shows which algorithms were applied, including post-processing — without searching for its id. Co-Authored-By: Claude <noreply@anthropic.com> * feat(admin): search detections by id in the admin Add a search box to the detection changelist: an all-digit term is an exact id lookup (jump straight to a detection by id), other terms search the source image path. Mirrors the occurrence changelist search. Co-Authored-By: Claude <noreply@anthropic.com> * feat(admin): speed up the classification admin and recompute determination More admin niceties for reviewing post-processing results on a large database: - Classification change page: FK fields (taxon, detection, algorithm, category_map, applied_to) render as autocomplete widgets instead of <select>s preloaded with every row, which made the page unusable. - Classification changelist: count the scores/logits arrays in SQL (cardinality) and defer the arrays, so the list no longer transfers thousands of floats per row; order by -id; add an id / taxon-name search box. - Link the detection page's classification inline rows to their change page. - New OccurrenceAdmin action "Recompute determination": editing classifications by hand doesn't refresh the occurrence determination (only Occurrence and Identification saves do), so this re-derives it for the selected occurrences. Co-Authored-By: Claude <noreply@anthropic.com> * refactor(admin): share id search in a mixin, skip the full changelist count, guard huge ids Three review follow-ups on the admin niceties added for reviewing post-processing: - Extract the duplicated digit-term id search from the Occurrence, Detection, and Classification admins into a shared IdSearchAdminMixin. The three copies were identical; the mixin is now the single place that behavior lives. - Guard an out-of-range numeric term. A digit string larger than the bigint primary key raised a database DataError (500) before; it now returns no results. - Set show_full_result_count = False on the three large-table admins, so the changelist no longer runs a second, unfiltered COUNT(*) over the whole table for its footer total -- as expensive as the page query itself on millions of rows. Tests: add the out-of-range id case. The shared mixin is now tested once in OccurrenceAdminChangelistTest, so the redundant per-admin copy on the Detection admin is removed (its docstring points to the canonical test). Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>

…dmin framework Rework of #999 onto the #1289 post-processing framework. The branch was cut from an old main with its own hand-rolled admin action; ClassMaskingTask and RankRollupTask now subclass BasePostProcessingTask with pydantic config schemas and are triggered through make_post_processing_action (collection scope on SourceImageCollection, single-occurrence scope on Occurrence for class masking). Correctness fixes from the review threads: - Class masking selects the top class from an -inf-masked softmax, so a class excluded by the taxa list can never win even when it had the highest logit; raises when the taxa list excludes every class in the category map. Stored logits stay raw (JSON-safe) and the mask is captured in scores (excluded -> 0). - The masked-output Algorithm is one per (source algorithm, taxa list) and its category map is persisted (previously set in memory only, so masked classifications referenced a null map). - applied_to is populated on new masked classifications (the provenance the API exposes was left blank). - Rank rollup preloads category-map labels in two queries and select_relates the per-row relations instead of dereferencing category_map per classification. Surfaces provenance in the API: applied_to is added to the Classification serializers, and applied_to__algorithm is prefetched in the occurrence list/detail prefetch and the classification viewset to avoid an N+1 on render. Tests: pydantic config validation, admin trigger for both scopes, the masking maths (including the excluded-class guarantee and the all-excluded error), ClassMaskingTask.run() end to end for both scopes, and rank rollup. 20 new tests; full post_processing suite and occurrence query-count tests pass. Co-Authored-By: Claude <noreply@anthropic.com>

The roll-up picks the global argmax over every taxon at each rank, not just ancestors of the source classification's taxon. On a diffuse, low-confidence distribution this can reparent a detection to a family outside its own lineage. Document the behavior at the candidate-pick loop and record the open design choice (lineage-constrained vs distribution roll-up) as a TODO; no behavior change. Co-Authored-By: Claude <noreply@anthropic.com>

…ection When an operator triggers class masking on an occurrence (or collection), the "Source classifier" dropdown now lists only the classification algorithms that actually produced classifications within the selected scope. Masking any other algorithm would be a no-op for those rows, so offering every classifier was misleading. The admin action hands the knob form the selected queryset (a small generic seam on the form base, ignored by forms that don't need it), and ClassMaskingActionForm uses it to filter the algorithm field by classifications__detection__occurrence / source_image__collections. Co-Authored-By: Claude <noreply@anthropic.com>

mihow · 2026-06-30T02:25:55Z

Claude says: This branch has been reworked and force-pushed, so the earlier 49-commit history and its inline review threads are now outdated. It is a clean port onto the post-processing framework that merged in #1289, rebased onto current main, and the PR description has been rewritten to match.

The correctness findings from the earlier review rounds are addressed in the rework:

Argmax could pick an excluded class. The top class is now taken from an -inf-masked softmax, so a species excluded by the taxa list scores exactly 0 and cannot win even when it had the highest raw logit. A taxa list that excludes every class in the category map now raises instead of silently mis-classifying.
The masking algorithm's category map was set in memory but never saved. There is now one masking Algorithm per (source algorithm, taxa list) with its category map persisted, so masked classifications no longer reference a null map.
applied_to was left blank. It is populated on each new masked classification and exposed in the Classification serializers, with applied_to__algorithm prefetched in the occurrence list/detail and the classification viewset to avoid an N+1 on render.
Rank rollup N+1 on the category map. Labels are preloaded in two queries and the per-row relations are select_related.
print() debugging, root logger, and imports inside the processing loop. Replaced with the task logger, module-level imports, and explicit raises; asserts removed.

One item is intentionally not fixed here and is documented as a follow-up: rank rollup picks the global argmax across all taxa at each rank rather than constraining to ancestors of the source taxon, so a diffuse, low-confidence distribution can roll a detection up to a family outside its own lineage. The trade-off is recorded as a TODO at the candidate-pick loop for a team decision rather than changed silently.

Re-review is welcome on the new diff.

…llections The class-mask admin form narrows its "Source classifier" dropdown to the algorithms that actually produced classifications in the selected scope, so an operator cannot pick a classifier whose masking would be a no-op. On the occurrence path that lookup is cheap (a handful of classifications), but on the collection path it became an unbounded DISTINCT over every classification in the collection. On a large collection that join runs for tens of seconds or times out while the intermediate form is still rendering, before the operator can do anything. Scope the dropdown only for the occurrence path. A collection scope now keeps the full classifier list. Offering a classifier that produced nothing in the collection is harmless — masking it changes no rows — so the narrowing was a convenience, not a correctness guard, and is not worth a query that can hang the page. Co-Authored-By: Claude <noreply@anthropic.com>

Rank roll-up moves out of this PR so class masking can merge on its own. Class masking is validated end to end against a real classifier and is well covered by tests; rank roll-up still has an open design question (whether the per-rank pick should be constrained to the source taxon's ancestors or remain a global distribution roll-up) and thinner tests. Keeping them together would force reviewers to either block ready class-masking work or sign off on the less-settled roll-up. This removes the rank-roll-up task, its admin form and action, its registry entry, and its tests. The roll-up feature returns in its own PR, stacked on this one, where the lineage decision and fuller tests can land before it merges. Co-Authored-By: Claude <noreply@anthropic.com>

mihow · 2026-07-01T15:23:11Z

Claude says: Two follow-up changes coming to the class-masking task, both surfaced by testing it against a large (~29k-class) classifier at scale:

Batched commit + progress heartbeats + per-batch stage metrics. The task currently builds the whole scope in memory and commits once at the end, with no heartbeat during that final commit. On a large classifier this means it can OOM on the final bulk_create, and — run through the admin/Job path — it gets REVOKED by the job health-check reaper before it ever commits, so the occurrences-changed metrics never reach the job page. Fix mirrors SmallSizeFilterTask: process in batches, commit per batch, and call update_progress() + report_stage_metrics({classifications_checked, classifications_masked, occurrences_updated}) per batch. That makes it survive the reaper, show the same live logs as the size filter, and bound memory. (Occurrence count will also switch to "only when the determination actually changed", matching the size filter.)
reweight toggle (default on) — TODO. Renormalising the masked softmax does not change the determination (argmax over the kept classes is identical) — it only rescales the confidence. Default stays on for score-threshold compatibility, but an admin-form option to disable it lets us evaluate with the model's raw absolute confidence. This unblocks running the offline eval (Measure the effect of post-processing filters on final predictions (offline eval) #1360) with and without reweighting.

…le, add reweight toggle Class masking now commits in batches and reports progress and stage metrics per batch, instead of building the whole scope in memory and committing once at the end. On a large classifier (tens of thousands of classes, thousands of classifications) the single final commit could exhaust memory, and run through the admin/Job path the task was revoked by the job health-check reaper before it committed — so nothing persisted and the occurrences-changed metrics never reached the job page. It now mirrors SmallSizeFilterTask: process in batches, commit each batch in its own transaction, and call update_progress() plus report_stage_metrics() per batch. Memory stays bounded (the source cursor and the per-batch buffers are capped at batch_size), the heartbeat keeps the reaper satisfied, and the live counts appear on the job page. occurrences_updated now counts only occurrences whose determination actually changed, matching the size filter. Also add a reweight toggle (default on) to the config and the admin form. Renormalising the kept classes to sum to 1 does not change which species is chosen — argmax over the kept classes is identical either way — so it affects only the stored confidence. With reweighting off, the kept classes keep their original absolute scores, which is useful for evaluating the model's raw confidence. Co-Authored-By: Claude <noreply@anthropic.com>

… spares long runs The stale-job reaper (check_stale_jobs) revokes running jobs whose updated_at is older than STALLED_JOBS_MAX_MINUTES (10). The post-processing progress heartbeat saved with update_fields=["progress"], and Django does not auto-add auto_now fields to a narrowed update_fields, so updated_at never moved. Any post-processing run longer than 10 minutes looked frozen and was revoked mid-flight even while streaming progress and stage metrics — observed on a large class-masking run (revoked at 42%, ~1600/3814 classifications). Add updated_at to both save paths in BasePostProcessingTask.update_progress and report_stage_metrics, mirroring the async-job progress save in jobs/models.py that already does this for the same reason. Add a regression test that freezes updated_at past the cutoff and asserts each heartbeat path drags it forward. Co-Authored-By: Claude <noreply@anthropic.com>

mihow · 2026-07-01T17:18:41Z

@coderabbitai full review

coderabbitai · 2026-07-01T17:18:47Z

✅ Action performed

Full review finished.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (3)

ami/ml/management/commands/run_class_masking.py (1)

29-42: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Consider chaining the original exception.

raise CommandError(...) inside each except ... DoesNotExist block discards the original traceback. Chaining with from err preserves it for debugging, independent of whether the repo's lint config enforces B904.

♻️ Proposed fix

         try:
             collection = SourceImageCollection.objects.get(pk=collection_id)
-        except SourceImageCollection.DoesNotExist:
-            raise CommandError(f"SourceImageCollection {collection_id} does not exist.")
+        except SourceImageCollection.DoesNotExist as err:
+            raise CommandError(f"SourceImageCollection {collection_id} does not exist.") from err

         try:
             taxa_list = TaxaList.objects.get(pk=taxa_list_id)
-        except TaxaList.DoesNotExist:
-            raise CommandError(f"TaxaList {taxa_list_id} does not exist.")
+        except TaxaList.DoesNotExist as err:
+            raise CommandError(f"TaxaList {taxa_list_id} does not exist.") from err

         try:
             algorithm = Algorithm.objects.get(pk=algorithm_id)
-        except Algorithm.DoesNotExist:
-            raise CommandError(f"Algorithm {algorithm_id} does not exist.")
+        except Algorithm.DoesNotExist as err:
+            raise CommandError(f"Algorithm {algorithm_id} does not exist.") from err

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ami/ml/management/commands/run_class_masking.py` around lines 29 - 42, The
exception handling in run_class_masking’s object lookups drops the original
traceback when SourceImageCollection, TaxaList, or Algorithm is missing. Update
each except ... DoesNotExist block to chain the caught exception when raising
CommandError, using the existing collection, taxa_list, and algorithm lookup
sections so the original traceback is preserved for debugging.

Source: Linters/SAST tools

ami/ml/post_processing/base.py (1)

76-82: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Correct fix — updated_at now explicitly included in update_fields.

This matches Django's semantics: pre_save() (which bumps auto_now fields) only runs for fields present in update_fields, so omitting updated_at previously left it untouched during these narrowed saves. Verified by the accompanying test_progress_save_bumps_updated_at_for_reaper test.

Both call sites duplicate the same save(update_fields=["progress", "updated_at"]) line and near-identical comment. Consider a tiny shared helper to avoid drift if a third heartbeat call site is added later:
♻️ Optional: extract shared heartbeat save
+    def _save_job_heartbeat(self, extra_fields: list[str]) -> None:
+        """Persist ``extra_fields`` alongside ``updated_at`` so the stale-job
+        reaper (``check_stale_jobs``) sees an actively-progressing run."""
+        self.job.save(update_fields=[*extra_fields, "updated_at"])
+
     def update_progress(self, progress: float):
         if self.job:
             self.job.progress.update_stage(self.job.job_type_key, progress=progress)
-            self.job.save(update_fields=["progress", "updated_at"])
+            self._save_job_heartbeat(["progress"])
Also applies to: 103-108
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ami/ml/post_processing/base.py` around lines 76 - 82, The heartbeat save
logic in the post-processing base flow is duplicated across both call sites,
which can drift if another progress update is added later. Extract the shared
“bump progress and updated_at” behavior in the base post-processing path into a
small helper or private method in the same class, and have both call sites use
it while keeping the `self.job.save(update_fields=["progress", "updated_at"])`
behavior and the stale-job reaper comment aligned.

ami/ml/post_processing/tests/test_class_masking.py (1)

253-269: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Use .get()/count assertion instead of .first() to catch duplicate-creation regressions.

Elsewhere in this file (e.g. lines 145, 173, 198) the new terminal classification is fetched with .get(...), which fails loudly if more than one row matches. Here, .exclude(algorithm=self.algorithm).first() will silently pass even if masking accidentally creates multiple new terminal classifications for the same detection.

💡 Proposed fix

-        new_clf = Classification.objects.filter(detection=det, terminal=True).exclude(algorithm=self.algorithm).first()
-        self.assertIsNotNone(new_clf)
+        new_clf = Classification.objects.get(detection=det, terminal=True, algorithm__ne=self.algorithm)
+        # or, if `__ne` lookup isn't available:
+        new_clfs = Classification.objects.filter(detection=det, terminal=True).exclude(algorithm=self.algorithm)
+        self.assertEqual(new_clfs.count(), 1)
+        new_clf = new_clfs.get()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ami/ml/post_processing/tests/test_class_masking.py` around lines 253 - 269,
In test_task_run_occurrence_scope, replace the lenient
Classification.objects.filter(...).exclude(...).first() check with a stricter
assertion pattern like the other tests in this file that use .get(), so
duplicate terminal classifications cannot slip by unnoticed. Use the same
detection/terminal filters on Classification and identify the non-original row
by algorithm exclusion, then assert exactly one result and validate its taxon
via the fetched object from the ClassMaskingTask flow.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ami/ml/management/commands/run_class_masking.py`:
- Around line 49-58: The dry-run/validation count in run_class_masking.py is
using a broader filter than the actual task scope, so it can report
classifications that ClassMaskingTask._scoped_classifications will skip. Update
the classification_count query to match the same constraints as
_scoped_classifications by requiring both scores and logits to be present, and
keep the guard/--dry-run reporting aligned with that shared scope.

In `@ami/ml/post_processing/class_masking.py`:
- Around line 231-249: The masking algorithm identity in
_get_or_create_masking_algorithm currently only distinguishes source_algorithm
and taxa_list, so reweight=True and reweight=False reuse the same Algorithm row.
Update the Algorithm.objects.get_or_create key (and any related name/description
if needed) to include the reweight mode, so the provenance stored for masked
outputs uniquely reflects whether scores were renormalized or preserved. Use the
existing _get_or_create_masking_algorithm symbol to locate the change and make
sure both modes create or reuse separate algorithm records.
- Around line 129-147: The `class_masking` reweight path is recomputing
kept-class confidence from `logits`, but `reweight=True` should renormalize the
already stored score vector instead. Update the masking logic in `mask_classes`
so the renormalized values are derived from `scores` (with excluded indices
zeroed and the remainder normalized to sum to 1), while preserving the existing
winner selection behavior via the relevant `top_index`/`score` handling.

---

Nitpick comments:
In `@ami/ml/management/commands/run_class_masking.py`:
- Around line 29-42: The exception handling in run_class_masking’s object
lookups drops the original traceback when SourceImageCollection, TaxaList, or
Algorithm is missing. Update each except ... DoesNotExist block to chain the
caught exception when raising CommandError, using the existing collection,
taxa_list, and algorithm lookup sections so the original traceback is preserved
for debugging.

In `@ami/ml/post_processing/base.py`:
- Around line 76-82: The heartbeat save logic in the post-processing base flow
is duplicated across both call sites, which can drift if another progress update
is added later. Extract the shared “bump progress and updated_at” behavior in
the base post-processing path into a small helper or private method in the same
class, and have both call sites use it while keeping the
`self.job.save(update_fields=["progress", "updated_at"])` behavior and the
stale-job reaper comment aligned.

In `@ami/ml/post_processing/tests/test_class_masking.py`:
- Around line 253-269: In test_task_run_occurrence_scope, replace the lenient
Classification.objects.filter(...).exclude(...).first() check with a stricter
assertion pattern like the other tests in this file that use .get(), so
duplicate terminal classifications cannot slip by unnoticed. Use the same
detection/terminal filters on Classification and identify the non-original row
by algorithm exclusion, then assert exactly one result and validate its taxon
via the fetched object from the ClassMaskingTask flow.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c9112f8f-e20b-4fb9-98f2-c6904c0c27a8

📥 Commits

Reviewing files that changed from the base of the PR and between cdcb3b1 and 3072dae.

📒 Files selected for processing (15)

ami/main/admin.py
ami/main/api/serializers.py
ami/main/api/views.py
ami/main/models_future/occurrence.py
ami/ml/management/commands/run_class_masking.py
ami/ml/post_processing/__init__.py
ami/ml/post_processing/admin/actions.py
ami/ml/post_processing/admin/class_masking_form.py
ami/ml/post_processing/admin/forms.py
ami/ml/post_processing/base.py
ami/ml/post_processing/class_masking.py
ami/ml/post_processing/registry.py
ami/ml/post_processing/tests/test_class_masking.py
ami/ml/post_processing/tests/test_class_masking_admin.py
ami/ml/tests.py

…ing, reweight in algorithm identity, dry-run filter Three CodeRabbit findings on the class-masking task: - Derive all masked scores from `logits` alone. The reweight=False branch and the no-change short-circuit still read the stored `scores` field; recompute the model's full softmax from logits instead. The `scores` field is being retired, so masking must not depend on it. Mathematically identical while `scores == softmax(logits)` (the winner is unchanged), and it drops the last read of `scores` from the masking math. (Argmax stays on the masked logits — an excluded class still cannot win.) - Include the reweight mode in the masking algorithm's identity. reweight=True and reweight=False persist different score semantics, so keying the output Algorithm on (source algorithm, taxa list) alone made both modes share one row and lose provenance. The key/name now carry a "reweighted" / "absolute" suffix; a new test pins that the two modes resolve to distinct algorithms. - Match the management command's dry-run count to the task scope by also requiring `logits__isnull=False`, so `--dry-run` and the "no classifications" guard no longer overstate what the run will process. Co-Authored-By: Claude <noreply@anthropic.com>

mohamedelabbas1996 force-pushed the feat/postprocessing-class-masking branch from 0b77504 to 88ffba8 Compare October 14, 2025 19:08

mihow requested a review from Copilot October 15, 2025 03:20

Copilot AI reviewed Oct 15, 2025

View reviewed changes

mihow reviewed Oct 15, 2025

View reviewed changes

Base automatically changed from feat/postprocessing-framework to main October 16, 2025 06:06

mihow marked this pull request as ready for review February 18, 2026 11:45

coderabbitai Bot reviewed Feb 18, 2026

View reviewed changes

Comment thread ami/ml/post_processing/class_masking.py Outdated

Comment thread ami/ml/post_processing/class_masking.py Outdated

Comment thread ami/ml/post_processing/class_masking.py Outdated

Comment thread ami/ml/post_processing/rank_rollup.py Outdated

Comment thread ami/ml/tests.py Outdated

coderabbitai Bot reviewed Feb 18, 2026

View reviewed changes

Comment thread ami/templates/admin/main/class_masking_confirmation.html Outdated

mihow force-pushed the feat/postprocessing-class-masking branch from f9ca78f to fc3f9e1 Compare February 18, 2026 13:00

coderabbitai Bot reviewed Feb 18, 2026

View reviewed changes

Comment thread ami/ml/post_processing/class_masking.py Outdated

Comment thread ami/ml/post_processing/rank_rollup.py Outdated

coderabbitai Bot reviewed Feb 18, 2026

View reviewed changes

Comment thread ami/main/api/serializers.py

Comment thread ami/main/api/views.py Outdated

annavik mentioned this pull request Mar 2, 2026

Present details in UI related to post processing #1165

Open

mihow assigned mihow and annavik Mar 16, 2026

annavik removed their assignment Mar 16, 2026

mihow mentioned this pull request May 1, 2026

Framework for admins to trigger and review post-processing methods #1289

Merged

6 tasks

mihow mentioned this pull request Jun 23, 2026

Validate the scope object belongs to the job's project in post-processing tasks #1345

Open

mihow added the post-processing Post-processing task framework and tasks: size filter, class masking, rank rollup, tracking label Jun 23, 2026

mihow force-pushed the feat/postprocessing-class-masking branch from daed538 to 6e8eac8 Compare June 26, 2026 05:04

mihow and others added 3 commits June 29, 2026 19:15

mihow force-pushed the feat/postprocessing-class-masking branch from 6e8eac8 to ba796c1 Compare June 30, 2026 02:16

mihow changed the title ~~Implement class masking using the post-processing framework~~ Let admins mask classifier predictions to a species list and roll up low-confidence IDs Jun 30, 2026

mihow mentioned this pull request Jun 30, 2026

Measure the effect of post-processing filters on final predictions (offline eval) #1360

Open

mihow mentioned this pull request Jun 30, 2026

Let admins roll up low-confidence predictions to a coarser rank #1361

Draft

mihow changed the title ~~Let admins mask classifier predictions to a species list and roll up low-confidence IDs~~ Let admins mask classifier predictions to a species list Jun 30, 2026

mihow and others added 2 commits July 1, 2026 08:37

coderabbitai Bot reviewed Jul 1, 2026

View reviewed changes

Comment thread ami/ml/management/commands/run_class_masking.py

Comment thread ami/ml/post_processing/class_masking.py Outdated

Comment thread ami/ml/post_processing/class_masking.py Outdated

Uh oh!

Conversation

mohamedelabbas1996 commented Oct 14, 2025 • edited by mihow Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

List of Changes

Detailed Description

How to test

Class masking on a single occurrence (fast path)

Class masking on a capture set

Verify provenance in the API

Related

Uh oh!

netlify Bot commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for antenna-preview canceled.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mihow Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

netlify Bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for antenna-ssec canceled.

Uh oh!

coderabbitai Bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mihow commented Feb 18, 2026

Browser Testing: Class Masking & Admin Intermediate Page

1. Management Command Test (end-to-end)

2. Admin Intermediate Page (replaces hardcoded action)

3. Classifications After Masking (admin view)

Changes in this update

Uh oh!

mihow commented Feb 18, 2026

Browser Testing: Class Masking in the UI

Management Command

Admin: Intermediate Page

Occurrence Detail — Identifications Tab

mohamedelabbas1996 commented Oct 14, 2025 •

edited by mihow

Loading

netlify Bot commented Oct 14, 2025 •

edited

Loading

netlify Bot commented Feb 18, 2026 •

edited

Loading

coderabbitai Bot commented Feb 18, 2026 •

edited

Loading

Proposed `applied_to` API schema for UI

coderabbitai Bot commented Jul 1, 2026 •

edited

Loading