Skip to content

fix(qdrant)!: clean up orphaned data PVC on delete (#3059)#3075

Open
scooby87 wants to merge 3 commits into
mainfrom
fix/qdrant-data-pvc-orphaned
Open

fix(qdrant)!: clean up orphaned data PVC on delete (#3059)#3075
scooby87 wants to merge 3 commits into
mainfrom
fix/qdrant-data-pvc-orphaned

Conversation

@scooby87

@scooby87 scooby87 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

What this PR does

After deleting a Qdrant app CR, the StatefulSet data PVC
(qdrant-storage-<release>-0) was left orphaned in the tenant namespace,
consuming storage permanently. StatefulSet-templated PVCs are never
garbage-collected on StatefulSet deletion by default.

The vendored upstream Qdrant chart does not support
persistentVolumeClaimRetentionPolicy (verified: its statefulset.yaml never
renders that field), so the OpenBAO-style value passthrough is not possible
here. Instead this adds a post-delete PVC cleanup hook (modeled on the MariaDB
hook) that deletes the release's PVCs by label.

The chart's volumeClaimTemplate only stamps app: qdrant (shared by every
Qdrant in a namespace), so to select PVCs safely per release this also passes
persistence.additionalLabels.app.kubernetes.io/instance: <release> through the
HelmRelease — giving the data PVC a release-unique label the hook selects on.

The cleanup Job is hardened: runs as non-root (uid 65534), read-only root fs,
all capabilities dropped, seccompProfile: RuntimeDefault; RBAC scoped to
persistentvolumeclaims. A real cleanup failure is logged (not silently
swallowed) but never blocks namespace teardown.

Warning

Deleting a Qdrant app now permanently removes its data PVC and the vector data it holds.
This is intentional and fixes the storage leak; the hook only fires on full
release deletion (not on scale-down). Back up before deleting.

Fixes #3059.

Release note

fix(qdrant): garbage-collect the StatefulSet data PVC when a Qdrant app is deleted, instead of leaving it orphaned. WARNING: deleting a Qdrant app now permanently removes its data PVC and the data it holds.

Summary by CodeRabbit

  • New Features

    • Added automated cleanup after deleting a Qdrant deployment to remove leftover data volumes.
    • Included security-hardened Kubernetes resources for the cleanup task.
  • Tests

    • Added chart coverage for the post-delete cleanup workflow.
    • Extended end-to-end checks to confirm the data volume is removed after deletion.

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses an issue where Qdrant data PVCs were left orphaned in the namespace after an application deletion. By introducing a post-delete Helm hook and applying unique instance labels to the PVCs, the system now ensures that storage is reclaimed correctly when a Qdrant release is removed. This change prevents long-term storage leaks while maintaining strict security practices for the cleanup process.

Highlights

  • PVC Cleanup Hook: Added a post-delete Helm hook that automatically removes orphaned persistent volume claims (PVCs) associated with a Qdrant release.
  • Labeling Strategy: Implemented a release-unique label (app.kubernetes.io/instance) on the data PVC to ensure the cleanup hook targets only the relevant resources.
  • Security Hardening: The cleanup Job is configured with non-root execution, read-only root filesystem, and minimal RBAC permissions to ensure secure operation.
  • Testing: Added comprehensive unit tests to verify the label passthrough and the correct rendering of the cleanup hook and its associated RBAC.
New Features

🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment Gemini (@gemini-code-assist) Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on Gemini (@gemini-code-assist) comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions github-actions Bot added area/uncategorized PR auto-labeler could not map title scope to a known area/*; please review kind/bug Categorizes issue or PR as related to a bug size/L This PR changes 100-499 lines, ignoring generated files labels Jun 25, 2026
@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a Helm post-delete hook to the Qdrant chart that removes orphaned StatefulSet data PVCs by name-prefix matching. The hook includes a batch/v1 Job with hardened security settings plus a ServiceAccount, Role, and RoleBinding. Unit tests and an e2e test verify the cleanup.

Qdrant PVC cleanup hook

Layer / File(s) Summary
Post-delete cleanup Job and RBAC
packages/apps/qdrant/templates/hooks/cleanup-pvc.yaml
Introduces a post-delete Job that lists PVCs in the release namespace, filters by qdrant-storage-{{ .Release.Name }}- prefix, and deletes them with --ignore-not-found. Paired with a ServiceAccount, Role (get/list/delete on persistentvolumeclaims), and RoleBinding, all with hook weight ordering and hardened pod security.
Unit tests
packages/apps/qdrant/tests/cleanup_pvc_test.yaml
Helm unit test suite asserting the rendered Job carries the post-delete hook annotation and correct service account, and that the Role grants the expected RBAC rules on persistentvolumeclaims.
E2e verification
hack/e2e-apps/qdrant.bats
After CR deletion, the e2e test polls for disappearance of qdrant-storage-qdrant-$name-0; on timeout it prints PVC status and hook Job/Pod info before failing.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

area/kubernetes

Suggested reviewers

  • kvaps
  • lllamnyp
  • androndo
  • IvanHunters

Poem

🐇 Hop hop, the PVCs flee,
No orphaned volumes for you or me!
Post-delete hooks sweep the floor,
kubectl deletes, then deletes some more.
Cozy storage, clean and free! 🌱

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly states the Qdrant PVC cleanup-on-delete fix and matches the main change.
Linked Issues check ✅ Passed The post-delete hook removes the orphaned Qdrant data PVC, matching the issue's expected cleanup behavior.
Out of Scope Changes check ✅ Passed No unrelated code changes are apparent; the hook, RBAC, tests, and e2e update all support PVC cleanup.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/qdrant-data-pvc-orphaned

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@dosubot dosubot Bot added the area/database Issues or PRs related to managed databases (postgres, mariadb, redis, etcd, kafka, clickhouse) label Jun 25, 2026
@scooby87 scooby87 removed the area/uncategorized PR auto-labeler could not map title scope to a known area/*; please review label Jun 25, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a post-delete cleanup hook to delete orphaned PersistentVolumeClaims (PVCs) when a Qdrant release is deleted, along with corresponding unit tests. Feedback on the changes points out that the upstream Qdrant Helm chart uses persistence.labels rather than persistence.additionalLabels to apply labels to the PVC volume claim templates. Consequently, the HelmRelease values and the associated unit tests need to be updated to use labels to ensure the PVCs are correctly labeled and subsequently cleaned up.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +42 to +43
additionalLabels:
app.kubernetes.io/instance: {{ .Release.Name }}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The upstream Qdrant Helm chart uses persistence.labels to apply labels to the PVC volume claim templates, not persistence.additionalLabels. Using additionalLabels will result in the label being ignored, meaning the PVC will not be labeled and the post-delete cleanup hook will fail to find and delete the orphaned PVC.

Please change additionalLabels to labels.

        labels:
          app.kubernetes.io/instance: {{ .Release.Name }}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This finding is incorrect — no change needed.

The vendored upstream Qdrant chart applies volumeClaimTemplates labels from persistence.additionalLabels, not persistence.labels:

https://github.com/cozystack/cozystack/blob/main/packages/system/qdrant/charts/qdrant/templates/statefulset.yaml#L264-L271

  volumeClaimTemplates:
    - metadata:
        name: {{ .Values.persistence.storageVolumeName | default "qdrant-storage" }}
        labels:
          app: {{ template "qdrant.name" . }}
          {{- with .Values.persistence.additionalLabels }}
          {{- toYaml . | nindent 10 }}
          {{- end }}

There is no persistence.labels key in this chart. Switching to labels would drop the app.kubernetes.io/instance label from the PVC and break the post-delete cleanup hook. The current additionalLabels passthrough is correct, and the helm unittest suite (cleanup_pvc_test.yaml) passes against the rendered output.

value: HelmRelease
asserts:
- equal:
path: spec.values.qdrant.persistence.additionalLabels["app.kubernetes.io/instance"]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Since the field in templates/qdrant.yaml should be changed from additionalLabels to labels, this test assertion needs to be updated accordingly to verify the correct path.

          path: spec.values.qdrant.persistence.labels["app.kubernetes.io/instance"]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same root cause as the additionalLabels thread above — the upstream chart uses persistence.additionalLabels (statefulset.yaml#L264-L271), so this test asserts the correct path and needs no change.

@github-actions github-actions Bot added the area/uncategorized PR auto-labeler could not map title scope to a known area/*; please review label Jun 25, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/apps/qdrant/templates/hooks/cleanup-pvc.yaml`:
- Around line 5-95: The cleanup PVC Helm template has unquoted Helm scalars that
strict YAML parsers treat as syntax errors. Update the qdrant cleanup manifest
so every field using template expressions like {{ .Release.Name }} is quoted,
including the resource names, serviceAccountName, roleRef.name, and
subjects.name. Use the cleanup job, ServiceAccount, Role, RoleBinding, and
related identifiers to locate all affected scalar values and keep the rendered
YAML valid.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a426d42b-fa22-407f-82ad-ba6e3a2bce8d

📥 Commits

Reviewing files that changed from the base of the PR and between 63f9926 and 065581a.

📒 Files selected for processing (4)
  • packages/apps/qdrant/Makefile
  • packages/apps/qdrant/templates/hooks/cleanup-pvc.yaml
  • packages/apps/qdrant/templates/qdrant.yaml
  • packages/apps/qdrant/tests/cleanup_pvc_test.yaml

Comment on lines +5 to +95
name: {{ .Release.Name }}-qdrant-cleanup
labels:
app.kubernetes.io/instance: {{ .Release.Name }}
annotations:
"helm.sh/hook": post-delete
"helm.sh/hook-weight": "10"
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
spec:
template:
metadata:
labels:
app.kubernetes.io/instance: {{ .Release.Name }}
policy.cozystack.io/allow-to-apiserver: "true"
spec:
serviceAccountName: {{ .Release.Name }}-qdrant-cleanup
restartPolicy: Never
securityContext:
runAsNonRoot: true
runAsUser: 65534
runAsGroup: 65534
seccompProfile:
type: RuntimeDefault
containers:
- name: cleanup
image: docker.io/clastix/kubectl:v1.32
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
env:
- name: HOME
value: /tmp
command:
- /bin/sh
- -c
- |
echo "Deleting orphaned PVCs for {{ .Release.Name }}..."
kubectl delete pvc -n {{ .Release.Namespace }} -l app.kubernetes.io/instance={{ .Release.Name }} --ignore-not-found \
|| echo "WARNING: PVC cleanup failed for {{ .Release.Name }}; orphaned PVCs may remain" >&2
echo "PVC cleanup complete."
volumeMounts:
- name: tmp
mountPath: /tmp
volumes:
- name: tmp
emptyDir: {}
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: {{ .Release.Name }}-qdrant-cleanup
labels:
app.kubernetes.io/instance: {{ .Release.Name }}
annotations:
"helm.sh/hook": post-delete
helm.sh/hook-weight: "0"
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: {{ .Release.Name }}-qdrant-cleanup
labels:
app.kubernetes.io/instance: {{ .Release.Name }}
annotations:
"helm.sh/hook": post-delete
"helm.sh/hook-weight": "5"
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
rules:
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["get", "list", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: {{ .Release.Name }}-qdrant-cleanup
labels:
app.kubernetes.io/instance: {{ .Release.Name }}
annotations:
"helm.sh/hook": post-delete
helm.sh/hook-weight: "5"
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: {{ .Release.Name }}-qdrant-cleanup
subjects:
- kind: ServiceAccount
name: {{ .Release.Name }}-qdrant-cleanup

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major

Quote templated scalar values to resolve YAML syntax errors.

yamllint fails on this file with syntax error: expected <block end> at line 5. Unquoted Helm template scalars starting with {{ are misinterpreted as YAML flow collections by strict parsers.

Quote all line items containing {{ .Release.Name }} (e.g., name, serviceAccountName, roleRef.name, subjects.name). For example:
name: {{ .Release.Name }}-qdrant-cleanupname: "{{ .Release.Name }}-qdrant-cleanup".

🧰 Tools
🪛 YAMLlint (1.37.1)

[error] 5-5: syntax error: expected , but found ''

(syntax)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/apps/qdrant/templates/hooks/cleanup-pvc.yaml` around lines 5 - 95,
The cleanup PVC Helm template has unquoted Helm scalars that strict YAML parsers
treat as syntax errors. Update the qdrant cleanup manifest so every field using
template expressions like {{ .Release.Name }} is quoted, including the resource
names, serviceAccountName, roleRef.name, and subjects.name. Use the cleanup job,
ServiceAccount, Role, RoleBinding, and related identifiers to locate all
affected scalar values and keep the rendered YAML valid.

The StatefulSet data PVC (qdrant-storage-<release>-0) was left orphaned
after a Qdrant app CR was deleted. The vendored upstream chart does not
support persistentVolumeClaimRetentionPolicy, so add a post-delete PVC
cleanup hook (modeled on the MariaDB hook) and pass persistence.additionalLabels
through the HelmRelease so the data PVC carries a release-unique
app.kubernetes.io/instance label the hook selects on. The cleanup Job runs
non-root with a read-only root fs and dropped capabilities; failures are
logged but never block namespace teardown.

Signed-off-by: Алексей Артамонов <aleksei.artamonov@aenix.io>
@scooby87 scooby87 force-pushed the fix/qdrant-data-pvc-orphaned branch from 065581a to 178b836 Compare June 25, 2026 15:48
The .PHONY: test tweak is out of scope for the PVC cleanup fix; remove it
to keep this PR focused (flagged by review).

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Алексей Артамонов <alexeyartamonov1987@gmail.com>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOT LGTM — the PVC-label mechanism breaks reconciliation for already-deployed Qdrant instances.

Blocker — StatefulSet volumeClaimTemplates is immutable. This PR routes the release-unique label through qdrant.persistence.additionalLabels, which the upstream chart renders into volumeClaimTemplates[].metadata.labels (packages/system/qdrant/charts/qdrant/templates/statefulset.yaml ~L264, L269). For any Qdrant that already exists, this is a change to an immutable StatefulSet field, which the API server rejects (only replicas, ordinals, template, updateStrategy, persistentVolumeClaimRetentionPolicy, minReadySeconds are mutable). The <release>-system HelmRelease (retries: -1) then retry-fails indefinitely on upgrade — no data loss, but reconciliation is permanently broken, and the cleanup still can't reclaim those instances' PVCs (volumeClaimTemplate label changes don't apply retroactively to existing PVCs). This is the opposite of the openbao case, where the change worked precisely because persistentVolumeClaimRetentionPolicy is in the mutable list.

Suggested fix: drop the additionalLabels passthrough and have the hook delete by deterministic PVC name-prefix (qdrant-storage-<release>-). PVC names derive from the release, so this needs no StatefulSet change (upgrade-safe) and also reclaims PVCs from instances created before this change.

Also: hack/e2e-apps/qdrant.bats deletes the Qdrant but never asserts the data PVC is gone afterward, so the behavior this PR exists to fix isn't covered end-to-end — please add a post-delete (async-waited) not-found check on qdrant-storage-<name>-0.

Notes: the unit test can only assert the value reaches the HelmRelease, not that the subchart consumes it (I confirmed consumption separately at statefulset.yaml L269 — it is not a no-op). Consider the fix(...)! marker for the data-loss-on-delete behavior change, to match the sibling ClickHouse PR. The automated "use persistence.labels" suggestions are incorrect — the chart uses persistence.additionalLabels (no persistence.labels key exists). The security hardening on the cleanup Job is a clear improvement over the hook it is modeled on.

…e label

The previous approach routed a release-unique label through
qdrant.persistence.additionalLabels, which the upstream chart renders into
the StatefulSet's volumeClaimTemplates[].metadata.labels. That field is
immutable on an existing StatefulSet, so for any already-deployed Qdrant the
API server would reject the upgrade and the <release>-system HelmRelease
(retries: -1) would retry-fail forever — permanently wedging reconciliation.
Label changes on a volumeClaimTemplate also never apply to PVCs that already
exist, so the cleanup hook could not reclaim those instances anyway.

Switch the post-delete hook to select PVCs by the deterministic StatefulSet
naming prefix (qdrant-storage-<release>-) instead. PVC names derive purely
from the release name, so this needs no StatefulSet change (upgrade-safe) and
also reclaims PVCs from instances created before the hook existed.

- qdrant.yaml: drop the additionalLabels passthrough entirely.
- hooks/cleanup-pvc.yaml: delete by name prefix via kubectl get -o name | grep.
- tests: assert the name-prefix command instead of the label passthrough.
- e2e: assert qdrant-storage-<name>-0 is gone after delete (async-waited).

BREAKING CHANGE: deleting a Qdrant app now permanently removes its data PVC
and the vector data it holds.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Алексей Артамонов <alexeyartamonov1987@gmail.com>
@scooby87 scooby87 changed the title fix(qdrant): clean up orphaned data PVC on delete (#3059) fix(qdrant)!: clean up orphaned data PVC on delete (#3059) Jun 29, 2026
@github-actions github-actions Bot added the kind/breaking-change Indicates the change introduces a breaking API or behaviour change label Jun 29, 2026
@scooby87

Copy link
Copy Markdown
Contributor Author

Aleksei Sviridkin (@lexfrei) thanks for the thorough review — you're right on the blocker, fixed in 2587458.

Root cause accepted. volumeClaimTemplates is immutable, so routing the release label through qdrant.persistence.additionalLabels (rendered into volumeClaimTemplates[].metadata.labels) would have made the API server reject the StatefulSet update on every already-deployed instance, and with retries: -1 the <release>-system HelmRelease would retry-fail indefinitely. And since volumeClaimTemplate label changes never apply retroactively, the hook couldn't have reclaimed those PVCs anyway. This is indeed the opposite of the OpenBAO case, where it worked only because persistentVolumeClaimRetentionPolicy is in the mutable set.

What changed (your suggested fix):

  1. Dropped the additionalLabels passthrough in templates/qdrant.yaml entirely — no StatefulSet mutation, so existing instances reconcile unchanged.
  2. Hook now deletes by deterministic PVC name prefix qdrant-storage-<release>- instead of by label:
    pvcs=$(kubectl get pvc -n <ns> -o name \
      | grep "^persistentvolumeclaim/qdrant-storage-<release>-" || true)
    [ -n "$pvcs" ] && echo "$pvcs" | xargs kubectl delete -n <ns> --ignore-not-found
    PVC names derive purely from the release name (volumeClaimTemplate qdrant-storage + StatefulSet name = fullnameOverride = <release> + ordinal), so this is upgrade-safe and also reclaims PVCs from instances created before this hook existed. The trailing - prevents a release named foo from matching qdrant-storage-foobar-*.
  3. e2e coverage in hack/e2e-apps/qdrant.bats: after deleting the Qdrant it now async-waits for qdrant-storage-<name>-0 to be not-found and fails loud (dumping PVC + cleanup Job/Pod state) if it lingers — so the regression this PR fixes is now actually covered end-to-end.
  4. Unit test rewritten to assert the name-prefix command rather than the (now-removed) label passthrough.
  5. fix(qdrant)! — marked the PR breaking for the data-loss-on-delete behavior, matching the sibling ClickHouse PR. RBAC stays scoped to persistentvolumeclaims (get,list,delete), which the get -o name | xargs delete flow needs.

helm unittest passes (60/60). The bot "use persistence.labels" suggestions remain incorrect as you noted — there is no persistence.labels key; the chart consumes persistence.additionalLabels at statefulset.yaml L269 — and that passthrough is now gone regardless.

PTAL when you have a moment.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@hack/e2e-apps/qdrant.bats`:
- Around line 34-35: The PVC wait check in qdrant.bats currently treats any
kubectl get pvc failure as success, so update the delete-verification logic to
only exit cleanly when kubectl reports NotFound for
qdrant-storage-qdrant-$name-0 and fail the script on all other errors. Adjust
the timeout loop and its surrounding shell condition so it inspects kubectl’s
stderr/exit status, keeping the existing regression message but ensuring
transient API/auth issues are surfaced instead of masked.

In `@packages/apps/qdrant/templates/hooks/cleanup-pvc.yaml`:
- Around line 59-65: The PVC cleanup in the cleanup logic should only target
StatefulSet-owned PVCs for the current release and must not treat list-command
errors as “no PVCs found.” Update the matching in the pvc collection block so it
binds to the exact qdrant StatefulSet ordinal naming used by the release (not a
broad prefix match), and handle the kubectl get pvc failure path explicitly in
this cleanup script instead of suppressing it with the current fallback. Use the
existing cleanup flow around pvcs, kubectl get pvc, and kubectl delete to locate
and adjust the behavior.

In `@packages/apps/qdrant/tests/cleanup_pvc_test.yaml`:
- Around line 41-43: The cleanup PVC test is still asserting only a broad prefix
match, so it won’t catch accidental matches like qdrant-test-other. Tighten the
expectation in cleanup_pvc_test.yaml for the hook command built by the cleanup
selector so it verifies the ordinal-bounded PVC pattern used by qdrant-test, and
update the matchRegex on spec.template.spec.containers[0].command[2] to reflect
the selector fix.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4356b894-8433-4423-a75a-6d449b99c242

📥 Commits

Reviewing files that changed from the base of the PR and between 26dc006 and 2587458.

📒 Files selected for processing (3)
  • hack/e2e-apps/qdrant.bats
  • packages/apps/qdrant/templates/hooks/cleanup-pvc.yaml
  • packages/apps/qdrant/tests/cleanup_pvc_test.yaml

Comment thread hack/e2e-apps/qdrant.bats
Comment on lines +34 to +35
timeout 120 sh -ec "while kubectl -n tenant-test get pvc qdrant-storage-qdrant-$name-0 >/dev/null 2>&1; do sleep 2; done" \
|| { echo "❌ data PVC qdrant-storage-qdrant-$name-0 still present after Qdrant delete (issue #3059 regression)"; kubectl -n tenant-test get pvc -o wide; kubectl -n tenant-test get jobs,pods -l app.kubernetes.io/instance=qdrant-$name -o wide; false; }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Only treat NotFound as success here.

This loop passes on any kubectl get pvc failure, so a transient API/auth error can hide a real cleanup regression. Please distinguish NotFound from other non-zero exits and fail on the latter.

Suggested fix
-  timeout 120 sh -ec "while kubectl -n tenant-test get pvc qdrant-storage-qdrant-$name-0 >/dev/null 2>&1; do sleep 2; done" \
+  timeout 120 sh -ec '
+    while true; do
+      out=$(kubectl -n tenant-test get pvc qdrant-storage-qdrant-'"$name"'-0 2>&1) && {
+        sleep 2
+        continue
+      }
+      case "$out" in
+        *"(NotFound)"*) exit 0 ;;
+        *) echo "$out" >&2; exit 1 ;;
+      esac
+    done
+  ' \
     || { echo "❌ data PVC qdrant-storage-qdrant-$name-0 still present after Qdrant delete (issue `#3059` regression)"; kubectl -n tenant-test get pvc -o wide; kubectl -n tenant-test get jobs,pods -l app.kubernetes.io/instance=qdrant-$name -o wide; false; }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
timeout 120 sh -ec "while kubectl -n tenant-test get pvc qdrant-storage-qdrant-$name-0 >/dev/null 2>&1; do sleep 2; done" \
|| { echo "❌ data PVC qdrant-storage-qdrant-$name-0 still present after Qdrant delete (issue #3059 regression)"; kubectl -n tenant-test get pvc -o wide; kubectl -n tenant-test get jobs,pods -l app.kubernetes.io/instance=qdrant-$name -o wide; false; }
timeout 120 sh -ec '
while true; do
out=$(kubectl -n tenant-test get pvc qdrant-storage-qdrant-'"$name"'-0 2>&1) && {
sleep 2
continue
}
case "$out" in
*"(NotFound)"*) exit 0 ;;
*) echo "$out" >&2; exit 1 ;;
esac
done
' \
|| { echo "❌ data PVC qdrant-storage-qdrant-$name-0 still present after Qdrant delete (issue `#3059` regression)"; kubectl -n tenant-test get pvc -o wide; kubectl -n tenant-test get jobs,pods -l app.kubernetes.io/instance=qdrant-$name -o wide; false; }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hack/e2e-apps/qdrant.bats` around lines 34 - 35, The PVC wait check in
qdrant.bats currently treats any kubectl get pvc failure as success, so update
the delete-verification logic to only exit cleanly when kubectl reports NotFound
for qdrant-storage-qdrant-$name-0 and fail the script on all other errors.
Adjust the timeout loop and its surrounding shell condition so it inspects
kubectl’s stderr/exit status, keeping the existing regression message but
ensuring transient API/auth issues are surfaced instead of masked.

Comment on lines +59 to +65
pvcs=$(kubectl get pvc -n {{ .Release.Namespace }} -o name \
| grep "^persistentvolumeclaim/qdrant-storage-{{ .Release.Name }}-" || true)
if [ -z "$pvcs" ]; then
echo "No data PVCs found for release {{ .Release.Name }}; nothing to delete."
else
echo "$pvcs" | xargs kubectl delete -n {{ .Release.Namespace }} --ignore-not-found \
|| echo "WARNING: PVC cleanup failed for {{ .Release.Name }}; orphaned PVCs may remain" >&2

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Bound PVC matching to StatefulSet ordinals and log list failures.

The current prefix match can delete another release’s PVC when release names overlap, e.g. cleanup for foo also matches qdrant-storage-foo-bar-0. Also, || true masks kubectl get pvc failures as “no PVCs found,” so cleanup can silently fail.

Proposed fix
-              pvcs=$(kubectl get pvc -n {{ .Release.Namespace }} -o name \
-                | grep "^persistentvolumeclaim/qdrant-storage-{{ .Release.Name }}-" || true)
+              if ! pvc_names=$(kubectl get pvc -n {{ .Release.Namespace }} -o name); then
+                echo "WARNING: PVC cleanup failed for {{ .Release.Name }}; unable to list PVCs" >&2
+                exit 0
+              fi
+              pvcs=$(echo "$pvc_names" \
+                | grep -E "^persistentvolumeclaim/qdrant-storage-{{ .Release.Name }}-[0-9]+$" || true)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pvcs=$(kubectl get pvc -n {{ .Release.Namespace }} -o name \
| grep "^persistentvolumeclaim/qdrant-storage-{{ .Release.Name }}-" || true)
if [ -z "$pvcs" ]; then
echo "No data PVCs found for release {{ .Release.Name }}; nothing to delete."
else
echo "$pvcs" | xargs kubectl delete -n {{ .Release.Namespace }} --ignore-not-found \
|| echo "WARNING: PVC cleanup failed for {{ .Release.Name }}; orphaned PVCs may remain" >&2
if ! pvc_names=$(kubectl get pvc -n {{ .Release.Namespace }} -o name); then
echo "WARNING: PVC cleanup failed for {{ .Release.Name }}; unable to list PVCs" >&2
exit 0
fi
pvcs=$(echo "$pvc_names" \
| grep -E "^persistentvolumeclaim/qdrant-storage-{{ .Release.Name }}-[0-9]+$" || true)
if [ -z "$pvcs" ]; then
echo "No data PVCs found for release {{ .Release.Name }}; nothing to delete."
else
echo "$pvcs" | xargs kubectl delete -n {{ .Release.Namespace }} --ignore-not-found \
|| echo "WARNING: PVC cleanup failed for {{ .Release.Name }}; orphaned PVCs may remain" >&2
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/apps/qdrant/templates/hooks/cleanup-pvc.yaml` around lines 59 - 65,
The PVC cleanup in the cleanup logic should only target StatefulSet-owned PVCs
for the current release and must not treat list-command errors as “no PVCs
found.” Update the matching in the pvc collection block so it binds to the exact
qdrant StatefulSet ordinal naming used by the release (not a broad prefix
match), and handle the kubectl get pvc failure path explicitly in this cleanup
script instead of suppressing it with the current fallback. Use the existing
cleanup flow around pvcs, kubectl get pvc, and kubectl delete to locate and
adjust the behavior.

Comment on lines +41 to +43
- matchRegex:
path: spec.template.spec.containers[0].command[2]
pattern: "grep \"\\^persistentvolumeclaim/qdrant-storage-qdrant-test-\""

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Assert the ordinal-bounded PVC selector.

This test currently preserves the broad prefix match, so it would still pass if cleanup for qdrant-test matched qdrant-test-other PVCs. Update it alongside the hook selector fix.

Proposed fix
       - matchRegex:
           path: spec.template.spec.containers[0].command[2]
-          pattern: "grep \"\\^persistentvolumeclaim/qdrant-storage-qdrant-test-\""
+          pattern: "grep -E \"\\^persistentvolumeclaim/qdrant-storage-qdrant-test-\\[0-9\\]\\+\\$\""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- matchRegex:
path: spec.template.spec.containers[0].command[2]
pattern: "grep \"\\^persistentvolumeclaim/qdrant-storage-qdrant-test-\""
- matchRegex:
path: spec.template.spec.containers[0].command[2]
pattern: "grep -E \"\\^persistentvolumeclaim/qdrant-storage-qdrant-test-\\[0-9\\]\\+\\$\""
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/apps/qdrant/tests/cleanup_pvc_test.yaml` around lines 41 - 43, The
cleanup PVC test is still asserting only a broad prefix match, so it won’t catch
accidental matches like qdrant-test-other. Tighten the expectation in
cleanup_pvc_test.yaml for the hook command built by the cleanup selector so it
verifies the ordinal-bounded PVC pattern used by qdrant-test, and update the
matchRegex on spec.template.spec.containers[0].command[2] to reflect the
selector fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/database Issues or PRs related to managed databases (postgres, mariadb, redis, etcd, kafka, clickhouse) area/uncategorized PR auto-labeler could not map title scope to a known area/*; please review kind/breaking-change Indicates the change introduces a breaking API or behaviour change kind/bug Categorizes issue or PR as related to a bug size/L This PR changes 100-499 lines, ignoring generated files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

qdrant: StatefulSet data PVC is orphaned after deletion

2 participants