Skip to content

ARTEMIS-6046 Kubernetes LockManager implementation#6409

Open
clebertsuconic wants to merge 1 commit into
apache:mainfrom
clebertsuconic:kube-lock
Open

ARTEMIS-6046 Kubernetes LockManager implementation#6409
clebertsuconic wants to merge 1 commit into
apache:mainfrom
clebertsuconic:kube-lock

Conversation

@clebertsuconic

@clebertsuconic clebertsuconic commented May 1, 2026

Copy link
Copy Markdown
Contributor
This commit introduces a Kubernetes-based distributed lock implementation using a generic HTTP REST client abstracted from KubernetesLoginModule.

Key changes:
- Extracted reusable Kubernetes HTTP client to artemis-commons
- Implemented KubernetesLockManager using Kubernetes Lease API
- Implemented KubeMutableLong using Kubernetes ConfigMap for distributed counters
- Moved PemSupport and extracted KeyStoreSupport to artemis-commons to avoid circular dependencies
- Added AbstractDistributedLockManager base class with parameter validation
- Added tests using LockCoordinatorTest against real Kubernetes (via Minikube) and FakeMinikube MockServer
- Added user manual documentation with RBAC configuration examples
- Added smoke test configurations for Kubernetes-based dual-mirror setup

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

@clebertsuconic clebertsuconic marked this pull request as draft May 1, 2026 23:34
@clebertsuconic clebertsuconic force-pushed the kube-lock branch 2 times, most recently from c557799 to 361bcf9 Compare May 2, 2026 00:10
@clebertsuconic clebertsuconic marked this pull request as ready for review May 2, 2026 00:10
@clebertsuconic clebertsuconic force-pushed the kube-lock branch 4 times, most recently from 2345717 to de9eeb9 Compare May 4, 2026 20:58

@gtully gtully left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, but I think we need to reduce the dependencies, with cve's popping up all over the place, we need to be reducing our surface area dependencies as much as possible.

Peek at the http client in the oidcloginmodule for inspiration, or the token review calls in the kubelogin module.
see: 98b24f02ae#diff-c0c7084d3381d18956a831a896c9124b9636a297630fa138513d3fc720560f45

@clebertsuconic

Copy link
Copy Markdown
Contributor Author

I will try. There are date operations here that are not easy to manage.

@gemmellr

gemmellr commented May 6, 2026

Copy link
Copy Markdown
Member

Mostly havent looked at this, but would follow Gary's comment on the deps/shading-thereof, seem horrid, as likely could be maintaining the shading going forward (and the 12 seconds it takes isnt ideal either). With all the deps (at least some of which seem like dupes) I'd guess its not tiny either, so if its large I'm not necessarily seeing it as obvious we should add it to the distribution.

I cant comment on the actual lines, as GitHub is having an outage preventing new code comments on PRs lol, but...

The bom changes are broken. The entries are in the relocations/old-modules section of the bom, and as brand new modules these ones would not be expected to have any relocations (indeed you didnt add any, so the GAVs listed there wont ever exist and are just junk entries, which can break things inspecting them). As the only changes to the bom that also means that there are no bom entries for the actual module GAVs either, which means you must have have put the version fields in place elsewhere in the build instead of relying on the bom-managed entries (doing which would point out that the bom is currently broken).

@clebertsuconic

Copy link
Copy Markdown
Contributor Author

@gemmellr I will see if I can connect directly to REST bits..

The kubernetes client has a few dependencies. and that's I am bringing.

I could also "fork" the code for what I need.. since all the kubernetes client does on this is to execute rest calls. The difficult part here is the dates operation and initial certificate validation.

Let me see what I can do and i will come back here.

@clebertsuconic clebertsuconic marked this pull request as draft May 7, 2026 00:37
@clebertsuconic clebertsuconic force-pushed the kube-lock branch 6 times, most recently from 0390a9d to e0fc2b2 Compare June 6, 2026 16:58
@clebertsuconic clebertsuconic force-pushed the kube-lock branch 8 times, most recently from d67a963 to dc8faa6 Compare June 18, 2026 01:10
@clebertsuconic clebertsuconic force-pushed the kube-lock branch 9 times, most recently from fa876af to fca624b Compare June 19, 2026 03:20
@gtully

gtully commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

I don't like the commit message email at the end. I am using a simpler "AssistedBy: Claude" trailer. you are still the author and responsible entity.

@gtully

gtully commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

I think it would be valuable to track the spec leaseTransitions attribute. It is good information to be visible in case there is flip/flop.
https://kubernetes.io/docs/reference/kubernetes-api/coordination/lease-v1/#LeaseSpec:~:text=last%20observed%20renewTime.-,leaseTransitions,-integer

@gemmellr gemmellr left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at the changes beyond the pom files, but I notice some of the things I commented on before are still needing updated so I commented on the actual files/lines this time (file/line commenting was having an outage last time I looked so I couldn’t then)

Comment thread artemis-bom/pom.xml Outdated
Comment thread tests/smoke-tests/pom.xml Outdated
@clebertsuconic clebertsuconic force-pushed the kube-lock branch 14 times, most recently from c7a07b0 to d6603ef Compare June 20, 2026 00:20
@clebertsuconic

Copy link
Copy Markdown
Contributor Author

I have executed LockCoordinatorTest (testWithMinikube and testWithFakekube) at least 1K times each without any failures.

I had a few failures that I had to fix and harden the implementation. I tested this quite a lot.

And that includes a new test I added to validate skewed time between servers.

This commit introduces a Kubernetes-based distributed lock implementation using a generic HTTP REST client abstracted from KubernetesLoginModule.

Key changes:
- Extracted reusable Kubernetes HTTP client to artemis-commons
- Implemented KubernetesLockManager using Kubernetes Lease API with time skew prevention
- Time skew prevention uses local observation timestamps (Kubernetes client-go approach) to avoid clock drift issues between nodes
- Implemented KubeMutableLong using Kubernetes ConfigMap for distributed counters
- Moved PemSupport and extracted KeyStoreSupport to artemis-commons to avoid circular dependencies
- Added AbstractDistributedLockManager base class with parameter validation
- Added tests using LockCoordinatorTest against real Kubernetes (via Minikube) and FakeMinikube MockServer
- Added time skew test extensions (KubeLockTimeSkew, KubeLockManagerTimeSkew) for testing clock drift scenarios
- Added testTimeSkewAffectsLockAcquisition test to verify lock behavior with simulated clock skew
- Added user manual documentation with RBAC configuration examples
- Added smoke test configurations for Kubernetes-based dual-mirror setup

Time skew prevention details:
- Locks use local observation cache to track when lease state was last seen
- Expiration is determined by elapsed local time since observation, not by comparing remote timestamps
- This prevents premature lock acquisition when nodes have clock differences
- Protected currentTime() method allows test subclasses to simulate clock skew

AssistedBy: Claude
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants