-
Notifications
You must be signed in to change notification settings - Fork 14
CLOUDP-337356 - static support #333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
nammn
wants to merge
33
commits into
multi-arch-pipeline-combined
Choose a base branch
from
static-support
base: multi-arch-pipeline-combined
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Summary **Add Not-Ready Handling for Ongoing Auth Transitions**: This patch refines our readiness logic to correctly reflect the state of authentication transitions. Previously, we treated LastGoalVersionAchieved == GoalVersion as a signal that the cluster was "Running", but this assumption breaks down when auth transitions are still in progress. This happened because we returned "ready" during a wait step (WaitAuthCanUpdate) — and [we generally return ready for all wait steps](https://github.com/mongodb/mongodb-kubernetes/blob/f0050b8942545701e8cb9e42d54d14f0cb58ee6a/mongodb-community-operator/cmd/readiness/main.go#L139), regardless of whether auth is fully transitioned. Example status: ``` { "step": "WaitAuthUpdate", "stepDoc": "Wait to update Auth", "isWaitStep": true, "started": "2025-08-07T14:59:40.213178437Z", "attempts": 512, "latestAttempt": "2025-08-07T15:09:20.966699961Z", "completed": null, "result": "wait" } ``` **Why implemented in the operator and not readinessProbe**: I didn't fix the readinessProbe but rather the operator * if the readinessProbe blocks new nodes are not coming up * we want new nodes coming up * but we also want to block new configurations being applied, which the automation_status check in the operator does **The core idea:** * Configuration applied ≠ transition fully complete. **What happened in our tests**: * we update auth via CR x509 -> scram * `node-0` completed its auth transition (now uses scram, instead of x509) * `Config server` hasn't finished its auth transition yet * We hit a race condition where clusters were marked as "Running" too early and thus continued the rolling restart of `nod e-0` * `node-0` restarted with the old X509 config (see below comment from the agent code) * The X509 process couldn’t access the SCRAM automation user * Leads to Error: "process...doesn't have the automation user" - in the mms-automation there is also a comment; that indicates thats they are handling the edge-case if an auth transition was not successful, they start the process with old auth to "finish" it. But this is exactly what causes our race condition ``` // If a process went down unexpectedly in the middle of an auth transition, // we want to restart it with the old auth args. // Otherwise, it could be upgraded to the new auth state too soon, // and not be able to communicate with other shard members. ``` tl;dr: first `node-0` moved to new auth, `config` not yet, `node-0` restarted and during the restart `config` transitioned to the new auth while `node-0` is again running old auth ## Proof of Work - auth change tests are passing multiple times in a row: [Link](http://spruce.mongodb.com/version/6894b98218a2e90007437e99/tasks?sorts=STATUS%3AASC%3BBASE_STATUS%3ADESC) - the most flaky auth tests + [Link2](https://spruce.mongodb.com/task/mongodb_kubernetes_e2e_static_mdb_kind_ubi_cloudqa_e2e_sharded_cluster_x509_to_scram_transition_patch_b29fb4ace63eec7102f8f034fd6c553b5d75c1a1_6894c0785c119f0007a58f3c_25_08_07_15_04_26/logs?execution=0) - from the patch ## Checklist - [ ] Have you linked a jira ticket and/or is the ticket in the title? - [x] Have you checked whether your jira ticket required DOCSP changes? - [x] Have you added changelog file? - use `skip-changelog` label if not needed - refer to [Changelog files and Release Notes](https://github.com/mongodb/mongodb-kubernetes/blob/master/CONTRIBUTING.md#changelog-files-and-release-notes) section in CONTRIBUTING.md for more details
…341) # Summary This patch adds a separate task in the `init_test_run` variant for building the operator image with the race-checker enabled. This patch also moved the tests under `scripts/release` according to [standard pytest best practices](https://doc.pytest.org/en/latest/explanation/goodpractices.html#tests-as-part-of-application-code) as currently our python test targets were not capturing all the tests. ## Proof of Work [EVG Job](https://spruce.mongodb.com/version/689b8a097e5e41000778b82e) ## Checklist - [x] Have you linked a jira ticket and/or is the ticket in the title? - [x] Have you checked whether your jira ticket required DOCSP changes? - [x] Have you added changelog file? - use `skip-changelog` label if not needed - refer to [Changelog files and Release Notes](https://github.com/mongodb/mongodb-kubernetes/blob/master/CONTRIBUTING.md#changelog-files-and-release-notes) section in CONTRIBUTING.md for more details
# Summary - when refactoring and removing daily builds we also removed parts of teardown by accident ## Proof of Work - periodic getting triggered: https://spruce.mongodb.com/version/689dc0de0e8148000791e683/tasks?sorts=STATUS%3AASC%3BBASE_STATUS%3ADESC ## Checklist - [ ] Have you linked a jira ticket and/or is the ticket in the title? - [ ] Have you checked whether your jira ticket required DOCSP changes? - [ ] Have you added changelog file? - use `skip-changelog` label if not needed - refer to [Changelog files and Release Notes](https://github.com/mongodb/mongodb-kubernetes/blob/master/CONTRIBUTING.md#changelog-files-and-release-notes) section in CONTRIBUTING.md for more details
…ubernetes into remove-static-pipeline
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Proof of Work
Checklist
skip-changelog
label if not needed