-
Notifications
You must be signed in to change notification settings - Fork 14
CLOUDP-338084 - removing and refactoring agent matrix from pipeline.py and atomic_pipeline.py #346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
MCK 1.3.0 Release NotesBug Fixes
Other Changes
|
- name: build_readiness_probe_image | ||
variant: init_test_run | ||
- name: build_upgrade_hook_image | ||
variant: init_test_run | ||
- name: build_mco_test_image | ||
variant: init_test_run | ||
- name: build_agent_images_ubi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we still use all run this on every patch, the script just checks whether its required and potentially skips it then if there are no changes.
Why still run it?
On CM and OM bump prs we still need the agent in ecr first. This ensures we build it to ecr first
@@ -251,6 +252,11 @@ def main(): | |||
type=int, | |||
help="Number of agent builds to run in parallel, defaults to number of cores", | |||
) | |||
parser.add_argument( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defaults to false
…ubernetes into remove-static-pipeline
.evergreen-functions.yml
Outdated
@@ -540,7 +540,7 @@ functions: | |||
shell: bash | |||
<<: *e2e_include_expansions_in_env | |||
working_dir: src/github.com/mongodb/mongodb-kubernetes | |||
binary: scripts/dev/run_python.sh scripts/release/pipeline_main.py --parallel ${image_name} | |||
binary: scripts/dev/run_python.sh scripts/release/pipeline_main.py --parallel ${image_name} ${all_agents} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all_agents expansion is empty, but in the manual release agents on ecr variant it will be set to --all-agents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be in scripts/release/tests otherwise it is not run in CI. @mircea-cosbuc knows more about it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should move scripts/release/tests
to scripts/tests/release
?
so this test would go to scripts/tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# We only need [latest agents (for each OM major version and for CM) x patch ID] for patches | ||
else: | ||
agent_versions_to_build = gather_latest_agent_versions(release_json, build_configuration.agent_to_build) | ||
def build_agent(build_configuration: BuildConfiguration): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we still using the legacy pipeline anywhere for agents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe - right now i want things to be consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this still needs answering. Since you're changing the image build for the agent there should be a decision on whether moving to atomic pipeline is happening. The title and the description of the PR imply that we're getting rid of pipeline.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes i was unclear. What I meant is - we should move to atomic_pipeline to make sure we are not changing 2 files at the same time (for the release as well as for patches).
Due to this reasons I've made pipeline.py and atomic_pipeline.py agent handling the same to not have that edge-case that we overlook something and still call pipeline.py
we plan to migrate to atomic for releases: #344
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but that PR does not touch agent images at all.
If we are in fact moving away from agent image building in legacy pipeline code, we should not make changes to it or remove redundant code completely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem now is that we have duplicated code in pipeline and atomic.
one is for releasing and one for patches but for agent releases both are the same duplicated code. I think its better to just use atomic with scenarion release 937e953. If you strongly disagree i can move back to pipeline and have them duplicated
# We only need [latest agents (for each OM major version and for CM) x patch ID] for patches | ||
else: | ||
agent_versions_to_build = gather_latest_agent_versions(release_json, build_configuration.agent_to_build) | ||
def build_agent(build_configuration: BuildConfiguration): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this still needs answering. Since you're changing the image build for the agent there should be a decision on whether moving to atomic pipeline is happening. The title and the description of the PR imply that we're getting rid of pipeline.py
.
@@ -517,30 +517,36 @@ functions: | |||
# docker buildx needs the moby/buildkit image when setting up a builder so we pull it from our mirror | |||
docker buildx create --driver=docker-container --driver-opt=image=268558157000.dkr.ecr.eu-west-1.amazonaws.com/docker-hub-mirrors/moby/buildkit:buildx-stable-1 --use | |||
docker buildx inspect --bootstrap | |||
- command: ec2.assume_role |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is cherry-picked from https://github.com/mongodb/mongodb-kubernetes/pull/344/files#diff-ad8722e626fc7bc08be6765b8268550446b1fb934c1a7eb6a5766d6446f92ad1 and i need to use this for the agent. We can merge either and I can fix the merge conflict - but this ensures we will get the correct merge either way
binary: scripts/dev/run_python.sh scripts/release/pipeline_main.py --parallel ${image_name} | ||
env: | ||
git_tag: ${triggered_by_git_tag} | ||
binary: scripts/dev/run_python.sh scripts/release/pipeline_main.py ${image_name} --build-scenario release ${git_tag|--version ${git_tag}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my change here is to make version optional, but this is hacky and we should make it properly for om and the agent in a next step https://jira.mongodb.org/browse/CLOUDP-338152
|
||
current_release = load_current_release_json() | ||
if not current_release: | ||
print("ERROR: Could not load current local release.json") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we use a logger in this file?
["git", "show", f"{commit}:{file_path}"], capture_output=True, text=True, check=True, timeout=30 | ||
) | ||
return result.stdout | ||
except (subprocess.CalledProcessError, subprocess.TimeoutExpired): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to properly log these exceptions?
LGTM! |
Summary
Why we do this
release.json
. Instead we should only release the agent if PCT adds a new agent. That happens during OM and CM bumps, the new detection script should handle this and release the imagesWhat changes
local
vsorigin/master
forrelease.json
and uses that as a base to do the releaseEvergreen configuration cleanup and simplification:
release_agent_operator_release
,release_agents_on_ecr_conditional
, andinit_release_agents_on_ecr
, to streamline the release process.Pipeline logic refactoring:
build_agent_default_case
function inpipeline.py
to use the newdetect_ops_manager_changes
function for determining which agent versions to build, and eliminated the separatebuild_agent_on_agent_bump
logic.pipeline.py
to matchatomic_pipeline.py
"agent"
and"agent-pct"
use the unifiedbuild_agent_default_case
function.Proof of Work
Example Cases
A new OM/CM bump workflow
Checklist
skip-changelog
label if not needed