Skip to content

Commit aa388b8

Browse files
committed
KEP-4368: Job Managed By; Promote to GA
Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com>
1 parent 253a46c commit aa388b8

File tree

3 files changed

+99
-31
lines changed

3 files changed

+99
-31
lines changed

keps/prod-readiness/sig-apps/4368.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,5 @@ alpha:
33
approver: "@wojtek-t"
44
beta:
55
approver: "@wojtek-t"
6+
stable:
7+
approver: "@wojtek-t"

keps/sig-apps/4368-support-managed-by-for-batch-jobs/README.md

Lines changed: 93 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@
3434
- [Alpha](#alpha)
3535
- [Beta](#beta)
3636
- [GA](#ga)
37+
- [Ecosystem Assessment](#ecosystem-assessment)
38+
- [Follow-up Work](#follow-up-work)
3739
- [Deprecation](#deprecation)
3840
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
3941
- [Upgrade](#upgrade)
@@ -92,11 +94,11 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
9294
- [x] (R) KEP approvers have approved the KEP status as `implementable`
9395
- [x] (R) Design details are appropriately documented
9496
- [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
95-
- [ ] e2e Tests for all Beta API Operations (endpoints)
96-
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
97-
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
97+
- [x] e2e Tests for all Beta API Operations (endpoints)
98+
- [x] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
99+
- [x] (R) Minimum Two Week Window for GA e2e tests to prove flake free
98100
- [x] (R) Graduation criteria is in place
99-
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
101+
- [x] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
100102
- [x] (R) Production readiness review completed
101103
- [x] (R) Production readiness review approved
102104
- [x] "Implementation History" section is up-to-date for milestone
@@ -140,7 +142,7 @@ controller, and delegate the status synchronization to the Kueue controller.
140142
### Non-Goals
141143

142144
- passing custom parameters to the external controller
143-
- Introduce a new concurrency policy for CronJobs (eg. `ForbidActive` or `SoftForbid`)
145+
- Introduce a new concurrency policy for CronJobs (e.g. `ForbidActive` or `SoftForbid`)
144146
to replace a Job that is about to complete, but still has terminating pods.
145147

146148
## Proposal
@@ -278,7 +280,7 @@ a typo in the field value, or the cluster administrator may not install the
278280
custom controller (like MultiKueue) on that cluster.
279281

280282
In such cases the user may not observe any progress by the job for a long time
281-
and my need to debug the Job.
283+
and may need to debug the Job.
282284

283285
In order to allow for debugging of situations like this the Job controller will
284286
put a log line indicating the synchronization is delegated to another controller
@@ -293,9 +295,15 @@ providing a user readable information if the Job is synchronized by a custom
293295
controller.
294296

295297
Alternative ideas considered were
296-
[a dedicated condition](#condition-to-indicated-job-is-skipped)
298+
[a dedicated condition](#condition-to-indicate-job-is-skipped)
297299
and [events](#event-indicating-the-job-is-skipped).
298300

301+
Based on beta feedback, kubectl extensions are deferred as follow-up enhancements.
302+
The "managedBy" field is already visible via standard kubectl commands
303+
(`kubectl get job <name> -o yaml`), and MultiKueue users have not reported
304+
difficulties debugging Jobs with this field. We will reconsider these enhancements
305+
based on user feedback post-GA.
306+
299307
#### Custom controllers not compatible with API assumptions by CronJob
300308

301309
Currently, the validation of the Job status API is rather relaxed, allowing for
@@ -352,9 +360,6 @@ type JobSpec struct {
352360
// all characters before the first "/" must be a valid subdomain as defined
353361
// by RFC 1123. All characters trailing the first "/" must be valid HTTP Path
354362
// characters as defined by RFC 3986. The value cannot exceed 64 characters.
355-
//
356-
// This field is alpha-level. The job controller accepts setting the field
357-
// when the feature gate JobManagedBy is enabled (disabled by default).
358363
// +optional
359364
ManagedBy *string
360365
}
@@ -373,6 +378,11 @@ We will re-evaluate for [GA](#ga) to also skip the reconciliation within the
373378
`enqueueSyncJobInternal` for optimal performance. See discussion in the
374379
[Skip reconciliation in the event handler](#skip-reconciliation-in-the-event-handler).
375380

381+
Skipping reconciliation in event handlers is deferred as a performance
382+
optimization for post-GA. This is considered a premature optimization
383+
without evidence of performance issues from MultiKueue users on the
384+
management cluster.
385+
376386
There is no validation for a value of the field beyond its format as described
377387
in the [API](#API) comment above.
378388

@@ -493,9 +503,9 @@ The following scenarios are covered:
493503
- it does not add the Suspended condition for a Job with `.spec.suspend=true` ([link](https://github.com/kubernetes/kubernetes/blob/856475e5fffe3d99c71606d6024f5ed93e37eebc/test/integration/job/job_test.go#L2059)).
494504
- the Job controller reconciles jobs with custom "managedBy" field when the feature gate is disabled ([link](https://github.com/kubernetes/kubernetes/blob/856475e5fffe3d99c71606d6024f5ed93e37eebc/test/integration/job/job_test.go#L2030))
495505
- the Job controller handles correctly re-enablement of the feature gate [link](https://github.com/kubernetes/kubernetes/blob/169a952720ebd75fcbcb4f3f5cc64e82fdd3ec45/test/integration/job/job_test.go#L1691)
496-
- the `job_by_external_controller_total` metric is incremented when a new Job with custom "managedBy" is created ([link](https://github.com/kubernetes/kubernetes/blob/856475e5fffe3d99c71606d6024f5ed93e37eebc/test/integration/job/job_test.go#L2044-L2058))
497-
- the `job_by_external_controller_total` metric is not incremented for a new Job without "managedBy" or with default value ([link](https://github.com/kubernetes/kubernetes/blob/856475e5fffe3d99c71606d6024f5ed93e37eebc/test/integration/job/job_test.go#L2000-L2029))
498-
- the `job_by_external_controller_total` metric is not incremented for Job updates (regardless of the "managedBy") (tested indirectly as [here](https://github.com/kubernetes/kubernetes/blob/856475e5fffe3d99c71606d6024f5ed93e37eebc/test/integration/job/job_test.go#L2000-L2029) the Job controller updates the Job status)
506+
- the `jobs_by_external_controller_total` metric is incremented when a new Job with custom "managedBy" is created ([link](https://github.com/kubernetes/kubernetes/blob/856475e5fffe3d99c71606d6024f5ed93e37eebc/test/integration/job/job_test.go#L2044-L2058))
507+
- the `jobs_by_external_controller_total` metric is not incremented for a new Job without "managedBy" or with default value ([link](https://github.com/kubernetes/kubernetes/blob/856475e5fffe3d99c71606d6024f5ed93e37eebc/test/integration/job/job_test.go#L2000-L2029))
508+
- the `jobs_by_external_controller_total` metric is not incremented for Job updates (regardless of the "managedBy") (tested indirectly as [here](https://github.com/kubernetes/kubernetes/blob/856475e5fffe3d99c71606d6024f5ed93e37eebc/test/integration/job/job_test.go#L2000-L2029) the Job controller updates the Job status)
499509

500510
The following scenarios related to [Terminating pods and terminal Job conditions](#terminating-pods-and-terminal-job-conditions) are covered:
501511
- `Failed` or `Complete` conditions are not added while there are still terminating pods ([link](https://github.com/kubernetes/kubernetes/blob/856475e5fffe3d99c71606d6024f5ed93e37eebc/test/integration/job/job_test.go#L1183))
@@ -513,6 +523,8 @@ We propose a single e2e test for the following scenario:
513523
- the Job controller does not reconcile a job with any other value of the "managedBy" field. In particular,
514524
it does not reset the status for an unsuspended Job.
515525

526+
[Job should allow to delegate reconciliation to external controller](https://github.com/kubernetes/kubernetes/blob/b393d87d16f225f873f72a79734b3409323b4a05/test/e2e/apps/job.go#L1310-L1333): [SIG Apps](https://testgrid.k8s.io/sig-apps#gce&include-filter-by-regex=external%20controller&include-filter-by-regex=Job%20should%20allow%20to%20delegate%20reconciliation%20to%20external%20controller), [triage search](https://storage.googleapis.com/k8s-triage/index.html?sig=apps&test=Job%20should%20allow%20to%20delegate%20reconciliation%20to%20external%20controller)
527+
516528
### Graduation Criteria
517529

518530
#### Alpha
@@ -523,7 +535,7 @@ We propose a single e2e test for the following scenario:
523535
API fields affected by the new validation rules
524536
- make CronJob more resilient by checking the Job condition is `Complete` when using `CompletionTime` (see [here](#custom-controllers-not-compatible-with-api-assumptions-by-cronjob))
525537
- The feature flag disabled by default
526-
- implement the `job_by_external_controller_total` metric
538+
- implement the `jobs_by_external_controller_total` metric
527539

528540
Second Alpha (1.31):
529541
- preparatory fix to address all known inconsistencies between validation and the
@@ -546,10 +558,60 @@ Second Alpha (1.31):
546558
#### GA
547559

548560
- Address reviews and bug reports from Beta users
549-
- Re-evaluate the ideas of improving debuggability (like [extended `kubectl`](#debuggability), [dedicated condition](#condition-to-indicated-job-is-skipped), or [events](#event-indicating-the-job-is-skipped))
561+
- So far, with the adoption in MultiKueue we have not got users requesting extra debuggability. With the "managedBy" field being immutable it seems the feature is not causing issues requiring extensive debug. The metric and immutable field should be sufficient as a source of truth.
562+
563+
- Re-evaluate the ideas of improving debuggability (like [extended `kubectl`](#debuggability), [dedicated condition](#condition-to-indicate-job-is-skipped), or [events](#event-indicating-the-job-is-skipped))
564+
- Keep these ideas as follow-up features and not blockers for the graduation.
565+
550566
- Re-evaluate the need to skip reconciliation in the event handlers to optimize performance
551-
- Assess the fragmentation of the ecosystem. Look for other implementations of a job controller and asses their conformance with k8s.
567+
- This could be a follow-up performance improvement, but does not seem required. We have no record of MultiKueue users hitting performance issues on the management cluster due to Job controller.
568+
569+
- Assess the fragmentation of the ecosystem. Look for other implementations of a job controller and assess their conformance with k8s.
570+
- See [Ecosystem Assessment](#ecosystem-assessment) section below.
571+
552572
- Lock the feature gate
573+
- This will be done in follow-up implementation PR for 1.35.
574+
575+
##### Ecosystem Assessment
576+
577+
We already have a couple of implementations of the "managedBy" field in order to integrate with MultiKueue:
578+
- JobSet
579+
- Kubeflow Trainer v1
580+
- Kubeflow Trainer v2
581+
- KubeRay
582+
- Tekton Pipelines
583+
- AppWrapper
584+
585+
These implementations follow the k8s core design, particularly making the field immutable.
586+
587+
**Conformance Analysis:**
588+
589+
| Aspect | k8s Job | JobSet | Kubeflow Trainer v1 | Kubeflow Trainer v2 | KubeRay | Tekton Pipelines | AppWrapper |
590+
|----------------------------|---------|--------|----------------------|----------------------|-------------------|--------------------|------------|
591+
| Open set of allowed values | y | y | n | n | n | y | y |
592+
| Max length 63 chars | y | y | y (by closed set) | y (by closed set) | y (by closed set) | n | n |
593+
| Specific format validation | y | y | y (by closed set) | y (by closed set) | y (by closed set) | n | n |
594+
| Immutability | y | y | y | y | y | y | y |
595+
| Status validation | y | n | n | n | n | n | n |
596+
| Skip inside Reconcile | y | y | y | y | y | y (also filtering) | y |
597+
598+
**Key Findings:**
599+
- There is a bit inconsistency about the "open set of values" vs "closed set of values"
600+
- Kubeflow Trainer v1 puts the field under `spec.runPolicy.managedBy`. This is because the code for different controllers is deduplicated at the level of the runPolicy field
601+
- Kubeflow Trainer locks the set to only "known controllers" like MultiKueue: `kueue.x-k8s.io/multikueue`. This was done to be "on the safe side", and avoid complex status validations as in the core
602+
- The main part is that all of them follow the key aspects: **immutability** and **skip inside Reconcile**
603+
- All projects consistently wanted to avoid the effort of validating the status which we took in the k8s Job
604+
605+
**Validation inconsistencies:**
606+
- Some controllers (Tekton, AppWrapper) don't enforce the 63-character length limit. If they allow more than 63 characters, then nothing will break for them. The idea of restricting in the core k8s was to make it more constrained
607+
- Most controllers skip the status validation that k8s Job implements to avoid the extra work, since "managedBy" is immutable
608+
609+
##### Follow-up Work
610+
611+
We may reconsider dropping the code gracefully handling Jobs with invalid statuses after 1.39 behind a dedicated feature gate. All Jobs created in 1.32+ won't require the graceful handling as the rules prevent entering invalid state. We will wait at least 4 releases before locking such a feature gate.
612+
613+
**Metric Stability:**
614+
The `jobs_by_external_controller_total` metric will be promoted to STABLE in 1.35.
553615

554616
#### Deprecation
555617

@@ -569,7 +631,7 @@ A downgrade to a version which does not support this feature (1.29 and below)
569631
does not require any additional configuration changes. All jobs, including these
570632
that specified a custom value for "managedBy", will be handled in the default
571633
way by the Job controller. However, this introduces the risk of
572-
[two controllers running at the same time](#two-controllers-running-at-the-same-time-on-old-version).
634+
[two controllers running at the same time](#two-controllers-running-when-feature-is-disabled).
573635

574636
In order to prepare the risk the admins may want to make sure the custom controllers
575637
using the "managedBy" field are disabled before the downgrade.
@@ -598,7 +660,7 @@ is not supported on the old version.
598660
In case the version of the kube-controller-manager leader is skewed (old), the
599661
built-in Job controller would reconcile the Jobs with custom "managedBy" field,
600662
running into the risk of
601-
[two controllers running at the same time](#two-controllers-running-at-the-same-time-on-old-version).
663+
[two controllers running at the same time](#two-controllers-running-when-feature-is-disabled).
602664
It is recommended the users don't create jobs with custom "managedBy" field
603665
during an ongoing upgrade.
604666

@@ -682,7 +744,7 @@ Yes.
682744

683745
However, when the feature is disabled and there are Jobs external controllers by
684746
using "managedBy" field there is a risk of
685-
[two controller running at the same time](#two-controllers-running-at-the-same-time-on-old-version).
747+
[two controller running at the same time](#two-controllers-running-when-feature-is-disabled).
686748
Thus, it is recommended administrators make sure there are no Jobs using external
687749
controllers before rollback.
688750

@@ -752,7 +814,7 @@ being flipped between two owners.
752814
The feature is opt-in so in case of such problems the custom "managedBy" field
753815
should not be used.
754816

755-
Also, an admin could check if the value of the `job_by_external_controller_total`
817+
Also, an admin could check if the value of the `jobs_by_external_controller_total`
756818
matches the expectations. For example, if the value of the metric does not increase
757819
when new jobs are being added with a custom "managedBy" field, it might be
758820
indicative that the feature is not working correctly.
@@ -831,7 +893,7 @@ checking if there are objects with field X set) may be a last resort. Avoid
831893
logs or events for this purpose.
832894
-->
833895

834-
Check the `job_by_external_controller_total` metric. If the value is non-zero
896+
Check the `jobs_by_external_controller_total` metric. If the value is non-zero
835897
for a field, it means there were Jobs using the custom controller created, so
836898
the feature is in use.
837899

@@ -885,22 +947,24 @@ Pick one more of these and delete the rest.
885947

886948
- [x] Metrics
887949
- Metric name:
888-
- `job_by_external_controller_total` (new), with the `controller_name` label
950+
- `jobs_by_external_controller_total` (new), with the `controller_name` label
889951
corresponding to the custom value of the "managedBy" field. The metric is
890952
incremented by the built-in Job controller on each ADDED Job event,
891953
corresponding to a Job with custom value of the "managedBy" field.
892954
This metric can be helpful to determine the health of a job and its controller
893955
in combination with already existing metrics (see below).
956+
- Components exposing the metric: kube-controller-manager
894957
- `apiserver_request_total[code=409, resource=job, group=batch]` (existing):
895-
substantial increase of this metric, when additionally `job_by_external_controller_total>0`
958+
substantial increase of this metric, when additionally `jobs_by_external_controller_total>0`
896959
may be indicative of two controllers stepping onto each-other causing
897-
conflicts (see [here](#two-controllers-running-at-the-same-time-on-old-version)).
960+
conflicts (see [here](#two-controllers-running-when-feature-is-disabled)).
961+
- Components exposing the metric: kube-apiserver
898962
- `kube_cronjob_status_active` (existing), substantial increase of this
899963
metric, may suggest that there are accumulating non-progressing jobs controlled
900-
by `CronJob`. If additionally `job_by_external_controller_total>0` it may suggest
964+
by `CronJob`. If additionally `jobs_by_external_controller_total>0` it may suggest
901965
that the Jobs are getting stuck due to not being synchronized by the custom
902966
controller.
903-
- Components exposing the metric: kube-apiserver
967+
- Components exposing the metric: kube-apiserver
904968

905969
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
906970

@@ -1080,7 +1144,7 @@ N/A.
10801144

10811145
## Implementation History
10821146

1083-
- 2023-12.20 - First version of the KEP
1147+
- 2023-12-20 - First version of the KEP
10841148
- 2024-03-05 - Merged implementation PR [Support for the Job managedBy field (alpha)](https://github.com/kubernetes/kubernetes/pull/123273)
10851149
- 2024-03-07 - Merged [Update Job conformance test for job status updates](https://github.com/kubernetes/kubernetes/pull/123751)
10861150
- 2024-03-08 - Merged [Follow up fix to the job status update test](https://github.com/kubernetes/kubernetes/pull/123815)
@@ -1090,6 +1154,7 @@ N/A.
10901154
- 2024-06-21 - Merged [Update the count of ready pods when deleting pods](https://github.com/kubernetes/kubernetes/pull/125546)
10911155
- 2024-07-12 - Merged [Delay setting terminal Job conditions until all pods are terminal](https://github.com/kubernetes/kubernetes/pull/125510)
10921156
- 2024-07-30 - Merged [Update the docs for JobManagedBy and JobPodReplacementPolicy related to pod termination](https://github.com/kubernetes/website/pull/46808)
1157+
- 2024-10-17 - Merged [Graduate JobManagedBy to Beta in 1.32](https://github.com/kubernetes/kubernetes/pull/127402)
10931158

10941159
<!--
10951160
Major milestones in the lifecycle of a KEP should be tracked in this section.
@@ -1369,7 +1434,7 @@ It makes them not that useful to debug situations when the Job didn't make
13691434
progress for long time. So, they would not give a reliable signal for debugging
13701435
based on playbooks.
13711436

1372-
Renewing the even on every Job update seems excessive from the performance
1437+
Renewing the event on every Job update seems excessive from the performance
13731438
perspective.
13741439

13751440
## Infrastructure Needed (Optional)

0 commit comments

Comments
 (0)