Skip to content

Commit

Permalink
Update "KEP-4368 Job API managed-by mechanism" targeting Beta in 1.32 (
Browse files Browse the repository at this point in the history
…#4856)

* KEP-4368 Job managed-by field update for Beta

* Review remarks and other updates

* Update keps/sig-apps/4368-support-managed-by-for-batch-jobs/README.md

Co-authored-by: Andrey Velichkevich <[email protected]>

---------

Co-authored-by: Andrey Velichkevich <[email protected]>
  • Loading branch information
mimowo and andreyvelich authored Sep 24, 2024
1 parent 2c50c98 commit e6475c7
Show file tree
Hide file tree
Showing 3 changed files with 66 additions and 33 deletions.
2 changes: 2 additions & 0 deletions keps/prod-readiness/sig-apps/4368.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
kep-number: 4368
alpha:
approver: "@wojtek-t"
beta:
approver: "@wojtek-t"
94 changes: 62 additions & 32 deletions keps/sig-apps/4368-support-managed-by-for-batch-jobs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@
- [Implementation History](#implementation-history)
- [Drawbacks](#drawbacks)
- [Alternatives](#alternatives)
- [Skip reconciliation in the event handler](#skip-reconciliation-in-the-event-handler)
- [Reserved controller name value](#reserved-controller-name-value)
- [Defaulting of the for newly created jobs](#defaulting-of-the-for-newly-created-jobs)
- [Alternative names for field](#alternative-names-for-field)
Expand Down Expand Up @@ -99,8 +100,8 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
- [x] (R) Production readiness review completed
- [x] (R) Production readiness review approved
- [x] "Implementation History" section is up-to-date for milestone
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
- [x] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
- [x] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

<!--
**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.
Expand Down Expand Up @@ -207,8 +208,15 @@ a blocker.

It would also complicate debuggability of the feature.

We decide to keep the field immutable, at least for [Alpha](#alpha), we will
re-evaluate the decision for [Beta](#beta).
Also, we already observe the adoption of the mechanism in other batch projects,
such as:
- [JobSet](https://github.com/kubernetes-sigs/jobset/blob/665bc42e0a33a0ebdf7fc09b2b6ae5d88eb7d33c/api/jobset/v1alpha2/jobset_types.go#L121-L133)
- [Kubeflow Training Operator](https://github.com/kubeflow/training-operator/blob/da11d1116c29322c481d0b8f174df8d6f05004aa/pkg/apis/kubeflow.org/v1/common_types.go#L238-L239).

These projects for now follow the decision taken in the core k8s to make the
field immutable to avoid complication of the support for mutability.

All together, we decide to keep the field immutable.

#### Use for MultiKueue

Expand Down Expand Up @@ -358,17 +366,15 @@ We skip synchronization of the Jobs with the "managedBy" field, if it has any
different value than `kubernetes.io/job-controller`. When the synchronization is skipped,
the name of the controller managing the Job object is logged.

We leave the particular place at which the synchronization is skipped as
implementation detail which can be determined during the implementation phase,
however, two candidate places are:
1. inside `syncJob` function
2. inside `enqueueSyncJobInternal` function
We skip the reconciliation inside the `syncJob` function
(see [here](https://github.com/kubernetes/kubernetes/blob/15d08bf7c8813b0533dc147a03d9f42aae735ecd/pkg/controller/job/job_controller.go#L819-L822)).

Note that, if we skip inside `enqueueSyncJobInternal` we may save on some memory
needed to needlessly enqueue the Job keys.
We will re-evaluate for [GA](#ga) to also skip the reconciliation within the
`enqueueSyncJobInternal` for optimal performance. See discussion in the
[Skip reconciliation in the event handler](#skip-reconciliation-in-the-event-handler).

There is no validation for the values of the field beyond that of standard
permitted field values.
There is no validation for a value of the field beyond its format as described
in the [API](#API) comment above.

#### Job status validation

Expand All @@ -393,7 +399,7 @@ For that we plan to follow the approach described [below](#terminating-pods-and-
which extend the scope of the interim `FailureTarget` and `SuccessCriteriaMet`
conditions. We will also validate that the transition to `Failed` or `Complete`
condition is preceded by adding the `FailureTarget` or `SuccessCriteriaMet`
condition, respecively.
condition, respectively.

Additionally, we are going to introduce a validation rule that the count of
ready `status.ready` pods is lower or equal than the number of active `status.active`
Expand Down Expand Up @@ -480,21 +486,21 @@ The following scenarios related to [Terminating pods and terminal Job conditions
##### Integration tests

The following scenarios are covered:
- the Job controller reconciles jobs with the "managedBy" field equal to `kubernetes.io/job-controller`
- the Job controller reconciles jobs without the "managedBy" field
- the Job controller reconciles jobs with the "managedBy" field equal to `kubernetes.io/job-controller` ([link](https://github.com/kubernetes/kubernetes/blob/856475e5fffe3d99c71606d6024f5ed93e37eebc/test/integration/job/job_test.go#L2016))
- the Job controller reconciles jobs without the "managedBy" field ([link](https://github.com/kubernetes/kubernetes/blob/856475e5fffe3d99c71606d6024f5ed93e37eebc/test/integration/job/job_test.go#L2000))
- the Job controller does not reconcile a job with any other value of the "managedBy" field. In particular:
- it does not reset the status for a Job with `.spec.suspend=false`,
- it does not add the Suspended condition for a Job with `.spec.suspend=true`.
- the Job controller reconciles jobs with custom "managedBy" field when the feature gate is disabled
- it does not reset the status for a Job with `.spec.suspend=false` ([link](https://github.com/kubernetes/kubernetes/blob/856475e5fffe3d99c71606d6024f5ed93e37eebc/test/integration/job/job_test.go#L2044)),
- it does not add the Suspended condition for a Job with `.spec.suspend=true` ([link](https://github.com/kubernetes/kubernetes/blob/856475e5fffe3d99c71606d6024f5ed93e37eebc/test/integration/job/job_test.go#L2059)).
- the Job controller reconciles jobs with custom "managedBy" field when the feature gate is disabled ([link](https://github.com/kubernetes/kubernetes/blob/856475e5fffe3d99c71606d6024f5ed93e37eebc/test/integration/job/job_test.go#L2030))
- the Job controller handles correctly re-enablement of the feature gate [link](https://github.com/kubernetes/kubernetes/blob/169a952720ebd75fcbcb4f3f5cc64e82fdd3ec45/test/integration/job/job_test.go#L1691)
- the `job_by_external_controller_total` metric is incremented when a new Job with custom "managedBy" is created
- the `job_by_external_controller_total` metric is not incremented for a new Job without "managedBy" or with default value
- the `job_by_external_controller_total` metric is not incremented for Job updates (regardless of the "managedBy")
- the `job_by_external_controller_total` metric is incremented when a new Job with custom "managedBy" is created ([link](https://github.com/kubernetes/kubernetes/blob/856475e5fffe3d99c71606d6024f5ed93e37eebc/test/integration/job/job_test.go#L2044-L2058))
- the `job_by_external_controller_total` metric is not incremented for a new Job without "managedBy" or with default value ([link](https://github.com/kubernetes/kubernetes/blob/856475e5fffe3d99c71606d6024f5ed93e37eebc/test/integration/job/job_test.go#L2000-L2029))
- the `job_by_external_controller_total` metric is not incremented for Job updates (regardless of the "managedBy") (tested indirectly as [here](https://github.com/kubernetes/kubernetes/blob/856475e5fffe3d99c71606d6024f5ed93e37eebc/test/integration/job/job_test.go#L2000-L2029) the Job controller updates the Job status)

The following scenarios related to [Terminating pods and terminal Job conditions](#terminating-pods-and-terminal-job-conditions) are covered:
- `Failed` or `Complete` conditions are not added while there are still terminating pods
- `FailureTarget` is added when backoffLimitCount is exceeded, or activeDeadlineSeconds timeout is exceeded
- `SuccessCriteriaMet` is added when the `completions` are satisfied
- `Failed` or `Complete` conditions are not added while there are still terminating pods ([link](https://github.com/kubernetes/kubernetes/blob/856475e5fffe3d99c71606d6024f5ed93e37eebc/test/integration/job/job_test.go#L1183))
- `FailureTarget` is added when backoffLimitCount is exceeded, or activeDeadlineSeconds timeout is exceeded ([link](https://github.com/kubernetes/kubernetes/blob/master/test/integration/job/job_test.go#L1253))
- `SuccessCriteriaMet` is added when the `completions` are satisfied ([link](https://github.com/kubernetes/kubernetes/blob/856475e5fffe3d99c71606d6024f5ed93e37eebc/test/integration/job/job_test.go#L1355))

During the implementation more scenarios might be covered.

Expand Down Expand Up @@ -541,8 +547,8 @@ Second Alpha (1.31):

- Address reviews and bug reports from Beta users
- Re-evaluate the ideas of improving debuggability (like [extended `kubectl`](#debuggability), [dedicated condition](#condition-to-indicated-job-is-skipped), or [events](#event-indicating-the-job-is-skipped))
- Re-evaluate the support for mutability of the field
- Asses the fragmentation of the ecosystem. Look for other implementations of a job controller and asses their conformance with k8s.
- Re-evaluate the need to skip reconciliation in the event handlers to optimize performance
- Assess the fragmentation of the ecosystem. Look for other implementations of a job controller and asses their conformance with k8s.
- Lock the feature gate

#### Deprecation
Expand Down Expand Up @@ -775,10 +781,10 @@ Describe manual testing that was done and the outcomes.
Longer term, we may want to require automated upgrade/rollback tests, but we
are missing a bunch of machinery and tooling and can't do that now.
-->
The Upgrade->downgrade->upgrade testing will be done manually prior to release
as Beta, with the following steps:
The Upgrade->downgrade->upgrade was tested manually using the 1.31 release
(Alpha), with the following steps:

1. Start the cluster with the `JobManagedBy` enabled for api server and control-plane.
1. Start the cluster with the `JobManagedBy` enabled for kube-apiserver and kube-controller-manager.

Then, create two-long running Jobs:
- `job-managed` with custom value of the "managedBy" field
Expand All @@ -788,13 +794,13 @@ Then, verify that:
- the `job-managed` does not get status updates from built-in controller. Update the status manually and observe it is not reset by the built-in controller.
- the `job-regular` starts making progress (creates pods and updates the status accordingly by the built-in controller)

2. Simulate downgrade by disabling the feature for api server and control-plane.
2. Simulate downgrade by disabling the feature for kube-apiserver and kube-controller-manager.

Then, verify that:
- the `job-managed` starts to make progress, the status is reset, and updated to some new values
- the `job-regular` continues making progress

3. Simulate upgrade by re-enabling the feature for api server and control-plane.
3. Simulate upgrade by re-enabling the feature for kube-apiserver and kube-controller-manager.

Then, verify that:
- the `job-managed` stops getting status updates from the built-in controller. Update the status manually and observe it is not reset by the built-in controller.
Expand Down Expand Up @@ -1080,6 +1086,10 @@ N/A.
- 2024-03-08 - Merged [Follow up fix to the job status update test](https://github.com/kubernetes/kubernetes/pull/123815)
- 2024-03-11 - Merged [Adjust the Job field API comments and validation to the current state](https://github.com/kubernetes/kubernetes/pull/123792)
- 2024-05-16 - Merged [Fix the comment for the Job managedBy field](https://github.com/kubernetes/kubernetes/pull/124793)
- 2024-06-11 - Merged [Count terminating pods when deleting active pods for failed jobs](https://github.com/kubernetes/kubernetes/pull/1251753)
- 2024-06-21 - Merged [Update the count of ready pods when deleting pods](https://github.com/kubernetes/kubernetes/pull/125546)
- 2024-07-12 - Merged [Delay setting terminal Job conditions until all pods are terminal](https://github.com/kubernetes/kubernetes/pull/125510)
- 2024-07-30 - Merged [Update the docs for JobManagedBy and JobPodReplacementPolicy related to pod termination](https://github.com/kubernetes/website/pull/46808)

<!--
Major milestones in the lifecycle of a KEP should be tracked in this section.
Expand All @@ -1100,6 +1110,26 @@ Why should this KEP _not_ be implemented?

## Alternatives

### Skip reconciliation in the event handler

We discussed to skip the reconciliation only within the
[`enqueueSyncJobInternal`](https://github.com/kubernetes/kubernetes/blob/15d08bf7c8813b0533dc147a03d9f42aae735ecd/pkg/controller/job/job_controller.go#L575).

However, it was noted that it would cause race conditions when the Job with the
same name and namespace is re-created, but with the `managedBy` field. The
race condition was reproduced by the
[TestManagedBy_RecreatedJob](https://github.com/kubernetes/kubernetes/blob/15d08bf7c8813b0533dc147a03d9f42aae735ecd/test/integration/job/job_test.go#L2229)
integration test which demonstrated the issue with such an implementation.

Still, it is a potential improvement to skip the reconciliation inside
`syncJob` and skip queuing within the `enqueueSyncJobInternal` function for
optimal performance (by saving memory and off-loading the reconciliation queue).

**Reasons for discarding/deferring**

Potentially a premature optimization which would complicate the code. We will
prefer to base the introduction of the optimization on users' feedback.

### Reserved controller name value

We could also use just `job-controller` for the reserved value of the field
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ see-also:
- "https://github.com/kubernetes/enhancements/pull/4073" # closed PR

# The target maturity stage in the current dev cycle for this KEP.
stage: alpha
stage: beta

# The most recent milestone for which work toward delivery of this KEP has been
# done. This can be the current (upcoming) milestone, if it is being actively
Expand All @@ -28,6 +28,7 @@ latest-milestone: "v1.31"
# The milestone at which this feature was, or is targeted to be, at each stage.
milestone:
alpha: "v1.30"
beta: "v1.32"

# The following PRR answers are required at alpha release
# List the feature gate name and the components for which it must be enabled
Expand Down

0 comments on commit e6475c7

Please sign in to comment.