-
Notifications
You must be signed in to change notification settings - Fork 1.5k
KEP-5237: watch-based route controller reconciliation using informers #5289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
KEP-5237: watch-based route controller reconciliation using informers #5289
Conversation
lukasmetzner
commented
May 8, 2025
- One-line PR description: Introduce a feature gate to enable informer-based reconciliation in the routes' controller of cloud-controller-manager, reducing API calls and improving efficiency.
- Issue link: CCM: watch-based route controller reconciliation using informers #5237
- Other comments:
Welcome @lukasmetzner! |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: lukasmetzner The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @lukasmetzner. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/cc @elmiko @JoelSpeed |
/ok-to-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the core concepts make sense to me, i think we should clean up the language around the term "cloud provider". in some places we use it to mean the controllers (ie the ccm), and other places we use it to mean the infrastructure provider (eg aws, azure, gcp), and we also refer to the framework as well.
we also need to make some decisions about the open questions. i wonder if we should go over these questions at the next sig meeting?
keps/sig-cloud-provider/5237-watch-based-route-controller-reconciliation/README.md
Outdated
Show resolved
Hide resolved
keps/sig-cloud-provider/5237-watch-based-route-controller-reconciliation/README.md
Outdated
Show resolved
Hide resolved
keps/sig-cloud-provider/5237-watch-based-route-controller-reconciliation/README.md
Outdated
Show resolved
Hide resolved
|
||
#### Story 3 | ||
|
||
As a cluster operator I need to use the API rate limits from my Cloud Provider effectively. Sending frequent API requests even though nothing changed causes me to deplete the API rate limits faster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we aren't going to capitalize "Cloud Provider" in other places, we shouldn't capitalize here.
alternatively, if we want to distinguish the "cloud provider" (eg aws, gcp, etc), then i think we should a less overloaded term, like "infrastructure provider".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine settling for the term infrastructure provider here.
keps/sig-cloud-provider/5237-watch-based-route-controller-reconciliation/README.md
Outdated
Show resolved
Hide resolved
|
||
We currently use the `Node.Status.Addresses` and `PodCIDRs` fields to trigger updates in the route reconciliation mechanism. However, relying solely on these fields may be insufficient, potentially causing missed route reconciliations when updates are necessary. This depends on the specific cloud-provider implementations. Using these fields works for the CCM maintained by the authors, but we do not know the details of other providers. | ||
|
||
This is mitigated by a feature gate, which allows other cloud providers to test it and provide feedback on the fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which "cloud providers" are we talking about here: the cloud-controller-manager, the infrastructure provider, or the maintainers of other ccm projects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both infrastructure providers and maintainers of other ccm projects are welcome to test this feature and provide feedback on the fields, which were choosen to trigger a reconcile.
|
||
Other controllers rely on “Owner” references to make sure that the resource is only deleted when the controller had the chance to run any cleanup. This is currently not implemented for any controller in [`k8s.io/cloud-provider`](http://k8s.io/cloud-provider). Because of this, Nodes may get deleted without the possibility to process the event in the route controller. | ||
|
||
This can cause issues with limits on the number of routes in from the cloud provider, as well as invalid routes being advertised as valid, causing possible networking reliability or confidentiality issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this "cloud provider" referring to the cloud-controller-manager?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is referring to the infrastructure provider. If an infrastructure provider has a limit on the max. amount of routes for a private network we can get rid of unused routes. This is at least the case for Hetzner.
Founds also a small typo here in from the cloud provider
, which I am going to clean up.
keps/sig-cloud-provider/5237-watch-based-route-controller-reconciliation/README.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Michael McCune <msm@opbstudios.com>
@elmiko If possible, I’d love to work through the open questions here in the PR so we can keep things moving, and then have a discussion in the next SIG meeting. Otherwise, we might end up losing a two of weeks of time. What do you think? Open Questions:
|
i'm fine to continue the discussions here.
i think 12h sounds fine to me.
i'll have to think about this a little more, i have a feeling that those are good to start with.
i like the idea of adjusting the default for the my only issue with using |