-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Refactor cloud provider creation options #8583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
this is an alternate solution instead of #8531 |
/retest |
not sure why the tests aren't triggering. |
1afa484
to
a5f07fe
Compare
rebased on master |
i'm not sure how i've broken this, but these error lines are not making sense to me:
|
@towca does this error looks familiar to you at all? |
ok, think my last commit on this chain fixed the circular import. |
We have a conversation going on here that will potentially improve the gpu node label concern that this PR encountered: /lgtm for @towca to give his thumbs up |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: elmiko, jackfrancis The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
i'm fine to change the label in the test for this PR, i was just trying to stop the circular import error. |
This change helps to prevent circular dependencies between the core and builder packages as we start to pass the AutoscalerOptions to the cloud provider builder functions.
this changes the options input to the cloud provider builder function so that the full autoscaler options are passed. This is being proposed so that cloud providers will have new options for injecting behavior into the core parts of the autoscaler.
this change adds a method to the AutoscalerOptions struct to allow registering a scale down node processor.
This change adds a custom scale down node processor for cluster api to reject nodes that are undergoing upgrade.
this change removes the import from the gce module in favor of using the string value directly.
b82aa40
to
aea176f
Compare
New changes are detected. LGTM label has been removed. |
rebased |
@jackfrancis @towca , how is this patch looking? i'd like to get it merged soon if possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed the non-cluster-api parts of the PR
} | ||
|
||
func (a *AutoscalerOptions) RegisterScaleDownNodeProcessor(np nodes.ScaleDownNodeProcessor) { | ||
a.Processors.ScaleDownNodeProcessor.(*scaledowncandidates.CombinedScaleDownCandidatesProcessor).Register(np) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method only works for a particular configuration of AutoscalerOptions and panics otherwise. IMO this logic belongs in cloud-provider-specific code where you can assume which processors are used. I'd convert this into a CombinedScaleDownCandidatesProcessor
-specific util function and call it directly from your NewCloudProvider()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i opted for the method here so that we could keep the details about the Processors
structure internal to the AutoscalerOptions
, but if you are ok with having the individual providers do the injection directly through a passed function, i can change it.
edit: i misread your comment, i'll try it another way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I meant adding an util like this to where the combined processor is implemented:
func AddScaleDownProcessorAssumingCombinedImplementation(combinedProcessor, processorToAdd nodes.ScaleDownNodeProcessor) error {
combinedProcessorConcrete, ok := combinedProcessor.(*scaledowncandidates.CombinedScaleDownCandidatesProcessor)
if !ok {
return fmt.Errorf("wrong concrete ScaleDownNodeProcessor type: want *scaledowncandidates.CombinedScaleDownCandidatesProcessor, got %T", combinedProcessor)
}
combinedProcessorConcrete.Register(processorToAdd)
}
And calling it from your NewCloudProvider
like so:
...
err := addScaleDownProcessorAssumingCombinedImplementation(opts.Processors.ScaleDownNodeProcessor, clusterApiSpecificProcessor)
...
The whole logic could also just be inlined into your NewCloudProvider()
if we don't expect that other cloud provider integrations would want to use it in the future.
The difference between this util and your proposed method is that I see AutoscalerOptions
and its associated code as "core" CA, and core CA shouldn't assume anything about concrete types behind the processors. Out-of-tree cloud-provider integrations can for example use a completely different implementation for the top-level ScaleDownNodeProcessor
. Then such integrations have to avoid using this method even though from the outside it looks like it provides a generic way of adding another ScaleDownNodeProcessor
to AutoscalerOptions
. The util on the other hand is specific to a concrete implementation of ScaleDownNodeProcessor
, so it should be clear not to use it on other implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense to me, i will update to use a util function style.
The whole logic could also just be inlined into your NewCloudProvider() if we don't expect that other cloud provider integrations would want to use it in the future.
i would prefer a function that could be utilized by other providers. i will make a function in scale_down_candidates_processor.go and see how that looks.
} | ||
if opts.CloudProvider == nil { | ||
opts.CloudProvider = cloudBuilder.NewCloudProvider(opts.AutoscalingOptions, informerFactory) | ||
opts.CloudProvider = cloudBuilder.NewCloudProvider(opts, informerFactory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to explicitly put this at the end of the defaulting in this function, otherwise we can have the changes to AutoscalerOptions
made in NewCloudProvider()
be overwritten by further logic here (e.g. if NewCloudProvider()
wanted to configure opts.Backoff
). It seems that ExpanderStrategy
has a dependency on CloudProvider
, so that either needs to be removed or explained wherever we describe how NewCloudProvider()
should be implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to be clear, you want to see something like?
NewCloudProvider(opts config.AutoscalingOptions, informerFactory informers.SharedInformerFactory, autoscalerOpts core.AutoscalerOptions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or did you mean move the function call further down inside of initializeDefaultOptions
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latter, I meant to put the function call as close to the bottom of initializeDefaultOptions
as possible (it can't be at the very bottom because the ExpanderStrategy
logic needs CloudProvider
already set). More generally, IMO the NewCloudProvider()
call should be as late in CA init logic as possible (but initializeDefaultOptions
already gets called at the end to fill out the fields that weren't set by the main.go
logic before).
My reasoning was slightly wrong though - the fields wouldn't be overwritten if NewCloudProvider
set them because of the nil checks in initializeDefaultOptions
. But IMO it still makes sense to put it as late as possible, so that NewCloudProvider
can assume that ~all fields on the AutoscalerOptions
it gets are already set and can be read/modified.
This isn't an issue for your particular problem because opts.Processors
are set before the NewCloudProvider
call. I'm just trying to make sure the mechanism is as flexible as possible. For example if a NewCloudProvider
implementation in the future wants to interact with the opts.EstimatorBuilder
field and we keep the current order, the field would be unset at the time of NewCloudProvider()
call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, makes sense to me. i will update this.
|
||
var snapshotStore clustersnapshot.ClusterSnapshotStore = store.NewDeltaSnapshotStore(autoscalingOptions.ClusterSnapshotParallelism) | ||
opts := core.AutoscalerOptions{ | ||
opts := coreoptions.AutoscalerOptions{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing - it seems that there is already existing cloud-provider-specific init logic in this function. Now we're adding a better mechanism for the same purpose, which might be confusing down the road. Could you add a TODO (and ideally an associated issue on Github) near the provider-specific logic to migrate it to NewCloudProvider()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, will do.
What type of PR is this?
/kind cleanup
/kind api-change
What this PR does / why we need it:
This patch series changes an argument to the
NewCloudProvider
function to use anAutoscalerOptions
struct instead ofAutoscalingOptions
. This change allows cloud providers to have more control over the core functionality of the cluster autoscaler.In specific, this patch series also adds a method named
RegisterScaleDownNodeProcessor
to theAutoscalerOptions
so that cloud providers can inject a custom scale down processor.Lastly, this change adds a custom scale down processor to the clusterapi provider to help it avoid removing the wrong instance during scale down operations that occur during a cluster upgrade.
Which issue(s) this PR fixes:
Fixes #8494
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: