-
Notifications
You must be signed in to change notification settings - Fork 1.4k
CalicoNodeStatus can't be updated when etcd performance is (temporarily) degraded #8715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This error tends to mean that the object in the API has been modified by another client between when Calico queried the resource and when Calico attempted to update the resource. Looking at that code, it appears like it does attempt to refresh its internal state when it sees CalicoNodeStatus changes, so I would expect this case to be handled. Do you know if there is another entity that is modifying these resources?
Are you doing something in particular to trigger this? |
Hi @caseydavenport
It seems there is no any controller that might change
Decrease ETCD storage IOPS, if it's possible |
Hi
For Ubuntu 22.04 it's something similar to that one
|
Degraded etcd performance is going to impact every Kubernetes API. The interesting piece is that it doesn't recover after etcd is functioning again, which suggests we might not be handling an error case correctly. |
Guys, any updates on this issue? |
This issue is stale because it is kind/enhancement or kind/bug and has been open for 180 days with no activity. |
This issue was closed because it has been inactive for 30 days since being marked as stale. |
Managed to reproduce issue in a simple synthetic way (but a bit dangerous) # change node-name to one of your nodes names (in two places)
etcdctl get /registry/crd.projectcalico.org/caliconodestatuses/node-name | grep -v /registry/crd.projectcalico.org/caliconodestatuses/node-name > status.json
# edit status.json a little, e.g. add fake annotation {"test": "abc"}
vi status.json
etcdctl put /registry/crd.projectcalico.org/caliconodestatuses/node-name "$(cat status.json)" After that, calico-node pod shows following errors
So the assumption that "we are behind syncer" is not always true, caliconodestatus (and other k8s resources) may be updated in a way where syncer will not see the update, e.g. in case of some apiserver/etcd performance issues. |
@caseydavenport is it possible to re-open this issue, please? Maybe we will be able to contribute fix ourselves |
Expected Behavior
CalicoNodeStatus
resource is updated according toupdatePeriodSeconds
optionCurrent Behavior
calico-node
stops updatingCalicoNodeStatus
resource due to the error:As far as I understand the issue is initiated by temporary ETCD performance degradation and the issue doesn't recover after the ETCD performance recovery.
Possible Solution
Reconciliation loop should process the
Operation cannot be fulfilled...
errorSteps to Reproduce
I'm not sure that is so easy
CalicoNodeStatus
resourceContext
We are using
CalicoNodeStatus
as a source for external BGP sessions monitoringYour Environment
The text was updated successfully, but these errors were encountered: