-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Avoid stack overflow in IndicesClusterStateService applyClusterState #132536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
Hi @albertzaharovits, I've created a changelog YAML for you. |
Honestly, I think I prefer that every chained listener be executed on a generic thread, for code simplicity's sake. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather we didn't extend the chain in the (overwhelmingly common) case where the cluster state update doesn't close any more shards.
Also can you cover this in a test?
@@ -274,8 +275,26 @@ public synchronized void applyClusterState(final ClusterChangedEvent event) { | |||
lastClusterStateShardsClosedListener = new SubscribableListener<>(); | |||
currentClusterStateShardsClosedListeners = new RefCountingListener(lastClusterStateShardsClosedListener); | |||
try { | |||
previousShardsClosedListener.addListener(currentClusterStateShardsClosedListeners.acquire()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm are you sure we should move all this listener stuff below doApplyClusterState()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't think of any impact to execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I've put it back at the original place.
Pushed 3a00599 |
f4c5977
to
b6d6742
Compare
@DaveCTurner can you take another look please? I've changed the code to avoid linking listeners when the applied cluster state doesn't close any shards. |
Every cluster state applied in the
IndicesClusterStateService
has the potential to chain a newRefCountingListener
to a chain of such listeners. If the chain is too long, the unlucky thread that decreases the ref count to0
for the head of the listeners chain, ends up calling each listener in turn, and, assuming all ref counts are hence decreased to0
, traversing the whole chain on its thread stack, possibly resulting in aStackoverflow
exception.This fix chains max 8
RefCountingListener
, the 11th one is forked on a generic thread when it gets to execution.