-
Notifications
You must be signed in to change notification settings - Fork 482
[ore] Infallible mz_ore::task::JoinHandle
#34180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
a8d3e24 to
917d47b
Compare
mz_ore::task::JoinHandle
Tasks can fail in theory for two ways: - Panic! However, we disable panics in production. (And for tests, we percolate the panic upward.) - Task is explicitly aborted. This is statically ruled out by the wrapper. - Runtime is shutting down. This is not interesting - it means the process is shutting down - and there's no value to percolating errors in that case. Here, we just remain pending indefinitely until the runtime dies.
917d47b to
b8dd190
Compare
|
Thansk for triggering nightly, please ignore the orchestratord failure, it's already fixed on main! |
teskje
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the simplification at the call sites and the reasoning seems sound to me. The only thing that's not great is that in non-abort-on-panic contexts you now need to remember to call into_tokio_handle or risk hanging forever when a task panics. As you say, we should have these contexts only in tests and hanging in a test is better than panicking in production.
| let (handle, _idx, remaining) = future::select_all(handles).await; | ||
| handles = remaining; | ||
| let (id, global_id, res) = handle.expect("must join"); | ||
| let (id, global_id, res) = handle; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to the changes you made, so feel free to ignore but: Calling this variable handle is misleading, since it is not a handle but the result of awaiting on one.
| AbortOnDropHandle(self) | ||
| } | ||
|
|
||
| /// Checks if the task associated with this [`JoinHandle`] has finished.a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// Checks if the task associated with this [`JoinHandle`] has finished.a | |
| /// Checks if the task associated with this [`JoinHandle`] has finished. |
(Also not your fault!)
| /// Checks if the task associated with this [`JoinHandle`] has finished.a | ||
| pub fn into_tokio_handle(self) -> TokioJoinHandle<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment got confused here!
| self.0.abort(); | ||
| let _ = self.await; | ||
| } | ||
| // Note: adding an `abort(&self)` method here is incorrect, please see `unpack_join_result`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unpack_join_result doesn't exist anymore I think?
https://github.com/MaterializeInc/database-issues/issues/9729 is an example of a recurring class of issue:
mz_orehad a utility for this:JoinHandle::wait_and_assert_finished. This code takes advantage of the fact that the only reason for cancellation is runtime shutdown, and just hang out and wait for itself to be shut down as well. We've been adding this method to join points whack-a-mole as new errors crop up.This patch just makes the
wait_and_assert_finishedbehaviour the overall default for ourJoinHandlewrapper. This should remove a class of bug and eliminates a bunch of sketch error handling across the codebase.Motivation
Tired of bugs like https://github.com/MaterializeInc/database-issues/issues/9729!
Tips for reviewer
One case where join errors are still meaningful: catching panics in async tests. In those cases we just unwrap the join handle to expose underlying Tokio handle, which seems fine.