-
Notifications
You must be signed in to change notification settings - Fork 1.9k
vdev_disk_close: take disk write lock before destroying it #17719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be wrong, but shouldn't config lock prevent vdev close while I/Os are running? Considering vdev_disk_io_start()
reads vdev_tsd
before taking the vd_lock
, I am not sure this patch can close whatever the race is there completely, unless I misunderstand it.
Sorry, I didn't explain at all really; I shouldn't post PRs when sleepy :) I believe it looks roughly like this:
With #17718, the assert in Now, I am not sure of exactly the circumstances that would cause the ISSUE thread to sleep for that long. In my case, it happened reliably in
With #17718 in place, the crash changed to:
I don't know if there's any particular reason for the (relatively) long sleep; possibly to do with the small number of flush taskq threads? Or just the workload difference. But, looking closer, it seems like this isn't just my code; it seems like all IO types theoretically have this problem. We even already sort-of knew about the shape of it, see the comment at the bottom of So anyway, I thought about reloading And that means having to ensure that the lock will survive until we're done, and the simplest way to do that seemed to be to just take the write lock in |
Many IO operations are submitted to the kernel async, and so the zio can complete and followup actions before the submission call returns. If one of the followup actions closes the disk (eg during pool create/import), the initiator may be left holding a lock on the disk at destruction. Instead, take the write lock before finishing up and decoupling the disk state from the vdev proper. The caller will hold until all IO is submitted and locks released. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
3d97062
to
21451df
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I guess makes sense.
I haven't looked deep on the code, but as an alternative thought: Do we really need to hold the lock during requests issue, or we may rely on some other means to be sure the device is still there? FreeBSD's |
Many IO operations are submitted to the kernel async, and so the zio can complete and followup actions before the submission call returns. If one of the followup actions closes the disk (eg during pool create/import), the initiator may be left holding a lock on the disk at destruction. Instead, take the write lock before finishing up and decoupling the disk state from the vdev proper. The caller will hold until all IO is submitted and locks released. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes openzfs#17719
[Sponsors: Klara, Inc., Wasabi Technology, Inc.]
Motivation and Context
I had a surprising crash in a client project and traced it to an A-B-A problem in vdev_disk, where the vdev could be closed and the private data, including an rwlock, destroyed while the lock was still held.
IO is issued asynchronously. If the kernel deschedules the initiating task, and then IO is scheduled, completed and goes on to call
vdev_disk_close()
and close the vdev before the initiating task runs, then thevdev_disk_t
invdev_tsd
will have been destroyed, rendering the callersvd
now invalid.I have not been able to trigger this in stock OpenZFS, but I can easily in my client project. Since I hope to upstream that work soon, fixing this becomes important.
Description
Take the write lock in
vdev_disk_close()
. If any read locks are outstanding, the task will sleep, and those issuing tasks will get an opportunity to release their read lock. Once cleared,vdev_disk_close()
will gain the write lock and be able to nullvdev_tsd
.I don't think it's possible for this to deadlock, because the read lock should always be taken in an "issue" task, while the completions will redispatch the zio to an "interrupt" task.
How Has This Been Tested?
ZTS run completed. In my own code, this fixes my crash.
Types of changes
Checklist:
Signed-off-by
.