-
Notifications
You must be signed in to change notification settings - Fork 800
Add non-legacy exceptions support (try_table) to the asyncify pass #7846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This introduces a "relaxed" mode to the flatten pass, which allows it to process try_table expressions. In this mode we preserve blocks return values if those blocks are used as catch destinations and we also preserve breaks with values if they target these blocks. To make this useful for asyncify, blocks return values are still saved into locals.
This uses the new "relaxed" flat IR which supports try_table. To make it work with asyncify we add support for the new flat IR "local.set with a block" expression, where we need to add a dummy local.get at the end to make the catch block reachable when rewinding.
Asyncify saves and restores locals from memory, but since reference types can't be stored in memory, they need to be stored in tables. What gets saved to memory are their indices, allowing us to continue supporting multiple stacks. The trickier part is how to keep track of free slots in these tables, and while that could be done using extra memory, in order to not depend on multiple memories, a bitmap funcref table is utilized, where a non-null value signals the slot being in use. Notably this doesn't add support for neither externref nor anyref, they would need separate tables.
Interesting work here! We have been considering some changes to Flatten, including relaxing it, so this may help inform that. Btw, do you still need Asyncify, given JSPI is in the process of shipping? |
@kripken, this is for QEMU coroutines which use fibers which in turn need Asyncify.. I'm not sure how JSPI would help here (hmm, could some module reentry hacks help?), I guess we'd need full stack switching support.. |
To use JSPI you would need to call out to JS, then back in, but JS can then pause/resume you just like Asyncify. Is this QEMU port for an environment without JS perhaps? |
I'm experimenting with QEMU running in the browser, trying to optimize this https://github.com/ktock/qemu-wasm I'm not sure about the implications of using JS for coroutines, but does it mean JSPI gives us support for muliple stacks for a given module instance? If so, I could also look into that (rewriting fiber to use JSPI). |
Yes, you can have multiple stacks using JSPI. This is a nice overview: See also the Emscripten docs which talk about using JSPI as an Asyncify alternative, https://emscripten.org/docs/porting/asyncify.html An easy way to see JSPI code in action is to compile a small suspending program with |
That's some good news :) In the page you wrote:
but I assume that was about C-level safety, right? Could you please give me some tips for implementing fibers using JSPI? I'm asking because for each second you'd spend answering I'd need to spend hours figuring it out.. I'm planning on looking into that in the near future. |
Yes, I think that's right. But looping in @brendandahl who would know best the exact funtionality of JSPI, also for the Fibers question. (Emscripten has a Fibers API with Asyncify, but I believe it doesn't run with JSPI atm, and I'm not sure if that is just because it wasn't updated, or there is something more fundamental.) |
I don't think there's anything preventing fibers from being implemented using JSPI. IIRC, someone is doing this already in a different language already. I don't think fibers will be that efficient using JSPI since each re-entry into wasm is going to allocate another stack. I believe there was some work to minimize the cost of this though in V8. |
Unfortunately it appears I've just hit one major limitation with JSPI coroutines - it's not possible to continue a coroutine in a different thread (worker), unless this can somehow be worked around.. Stack-switching proposal won't have that limitation, right? |
I would be quite surprised if wasm stack switching plans to allow suspending in one Web Worker and resuming in another. But @brendandahl @tlively can correct me if I am wrong. If wasm had some form of lightweight thread, as has been discussed, that all runs in the same process, I can imagine it would be possible there - in theory. |
Hopefully this PR finally adds
try_table
exceptions support to the asyncify pass :)Unlike #5475 it doesn't add support for legacy exceptions, but there's no restriction on unwinding from catch blocks since in the new proposal they're ordinary blocks.
I made this thinking it'd be a simple patch but.. well it wasn't.. while I was able to finish it, it didn't really speed up qemu as much as I believed, so I most likely won't be able to polish it further..
There are 3 parts of this proposed PR:
Flatten pass
As mentioned in #6814 (comment) the flatten pass doesn't support try_table either, since the guarantee was that all block return types would be removed. Since that is impossible to achieve with try_table, this PR introduces a new opt-in
relaxed
flat ir mode which permits blocks with return values / breaks with values where they're necessary. For it to be useful for asyncify it also needs to save return values to locals (so we can "if" it out..)Basic support for exceptions with tags
The next step is to add support for this relaxed flat IR to asyncify - we handle the new "local set with a block" expression, where we need to ensure that we can also reach the catch block without actually throwing anything - this is achieved by adding an unconditional local.get instruction to be used when rewinding (the value will be discarded anyway).
Supporting catch blocks with exnref
As mentioned in #3739, reference types can't be stored in memory, so they need to be stored in tables. However, the restriction from doedrop@449dd40 that we could only support one pause at a time was not acceptable for qemu which uses fibers extensively.
Therefore this PR introduces a hacky solution - we store refs in tables, but store their indices in memory. Additionally we use a dummy ref table as a "bitmap" so we can reuse table indices.
(normally I'd do this in a separate memory and not via dummy table with null/nonnull references, but of course safari doesn't support multiple memories, so..)
Unfortunately this doesn't solve #3739 because it only works with
exnref
.. at first I thought that "any"ref really meant "any" reference, but then I realized there are disjoint type hierarchies. So to solve that issue we'd need a separate type for each such hierarchy.. in this PR there's only a table for exnrefs.