Skip to content

Conversation

zb3
Copy link

@zb3 zb3 commented Aug 20, 2025

Hopefully this PR finally adds try_table exceptions support to the asyncify pass :)

Unlike #5475 it doesn't add support for legacy exceptions, but there's no restriction on unwinding from catch blocks since in the new proposal they're ordinary blocks.

I made this thinking it'd be a simple patch but.. well it wasn't.. while I was able to finish it, it didn't really speed up qemu as much as I believed, so I most likely won't be able to polish it further..

There are 3 parts of this proposed PR:

Flatten pass

As mentioned in #6814 (comment) the flatten pass doesn't support try_table either, since the guarantee was that all block return types would be removed. Since that is impossible to achieve with try_table, this PR introduces a new opt-in relaxed flat ir mode which permits blocks with return values / breaks with values where they're necessary. For it to be useful for asyncify it also needs to save return values to locals (so we can "if" it out..)

Basic support for exceptions with tags

The next step is to add support for this relaxed flat IR to asyncify - we handle the new "local set with a block" expression, where we need to ensure that we can also reach the catch block without actually throwing anything - this is achieved by adding an unconditional local.get instruction to be used when rewinding (the value will be discarded anyway).

Supporting catch blocks with exnref

As mentioned in #3739, reference types can't be stored in memory, so they need to be stored in tables. However, the restriction from doedrop@449dd40 that we could only support one pause at a time was not acceptable for qemu which uses fibers extensively.
Therefore this PR introduces a hacky solution - we store refs in tables, but store their indices in memory. Additionally we use a dummy ref table as a "bitmap" so we can reuse table indices.
(normally I'd do this in a separate memory and not via dummy table with null/nonnull references, but of course safari doesn't support multiple memories, so..)

Unfortunately this doesn't solve #3739 because it only works with exnref.. at first I thought that "any"ref really meant "any" reference, but then I realized there are disjoint type hierarchies. So to solve that issue we'd need a separate type for each such hierarchy.. in this PR there's only a table for exnrefs.

zb3 added 4 commits August 20, 2025 02:27
This introduces a "relaxed" mode to the flatten pass, which allows it to process try_table expressions. In this mode we preserve blocks return values if those blocks are used as catch destinations and we also preserve breaks with values if they target these blocks. To make this useful for asyncify, blocks return values are still saved into locals.
This uses the new "relaxed" flat IR which supports try_table. To make it work with asyncify we add support for the new flat IR "local.set with a block" expression, where we need to add a dummy local.get at the end to make the catch block reachable when rewinding.
Asyncify saves and restores locals from memory, but since reference types can't be stored in memory, they need to be stored in tables. What gets saved to memory are their indices, allowing us to continue supporting multiple stacks. The trickier part is how to keep track of free slots in these tables, and while that could be done using extra memory, in order to not depend on multiple memories, a bitmap funcref table is utilized, where a non-null value signals the slot being in use.

Notably this doesn't add support for neither externref nor anyref, they would need separate tables.
@kripken
Copy link
Member

kripken commented Aug 20, 2025

Interesting work here! We have been considering some changes to Flatten, including relaxing it, so this may help inform that.

Btw, do you still need Asyncify, given JSPI is in the process of shipping?

@zb3
Copy link
Author

zb3 commented Aug 20, 2025

@kripken, this is for QEMU coroutines which use fibers which in turn need Asyncify.. I'm not sure how JSPI would help here (hmm, could some module reentry hacks help?), I guess we'd need full stack switching support..

@kripken
Copy link
Member

kripken commented Aug 20, 2025

To use JSPI you would need to call out to JS, then back in, but JS can then pause/resume you just like Asyncify. Is this QEMU port for an environment without JS perhaps?

@zb3
Copy link
Author

zb3 commented Aug 20, 2025

I'm experimenting with QEMU running in the browser, trying to optimize this https://github.com/ktock/qemu-wasm
(there's a room for improvement in the JIT generation, but that's beyond my capabilities for now)

I'm not sure about the implications of using JS for coroutines, but does it mean JSPI gives us support for muliple stacks for a given module instance? If so, I could also look into that (rewriting fiber to use JSPI).

@kripken
Copy link
Member

kripken commented Aug 21, 2025

Yes, you can have multiple stacks using JSPI. This is a nice overview:

https://v8.dev/blog/jspi

See also the Emscripten docs which talk about using JSPI as an Asyncify alternative,

https://emscripten.org/docs/porting/asyncify.html

An easy way to see JSPI code in action is to compile a small suspending program with emcc with Asyncify vs JSPI.

@zb3
Copy link
Author

zb3 commented Aug 21, 2025

That's some good news :)

In the page you wrote:

If that handler calls into compiled code, then it can be confusing, since it starts to look like coroutines or multithreading, with multiple executions interleaved.
It is not safe to start an async operation while another is already running. The first must complete before the second begins.

but I assume that was about C-level safety, right?
QEMU uses fibers for coroutines hence I need to preserve that functionality.

Could you please give me some tips for implementing fibers using JSPI? I'm asking because for each second you'd spend answering I'd need to spend hours figuring it out.. I'm planning on looking into that in the near future.

@kripken
Copy link
Member

kripken commented Aug 21, 2025

but I assume that was about C-level safety, right?

Yes, I think that's right.

But looping in @brendandahl who would know best the exact funtionality of JSPI, also for the Fibers question. (Emscripten has a Fibers API with Asyncify, but I believe it doesn't run with JSPI atm, and I'm not sure if that is just because it wasn't updated, or there is something more fundamental.)

@brendandahl
Copy link
Collaborator

I don't think there's anything preventing fibers from being implemented using JSPI. IIRC, someone is doing this already in a different language already. I don't think fibers will be that efficient using JSPI since each re-entry into wasm is going to allocate another stack. I believe there was some work to minimize the cost of this though in V8.

@zb3
Copy link
Author

zb3 commented Aug 24, 2025

Unfortunately it appears I've just hit one major limitation with JSPI coroutines - it's not possible to continue a coroutine in a different thread (worker), unless this can somehow be worked around..
If these reference tables are per module instance it also means my PR here has this limitation too when reference types are present on the stack, albeit QEMU worked since for setjmp/longjmp these weren't used.

Stack-switching proposal won't have that limitation, right?

@kripken
Copy link
Member

kripken commented Aug 26, 2025

I would be quite surprised if wasm stack switching plans to allow suspending in one Web Worker and resuming in another. But @brendandahl @tlively can correct me if I am wrong.

If wasm had some form of lightweight thread, as has been discussed, that all runs in the same process, I can imagine it would be possible there - in theory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants