Skip to content

Conversation

arjunr2
Copy link

@arjunr2 arjunr2 commented Jul 18, 2025

Brief

This PR is intended to support deterministic record and replay (RR) of Wasm components intrinsically in Wasmtime, and received an initial round of discussion in the Wasmtime bi-weekly meeting on 07/17

Motivation

RR is a very useful primitive for improving the debugging story of Wasm in Wasmtime. Bugs that are often encountered in modules during deployment can be deterministically reproduced. In particular, it provides the foundation for the following (to name a few):

  • Reverse-execution (a.k.a, time-travel) debugging
  • Offline static/dynamic analysis of prior module executions
  • Profiling of module/runtime components
  • Automatic extraction of differential unit-tests for system interfaces (e.g. WASI)
  • Interposition points for targeted fuzzing of system interfaces and/or modules

Scope

This initial PR provides the base primitives for recording and replay events. It supports RR at all import function boundaries and lowering rules for component types. The RR event infrastructure is intended to be easily extensible to new event types as new use-cases emerge.

Primary Goals

  • Enabling low overhead (memory, compute, and trace size) recording and high-performance replay,
  • Full determinism during replay that can run outside the embedder context.
  • An engine-agnostic trace recording format -- the goal is to purely capture guest-host boundary crossings (import calls and component model interactions) that can be reasonably interpreted by another component model compliant engine.

Non-Goals (Subject to Discussion)

  • A human readable trace format. This belong better in something like wit-bindgen, and/or as an independent tool over the low-level trace
  • Replay support for updated versions of a recorded module -- this requires a much more coordinated effort from the producers as well to make this practically useful.

Initial Performance Numbers

Some initial runs on compression libraries like zstd show a 4-5% overhead on recording logic, excluding the disk I/O. This seems reasonable at the moment and likely doesn't need further optimization unless there are explicit use-cases.

Minor Todo

The following (minor) additions will be made in the coming days prior to potential merging:

  • Encoding hashes of modules in the recorded trace for validation
  • Generic writers/readers for RecordBuffer and ReplayBuffer
  • Feature gating all of RR

Questions for Maintainers

  • Do we get wasip1 for free by recording/replay at this level?
  • What's typically the most idiomatic way to serialize anyhow:Errors?

arjunr2 added 30 commits June 5, 2025 13:51
* Integrated `Recorder` and `Replayer` traits
* Supported validation in configs and CLI
No replay support is included yet, and the recording still doesn't
cover all cases surrounding overriden lowering implementations
Todos:
* Fix some interfaces for Recorder/Replayer and Stores
* Include feature gating
Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial review -- this is looking really good overall!

My main high-level thought around the heart of the record/replay (the interaction with the component hostcall logic and lifting/lowering): there are a lot of fine-grained conditionals, and it's somewhat hard to trace, even though there are only three modes (running + recording, replaying, or just running). It feels like somehow we should be able to hoist the conditionals a bit more and then have straight-line sequences for each of those modes.

A few other misc thoughts:

  • We'll want to make sure we either plug all soundness holes or document them for the initial landing. We already know about component builtins (and their memory effects); we should ask around to ensure we catch any others.
  • For core-wasm recording, I suspect we'll want to validate that there are no exported memories or tables.
  • We should validate that the module is exactly the same on replay as on record -- I know you're working on adding hashing for this.
  • Some comments below about std dependence -- we should ensure the core can work, serializing into memory, in a no-std build.
  • We'll want at least @alexcrichton to give this a review as well, maybe after we try some of the refactors mentioned here.

Thanks again!

/// enforced during replay by default (NaN canonicalization, deterministic relaxed SIMD)
#[arg(short = 'P', long = "replay", value_name = "KEY[=VAL[,..]]")]
#[serde(skip)]
replay_raw: Vec<opt::CommaSeparated<Replay>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High-level thoughts on command-line options:

  • It makes sense to me that record is a general option that the user should be able to set on any Wasmtime subcommand that executes WebAssembly; this includes wasmtime run, wasmtime serve, and maybe others. (Curious: have you tried it on an HTTP server component?)
  • However, replay is a specific and separate thing: for example, if I am replaying an HTTP server execution, I should not be spawning an actual HTTP server on the host side. Basically, the idea of replay is that it is replacing the "real" host with a trace, so it should be properly seen (I think) as a separate top-level driver.

With that in mind, what do you think about turning replay into a command, rather than an option?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to what @cfallin said about replay being a command, not an option.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replay in practice operates very similar to the run command and shares a lot of the top level command line options with it as well.

The replay command as a result would be just a minor super-set of the run setup. It sort of makes sense to do one of two things:

  • Have a top-level replay command, that just adds additional replay options, and calls run under the hood with it
  • Add replay options perhaps just for the run command (as opposed to general options)

Leaning towards the former since maybe in the future replay could support different modes of operation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replay in practice operates very similar to the run command and shares a lot of the top level command line options with it as well.

Shouldn't all the top-level options pretty much be fixed and effectively identical to whatever they were when the trace was recorded? When does it make sense to specify different options than were used when recording? If the answer is "it doesn't make sense" then I don't think we want to force users to manually provide the same options from before in the replay command.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that's true for most options. I think the ones that might be usable are perhaps just profiling

Copy link
Member

@fitzgen fitzgen Jul 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah great point, very cool to profile an execution after the fact and ignore I/O and the external world.

I think we can probably add CLI options one at a time as they make sense to the replay command, and have the default1 be "whatever was configured during the recording".

Footnotes

  1. I think this is implicit/automatic in that the trace will reflect whatever configuration settings were in effect during recording? At least that is true for anything that affects execution.

@arjunr2 arjunr2 requested a review from a team as a code owner July 24, 2025 23:18
@alexcrichton
Copy link
Member

We'll want at least @alexcrichton to give this a review as well, maybe after we try some of the refactors mentioned here.

Happy to help out! (and agreed I'd like to once-over at some point)

How about scheduling a call when y'all are ready with the 3 of us? That'd probably be best to draw attention to any various areas and for me to ask some questoins in a high-bandwidth way before going off to review on my own.

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super excited for this!

I think we should adjust the #[cfg(...)] options / cargo features a little bit. Instead of having a cfg feature for just type validation, but always including the general record-replay code in the build, I think we should have a #[cfg(feature = "rr")] cargo feature for controlling whether we are building in the ability to do any kind of record-replay at all. (We don't want to force the smallest embedded builds of Wasmtime to include record-replay infrastructure, for example.) And then, when record-replay support was included at compile time, we can have runtime boolean knobs for whether to do type validation and divergence-checking when replaying.

Finally, in order to make it so that the whole core runtime isn't littered with #[cfg(feature = "rr")], we can do something similar to what we do with #[cfg(feature = "gc")] and the wasmtime::runtime::gc submodule:

// crates/wasmtime/src/runtime/rr.rs

#[cfg(feature = "rr")]
mod enabled;
#[cfg(feature = "rr")]
pub use enabled::*;

#[cfg(not(feature = "rr"))]
mod disabled;
#[cfg(not(feature = "rr"))]
pub use disabled::*;

The wasmtime::runtime::rr::enabled submodule would have the actual record-replay implementation, and the wasmtime::runtime::rr::disabled submodule would have a stubbed out version of the same public API but without constructors and with every public type being a newtype of wasmtime::runtime::uninhabited::Uninhabited. This pattern lets the core runtime usage of record-replay APIs look the same regardless whether we actually build in the code to do record-replay.

The other thing I think we need before this can land is some kind of testing or fuzzing story. At minimum, we should (1) make it so that the fuzzer can turn on recording during our existing fuzzing, and (2) we should have a smoke test that records some kind of Wasm program and then replays it back again and asserts that we get the same result again. As a follow up, I think we should also generalize that smoke test into a new fuzz target where we take an arbitrary Wasm module generated by the fuzzer, record its execution, and then replay that execution and assert that we get the same results (we can mostly rely on enabling the internal type validation and divergence checks for these assertions).

Let me know if any of this isn't clear or if I've overlooked something.

Thanks!

/// enforced during replay by default (NaN canonicalization, deterministic relaxed SIMD)
#[arg(short = 'P', long = "replay", value_name = "KEY[=VAL[,..]]")]
#[serde(skip)]
replay_raw: Vec<opt::CommaSeparated<Replay>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to what @cfallin said about replay being a command, not an option.

Comment on lines +60 to +62
// Uninitialized data is assumed and serialized, so hence
// may contain some undefined values
unsafe { self.assume_init() }.to_valraw_bytes()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is sound. Instead, I think we'll want something like this:

union SerializedValRaw {
    bytes: ValRawBytes,
    val: ValRaw,
}

impl SerializedValRaw {
    pub fn new(val: ValRaw) -> Self {
        // Zero-initialize `self.bytes` to ensure that there are
        // no undefined bytes in `self` and we don't ever try to
        // read and serialize undefined data.
        let ret = Self {
            // SAFETY: it is safe to zero-initialize `u8` arrays.
            bytes: unsafe { mem::zeroed() },
        };
        ret.val = val;
        ret
    }

    pub fn get_val(&self) -> ValRaw {
        // SAFETY: `self` is always initialized in `new` such
        // that there is a valid `ValRaw` inside.
        unsafe { self.val }
    }

    pub fn get_bytes(&self) -> ValRawBytes {
        // SAFETY: We take care to ensure that `self` has no
        // undefined bytes in the constructor, so accessing the
        // raw bytes representation of `ValRaw` is safe.
        unsafe { self.bytes }
    }
}

@cfallin
Copy link
Member

cfallin commented Jul 25, 2025

How about scheduling a call when y'all are ready with the 3 of us? That'd probably be best to draw attention to any various areas and for me to ask some questoins in a high-bandwidth way before going off to review on my own.

That'd be great! FWIW, I'm on PTO next week and the week after; please feel free to talk with Arjun directly before then if you both want, or I can join after Aug 11...

@arjunr2
Copy link
Author

arjunr2 commented Jul 26, 2025

@alexcrichton @fitzgen I'll take a pass through the comments early next week, and perhaps we can find a time later next week that works

@alexcrichton
Copy link
Member

Sounds good! Feel free to ping me on Zulip when ready

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wasmtime:api Related to the API of the `wasmtime` crate itself wasmtime:config Issues related to the configuration of Wasmtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants