Skip to content

fix(body): skip empty chunks during gzip decoding #480

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

acme
Copy link
Collaborator

@acme acme commented May 27, 2025

  • Process more chunks if we receive just the gzip header

Addresses #479

- Process more chunks if we receive just the gzip header

Fixes #479
@cceckman-at-fastly
Copy link
Contributor

The fix looks reasonable, though let me make sure I'm following the problem correctly:

The goal is to never get a chunk with zero bytes. (Is that right? It seems like there are still cases where a zero-byte chunk may be produced, e.g. one could be passed through from downstream.)

The problem in this case is that, if the gzip header and body are in different chunks during Viceroy's processing, the header will be consumed -> update the state, but there are no bytes "ready" -> empty decompressed chunk. The next chunk then would (could) contain the decompressed data.

It would be good to have some tests to capture the behavior we're worried about & any regressions - I've started sketching some proptest tests for these cases, will ping you to go over them.

@cceckman-at-fastly cceckman-at-fastly self-requested a review May 27, 2025 19:24
@acme
Copy link
Collaborator Author

acme commented May 28, 2025

Thanks for investigating. I'm not familiar with the Viceroy codebase, but this change certainly fixes #479 and let me provide a working solution for a customer.

@cceckman-at-fastly
Copy link
Contributor

I added some additional investigation on #479 ; based on my testing, I think Viceroy is still vulnerable to a related issue in some cases (if the origin feeds in a zero-length chunk).

That said, yes, this makes the implementation more resilient & fixes an issue for the customer, so let's get it in; I'll follow up on the broader issue.

@cceckman-at-fastly cceckman-at-fastly merged commit c8c0ef2 into main May 28, 2025
15 checks passed
@cceckman-at-fastly cceckman-at-fastly deleted the acme/skip-empty-chunks-during-gzip-decoding branch May 28, 2025 17:09
cceckman-at-fastly added a commit that referenced this pull request May 28, 2025
This demonstrates the underlying issue in #479. The origin writes
maliciously-sized chunks - zero-length for the uncompressed body, two
bytes at a time for the compressed body - and the test fixture asserts
that the expected data are read.

This passes for the uncompressed body (yay!); without #480, this fails
for the uncompressed body (i.e. it's a reproducer for #479).
cceckman-at-fastly added a commit that referenced this pull request May 28, 2025
A more general set of tests for #479. Assume that the origin can send
chunks of arbitrary length - in the case of #479, a gzip header in a
separate chunk from the content.

These show up as separate chunks in a `hyper::Body`. Ensure that, when
splitting a body at arbitrary chunk lengths, the output still reproduces
the body. This causes failures without #480 (or similar changes).

This validates the fix in #480 (the gzip test fails without #480).
cceckman-at-fastly added a commit that referenced this pull request May 29, 2025
Tests and a slightly broader preventative measure for #479.

The Rust [`Read`](https://doc.rust-lang.org/std/io/trait.Read.html) trait uses a return value of `Ok(0)` to indicate a (permanent/temporary) end-of-stream, e.g. end-of-file.
We use the same convention for the `fastly_http_body::read` hostcall.

The `Body` [trait](https://docs.rs/http-body/latest/http_body/trait.Body.html) is a little more careful, with a separate `is_stream_end` method distinct from the "length of frame" property.

However, this means if our `Body` implementations _ever_ yield a zero-length chunk, the `fastly_http_body::read` hostcall will look like an end-of-stream to the guest. That's at the root of #479.

#480 addresses the observed cause- that the gzip decoder in particular can yield zero-length chunks when the header + body are split. However, it's in principle possible for another (current or future) variant of `Body` to reintroduce the issue.

-   Test for zero-length bodies, as an integraton test and as a property test. If #480 is patched out, both of these groups of tests reproduce the issue with the gzip body.
-   Tackle the issue where it occurs, at the conversion from the `http::Body` stream-style API to the `std::io::Read`-style API. Add a `Body::read` method that acts more like the `AsyncRead` traits; use it from the hostcalls, instead of putting `Body` internals in those hostcalls.

Fixed #479 (to my satisfaction)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants