-
Notifications
You must be signed in to change notification settings - Fork 40
fix(body): skip empty chunks during gzip decoding #480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(body): skip empty chunks during gzip decoding #480
Conversation
- Process more chunks if we receive just the gzip header Fixes #479
The fix looks reasonable, though let me make sure I'm following the problem correctly: The goal is to never get a chunk with zero bytes. (Is that right? It seems like there are still cases where a zero-byte chunk may be produced, e.g. one could be passed through from downstream.) The problem in this case is that, if the gzip header and body are in different chunks during Viceroy's processing, the header will be consumed -> update the state, but there are no bytes "ready" -> empty decompressed chunk. The next chunk then would (could) contain the decompressed data. It would be good to have some tests to capture the behavior we're worried about & any regressions - I've started sketching some |
Thanks for investigating. I'm not familiar with the Viceroy codebase, but this change certainly fixes #479 and let me provide a working solution for a customer. |
I added some additional investigation on #479 ; based on my testing, I think Viceroy is still vulnerable to a related issue in some cases (if the origin feeds in a zero-length chunk). That said, yes, this makes the implementation more resilient & fixes an issue for the customer, so let's get it in; I'll follow up on the broader issue. |
This demonstrates the underlying issue in #479. The origin writes maliciously-sized chunks - zero-length for the uncompressed body, two bytes at a time for the compressed body - and the test fixture asserts that the expected data are read. This passes for the uncompressed body (yay!); without #480, this fails for the uncompressed body (i.e. it's a reproducer for #479).
A more general set of tests for #479. Assume that the origin can send chunks of arbitrary length - in the case of #479, a gzip header in a separate chunk from the content. These show up as separate chunks in a `hyper::Body`. Ensure that, when splitting a body at arbitrary chunk lengths, the output still reproduces the body. This causes failures without #480 (or similar changes). This validates the fix in #480 (the gzip test fails without #480).
Tests and a slightly broader preventative measure for #479. The Rust [`Read`](https://doc.rust-lang.org/std/io/trait.Read.html) trait uses a return value of `Ok(0)` to indicate a (permanent/temporary) end-of-stream, e.g. end-of-file. We use the same convention for the `fastly_http_body::read` hostcall. The `Body` [trait](https://docs.rs/http-body/latest/http_body/trait.Body.html) is a little more careful, with a separate `is_stream_end` method distinct from the "length of frame" property. However, this means if our `Body` implementations _ever_ yield a zero-length chunk, the `fastly_http_body::read` hostcall will look like an end-of-stream to the guest. That's at the root of #479. #480 addresses the observed cause- that the gzip decoder in particular can yield zero-length chunks when the header + body are split. However, it's in principle possible for another (current or future) variant of `Body` to reintroduce the issue. - Test for zero-length bodies, as an integraton test and as a property test. If #480 is patched out, both of these groups of tests reproduce the issue with the gzip body. - Tackle the issue where it occurs, at the conversion from the `http::Body` stream-style API to the `std::io::Read`-style API. Add a `Body::read` method that acts more like the `AsyncRead` traits; use it from the hostcalls, instead of putting `Body` internals in those hostcalls. Fixed #479 (to my satisfaction)
Addresses #479