Skip to content

Conversation

ikreymer
Copy link
Member

  • separate out reading stream response while browser is waiting (not really async) from actual async loading, this is not handled via fetchResponseBody()
  • unify async fetch into first trying browser networking for regular GET, fallback to regular fetch()
  • load headers and body separately in async fetch, allowing for cancelling request after headers
  • refactor direct fetch of non-html pages: load headers and handle loading body, adding page async, allowing worker to continue loading browser-based pages (should allow more parallelization in the future)
  • unify WARC writing in preparation for dedup: unified serializeWARC() called for all paths, WARC digest computed, additional checks for payload added for streaming loading

@ikreymer ikreymer requested a review from tw4l August 25, 2025 16:54
Comment on lines +1268 to +1271
// not yet finished
if (data.asyncLoading) {
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we'd end up here if the page worker timeout is hit (or the worker crashes) before the page has finished loading async. Is there any tidying up we want to do in that case rather than just returning?

Copy link
Member

@tw4l tw4l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working very well in testing, haven't noticed any regressions. Thanks for the test updates as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants