Skip to content

Conversation

brettsam
Copy link
Member

@brettsam brettsam commented Aug 25, 2025

Fixes #11263

The bug here is that during a request, we register a DI child scope container with the request... then we dispose it mid-request, so when anything in ASP.NET tries to get a service later, it throws. It's a race b/c it's possible that the request returns before the old container is disposed. Despite this, everything seems to be working b/c all the hard work has already been done by the time the exception is thrown. The JobHost is restarted and everything continues on happily even though we return a 500 to the caller.

I have a preliminary PR that I've just pushed up but it's not truly complete. It'll work for what we need but it's quite tricky to get right with how DI works.

Some background:

What I've got should be good enough for the very specific scenario we need it for, but there still seems like there's gaps that may bite us later.

Want to discuss with @fabiocav later whether we should use this same approach? Or abandon it and just skip registering services for /admin calls? Or if there's other approaches.

Will leave it in Draft for now.

Pull request checklist

IMPORTANT: Currently, changes must be backported to the in-proc branch to be included in Core Tools and non-Flex deployments.

  • Backporting to the in-proc branch is not required
    • Otherwise: Link to backporting PR
  • My changes do not require documentation changes
    • Otherwise: Documentation issue linked to PR
  • My changes should not be added to the release notes for the next release
    • Otherwise: I've added my notes to release_notes.md
  • My changes do not need to be backported to a previous version
    • Otherwise: Backport tracked by issue/PR #issue_or_pr
  • My changes do not require diagnostic events changes
    • Otherwise: I have added/updated all related diagnostic events and their documentation (Documentation issue linked to PR)
  • I have added all required tests (Unit tests, E2E tests)

{
// This forces the hosts to be stopped and disposed before a new one starts.
// There was a bug hiding here originally, so we'll run all these tests this way.
{ ConfigurationSectionNames.SequentialJobHostRestart, "true" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this behavior has some major differences. It doesn't call something regarding worker process cleanup. Any concern with missing that logic in this test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only place that I see it's called

if (ShouldEnforceSequentialRestart())
{
stopTask = Orphan(previousHost, cancellationToken);
await stopTask;
startTask = UnsynchronizedStartHostAsync(activeOperation);

Which is awaiting the call to Orphan() so that we're guaranteed the previous host is disposed before we start another one. Otherwise we fire-and-forget that call.

The bug here is a race and it always passes for me locally (and apparently in CI) because the new host is started and the request returns while we're still waiting to dispose the orphaned one.

@brettsam
Copy link
Member Author

Added a bunch more context to the description; leaving this in Draft now until I can discuss with @fabiocav. It's possible this was a purposeful omission and he had another plan (or maybe not :-))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

/admin/host/resume throws ObjectDisposedException
2 participants