Skip to content

Conversation

VSadov
Copy link
Member

@VSadov VSadov commented Aug 19, 2025

Re: #114581

It looks like under GC stress a couple of self-rearming GC-finalizer callbacks can keep finalizer loop continuously occupied by re-arming each other and causing a GC. That adds very little to the coverage, but can consume considerable resources through frequent GCs and make everything else slower.

Throttling finalization rate to 1 item/msec when running with GC stress reduces the rate of uninteresting GCs and makes the tests move faster.

@VSadov
Copy link
Member Author

VSadov commented Aug 19, 2025

/azp run runtime-coreclr gcstress-extra, runtime-coreclr gcstress0x3-gcstress0xc

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@VSadov
Copy link
Member Author

VSadov commented Aug 19, 2025

The effect of this change on gcstress0x3-gcstress0xc

Platform Last test pass This PR Notes
osx-arm64 4h 30min + timeout failure 34min > 4× improvement
win-arm64 2h 18min + timeout failure 1h 17min > 2× improvement
lin-arm64 1h 53min 1h 3min ~2× improvement
win-x64 2h 5min 1h 44min ~20% improvement
lin-x64 2h 48min 1h 13min ~30% improvement
win-x86 2h 42min 2h 40min No change
lin-arm 1h 40min 1h 43min No change

I suspect, but do not have direct evidence, that the change has higher effect on arm64 platforms because we issue more process-wide barriers per GC.
In particular on OSX the process-wide barrier has no OS API and we walk and ping every thread in the process, so it is likely slower.

I do not have a good explanation why 32bit tests are not affected.

@VSadov
Copy link
Member Author

VSadov commented Aug 21, 2025

/azp run runtime-coreclr gcstress-extra, runtime-coreclr gcstress0x3-gcstress0xc

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@VSadov
Copy link
Member Author

VSadov commented Aug 21, 2025

/azp run runtime-coreclr gcstress-extra, runtime-coreclr gcstress0x3-gcstress0xc

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@VSadov VSadov marked this pull request as ready for review August 21, 2025 16:23
@Copilot Copilot AI review requested due to automatic review settings August 21, 2025 16:23
@VSadov
Copy link
Member Author

VSadov commented Aug 21, 2025

I have run GC stress once more. It looks like results are repeatable.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses performance issues during GC stress testing by throttling the finalization rate to prevent continuous occupation of the finalizer loop. The change replaces the existing quiescence-based approach with a simple time-based throttling mechanism that limits finalization to approximately 1 item per millisecond when running under GC stress levels greater than 1.

Key changes:

  • Replace the existing GC quiescence-based throttling with a simpler sleep-based approach
  • Move throttling logic from the main finalizer worker loop to occur per finalized object
  • Reduce overhead from continuous GC count checking and thread switching

@VSadov VSadov enabled auto-merge (squash) August 21, 2025 16:25
@VSadov VSadov disabled auto-merge August 21, 2025 16:25
@VSadov
Copy link
Member Author

VSadov commented Aug 21, 2025

There was a failure in second try. Not sure if related.

@VSadov
Copy link
Member Author

VSadov commented Aug 21, 2025

/azp run runtime-coreclr gcstress-extra, runtime-coreclr gcstress0x3-gcstress0xc

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@VSadov
Copy link
Member Author

VSadov commented Aug 23, 2025

Lots of new failures, which seem unrealted to the change.

Things seems to be quite broken now.
Hard to make a change like this.

Example:

Assert failure(PID 8244 [0x00002034], Thread: 8248 [0x2038]): Consistency check failed: hit privileged instruction!
FAILED: !ExecutionManager::IsManagedCode(GetIP(pContext))

CORECLR! CHECK::Trigger + 0x1E9 (0x00007ff8`5920d989)
CORECLR! IsGcMarker + 0x185 (0x00007ff8`594bc665)
CORECLR! CLRVectoredExceptionHandlerShim + 0x9D (0x00007ff8`594ac0fd)
NTDLL! RtlInitializeCriticalSection + 0x190 (0x00007ff8`8cef6dc0)
NTDLL! RtlImageNtHeaderEx + 0x17B (0x00007ff8`8cec46cb)
NTDLL! KiUserExceptionDispatcher + 0x3A (0x00007ff8`8cf39c9a)
<no module>! <no symbol> + 0x0 (0x00007ff7`fa700078)
CORECLR! `string' + 0x0 (0x00007ff8`59ced298)
<no module>! <no symbol> + 0x0 (0x0000004a`000007a2)
CORECLR! `string' + 0x0 (0x00007ff8`59cb4a18)
    File: D:\a\_work\1\s\src\coreclr\vm\excep.cpp:5742
    Image: C:\h\w\A8110990\p\corerun.exe

Another:

/datadisks/disk1/work/A51108F1/p/watchdog 299 /datadisks/disk1/work/A51108F1/p/corerun -p System.Reflection.Metadata.MetadataUpdater.IsSupported=false -p System.Runtime.Serialization.EnableUnsafeBinaryFormatterSerialization=true Directed_1.dll ''
ASSERT FAILED
	Expression: pMD->HasUnmanagedCallersOnlyAttribute()
	Location:   line 1817 in /__w/1/s/src/coreclr/vm/prestub.cpp
	Function:   PreStubWorker_Preemptive
	Process:    62552

@VSadov
Copy link
Member Author

VSadov commented Aug 23, 2025

/azp run runtime-coreclr gcstress-extra, runtime-coreclr gcstress0x3-gcstress0xc

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@jkotas
Copy link
Member

jkotas commented Aug 23, 2025

Assert failure(PID 8244 [0x00002034], Thread: 8248 [0x2038]): Consistency check failed: hit privileged instruction!

#119028 has the fix for this one.

@VSadov
Copy link
Member Author

VSadov commented Aug 24, 2025

Assert failure(PID 8244 [0x00002034], Thread: 8248 [0x2038]): Consistency check failed: hit privileged instruction!

#119028 has the fix for this one.

I have tried combining this PR with the fix and nearly everything passed. - see: #119037

There were some failures on arm32, but I did not see any failures in the previous runs with this PR, so I assume they are not related to this change.

@VSadov
Copy link
Member Author

VSadov commented Aug 24, 2025

This is a test-only change that so far was not found to cause regressions.
I think we have enough confidence that this change can be merged.

@jkotas jkotas merged commit e65701d into dotnet:main Aug 24, 2025
110 of 130 checks passed
@VSadov VSadov deleted the strOsx branch August 24, 2025 05:27
@VSadov
Copy link
Member Author

VSadov commented Aug 25, 2025

/backport to release/10.0

Copy link
Contributor

Started backporting to release/10.0: https://github.com/dotnet/runtime/actions/runs/17196515733

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants