-
Notifications
You must be signed in to change notification settings - Fork 15
Improve GPU performance with shifting #974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Improve GPU performance with shifting #974
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves GPU performance for particle shifting operations by refactoring from mutating to non-mutating functions and adding @propagate_inbounds annotations. The benchmark shows a ~9x performance improvement (from 3.117 ms to 347.360 μs) for ConsistentShiftingSun2019 on an A4500 GPU.
Key changes:
- Refactored
continuity_equation_shifting!to non-mutatingcontinuity_equation_shifting_termthat returns values instead of mutating arrays - Added
@propagate_inboundsannotations to performance-critical functions - Reorganized timing instrumentation for better profiling granularity
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/schemes/fluid/shifting_techniques.jl | Refactored continuity equation shifting from mutating to non-mutating pattern, added @propagate_inbounds annotations, and split timing blocks |
| src/schemes/fluid/fluid.jl | Integrated non-mutating continuity_equation_shifting_term into continuity equation calculation |
| src/schemes/boundary/open_boundary/system.jl | Added timing instrumentation wrapper around open boundary update |
| src/preprocessing/particle_packing/system.jl | Added timing instrumentation for particle packing operations |
| src/callbacks/update.jl | Removed redundant timing wrappers (moved to individual functions) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…iParticles.jl into shifting-gpu-performance
…iParticles.jl into shifting-gpu-performance
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #974 +/- ##
==========================================
- Coverage 64.84% 64.83% -0.01%
==========================================
Files 120 120
Lines 8565 8566 +1
==========================================
Hits 5554 5554
- Misses 3011 3012 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Here is a benchmark of
interact!on an A4500 with FP32. I ran theperiodic_channel_2d.jlexample with 125k particles and benchmarked with:ConsistentShiftingSun2019continuity_equation_shifting!@propagate_inboundsvdiff_shiftinginstead ofcontinuity_equation_shifting!