Skip to content
Discussion options

You must be logged in to vote

You can set the thread block size with ParallelFor quite easily by putting it as a template argument: amrex::ParallelFor<128>(...);. The default is 256 (set by AMREX_GPU_MAX_THREADS), while values between 32 and 1024 could be worth trying. Dynamic shared memory is not usable with ParallelFor, as only some of the threads might execute the last threadblock. Instead, you would have to use amrex::launch which gives direct access to dynamic shared memory amount, thread block size, and grid size. However, it is not available when compiling for CPUs.

For the equations, I can only give some general advice, which is to use profiling tools such as AMReX TinyProfiler or NVIDIA Nsight Systems and to …

Replies: 4 comments 13 replies

Comment options

You must be logged in to vote
2 replies
@lwJi
Comment options

@AlexanderSinn
Comment options

Answer selected by lwJi
Comment options

You must be logged in to vote
8 replies
@dwillcox
Comment options

@AlexanderSinn
Comment options

@AlexanderSinn
Comment options

@dwillcox
Comment options

@AlexanderSinn
Comment options

Comment options

You must be logged in to vote
3 replies
@AlexanderSinn
Comment options

@chcheng3
Comment options

@AlexanderSinn
Comment options

Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
5 participants