Skip to content

MDRange iteration order transformation #49

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 37 commits into
base: main
Choose a base branch
from

Conversation

cwpearson
Copy link
Member

@cwpearson cwpearson commented Dec 19, 2024

Note

Work in progress
I'll squash / tidy this up once it's ready to go.

Improves memory access locality for GPUs.

  • The Cost of a memref is modeled as its reuse distance scaled by loop trip count and whether it's a load or store
  • The reuse distance of a memref is the partial derivative of it's offset w.r.t the right-most induction variable of the enclosing scf.parallel
    1. Compute d(offset)/d(memref index variable) for all of the memref's index variables
    2. Compute d(memref index variable)/d(scf.parallel induction variable) for all of the memref's index variables
      • may not be computable
    3. Sum up all quantities from (1) and (2) to get the desired partial derivative d(offset)/d(scf.parallel induction variable)
      • may not be computable due to (2)
    4. The resulting Cost will usually include unknown memref extents and loop trip counts
      • or may not be computable due to (2)
  • Phase 1: walk the modules and compute a map of (scf.parallel) -> trip count
  • Phase 2: walk the module and compute a map of (memref, induction variable) -> Cost, to refer to in phase 2
    • incorporate reuse distance
    • incorporate load vs store
    • incorporate loop trip count
  • Phase 3: consider all possible choices for right-most induction variable for all scf.parallel
    • iterate all choices of right-most induction variable
    • Add up the evaluated cost model for all memrefs
    • Evaluate the cost model with Monte-Carlo method for unknowns
      • principled "sampling" method?
        • log random
      • principled "aggregation" method?
        • median
  • Phase 3: generate a new module
    • clone original module
    • modify scf.parallel things
  • Phase 4: does it work?

Signed-off-by: Carl Pearson <cwpears@sandia.gov>
Signed-off-by: Carl Pearson <cwpears@sandia.gov>
Signed-off-by: Carl Pearson <cwpears@sandia.gov>
Signed-off-by: Carl Pearson <cwpears@sandia.gov>
Signed-off-by: Carl Pearson <cwpears@sandia.gov>
Signed-off-by: Carl Pearson <cwpears@sandia.gov>
Signed-off-by: Carl Pearson <cwpears@sandia.gov>
Signed-off-by: Carl Pearson <cwpears@sandia.gov>
@cwpearson cwpearson force-pushed the feature/mdrange-memory-access branch from 224c3e2 to 9f938d0 Compare January 24, 2025 17:09
Signed-off-by: Carl Pearson <cwpears@sandia.gov>
Signed-off-by: Carl Pearson <cwpears@sandia.gov>
@cwpearson cwpearson force-pushed the feature/mdrange-memory-access branch from 1a84988 to 856f904 Compare January 24, 2025 17:42
@cwpearson cwpearson force-pushed the feature/mdrange-memory-access branch from faf3874 to e878b59 Compare January 24, 2025 19:30
@vmiheer vmiheer requested review from brian-kelley and removed request for brian-kelley January 26, 2025 08:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants