MDRange iteration order transformation #49

cwpearson · 2024-12-19T20:14:35Z

Note

Work in progress
I'll squash / tidy this up once it's ready to go.

Improves memory access locality for GPUs.

The Cost of a memref is modeled as its reuse distance scaled by loop trip count and whether it's a load or store
The reuse distance of a memref is the partial derivative of it's offset w.r.t the right-most induction variable of the enclosing scf.parallel
1. Compute d(offset)/d(memref index variable) for all of the memref's index variables
2. Compute d(memref index variable)/d(scf.parallel induction variable) for all of the memref's index variables
  - may not be computable
3. Sum up all quantities from (1) and (2) to get the desired partial derivative d(offset)/d(scf.parallel induction variable)
  - may not be computable due to (2)
4. The resulting Cost will usually include unknown memref extents and loop trip counts
  - or may not be computable due to (2)

Signed-off-by: Carl Pearson <cwpears@sandia.gov>

mlir/lib/Dialect/Kokkos/Transforms/KokkosMdrangeIterationPass.cpp

Signed-off-by: Carl Pearson <cwpears@sandia.gov>

cwpearson added 4 commits December 19, 2024 13:13

mdrange: Initial skeleton

a911162

Signed-off-by: Carl Pearson <cwpears@sandia.gov>

mdrange: missing header

aca4a69

mdrange: remove unneeded defs

6322fbe

mdrange: no-op test of no-op pass

bf2e025

vmiheer reviewed Dec 20, 2024

View reviewed changes

mlir/lib/Dialect/Kokkos/Transforms/KokkosMdrangeIterationPass.cpp Outdated Show resolved Hide resolved

cwpearson added 15 commits January 14, 2025 10:59

find some relevant stuff in the module

c1f6e48

Symbolic reuse distance and additional utilities

c0f7fac

Walk all possible parallel loop configurations

ba6217a

primitive monte-carlo

b9d6325

remove unused walk_selections

c98872f

Signed-off-by: Carl Pearson <cwpears@sandia.gov>

Add Sub and Div (useful for trip counts)

9ceabce

Fix div symbol

0a33688

expressions for parallel trip counts

9780455

Signed-off-by: Carl Pearson <cwpears@sandia.gov>

Incorporate trip count into cost model

02273e6

use llvm::DenseMap for ParallelTripCounts

641a3c9

Signed-off-by: Carl Pearson <cwpears@sandia.gov>

use llvm::DenseMap for MemrefInductionCosts

da5942a

Signed-off-by: Carl Pearson <cwpears@sandia.gov>

use llvm::DenseMap in ParallelConfig

5090616

Signed-off-by: Carl Pearson <cwpears@sandia.gov>

remove unused VecMap

cf6c5e7

Signed-off-by: Carl Pearson <cwpears@sandia.gov>

better stack scoping

4801b95

Signed-off-by: Carl Pearson <cwpears@sandia.gov>

factor out memref cost generation

9f938d0

Signed-off-by: Carl Pearson <cwpears@sandia.gov>

cwpearson force-pushed the feature/mdrange-memory-access branch from 224c3e2 to 9f938d0 Compare January 24, 2025 17:09

cwpearson added 2 commits January 24, 2025 10:12

replace loop with DenseMap::insert

e5f777a

Signed-off-by: Carl Pearson <cwpears@sandia.gov>

Improve names & comments, SmallVector for parallel op stack

856f904

Signed-off-by: Carl Pearson <cwpears@sandia.gov>

cwpearson force-pushed the feature/mdrange-memory-access branch from 1a84988 to 856f904 Compare January 24, 2025 17:42

cwpearson added 3 commits January 24, 2025 10:47

clone and replace module ops

187a0c3

fix redundant scf.reduce, permute scf parallel

fa9a22b

add nested parallel test

e878b59

Signed-off-by: Carl Pearson <cwpears@sandia.gov>

cwpearson force-pushed the feature/mdrange-memory-access branch from faf3874 to e878b59 Compare January 24, 2025 19:30

cwpearson added 2 commits January 24, 2025 14:34

Improve naming, use SmallVector in ParallelConfig

60fc12e

Fix walk_configurations

67196be

cwpearson added 2 commits January 24, 2025 15:58

helper fn for iterating over nested ops

757ab30

remove redundant modeling calls

1ac37d7

vmiheer requested review from brian-kelley and removed request for brian-kelley January 26, 2025 08:11

cwpearson added 9 commits May 15, 2025 14:01

incorporate enclosing parallel trip count

ed1602e

simplfy parallel region traversal

403dd7d

Signed-off-by: Carl Pearson <cwpears@sandia.gov>

handle nesting in cost table, left-most induction variable in cost

218d8e1

Signed-off-by: Carl Pearson <cwpears@sandia.gov>

LayoutRight cost, model cost of nested loops

cc9eec0

Simplify building costs of memrefs

a46ace9

incorporate load/store scale factor, guard prints with macro

7c40ba5

More logging improvements

88f48fb

Signed-off-by: Carl Pearson <cwpears@sandia.gov>

more MDRange loop ordering tests

fc9ea8d

prevent overflow in Expr eval, more consistent MC simulations

297b31a

Signed-off-by: Carl Pearson <cwpears@sandia.gov>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MDRange iteration order transformation #49

MDRange iteration order transformation #49

cwpearson commented Dec 19, 2024 •

edited

Loading

MDRange iteration order transformation #49

Are you sure you want to change the base?

MDRange iteration order transformation #49

Conversation

cwpearson commented Dec 19, 2024 • edited Loading

cwpearson commented Dec 19, 2024 •

edited

Loading