v0.4.3

Latest

Latest

github-actions released this 23 Jul 19:26

· 3 commits to main since this release

c146374

AcceleratedKernels v0.4.3

Diff since v0.4.2

Made ScanPrefixes the default accumulate / cumsum / cumprod algorithm. It is almost always faster on real-world data than DecoupledLookback, and doesn't depend on cross-block communication (even though theoretically DecoupledLookback has better asymptotic scalability).
Prepared AcceleratedKernels for the future PoCL backend becoming the KernelAbstractions CPU default backend; the Threads-based algorithms will remain the defaults until PoCL ones become faster.
A lot of housekeeping.

Merged pull requests:

Typo in accumulate benchmarks (#42) (@christiangnrd)
Use UnsafeAtomics to fix race in accumulate (#44) (@vchuravy)
Stop relying on backend type to determine algorithm used (#45) (@christiangnrd)
Test both 1d accumulate algorithms when supported (#49) (@christiangnrd)
neutral_element fixes (#52) (@christiangnrd)
Deduplicate reduce_group (#55) (@christiangnrd)
Tweak backend selection (#56) (@christiangnrd)
Vc/accumulate alg: made ScanPrefixes the default accumulate algorithm; added atomic orderings to DecoupledLookback. (#57) (@anicusan)

Closed issues:

Port over GPUArrays neutral_element fixes (#51)

Contributors

vchuravy, anicusan, and christiangnrd

Assets 2