-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Matmul performance optimization with cuBlasLt #46431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Xreki
merged 76 commits into
PaddlePaddle:develop
from
JamesLim-sy:add_autotune_kernel_tool
Feb 26, 2023
Merged
Changes from all commits
Commits
Show all changes
76 commits
Select commit
Hold shift + click to select a range
7f42952
for 1st time interface combine.
JamesLim-sy 1dba1a6
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
JamesLim-sy 96cf58a
another first commit
JamesLim-sy 07677e3
first commit
JamesLim-sy ee801c3
first commit
JamesLim-sy 67bf57c
merge alloc together
JamesLim-sy 64ee6d7
remove the autotune.h file
JamesLim-sy de873b4
add CheckEighResult for both sysej and evd kernel
JamesLim-sy 3aa505a
profile reduce kernel for fp16 and reduceHigherdim
zhangbopd c6c5ca2
Merge branch 'PaddlePaddle:develop' into develop
zhangbopd 4187e28
Merge branch 'PaddlePaddle:develop' into develop
zhangbopd 14180cc
use reinterpret_cast
zhangbopd 2c6eaa5
fix for CI on ROCm
zhangbopd 1eaf75f
add Macro for ROCm
zhangbopd 5f8c72b
ROCm CI config
zhangbopd 444b1c4
ROCm CI config
zhangbopd fbb8361
unit test repair
zhangbopd 19de67a
Merge branch 'PaddlePaddle:develop' into develop
zhangbopd 427e98c
Merge branch 'PaddlePaddle:develop' into develop
zhangbopd cbf1f3d
Merge branch 'PaddlePaddle:develop' into develop
zhangbopd c6dbe30
pull
zhangbopd 2a9ef0a
add common_funcs.h
zhangbopd ba99367
reduceType
zhangbopd d326b58
Update reduce_function.h
zhangbopd 2ccb0ea
not higher
zhangbopd 2a14bdb
conflict fix
zhangbopd ff38003
rename
zhangbopd 3c7e544
Merge branch 'PaddlePaddle:develop' into develop
zhangbopd 66475ea
Merge branch 'PaddlePaddle:develop' into develop
zhangbopd e3fd59b
implement of matmul using cublasLt instead of cublas
zhangbopd 9c2b658
Merge branch 'PaddlePaddle:develop' into matmul-autotune
zhangbopd 218990e
cublasLt bugfix
zhangbopd 40d66f9
Merge branch 'matmul-autotune' of https://github.com/zhangbopd/Paddle…
zhangbopd f49b23d
Update matmul_kernel_impl.h
zhangbopd 8e8dda6
Update matmul_kernel_impl_via_blasLt.h
zhangbopd 75e83bb
for-loop-algo
zhangbopd 192a1a8
PR comments changes
zhangbopd e636886
add macro
zhangbopd 5783696
ci unused variable isCublasLt
zhangbopd 11bf150
ci unused variable isCublasLt macro
zhangbopd 8eb3aa8
split matmul to autotune
zhangbopd 9405067
Merge branch 'matmul-autotune' of https://github.com/zhangbopd//Paddl…
JamesLim-sy c780a94
[WIP]: temporary storage of codes
zhangbopd 307c89e
[WIP] temp storage
zhangbopd 3dae0f9
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
zhangbopd e0c40bc
temp storage
zhangbopd 2e8a684
temp storage
zhangbopd 65a77d9
add some changes
zhangbopd d876e7e
add some changes
zhangbopd faaa937
temp storage for changing cublasLtWithBatch computation
zhangbopd 5285192
temp storage of compile-time debug
zhangbopd f02d2e9
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
zhangbopd adec3bd
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
zhangbopd 4372c0d
add some changes
zhangbopd 6791ff5
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
zhangbopd 7f5b526
fix bugs for ci
zhangbopd 80fafc5
revert the case number written style
zhangbopd 8fe0afe
revert the case number written style
zhangbopd fbda72c
add some changes
JamesLim-sy 6ec9106
add some changes for matmul_auto_tune
JamesLim-sy ad58d06
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
zhangbopd c4a540d
revise the data format
JamesLim-sy fea6614
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
JamesLim-sy 335c134
fix according to CI and review advices
JamesLim-sy cb7c608
fix bugs according to CI
JamesLim-sy 9cdfa2c
change according to ci
JamesLim-sy e2ed925
Merge branch 'develop' into add_autotune_kernel_tool
Xreki c1a7448
Polish codes.
Xreki eef4555
Warp the matmul function and revert the change of matmul_grad_kernel.
Xreki 9044737
Simplify the codes.
Xreki 16864be
Fix typo.
Xreki cc539d7
Fix compiling error.
Xreki cf85133
Merge branch 'develop' into add_autotune_kernel_tool
Xreki c35bdea
Fix compiling error when no gpu.
Xreki e863cbe
Add the missing argument.
Xreki febeb01
Merge branch 'develop' into add_autotune_kernel_tool
Xreki File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
似乎,没有开启AutoTune功能的时候,这里会多1次查cache的开销。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这块比较难避免,AutoTune关闭的状态存在于调优功能开启的之前,和之后,这里的操作逻辑与
conv_udnn_v7.h
中一致