-
Notifications
You must be signed in to change notification settings - Fork 5.7k
93e4d0cce 相对 45c81a414f SgdSparseCpuTraining 慢了很多 #366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@linrongyi 谢谢反馈这个问题。可否贴一下测试SgdSparseCpuTraining的速度的步骤?我们请paddle团队的同学也来复现一下这个问题,然后进一步找原因和解法。 |
Merged
@linrongyi 我们在确认#359 的PR的意义。 这种10倍量级的性能差异极可能是使用上的问题,等我们确认mkl_lapacke.h的作用后再讨论解决方法。 相关问题: |
@backyes 在mkl里加上mkl_lapacke.h即可 |
@linrongyi 问题已经解决:
|
wangxicoding
pushed a commit
to wangxicoding/Paddle
that referenced
this issue
Dec 9, 2021
* rm models module
AnnaTrainingG
pushed a commit
to AnnaTrainingG/Paddle
that referenced
this issue
Sep 19, 2022
WAYKEN-TSE
pushed a commit
to WAYKEN-TSE/Paddle
that referenced
this issue
Dec 6, 2024
任务:PaddlePaddle/PaddleMIX#250 - text-to-audio推理已跑通
zhangyuqin1998
pushed a commit
to zhangyuqin1998/Paddle
that referenced
this issue
Feb 20, 2025
PaddlePaddle#366) * Removed trivial copy constructors on parameter classes to enable device-side launch of CUTLASS kernels * Added SFINAE to the `TensorRef(NonConstTensorRef const&)` constructor to avoid making it a copy-constructor for device code * std => platform * fix affine2 * really fix affine2 Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
lizexu123
pushed a commit
to lizexu123/Paddle
that referenced
this issue
Mar 26, 2025
[IPU] update docs and custom ops demo
tianyuzhou668
pushed a commit
to tianyuzhou668/Paddle
that referenced
this issue
May 12, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
今天编译的时候checkout了最新的 93e4d0c, 发现SgdSparseCpuTraining 速度相对 45c81a4 慢了很多, 将近有10倍以上.
由于 @backyes 内部工具里面的mkl 工具没有 mkl_lapacke.h, 导致 93e4d0cce编译不过. 所以找了一个带有 mkl_lapacke.h 的MKL包. 两个版本都基于这个MKL编译的.
由于还没有修复GPU版本没有开启avx的功能, 所以编译的时候-DWITH_GPU=OFF.
The text was updated successfully, but these errors were encountered: