93e4d0cce 相对 45c81a414f SgdSparseCpuTraining 慢了很多 #366

linrongyi · 2016-11-05T14:22:06Z

今天编译的时候checkout了最新的 93e4d0c, 发现SgdSparseCpuTraining 速度相对 45c81a4 慢了很多, 将近有10倍以上.

由于 @backyes 内部工具里面的mkl 工具没有 mkl_lapacke.h, 导致 93e4d0cce编译不过. 所以找了一个带有 mkl_lapacke.h 的MKL包. 两个版本都基于这个MKL编译的.

由于还没有修复GPU版本没有开启avx的功能, 所以编译的时候-DWITH_GPU=OFF.

The text was updated successfully, but these errors were encountered:

wangkuiyi · 2016-11-06T01:44:43Z

@linrongyi 谢谢反馈这个问题。可否贴一下测试SgdSparseCpuTraining的速度的步骤？我们请paddle团队的同学也来复现一下这个问题，然后进一步找原因和解法。

backyes · 2016-11-06T04:09:33Z

@linrongyi 我们在确认#359 的PR的意义。这种10倍量级的性能差异极可能是使用上的问题，等我们确认mkl_lapacke.h的作用后再讨论解决方法。
@lzhao4ever 请知晓。

相关问题：
GPU avx BUG在 @hedaoyuan 的PR中#239 中已经FIX，该BUG比较紧急，请 @hedaoyuan 看看能否提供一个单独FIX的patch。

emailweixu · 2016-11-07T01:11:50Z

@backyes 在mkl里加上mkl_lapacke.h即可

luotao1 · 2016-11-07T02:26:17Z

In composer_xe_2013.0.079/mkl/include/mkl.h, it includes mkl_lapack.h:

backyes · 2016-11-07T03:40:57Z

@linrongyi 问题已经解决：

性能问题。根源是 @linrongyi 同学引用了一个性能存在问题的第三方MKL库
mkl_lapacke.h引入的编译问题。已经通过添加『mkl_lapacke.h』头文件，解决。（我们提供的MKL缺少相关库头文件，不缺少函数实现）

* rm models module

任务：PaddlePaddle/PaddleMIX#250 - text-to-audio推理已跑通

PaddlePaddle#366) * Removed trivial copy constructors on parameter classes to enable device-side launch of CUTLASS kernels * Added SFINAE to the `TensorRef(NonConstTensorRef const&)` constructor to avoid making it a copy-constructor for device code * std => platform * fix affine2 * really fix affine2 Co-authored-by: Haicheng Wu <haichengw@nvidia.com>

[IPU] update docs and custom ops demo

wangkuiyi assigned backyes Nov 6, 2016

backyes mentioned this issue Nov 6, 2016

include mkl_lapacke.h #359

Merged

backyes closed this as completed Nov 7, 2016

backyes reopened this Nov 7, 2016

backyes closed this as completed Nov 7, 2016

wangxicoding pushed a commit to wangxicoding/Paddle that referenced this issue Dec 9, 2021

rm models module (PaddlePaddle#366)

fa6b699

* rm models module

AnnaTrainingG pushed a commit to AnnaTrainingG/Paddle that referenced this issue Sep 19, 2022

add stargan pretrain model (PaddlePaddle#366)

f930083

WAYKEN-TSE pushed a commit to WAYKEN-TSE/Paddle that referenced this issue Dec 6, 2024

AudioLDM2模型复现前向推理 (PaddlePaddle#366)

f049e2d

任务：PaddlePaddle/PaddleMIX#250 - text-to-audio推理已跑通

lizexu123 pushed a commit to lizexu123/Paddle that referenced this issue Mar 26, 2025

Merge pull request PaddlePaddle#366 from graphcore/yzx-0804-mv

18e9c48

[IPU] update docs and custom ops demo

tianyuzhou668 pushed a commit to tianyuzhou668/Paddle that referenced this issue May 12, 2025

[Zero-Dim] add median, where 0-dim ut (PaddlePaddle#366)

b411c4d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

93e4d0cce 相对 45c81a414f SgdSparseCpuTraining 慢了很多 #366

93e4d0cce 相对 45c81a414f SgdSparseCpuTraining 慢了很多 #366

linrongyi commented Nov 5, 2016

wangkuiyi commented Nov 6, 2016 •

edited

Loading

backyes commented Nov 6, 2016 •

edited

Loading

emailweixu commented Nov 7, 2016

luotao1 commented Nov 7, 2016

backyes commented Nov 7, 2016

93e4d0cce 相对 45c81a414f SgdSparseCpuTraining 慢了很多 #366

93e4d0cce 相对 45c81a414f SgdSparseCpuTraining 慢了很多 #366

Comments

linrongyi commented Nov 5, 2016

wangkuiyi commented Nov 6, 2016 • edited Loading

backyes commented Nov 6, 2016 • edited Loading

emailweixu commented Nov 7, 2016

luotao1 commented Nov 7, 2016

backyes commented Nov 7, 2016

wangkuiyi commented Nov 6, 2016 •

edited

Loading

backyes commented Nov 6, 2016 •

edited

Loading