-
Notifications
You must be signed in to change notification settings - Fork 5.7k
[BIT] nonzero #72244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BIT] nonzero #72244
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reviewed,辛苦陈同学修改一下啦😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
既然涉及InferMeta的改动,就需要去学习一下符号推到,改一下符号推导
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
无反向修复的理由应该是:nonzero为forward,不需要也无法跑反向~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* nonzero-size * fix * fix * fix * fix * fix * fix
PR Category
Execute Infrastructure
PR Types
New features
Description
nonzero支持0size
1、错误日志:
2025-03-03 18:44:56.987675 test begin: paddle.nonzero(Tensor([0, 2, 28, 28],"float32"), )
[cuda error] paddle.nonzero(Tensor([0, 2, 28, 28],"float32"), )
(External) CUDA error(9), invalid configuration argument.
[Hint: 'cudaErrorInvalidConfiguration'. This indicates that a kernel launch is requesting resources that can never be satisfied by the current device. Requestingmore shared memory per block than the device supports will trigger this error, as will requesting too many threads or blocks.See cudaDeviceProp for more device limitations.] (at ../paddle/fluid/pybind/eager_functions.cc:138
2、前向修复:
a.核心调用的为_C_ops.nonzero函数
b.infermeta添加0-size时的形状推导
c.修改符号推导,适应0-size时的情况
d.xpu、gpu和cpu增加0size的out返回
3、反向修复:
api无反向操作
4、添加单侧:
a.分别添加了is_tuple = True 和 is_tuple = False的验证前向反向测试
b.添加了0size的符号验证测试
5、回归测试:
测试了0size-tensor和merged6两个文件中出现的api 均为pass