[PHI] fix paddle.Tensor.logit for big tensor #73046
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR Category
Execute Infrastructure
PR Types
Bug fixes
Description
paddle.Tensor.logit报错信息如下:

从报错中可以分析出在logit反向计算的结果与torch对不上。torch 反向结果一些值为NAN,paddle将其初始化为0。根据https://www.paddlepaddle.org.cn/documentation/docs/zh/3.0-beta/api/paddle/logit_cn.html#logit api的描述中发现,当参数eps被设置为None时>1 或者 < 0 的输入对应的输出被置为NAN。但是经过测试发现,在troch中这些输入的反向过程也置为了NAN而paddle将其置为0。这个问题不仅在big tensor中出现,尝试其他较小的shape时仍然也存在。
修复方法:
修改
logitGradkernel
调用的LogitGradFunctor.operator
方法,当eps没有被设置时,将其填充的值从0
改为NAN
paddle和torch的对比验证:
当eps参数被设置时:

当eps参数没被设置时(eps=None):

前向和反向与torch都能对得上。
paddleAPITest中所有失败的config全部测试通过
#72973 中适用的修复方法与本PR修复方法基本一致,由于#72973 先合入develop分支,因此本PR用于完善单测
pcard-67164