[BIT] Fix fused_linear, fused_multi_head_attention doc #7336
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
paddle.incubate.nn.functional.fused_linear
缺少 trans_x 的参数描述,为其补充。
paddle.incubate.nn.functional.fused_multi_head_attention
原来文档的有一行是这样的
看上去多打了
qkv)
,并且原来的描述里缺少了 qkv 的矩阵乘法,发现源码文档的伪码描述更好,但是源码文档里算完 attention score 后的线性层相关行为描述为和 kernel 实现不一致,在 paddle/phi/kernels/fusion/gpu/fused_attention_kernel.cu 中为了采用融合算子 fused_dropout_layernorm_helper.ResidualDropoutBias 和 fused_dropout_layernorm_helper.LayernormResidualDropoutBias,linear 是无偏的,将偏置放到了融合算子的参数里,所以改为
更符合算子的实现。