PaddlePaddle · zhwesky2010 · Sep 29, 2024 · Sep 22, 2024 · Sep 24, 2024 · Sep 24, 2024
diff --git a/...el_convert/convert_from_pytorch/api_difference/cuda/torch.cuda.Stream__upper.md b/...el_convert/convert_from_pytorch/api_difference/cuda/torch.cuda.Stream__upper.md
@@ -6,10 +6,10 @@
 torch.cuda.Stream(device=None, priority=0, **kwargs)
 ```
 
-### [paddle.device.cuda.Stream](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api/paddle/device/cuda/Stream_cn.html)
+### [paddle.device.Stream](https://www.paddlepaddle.org.cn/documentation/docs/zh/2.6/api/paddle/device/Stream_cn.html#stream)
 
 ```python
-paddle.device.cuda.Stream(device=None, priority=None)
+paddle.device.Stream(device=None, priority=None)
 ```
 
 两者功能一致，参数用法不一致，具体如下：
@@ -34,5 +34,5 @@ y = torch.cuda.Stream(priority=default_priority)
 # Paddle 写法
 high_priority = 1
 default_priority = 2
-y = paddle.device.cuda.Stream(priority=default_priority)
+y = paddle.device.Stream(priority=default_priority)
 ```
diff --git a/...des/model_convert/convert_from_pytorch/api_difference/cuda/torch.cuda.stream.md b/...des/model_convert/convert_from_pytorch/api_difference/cuda/torch.cuda.stream.md
@@ -6,10 +6,10 @@
 torch.cuda.stream(stream)
 ```
 
-### [paddle.device.cuda.stream_guard](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api/paddle/device/cuda/stream_guard_cn.html)
+### [paddle.device.stream_guard](https://www.paddlepaddle.org.cn/documentation/docs/zh/2.6/api/paddle/device/stream_guard_cn.html#stream-guard)
 
 ```python
-paddle.device.cuda.stream_guard(stream)
+paddle.device.stream_guard(stream)
 ```
 
 功能一致，参数完全一致，具体如下：

diff --git a/...nvert_from_pytorch/api_difference/distributions/torch.distributions.Binomial.md b/...nvert_from_pytorch/api_difference/distributions/torch.distributions.Binomial.md
@@ -22,7 +22,7 @@ PyTorch 相比 Paddle 支持更多其他参数，具体如下：
 
 | PyTorch       | PaddlePaddle | 备注                                                         |
 | ------------- | ------ | ------------------------------------------------------------ |
-| total_count        | total_count      | 样本大小。                         |
+| total_count        | total_count      | 样本大小，当 torch 不指定时，Paddle 应设置该值为 1。                         |
 | probs           | probs      | 每次伯努利实验中事件发生的概率。         |
 | logits         | -  | 采样 1 的 log-odds，Paddle 无此参数，暂无转写方式。 |
 | validate_args        | -      | 是否添加验证环节。Paddle 无此参数，一般对训练结果影响不大，可直接删除。 |
diff --git a/docs/guides/model_convert/convert_from_pytorch/api_difference/nn/torch.nn.GRU.md b/docs/guides/model_convert/convert_from_pytorch/api_difference/nn/torch.nn.GRU.md
@@ -7,7 +7,9 @@ torch.nn.GRU(input_size,
              bias=True,
              batch_first=False,
              dropout=0,
-             bidirectional=False)
+             bidirectional=False,
+             device=None,
+             dtype=None)
 ```
 
 ### [paddle.nn.GRU](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api/paddle/nn/GRU_cn.html#gru)
@@ -36,6 +38,8 @@ paddle.nn.GRU(input_size,
 | batch_first   | time_major   | PyTorch 表示 batch size 是否为第一维，PaddlePaddle 表示 time steps 是否为第一维，它们的意义相反。需要转写。  |
 | dropout   | dropout   | 表示 dropout 概率。  |
 | bidirectional | direction    | PyTorch 表示是否进行双向 GRU，Paddle 使用字符串表示是双向 GRU（`bidirectional`）还是单向 GRU（`forward`）。 |
+| device   | -   | 指定 Tensor 的设备，Paddle 无此参数，一般对网络训练结果影响不大，可直接删除。  |
+| dtype   | -   | Tensor 的所需数据类型，Paddle 无此参数，一般对网络训练结果影响不大，可直接删除。 |
 | -             |weight_ih_attr| weight_ih 的参数， PyTorch 无此参数， Paddle 保持默认即可。  |
 | -             |weight_hh_attr| weight_hh 的参数，  PyTorch 无此参数， Paddle 保持默认即可。  |
 

diff --git a/docs/guides/model_convert/convert_from_pytorch/api_difference/nn/torch.nn.LSTM.md b/docs/guides/model_convert/convert_from_pytorch/api_difference/nn/torch.nn.LSTM.md
@@ -9,7 +9,9 @@ torch.nn.LSTM(input_size,
               batch_first=False,
               dropout=0,
               bidirectional=False,
-              proj_size=0)
+              proj_size=0,
+              device=None,
+              dtype=None)
 ```
 
 ### [paddle.nn.LSTM](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api/paddle/nn/LSTM_cn.html#lstm)
@@ -42,6 +44,8 @@ paddle.nn.LSTM(input_size,
 | dropout   | dropout   | 表示 dropout 概率。  |
 | bidirectional | direction    | PyTorch 表示是否进行双向，Paddle 使用字符串表示是双向 LSTM（`bidirectional`）还是单向 LSTM（`forward`）|
 | proj_size     | proj_size            | 表示 LSTM 后将 `hidden state` 映射到对应的大小。 |
+| device   | -   | 指定 Tensor 的设备，Paddle 无此参数，一般对网络训练结果影响不大，可直接删除。  |
+| dtype   | -   | Tensor 的所需数据类型，Paddle 无此参数，一般对网络训练结果影响不大，可直接删除。 |
 | -             |weight_ih_attr| weight_ih 的参数，PyTorch 无此参数，Paddle 保持默认即可。  |
 | -             |weight_hh_attr| weight_hh 的参数，PyTorch 无此参数，Paddle 保持默认即可。  |
 

diff --git a/...convert/convert_from_pytorch/api_difference/nn/torch.nn.Module.named_buffers.md b/...convert/convert_from_pytorch/api_difference/nn/torch.nn.Module.named_buffers.md
@@ -1,4 +1,4 @@
-## [ 仅参数名不一致 ]torch.nn.Module.named_buffers
+## [仅参数名不一致]torch.nn.Module.named_buffers
 
 ### [torch.nn.Module.named_buffers](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.named_buffers)
 
@@ -20,4 +20,4 @@ paddle.nn.Layer.named_buffers(prefix='', include_sublayers=True, remove_duplicat
 | -------------- | ------------ | ------------------------------------------------------------- |
 | prefix         | prefix       | 在所有参数名称前加的前缀。                                            |
 | recurse        | include_sublayers     | 生成该模块和所有子模块的缓冲区，仅参数名不一致。                               |
-| remove_duplicate   | remove_duplicate  | 是否删除结果中重复的模块实例。                                        |
+| remove_duplicate   | remove_duplicate  | 是否删除结果中重复的模块实例 |
diff --git a/...convert/convert_from_pytorch/api_difference/nn/torch.nn.Module.named_modules.md b/...convert/convert_from_pytorch/api_difference/nn/torch.nn.Module.named_modules.md
@@ -21,4 +21,4 @@ Paddle 相比 PyTorch 支持更多其他参数，具体如下：
 | memo          | layers_set   | 用来记录已经加入结果的子层的集合，仅参数名不一致。                               |
 | prefix   | prefix  | 在所有参数名称前加的前缀。                                            |
 | remove_duplicate   | remove_duplicate  | 是否删除结果中重复的模块实例。                                            |
-| -         | include_self      | 是否包含该层自身，PyTorch 无此参数，Paddle 保持默认即可。                                                |
+| -         | include_self      | 是否包含该层自身，PyTorch 无此参数，Paddle 需设为 True 才与 Pytorch 一致。           |
diff --git a/...vert/convert_from_pytorch/api_difference/nn/torch.nn.Module.named_parameters.md b/...vert/convert_from_pytorch/api_difference/nn/torch.nn.Module.named_parameters.md
@@ -1,4 +1,4 @@
-## [ 仅参数名不一致 ]torch.nn.Module.named_parameters
+## [仅参数名不一致]torch.nn.Module.named_parameters
 
 ### [torch.nn.Module.named_parameters](https://pytorch.org/docs/stable/generated/torch.nn.Module.html?highlight=torch+nn+module+named_parameters#torch.nn.Module.named_parameters)
 
@@ -20,4 +20,4 @@ paddle.nn.Layer.named_parameters(prefix='', include_sublayers=True, remove_dupli
 | -------------- | ------------ | ------------------------------------------------------------- |
 | prefix   | prefix  | 在所有参数名称前加的前缀。                                            |
 | recurse   | include_sublayers  | 生成该模块和所有子模块的参数, 仅参数名不一致。                                            |
-| remove_duplicate   | remove_duplicate  | 是否删除结果中的重复参数。                                        |
+| remove_duplicate   | remove_duplicate  | 是否删除结果中的重复参数。|
diff --git a/docs/guides/model_convert/convert_from_pytorch/api_difference/nn/torch.nn.RNN.md b/docs/guides/model_convert/convert_from_pytorch/api_difference/nn/torch.nn.RNN.md
@@ -8,7 +8,9 @@ torch.nn.RNN(input_size,
              bias=True,
              batch_first=False,
              dropout=0,
-             bidirectional=False)
+             bidirectional=False，
+             device=None,
+             dtype=None)
 ```
 
 ### [paddle.nn.SimpleRNN](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api/paddle/nn/SimpleRNN_cn.html#simplernn)
@@ -29,6 +31,8 @@ paddle.nn.SimpleRNN(input_size, hidden_size, num_layers=1, activation='tanh', di
 | batch_first   | time_major   | PyTorch 表示 batch size 是否为第一维，PaddlePaddle 表示 time steps 是否为第一维，它们的意义相反。需要转写。  |
 | dropout   | dropout   | 表示 dropout 概率。  |
 | bidirectional | direction    | PyTorch 表示是否进行双向 RNN，Paddle 使用字符串表示是双向 RNN（`bidirectional`）还是单向 RNN（`forward`）。 |
+| device   | -   | 指定 Tensor 的设备，Paddle 无此参数，一般对网络训练结果影响不大，可直接删除。  |
+| dtype   | -   | Tensor 的所需数据类型，Paddle 无此参数，一般对网络训练结果影响不大，可直接删除。 |
 | -             |weight_ih_attr| weight_ih 的参数， PyTorch 无此参数， Paddle 保持默认即可。  |
 | -             |weight_hh_attr| weight_hh 的参数，  PyTorch 无此参数， Paddle 保持默认即可。  |
 

diff --git a/...es/model_convert/convert_from_pytorch/api_difference/nn/torch.nn.Transformer.md b/...es/model_convert/convert_from_pytorch/api_difference/nn/torch.nn.Transformer.md
@@ -3,7 +3,7 @@
 ### [torch.nn.Transformer](https://pytorch.org/docs/stable/generated/torch.nn.Transformer.html#torch.nn.Transformer)
 
 ```python
-torch.nn.Transformer(d_model=512, nhead=8, num_encoder_layers=6, num_decoder_layers=6, dim_feedforward=2048, dropout=0.1, activation=<function relu>, custom_encoder=None, custom_decoder=None, layer_norm_eps=1e-05, batch_first=False, norm_first=False, device=None, dtype=None)
+torch.nn.Transformer(d_model=512, nhead=8, num_encoder_layers=6, num_decoder_layers=6, dim_feedforward=2048, dropout=0.1, activation=<function relu>, custom_encoder=None, custom_decoder=None, layer_norm_eps=1e-05, batch_first=False, norm_first=False, bias=True, device=None, dtype=None)
 ```
 
 ### [paddle.nn.Transformer](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api/paddle/nn/Transformer_cn.html)
@@ -30,12 +30,13 @@ PyTorch 相比 Paddle 支持更多其他参数，具体如下：
 | layer_norm_eps     | -                  | 层 normalization 组件的 eps 值，Paddle 无此参数，暂无转写方式。                     |
 | batch_first        | -                  | 表示输入数据的第 0 维是否代表 batch_size，Paddle 无此参数，暂无转写方式。           |
 | norm_first         | normalize_before   | 是否 LayerNorms 操作在 attention 和 feedforward 前，仅参数名不一致。                |
+| bias                 | bias_attr          | 指定偏置参数属性的对象，仅参数名不一致。                     |
 | device             | -                  | Tensor 的设备，Paddle 无此参数，需要转写。                                      |
 | dtype              | -                  | Tensor 的数据类型，Paddle 无此参数，需要转写。                                  |
 | -                  | attn_dropout       | 多头自注意力机制中对注意力目标的随机失活率，PyTorch 无此参数，Paddle 保持默认即可。 |
 | -                  | act_dropout        | 前馈神经网络的激活函数后的 dropout，PyTorch 无此参数，Paddle 保持默认即可。         |
 | -                  | weight_attr        | 指定权重参数属性的对象，PyTorch 无此参数，Paddle 保持默认即可。                     |
-| -                  | bias_attr          | 指定偏置参数属性的对象，PyTorch 无此参数，Paddle 保持默认即可。                     |
+
 
 ### 转写示例
 

diff --git a/...vert/convert_from_pytorch/api_difference/nn/torch.nn.TransformerDecoderLayer.md b/...vert/convert_from_pytorch/api_difference/nn/torch.nn.TransformerDecoderLayer.md
@@ -10,6 +10,7 @@ torch.nn.TransformerDecoderLayer(d_model,
                                  layer_norm_eps=1e-05,
                                  batch_first=False,
                                  norm_first=False,
+                                 bias=True,
                                  device=None,
                                  dtype=None)
 ```
@@ -42,7 +43,7 @@ PyTorch 相比 Paddle 支持更多其他参数，具体如下：
 | layer_norm_eps | layer_norm_eps       | layer normalization 层的 eps 值。  |
 | batch_first     | -      | 输入和输出 tensor 的 shape，Paddle 无此参数，暂无转写方式  |
 | norm_first             | normalize_before  | 设置对每个子层的输入输出的处理。如果为 True，则对每个子层的输入进行层标准化（Layer Normalization），对每个子层的输出进行 dropout 和残差连接（residual connection）。否则（即为 False），则对每个子层的输入不进行处理，只对每个子层的输出进行 dropout、残差连接（residual connection）和层标准化（Layer Normalization）。默认值：False。  仅参数名不一致|
+| bias                 | bias_attr          | 指定偏置参数属性的对象，仅参数名不一致。                     |
 | device        | -            | 设备类型，Paddle 无此参数，一般对网络训练结果影响不大，可直接删除。        |
 | dtype         | -            | 参数类型，Paddle 无此参数，一般对网络训练结果影响不大，可直接删除。        |
 | -             | weight_attr  | 指定权重参数的属性，PyTorch 无此参数，Paddle 保持默认即可。 |
-| -             | bias_attr    | 指定偏置参数的属性, PyTorch 无此参数，Paddle 保持默认即可。 |
diff --git a/...vert/convert_from_pytorch/api_difference/nn/torch.nn.TransformerEncoderLayer.md b/...vert/convert_from_pytorch/api_difference/nn/torch.nn.TransformerEncoderLayer.md
@@ -3,7 +3,7 @@
 ### [torch.nn.TransformerEncoderLayer](https://pytorch.org/docs/stable/generated/torch.nn.TransformerEncoderLayer.html#torch.nn.TransformerEncoderLayer)
 
 ```python
-torch.nn.TransformerEncoderLayer(d_model, nhead, dim_feedforward=2048, dropout=0.1, activation=<function relu>, layer_norm_eps=1e-05, batch_first=False, norm_first=False, device=None, dtype=None)
+torch.nn.TransformerEncoderLayer(d_model, nhead, dim_feedforward=2048, dropout=0.1, activation=<function relu>, layer_norm_eps=1e-05, batch_first=False, norm_first=False, bias=True, device=None, dtype=None)
 ```
 
 ### [paddle.nn.TransformerEncoderLayer](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api/paddle/nn/TransformerEncoderLayer_cn.html)
@@ -26,12 +26,12 @@ PyTorch 相比 Paddle 支持更多其他参数，具体如下：
 | layer_norm_eps  | layer_norm_eps   | 层 normalization 组件的 eps 值。                                                  |
 | batch_first     | -                | 表示输入数据的第 0 维是否代表 batch_size，Paddle 无此参数，暂无转写方式。           |
 | norm_first      | normalize_before | 是否 LayerNorms 操作在 attention 和 feedforward 前，仅参数名不一致。                |
+| bias                 | bias_attr          | 指定偏置参数属性的对象，仅参数名不一致。                     |
 | device          | -                | Tensor 的设备，Paddle 无此参数，需要转写。                                      |
 | dtype           | -                | Tensor 的数据类型，Paddle 无此参数，需要转写。                                  |
 | -               | attn_dropout     | 多头自注意力机制中对注意力目标的随机失活率，PyTorch 无此参数，Paddle 保持默认即可。 |
 | -               | act_dropout      | 前馈神经网络的激活函数后的 dropout，PyTorch 无此参数，Paddle 保持默认即可。         |
 | -               | weight_attr      | 指定权重参数属性的对象，PyTorch 无此参数，Paddle 保持默认即可。                     |
-| -               | bias_attr        | 指定偏置参数属性的对象，PyTorch 无此参数，Paddle 保持默认即可。                     |
 
 ### 转写示例
 

diff --git a/...hird_party/fairscale/fairscale.nn.model_parallel.layers.ColumnParallelLinear.md b/...hird_party/fairscale/fairscale.nn.model_parallel.layers.ColumnParallelLinear.md
@@ -5,10 +5,10 @@
 ```python
 fairscale.nn.model_parallel.layers.ColumnParallelLinear(in_features: int, out_features: int, bias: bool = True, gather_output: bool = True, init_method: Callable[[torch.Tensor], torch.Tensor] = init.xavier_normal_, stride: int = 1, keep_master_weight_for_test: bool = False)
 ```
-### [paddle.distributed.meta_parallel.parallel_layers.mp_layers.ColumnParallelLinear](https://github.com/PaddlePaddle/Paddle/blob/016766cc89fabc10181453ce70b701dd8ed019f6/python/paddle/distributed/fleet/layers/mpu/mp_layers.py#L153)
+### [paddle.distributed.fleet.meta_parallel.ColumnParallelLinear](https://github.com/PaddlePaddle/Paddle/blob/016766cc89fabc10181453ce70b701dd8ed019f6/python/paddle/distributed/fleet/layers/mpu/mp_layers.py#L153)
 
 ```python
-paddle.distributed.meta_parallel.parallel_layers.mp_layers.ColumnParallelLinear(in_features, out_features, weight_attr=None, has_bias=None, gather_output=True, fuse_matmul_bias=False, mp_group=None, name=None)
+paddle.distributed.fleet.meta_parallel.ColumnParallelLinear(in_features, out_features, weight_attr=None, has_bias=None, gather_output=True, fuse_matmul_bias=False, mp_group=None, name=None)
 ```
 
 PyTorch 相比 Paddle 支持更多其他参数，具体如下：

diff --git a/...e_third_party/fairscale/fairscale.nn.model_parallel.layers.ParallelEmbedding.md b/...e_third_party/fairscale/fairscale.nn.model_parallel.layers.ParallelEmbedding.md
@@ -5,10 +5,10 @@
 ```python
 fairscale.nn.model_parallel.layers.ParallelEmbedding(num_embeddings: int, embedding_dim: int ,padding_idx: Optional[int] = None, max_norm: Optional[float] = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, sparse: bool = False, init_method: Callable[[torch.Tensor], torch.Tensor] = init.xavier_normal_, keep_master_weight_for_test: bool = False)
 ```
-### [paddle.distributed.meta_parallel.parallel_layers.mp_layers.VocabParallelEmbedding](https://github.com/PaddlePaddle/Paddle/blob/016766cc89fabc10181453ce70b701dd8ed019f6/python/paddle/distributed/fleet/layers/mpu/mp_layers.py#L37)
+### [paddle.distributed.fleet.meta_parallel.VocabParallelEmbedding](https://github.com/PaddlePaddle/Paddle/blob/016766cc89fabc10181453ce70b701dd8ed019f6/python/paddle/distributed/fleet/layers/mpu/mp_layers.py#L37)
 
 ```python
-paddle.distributed.meta_parallel.parallel_layers.mp_layers.VocabParallelEmbedding(num_embeddings, embedding_dim, weight_attr=None, mp_group=None, name=None)
+paddle.distributed.fleet.meta_parallel.VocabParallelEmbedding(num_embeddings, embedding_dim, weight_attr=None, mp_group=None, name=None)
 ```
 
 两者功能大体一致，但内部实现细节不一样，ParallelEmbedding 的切分方向沿着 embedding 方向，VocabParallelEmbedding 的切分方向沿着 vocab(词汇表)方向，故在多卡训练时，load 参数时需手动修改以匹配参数切分方式的不同。
@@ -28,3 +28,4 @@ paddle.distributed.meta_parallel.parallel_layers.mp_layers.VocabParallelEmbeddin
 | keep_master_weight_for_test  | -              | 返回主参数用于测试，Paddle 无此参数，一般对网络训练结果影响不大，可直接删除。 |
 | -                            | mp_group       | 模型并行组，PyTorch 无此参数，Paddle 保持默认即可。 |
 | -                            | name           | 网络层名称，PyTorch 无此参数，Paddle 保持默认即可。 |
+| -                            | weight_attr           | 指定权重参数属性，PyTorch 无此参数，Paddle 设置为paddle.nn.initializer.Constant(0)。 |
diff --git a/...e_third_party/fairscale/fairscale.nn.model_parallel.layers.RowParallelLinear.md b/...e_third_party/fairscale/fairscale.nn.model_parallel.layers.RowParallelLinear.md
@@ -6,10 +6,10 @@
 fairscale.nn.model_parallel.layers.RowParallelLinear(in_features: int, out_features: int, bias: bool = True, input_is_parallel: bool = False, init_method: Callable[[torch.Tensor], torch.Tensor] = init.xavier_normal_, stride: int = 1, keep_master_weight_for_test: bool = False)
 ```
 
-### [paddle.distributed.meta_parallel.parallel_layers.mp_layers.RowParallelLinear](https://github.com/PaddlePaddle/Paddle/blob/016766cc89fabc10181453ce70b701dd8ed019f6/python/paddle/distributed/fleet/layers/mpu/mp_layers.py#L291)
+### [paddle.distributed.fleet.meta_parallel.RowParallelLinear](https://github.com/PaddlePaddle/Paddle/blob/016766cc89fabc10181453ce70b701dd8ed019f6/python/paddle/distributed/fleet/layers/mpu/mp_layers.py#L291)
 
 ```python
-paddle.distributed.meta_parallel.parallel_layers.mp_layers.RowParallelLinear(in_features, out_features, weight_attr=None, has_bias=True, input_is_parallel=False, fuse_matmul_bias=False, mp_group=None, name=None)
+paddle.distributed.fleet.meta_parallel.RowParallelLinear(in_features, out_features, weight_attr=None, has_bias=True, input_is_parallel=False, fuse_matmul_bias=False, mp_group=None, name=None)
 ```
 
 PyTorch 相比 Paddle 支持更多其他参数，具体如下：