OP move task from ernie-core to framework #72957

A-nnonymous · 2025-05-27T08:03:06Z

PR Category

User Experience

PR Types

Improvements

Description

OP move task from ernie-core to framework. Locally picked from migration PR 72835, 72875, 72909 by @pesionzhao @feixi21 @zhenghuaijin verified and merged by @A-nnonymous

Ops Ready list：

paddle.incubate.nn.functional.cal_aux_loss (and grad).
paddle.incubate.nn.functional.moe_combine (and grad).
paddle.incubate.nn.functional.expand_modality_expert_id.
paddle.incubate.nn.functional.build_src_rank_and_local_expert_id.
paddle.incubate.nn.functional.int_bincount.
paddle.incubate.nn.functional.fused_rms_norm_ext.
paddle.incubate.nn.functional.moe_gate_dispatch(and grad).
paddle.incubate.nn.functional.moe_gate_dispatch_permute(and grad).
paddle.incubate.nn.functional.moe_gate_dispatch_partial_nosoftmaxtopk(and grad).

pcard-91067

fix-bugs fix-bugs

…xpert_id by Peisen Zhao)' of https://github.com/PaddlePaddle/Paddle into opmove_stage1

A-nnonymous · 2025-06-04T04:51:47Z

/re-run all-failed

SigureMo

相关类型提示可以下一个 PR 再改

SigureMo · 2025-06-04T06:30:15Z

python/paddle/incubate/nn/functional/fused_rms_norm_ext.py

+from paddle.base.layer_helper import LayerHelper
+
+
+def fused_rms_norm_ext(x, scale, epsilon=1e-5, name=None):


这个可以补一下类型提示

好的，收到，我们在下一阶段将改进，这阶段不动它的代码

SigureMo · 2025-06-04T06:31:22Z

python/paddle/incubate/nn/functional/build_src_rank_and_local_expert_id.py

+
+def build_src_rank_and_local_expert_id(
+    expert_num_global_tensor: Tensor,
+    expert_num_global: list,


Suggested change

expert_num_global: list,

expert_num_global: list[Xxx],

这里需要补一下泛型内部类型，是 list[Tensor] 还是 list[int] 之类的

好的，收到

SigureMo · 2025-06-04T06:32:17Z

python/paddle/incubate/nn/functional/int_bincount.py

+from paddle.base.layer_helper import LayerHelper
+
+
+def int_bincount(x, low, high, dtype=None, name=None):


需要补一下类型提示

好的，收到

wanghuancoder

LGTM

XiaoguangHu01

LGTM

sneaxiy

Should polish the code asap.

A-nnonymous · 2025-06-04T08:39:08Z

/re-run all-failed

zyfncg · 2025-06-04T08:45:11Z

paddle/phi/kernels/moe_combine_kernel.h

@@ -0,0 +1,25 @@
+// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.


2023 -> 2025

zyfncg · 2025-06-04T08:45:41Z

paddle/phi/kernels/moe_fuse_bwd_op.h

+#pragma once
+#ifdef PADDLE_WITH_CUDA
+#include "paddle/common/exception.h"
+#include "paddle/phi/kernels/funcs/aligned_vector.h"
+#include "paddle/phi/kernels/moe_kernel_impl.h"
+
+namespace phi {
+
+template <typename T, int64_t vec_size>
+__global__ void gather_with_mask_permute_kernel(
+    const T* dy,                   // [s*k, d]
+    const int* scatter_index,      // [s, k]
+    const float* combine_weights,  // [s, k]
+    T* dx,                         // [s, d]
+    int64_t num_rows,              // s
+    int64_t k,                     // k
+    int64_t dim,                   // d
+    int64_t N,
+    int64_t num_active,  // skip > num_active pos is num_active specified
+    int64_t s_shared_num,
+    int64_t capacity,
+    int64_t world_size,


cuda 的实现放到 gpu目录下

zyfncg · 2025-06-04T08:46:04Z

paddle/phi/kernels/moe_fuse_op.h

zyfncg · 2025-06-04T08:46:49Z

paddle/phi/kernels/moe_gate_dispatch_permute_grad_kernel.h

@@ -0,0 +1,31 @@
+// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.


2023 -> 2025

zyfncg · 2025-06-04T08:47:22Z

paddle/phi/kernels/moe_kernel_impl.h

放到gpu目录

zyfncg · 2025-06-04T08:47:35Z

paddle/phi/kernels/moe_kernel_impl.h

@@ -0,0 +1,649 @@
+// NOLINT
+/* Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.


2022 -> 2025

zyfncg · 2025-06-04T08:49:28Z

test/legacy_test/ernie_utils/moe_all_gather_layer.py

+@time: 2024/09/21 15:11:10
+@Copyright (c) 2024 Baidu.com, Inc. All Rights Reserved
+
+这一行开始写关于本文件的说明与解释


这是什么?

zyfncg · 2025-06-04T08:49:52Z

test/legacy_test/ernie_utils/moe_all_gather_layer.py

+            )
+            weight_lm = prob_lm[batch_idx, expert_id_lm]  # use correct bias
+
+        # num_expert_per_modality == 0 时只执行 group-expert expand，不执行 multimodal-expand


换成英文注释

zyfncg · 2025-06-04T08:50:56Z

test/legacy_test/ernie_utils/moe_layer.py

+        初始化MoE层。
+
+        Args:
+            gate (nn.Layer): 智能门控层，用于选择需要使用的专家。
+            experts (List[nn.Layer]): 需要使用的专家列表。
+            layer_idx (int): 当前MoE层的索引。
+            group (Group): 分布式通信组。默认值为None。
+            recompute (bool): 是否在每个训练迭代中重新计算MoE输出。默认值为False。


换成英文

zyfncg · 2025-06-04T08:51:17Z

test/legacy_test/ernie_utils/moe_layer_uneven.py

+        """
+        对`gate_prob` 进行 softmax 并根据结果选取 topk 路由expert。 最后根据 expert 号对 `x` 进行重排。
+        Args:
+            x: [s, d] 输入的 activateion
+            gate_prob: [s, e]
+        k: int
+            capacity: int #no use
+        Returns:
+            y: [s*k, d] 将所有 `x` 根据其路由的 `expert-id` 升序的排序，融合到 s 维度。
+                    当截断发生时 s 会比输入 s 小。
+            combine_weights: [s, k], float： 每个 token 第 k 选择的 expert 的权重。
+                    当截断发生时 s 会比输入 s 小。
+            scatter_index: [k, s] ： 每个 token 第 k 次选择对应到 `y` 中的位置。
+            expert_offset: [e]： `y`中每个 expert-id 的分割位置。
+            expert_id: [s] `x` 中激活的 expert 号
+        """


zyfncg

相关问题根据结论再提PR修改

A-nnonymous · 2025-06-04T12:25:52Z

/re-run all-failed

pesionzhao and others added 30 commits May 20, 2025 11:58

init

77b7b8a

insert moe_combine

492ac04

init

22e3643

update yaml

bebaf44

update python API

df96e81

Merge branch 'develop' into zps_insert

4c93cdd

delete useless header file

9c7cf25

remove supported by DCU

81efa89

merge from zps_insert

f694787

add expand_modality_expert_id kernel

56ce7d2

reorder the new code and refine OP type

e2e6ac7

add unit test

ddf247c

add cal_aux_loss_op and build_src_rank_and_local_expert_id_op

ad82cc8

moegatedispatch init

8ade3ac

insert moegatedispatch

b8c9636

remove DCU support

e5bfdc9

fix-bugs

81d9fbc

fix-bugs fix-bugs

fix log2 in windows maybe

4c04429

update header file format

fb784e3

fix-bugs

d25d23e

delete op test for pass CI

47f010d

add cmath header

80bd65f

tmp

9e889aa

pass int_bincount

c78a3cb

add moe_dispatch_bwd

19085da

add moe_gate_dispatch

aca0c5d

fix-bugs

3e4e392

fix optional Tensor

0242f9c

Merge commit 'refs/pull/72835/head (moe_combine and expand_modality_e…

08a93f8

…xpert_id by Peisen Zhao)' of https://github.com/PaddlePaddle/Paddle into opmove_stage1

update cal_aux_loss_kernel

9b09cff

pesionzhao added 14 commits May 30, 2025 03:27

check OP type

6a1a318

remove optest for WIN & APPLE

d290b3b

fix bug for (int32_t and int)

31d6465

rename fused_rms_norm op

0b990d6

select op test env not for Volta

2fe0ef3

fix openblas mistake

e8a30df

CMake code format

4fd6186

fix bugs in CPU

ebd2244

CodeStyle format

10f6058

fix bugs in CPU

0637a02

fix bugs in CPU

e4ecf9b

skip some op when CUDA<12.0

54dda45

skip op when CUDA<12.0

0d1b3d0

fix bugs in CPU

8e0817a

phlrain self-requested a review June 4, 2025 06:10

phlrain approved these changes Jun 4, 2025

View reviewed changes

SigureMo approved these changes Jun 4, 2025

View reviewed changes

wanghuancoder approved these changes Jun 4, 2025

View reviewed changes

XiaoguangHu01 approved these changes Jun 4, 2025

View reviewed changes

sneaxiy approved these changes Jun 4, 2025

View reviewed changes

zyfncg reviewed Jun 4, 2025

View reviewed changes

zyfncg approved these changes Jun 4, 2025

View reviewed changes

A-nnonymous closed this Jun 4, 2025

A-nnonymous reopened this Jun 4, 2025

PaddlePaddle locked as off-topic and limited conversation to collaborators Jun 4, 2025

PaddlePaddle unlocked this conversation Jun 4, 2025

phlrain merged commit 308e758 into PaddlePaddle:develop Jun 4, 2025
135 of 183 checks passed

		from paddle.base.layer_helper import LayerHelper


		def fused_rms_norm_ext(x, scale, epsilon=1e-5, name=None):

		from paddle.base.layer_helper import LayerHelper


		def int_bincount(x, low, high, dtype=None, name=None):

		@@ -0,0 +1,25 @@
		// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.

		@@ -0,0 +1,31 @@
		// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.

		@@ -0,0 +1,649 @@
		// NOLINT
		/* Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.

OP move task from ernie-core to framework #72957

OP move task from ernie-core to framework #72957

Conversation

A-nnonymous commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

PR Types

Description

Ops Ready list：

Uh oh!

A-nnonymous commented Jun 4, 2025

Uh oh!

SigureMo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wanghuancoder left a comment

Choose a reason for hiding this comment

Uh oh!

XiaoguangHu01 left a comment

Choose a reason for hiding this comment

Uh oh!

sneaxiy left a comment

Choose a reason for hiding this comment

Uh oh!

A-nnonymous commented Jun 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zyfncg left a comment

Choose a reason for hiding this comment

Uh oh!

A-nnonymous commented Jun 4, 2025

Uh oh!

Uh oh!

Uh oh!

A-nnonymous commented May 27, 2025 •

edited

Loading