Skip to content

[Auto Parallel] Add spmd rule No.2-3、8 for (log_softmax,cummax,cummin) and their backward ops #72720

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
May 22, 2025

Conversation

ooooo-create
Copy link
Contributor

@ooooo-create ooooo-create commented May 14, 2025

PR Category

Auto Parallel

PR Types

New features

Description

为如下算子添加了切分推导规则:

  1. log_softmax 和 log_softmax_grad,复用 softmax 推导规则,之前已在 rules.cc 注册,pr 只配置了 yaml,没有添加单测,
  2. 修复之前 topk_grad 的切分推导规则
  3. cummax 和 cummax_grad,复用 topk 的推导逻辑,同样将 axis 强制转成 Replicated
  4. cummin 和 cummin_grad,与 cummax 相同的处理

Copy link

paddle-bot bot commented May 14, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label May 14, 2025
@ooooo-create
Copy link
Contributor Author

/re-run all-failed

@ooooo-create
Copy link
Contributor Author

@Yeenyeong This pr is ready to be reviewed, Thanks!

@@ -86,6 +86,10 @@ std::unordered_map<std::string, int64_t> ShardingMergeForTensors(
for (auto& pair : tensor_axes_to_dim_pairs) {
for (size_t i = 0; i < pair.second.size(); ++i) {
auto tensor_axis = pair.first.substr(i, 1);
// Cooperate with GetDimsMappingForAxes to deal "1" for replicated.
if (tensor_axis == "1") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What scenario does there lines of code work in? Is there any bug that has happened?

Copy link
Contributor Author

@ooooo-create ooooo-create May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetDimsMappingForAxes will auto treat "1" to replicated,and the input axis_to_dim_map always come from ShardingMergeForTensors's output.I thought a statution, when x_axes="a1b" and dim_mapping={0,1,-1}, y_axes="a1b" dim_mapping={0,-1,1}, now will out {a:0,1:1,b:-1}, the b map to -1, and i think the b is also can map to 1(because when we set "1", it should be -1). I'm not sure if my modification is necessary, it might not be encountered in actual scenarios. Do I need to change it back

Copy link
Contributor

@Yeenyeong Yeenyeong May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the situation you assumed, the output of merging should be {a:0, 1:-1, b:1}, which comes from the rules that the ShardingMergeForAxis function follows.

In addition, an axis named "1" should always be in Replicate mode. So when the dims_mapping["1"] != -1 as assumed in your situation, something wrong may happen. So we'd better not to skip checking an axis named "1".

At the very least, this modification may affect all other spmd-rule-inference-functions calling the function and we'd better not to modify the code unless we have to do that.

Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for such a clear and logical answer!

@ooooo-create ooooo-create requested a review from Yeenyeong May 21, 2025 02:32
@@ -95,24 +91,29 @@ SpmdInfo TopkGradInferSpmd(const DistMetaTensor& x,
axis));
// Build einsum notation
std::string alphabet = "abcdefghijlopqrstuvwxyz";
std::string x_axes = alphabet.substr(0, x_ndim - 1);
std::string x_axes = alphabet.substr(0, x_ndim);
x_axes[axis] = '1';
Copy link
Contributor

@jeff41404 jeff41404 May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this line of code? can we delete L95

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Has been deleted. becaues set all dim_mapping[axis]=-1 after, it's redundant

Comment on lines 757 to 759
PD_REGISTER_SPMD_RULE(cummax,
PD_INFER_SPMD(phi::distributed::TopkInferSpmdBase),
PD_INFER_SPMD(phi::distributed::TopkGradInferSpmdBase));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if CummaxInferSpmd actually calls TopkInferSpmdBase, it is best to write CummaxInferSpmd instead of TopkInferSpmdBase when registering. and the parameters of CummaxInferSpmd and TopkInferSpmdBase are different. same issue with CummaxGradInferSpmd and TopkGradInferSpmdBase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I adapt the InferSpmdContext::AttrAt in inferspmd_utils.cc and parse_attr in auto_parallel_py.cc to add a new attr support DataType, and use CummaxInferSpmd instead of TopkInferSpmdBase

Comment on lines 761 to 763
PD_REGISTER_SPMD_RULE(cummin,
PD_INFER_SPMD(phi::distributed::TopkInferSpmdBase),
PD_INFER_SPMD(phi::distributed::TopkGradInferSpmdBase));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same issue as above

@ooooo-create ooooo-create requested a review from jeff41404 May 21, 2025 10:08
Copy link
Contributor

@jeff41404 jeff41404 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luotao1 luotao1 merged commit 77bf836 into PaddlePaddle:develop May 22, 2025
53 of 55 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers HappyOpenSource Pro 进阶版快乐开源活动,更具挑战性的任务
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants