-
Notifications
You must be signed in to change notification settings - Fork 5.7k
[Auto Parallel] Add spmd rule No.2-3、8 for (log_softmax,cummax,cummin
) and their backward ops
#72720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Auto Parallel] Add spmd rule No.2-3、8 for (log_softmax,cummax,cummin
) and their backward ops
#72720
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
/re-run all-failed |
… it can easily use to have more chance to reshared
… add_spmd_log_softmax
@Yeenyeong This pr is ready to be reviewed, Thanks! |
@@ -86,6 +86,10 @@ std::unordered_map<std::string, int64_t> ShardingMergeForTensors( | |||
for (auto& pair : tensor_axes_to_dim_pairs) { | |||
for (size_t i = 0; i < pair.second.size(); ++i) { | |||
auto tensor_axis = pair.first.substr(i, 1); | |||
// Cooperate with GetDimsMappingForAxes to deal "1" for replicated. | |||
if (tensor_axis == "1") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What scenario does there lines of code work in? Is there any bug that has happened?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetDimsMappingForAxes will auto treat "1" to replicated,and the input axis_to_dim_map
always come from ShardingMergeForTensors's output.I thought a statution, when x_axes="a1b" and dim_mapping={0,1,-1}, y_axes="a1b" dim_mapping={0,-1,1}, now will out {a:0,1:1,b:-1}
, the b map to -1, and i think the b is also can map to 1(because when we set "1", it should be -1). I'm not sure if my modification is necessary, it might not be encountered in actual scenarios. Do I need to change it back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the situation you assumed, the output of merging should be {a:0, 1:-1, b:1}, which comes from the rules that the ShardingMergeForAxis function follows.
In addition, an axis named "1" should always be in Replicate mode. So when the dims_mapping["1"] != -1 as assumed in your situation, something wrong may happen. So we'd better not to skip checking an axis named "1".
At the very least, this modification may affect all other spmd-rule-inference-functions calling the function and we'd better not to modify the code unless we have to do that.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for such a clear and logical answer!
@@ -95,24 +91,29 @@ SpmdInfo TopkGradInferSpmd(const DistMetaTensor& x, | |||
axis)); | |||
// Build einsum notation | |||
std::string alphabet = "abcdefghijlopqrstuvwxyz"; | |||
std::string x_axes = alphabet.substr(0, x_ndim - 1); | |||
std::string x_axes = alphabet.substr(0, x_ndim); | |||
x_axes[axis] = '1'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this line of code? can we delete L95
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Has been deleted. becaues set all dim_mapping[axis]=-1 after, it's redundant
PD_REGISTER_SPMD_RULE(cummax, | ||
PD_INFER_SPMD(phi::distributed::TopkInferSpmdBase), | ||
PD_INFER_SPMD(phi::distributed::TopkGradInferSpmdBase)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if CummaxInferSpmd
actually calls TopkInferSpmdBase
, it is best to write CummaxInferSpmd
instead of TopkInferSpmdBase
when registering. and the parameters of CummaxInferSpmd
and TopkInferSpmdBase
are different. same issue with CummaxGradInferSpmd
and TopkGradInferSpmdBase
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I adapt the InferSpmdContext::AttrAt in inferspmd_utils.cc
and parse_attr in auto_parallel_py.cc
to add a new attr support DataType
, and use CummaxInferSpmd
instead of TopkInferSpmdBase
PD_REGISTER_SPMD_RULE(cummin, | ||
PD_INFER_SPMD(phi::distributed::TopkInferSpmdBase), | ||
PD_INFER_SPMD(phi::distributed::TopkGradInferSpmdBase)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same issue as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR Category
Auto Parallel
PR Types
New features
Description
为如下算子添加了切分推导规则: