Skip to content

[Auto Parallel] add main_grad for sharding in auto dy #72493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: develop
Choose a base branch
from

Conversation

Xing-lil
Copy link
Contributor

@Xing-lil Xing-lil commented Apr 25, 2025

PR Category

Auto Parallel

PR Types

New features

Description

  • Add inplace param.main_grad replaces old master_grad in auto dy.
  • Enable by setting export FLAGS_enable_tensor_fusion =1.

sharding tensor_fusion PR #72508

Pcard-70448

Copy link

paddle-bot bot commented Apr 25, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@Xing-lil Xing-lil force-pushed the FLAGS_enable_inplace_master_grad branch from 900f276 to 388364d Compare May 12, 2025 11:20
@Xing-lil Xing-lil changed the title [Auto Parallel] add Flags_enable_inplace_master_grad [Auto Parallel] add main_grad for sharding in auto dy May 13, 2025
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 97.43590% with 1 line in your changes missing coverage. Please review.

Please upload report for BASE (develop@441816a). Learn more about missing BASE report.

Files with missing lines Patch % Lines
python/paddle/amp/auto_cast.py 96.55% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop   #72493   +/-   ##
==========================================
  Coverage           ?   97.43%           
==========================================
  Files              ?        3           
  Lines              ?       39           
  Branches           ?        0           
==========================================
  Hits               ?       38           
  Misses             ?        1           
  Partials           ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@liym27 liym27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Why use main_grad instead of master_grad here?

if param._grad_ivar() is not None:
grad_var = param._grad_ivar()
params_grads.append((param, grad_var))
if os.getenv("FLAGS_enable_tensor_fusion") == '1':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Flag 需要考虑 true True的情况
  2. 不建议在 auto_cast.py api.py optimizer.py 文件中,都用 FLAGS_enable_tensor_fusion 判断,可以改成配置。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,将在后续PR上一起修改

@Xing-lil
Copy link
Contributor Author

LGTM.

Why use main_grad instead of master_grad here?

tensor_fusion fuse grad into a contiguous fuse_grad, requiring inplace grad to avoid concat in each step.
differences between main_grad and master_grad:

  1. main_grad uses inplace cast, whereas master_grad does not.
  2. main_grad cast after each grad node, while master_grad after entire backward.
  3. main_grad access by param.main_grad, master_grad access by grad access.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants