Flux ControlNet Training Multi-GPU DeepSpeed Stage-3 doesn't reduce memory compared to Single GPU #10027
Unanswered
enesmsahin
asked this question in
Q&A
Replies: 1 comment
-
This doesn't seem like a |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Describe the bug
I am running a slightly modified version of Flux ControlNet training script in diffusers. The script is attached below. I am using DeepSpeed Stage-3 with the accelerate config below.
When I use only 1 GPU (configured via accelerate config file below), it takes around 42GB during training. When I use all 8 GPUs in a single node, it still takes around 42GB per GPU.
I don't know about the parallelization details of DeepSpeed but I would expect DeepSpeed Stage-3 to shard the model weights further and reduce the memory usage per GPU for 8 GPUs compared to single-GPU case.
PS: I am not sure if this issue is related to the CN training script in
diffusers
oraccelerate
. I have opened the same issue inaccelerate
.Reproduction
Link to the script: https://pastebin.com/SdQZcQR8
Command used to run the script:
Accelerate Config File
Logs
No response
System Info
NVIDIA A100-SXM4-80GB, 81920 MiB
NVIDIA A100-SXM4-80GB, 81920 MiB
NVIDIA A100-SXM4-80GB, 81920 MiB
NVIDIA A100-SXM4-80GB, 81920 MiB
NVIDIA A100-SXM4-80GB, 81920 MiB
NVIDIA A100-SXM4-80GB, 81920 MiB
NVIDIA A100-SXM4-80GB, 81920 MiB
Who can help?
@PromeAIpro @sayakpaul
Beta Was this translation helpful? Give feedback.
All reactions