Skip to content

Controlnet Inference example, CUDA OOM #11363

@roguxivlo

Description

@roguxivlo

Describe the bug

When running inference example on a single RTX2080Ti, error CUDA out of memory

Reproduction

# simple_inference.py

from diffusers import StableDiffusion3ControlNetPipeline, SD3ControlNetModel
from diffusers.utils import load_image
import torch

base_model_path = "stabilityai/stable-diffusion-3-medium-diffusers"
controlnet_path = "DavyMorgan/sd3-controlnet-out"

controlnet = SD3ControlNetModel.from_pretrained(controlnet_path, torch_dtype=torch.float16)
pipe = StableDiffusion3ControlNetPipeline.from_pretrained(
    base_model_path, controlnet=controlnet
)
pipe.to("cuda", torch.float16)


control_image = load_image("./conditioning_image_1.png").resize((1024, 1024))
prompt = "pale golden rod circle with old lace background"

# generate image
generator = torch.manual_seed(0)
image = pipe(
    prompt, num_inference_steps=20, generator=generator, control_image=control_image
).images[0]
image.save("./output.png")
 accelerate launch simple_inference.py

Logs

accelerate launch simple_inference.py
Loading pipeline components...:  44%|████████████████████████████▍                                   | 4/9 [00:07<00:12,  2.47s/it]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████| 2/2 [00:21<00:00, 10.65s/it]
Loading pipeline components...: 100%|████████████████████████████████████████████████████████████████| 9/9 [00:31<00:00,  3.48s/it]
Traceback (most recent call last):
  File "/home/jroguwski/simple_inference.py", line 12, in <module>
    pipe.to("cuda", torch.float16)
  File "/opt/diffusers/src/diffusers/pipelines/pipeline_utils.py", line 482, in to
    module.to(device, dtype)
  File "/opt/miniconda/envs/control/lib/python3.12/site-packages/transformers/modeling_utils.py", line 3712, in to
    return super().to(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda/envs/control/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1343, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda/envs/control/lib/python3.12/site-packages/torch/nn/modules/module.py", line 903, in _apply
    module._apply(fn)
  File "/opt/miniconda/envs/control/lib/python3.12/site-packages/torch/nn/modules/module.py", line 903, in _apply
    module._apply(fn)
  File "/opt/miniconda/envs/control/lib/python3.12/site-packages/torch/nn/modules/module.py", line 903, in _apply
    module._apply(fn)
  [Previous line repeated 4 more times]
  File "/opt/miniconda/envs/control/lib/python3.12/site-packages/torch/nn/modules/module.py", line 930, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "/opt/miniconda/envs/control/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1329, in convert
    return t.to(
           ^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 80.00 MiB. GPU 0 has a total capacity of 10.57 GiB of which 81.12 MiB is free. Including non-PyTorch memory, this process has 10.49 GiB memory in use. Of the allocated memory 10.22 GiB is allocated by PyTorch, and 119.81 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Traceback (most recent call last):
  File "/opt/miniconda/envs/control/bin/accelerate", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/miniconda/envs/control/lib/python3.12/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/opt/miniconda/envs/control/lib/python3.12/site-packages/accelerate/commands/launch.py", line 1194, in launch_command
    simple_launcher(args)
  File "/opt/miniconda/envs/control/lib/python3.12/site-packages/accelerate/commands/launch.py", line 780, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/miniconda/envs/control/bin/python', 'simple_inference.py']' returned non-zero exit status 1.

System Info

  • 🤗 Diffusers version: 0.33.0.dev0
  • Platform: Linux-6.8.0-53-generic-x86_64-with-glibc2.39
  • Running on Google Colab?: No
  • Python version: 3.12.9
  • PyTorch version (GPU?): 2.6.0+cu124 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Huggingface_hub version: 0.29.3
  • Transformers version: 4.50.3
  • Accelerate version: 1.5.2
  • PEFT version: not installed
  • Bitsandbytes version: not installed
  • Safetensors version: 0.5.3
  • xFormers version: not installed
  • Accelerator: NVIDIA GeForce RTX 2080 Ti, 11264 MiB

Who can help?

@sayakpaul

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleIssues that haven't received updates

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions