Controlnet Inference example, CUDA OOM

### Describe the bug

When running [inference example](https://github.com/huggingface/diffusers/blob/main/examples%2Fcontrolnet%2FREADME_sd3.md) on a single RTX2080Ti, error CUDA out of memory

### Reproduction

```
# simple_inference.py

from diffusers import StableDiffusion3ControlNetPipeline, SD3ControlNetModel
from diffusers.utils import load_image
import torch

base_model_path = "stabilityai/stable-diffusion-3-medium-diffusers"
controlnet_path = "DavyMorgan/sd3-controlnet-out"

controlnet = SD3ControlNetModel.from_pretrained(controlnet_path, torch_dtype=torch.float16)
pipe = StableDiffusion3ControlNetPipeline.from_pretrained(
    base_model_path, controlnet=controlnet
)
pipe.to("cuda", torch.float16)


control_image = load_image("./conditioning_image_1.png").resize((1024, 1024))
prompt = "pale golden rod circle with old lace background"

# generate image
generator = torch.manual_seed(0)
image = pipe(
    prompt, num_inference_steps=20, generator=generator, control_image=control_image
).images[0]
image.save("./output.png")
```

```
 accelerate launch simple_inference.py
```

### Logs

```shell
accelerate launch simple_inference.py
Loading pipeline components...:  44%|████████████████████████████▍                                   | 4/9 [00:07<00:12,  2.47s/it]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████| 2/2 [00:21<00:00, 10.65s/it]
Loading pipeline components...: 100%|████████████████████████████████████████████████████████████████| 9/9 [00:31<00:00,  3.48s/it]
Traceback (most recent call last):
  File "/home/jroguwski/simple_inference.py", line 12, in <module>
    pipe.to("cuda", torch.float16)
  File "/opt/diffusers/src/diffusers/pipelines/pipeline_utils.py", line 482, in to
    module.to(device, dtype)
  File "/opt/miniconda/envs/control/lib/python3.12/site-packages/transformers/modeling_utils.py", line 3712, in to
    return super().to(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda/envs/control/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1343, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda/envs/control/lib/python3.12/site-packages/torch/nn/modules/module.py", line 903, in _apply
    module._apply(fn)
  File "/opt/miniconda/envs/control/lib/python3.12/site-packages/torch/nn/modules/module.py", line 903, in _apply
    module._apply(fn)
  File "/opt/miniconda/envs/control/lib/python3.12/site-packages/torch/nn/modules/module.py", line 903, in _apply
    module._apply(fn)
  [Previous line repeated 4 more times]
  File "/opt/miniconda/envs/control/lib/python3.12/site-packages/torch/nn/modules/module.py", line 930, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "/opt/miniconda/envs/control/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1329, in convert
    return t.to(
           ^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 80.00 MiB. GPU 0 has a total capacity of 10.57 GiB of which 81.12 MiB is free. Including non-PyTorch memory, this process has 10.49 GiB memory in use. Of the allocated memory 10.22 GiB is allocated by PyTorch, and 119.81 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Traceback (most recent call last):
  File "/opt/miniconda/envs/control/bin/accelerate", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/miniconda/envs/control/lib/python3.12/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/opt/miniconda/envs/control/lib/python3.12/site-packages/accelerate/commands/launch.py", line 1194, in launch_command
    simple_launcher(args)
  File "/opt/miniconda/envs/control/lib/python3.12/site-packages/accelerate/commands/launch.py", line 780, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/miniconda/envs/control/bin/python', 'simple_inference.py']' returned non-zero exit status 1.
```

### System Info

- 🤗 Diffusers version: 0.33.0.dev0
- Platform: Linux-6.8.0-53-generic-x86_64-with-glibc2.39
- Running on Google Colab?: No
- Python version: 3.12.9
- PyTorch version (GPU?): 2.6.0+cu124 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.29.3
- Transformers version: 4.50.3
- Accelerate version: 1.5.2
- PEFT version: not installed
- Bitsandbytes version: not installed
- Safetensors version: 0.5.3
- xFormers version: not installed
- Accelerator: NVIDIA GeForce RTX 2080 Ti, 11264 MiB

### Who can help?

@sayakpaul 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Controlnet Inference example, CUDA OOM #11363

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Controlnet Inference example, CUDA OOM #11363

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions