Minimum requirements for inference #31

ShaochengShen · 2024-11-27T08:18:31Z

Hi！
Now I'm just trying to use the inference code to upscale some examples, and I use A40 48G GPU. However, it still reminds me OOM.
So Please tell me the minimum requirements for inference, or if you can, please tell me some methods to reduce GPU memory.
Thanks a lot!

C00reNUT · 2024-11-28T20:47:09Z

Hi！ Now I'm just trying to use the inference code to upscale some examples, and I use A40 48G GPU. However, it still reminds me OOM. So Please tell me the minimum requirements for inference, or if you can, please tell me some methods to reduce GPU memory. Thanks a lot!

And I was hoping that my 4090 will manage it :) thank you for providing the reference, but it's strange seeing how many starts this repo has one would think it is usable in real life...

codecowboy · 2024-12-07T14:25:44Z

@sczhou would be grateful if you could advise on the minimum VRAM requirements. I've tried it on a RTX A6000 and also get OOM error.

Loading Upscale-A-Video
[1/1] Processing video:  testclip
Traceback (most recent call last):
  File "/home/Upscale-A-Video/inference_upscale_a_video.py", line 215, in <module>
    output = vframes.new_zeros(output_shape)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 50.59 GiB (GPU 0; 47.53 GiB total capacity; 9.55 GiB already allocated; 37.55 GiB free; 9.64 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
(.venv) root@488a72695c82:/home/Upscale-A-Video#

I've tried the following:
export PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.6,max_split_size_mb:128

which didn't resolve the problem.

jahwanoh · 2025-05-13T15:37:48Z

40GB fails too T_T

doesmyemployercareaboutmyusername · 2025-05-14T08:57:47Z

Meanwhile I'm sitting here with my RTX 3080 12GB VRAM wondering why it doesn't work. This thread explains a lot.

Here's what I did so far to get this software running:
Disabling CPU-Intensive Features:

Use --no_llava to disable the image captioning model
Remove -p parameter to skip flow propagation steps
Consider using --color_fix None to avoid additional post-processing

Adjusting Processing Batch Size:

Reduce -n parameter (e.g., from 150 to 50) to process fewer frames at once
Enable tiling with --perform_tile for large images
Decrease tile size with --tile_size 192 or lower (default is 256)

GPU Configuration:

Set -g 1 to use a single GPU
Match value to your actual GPU count
The parameter affects resource allocation

Quality vs. Performance Tradeoffs:

Lower -s parameter (inference steps) from 30 to 20 for faster processing
Reduce -g (guidance scale) from 6 to 4 for less computation
Decrease -n (noise level) for potentially faster rendering

The hardware requirements are described in their working paper:
The training of Upscale-A-Video was done on “32 NVIDIA A100-80G GPUs with a batch size of 384”. This information indicates very high hardware requirements for training the model. For inference, the specific requirements are not explicitly stated, but there are hints:

The software uses a Latent Diffusion Model (LDM) framework, which is typically resource intensive.
To work around memory constraints, they use a strategy of “splitting the input video into multiple overlapping patches, processing them separately, and finally merging the improved patches.”
This information suggests that the application also requires powerful GPUs with sufficient VRAM, although the exact minimum requirements are not specified.

Hope it helps some of you.
It would be very helpful if someone who has gotten this software to work could describe which hardware and which settings they used for it.

Edit: In this thread someone apparently got it working with a H200 with 140GB of VRAM...

twobob · 2025-05-24T15:10:02Z

using #42 I estimated it would require around 35Gb, still out of reach for home GPUs

However... it seems it can be done in "around" 24Gb

needs a little tuning to stop the peaks going about 24 but I am trying the following on a 3090 on windows

(UAV) D:\repo\Upscale-A-Video>python inference_upscale_a_video.py -i "C:\Users\new\THE REAL DEAL - CroppedFinal.mp4" --fp16 --load_8bit_llava --output_path output -n 50 -g 4 -s 10 --tile_size 96 --perform_tile --color_fix None
Detected 1 GPU. Using cuda:0 for all models.
Using FP16 precision for main pipeline on GPU.

/ / / /___ ______________ / /_ / | | | / ()/ / ____
/ / / / __ / / / __ `/ / _ _/ /| || | / / / __ / _ / __
/ // / // ( ) // // / / // ___ /__/ |/ / / // / / // /
_/ .//___/_,//_/ // || |//_,/_/_/
//

Upscale-A-Video Device: cuda:0 (torch.float16)
LLaVA Device: cuda:0 (8-bit: True)
Processing in chunks of: 8 frames
Tiling: Enabled, Tile Size: 96, Overlap: 32
Color Correction: None
Propagation Steps: Disabled

Loading Upscale-A-Video Pipeline components...

Pipeline structure loaded.
Using 3D VAE
VAE loaded to CPU.
UNet loaded to CPU.
Scheduler loaded.
RAFT model loaded. Propagator initialized.
Moving pipeline components to cuda:0 with torch.float16...
Pipeline components moved and configured.

Loading LLaVA...
Loading vision tower: openai/clip-vit-large-patch14-336
bin C:\Users\new\AppData\Roaming\Python\Python310\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:13<00:00, 4.44s/it]
LLaVA loaded on cuda:0.
Found 1 video(s) for captioning.
Generating caption for: THE REAL DEAL - CroppedFinal.mp4 (using first frame)...
Caption: The image is a black and white photograph of a dark forest with a very
dark background. The scene is captured in a sepia-toned style, giving it a
vintage and nostalgic feel. The forest is filled with trees, some of which are
closer to the foreground, while others are further
Upscale-A-Video Pipeline ready.

Starting processing for 1 video(s)...

[1/1] Processing video: THE REAL DEAL - CroppedFinal
Input path: C:\Users\new\THE REAL DEAL - CroppedFinal.mp4
Detected ~171233 frames.
Video resolution: 660x540, FPS: 50.00
Using generated caption: The image is a black and white photograph of a dark forest with [...]
Output path: output\video\THE REAL DEAL - CroppedFinal_n50_g4_s10_t96o32.mp4

[1/1] Reading chunk 1...
[1/1] Processing chunk 1 (8 frames)...
Processing chunk w/ tile patches [7x6], size=96x96, overlap=32...
Propagation steps: None
Denoising: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:07<00:00, 1.35it/s]
Decoding: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00, 2.06it/s]
Propagation steps: None

twobob · 2025-05-24T15:30:35Z

python inference_upscale_a_video.py -i "C:\Users\new\THE REAL DEAL - CroppedFinal.mp4" --fp16 --load_8bit_llava --output_path output -n 50 -g 4 -s 10 --tile_size 80 --perform_tile --color_fix None

Video resolution: 660x540, FPS: 50.00

Seems to fit so far. Obviously the quality will be terrible, but some "hey this is a place to start" numbers might help someone

rjbaw mentioned this issue May 19, 2025

Reducing memory usage #42

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Minimum requirements for inference #31

Minimum requirements for inference #31

ShaochengShen commented Nov 27, 2024

C00reNUT commented Nov 28, 2024

Uh oh!

codecowboy commented Dec 7, 2024

Uh oh!

jahwanoh commented May 13, 2025

Uh oh!

doesmyemployercareaboutmyusername commented May 14, 2025 •

edited

Loading

Uh oh!

twobob commented May 24, 2025 •

edited

Loading

Uh oh!

twobob commented May 24, 2025 •

edited

Loading

Uh oh!

Minimum requirements for inference #31

Minimum requirements for inference #31

Comments

ShaochengShen commented Nov 27, 2024

C00reNUT commented Nov 28, 2024

Uh oh!

codecowboy commented Dec 7, 2024

Uh oh!

jahwanoh commented May 13, 2025

Uh oh!

doesmyemployercareaboutmyusername commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

twobob commented May 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Starting processing for 1 video(s)...

Uh oh!

twobob commented May 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

doesmyemployercareaboutmyusername commented May 14, 2025 •

edited

Loading

twobob commented May 24, 2025 •

edited

Loading

twobob commented May 24, 2025 •

edited

Loading