Skip to content

Minimum requirements for inference #31

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ShaochengShen opened this issue Nov 27, 2024 · 6 comments
Open

Minimum requirements for inference #31

ShaochengShen opened this issue Nov 27, 2024 · 6 comments

Comments

@ShaochengShen
Copy link

Hi!
Now I'm just trying to use the inference code to upscale some examples, and I use A40 48G GPU. However, it still reminds me OOM.
So Please tell me the minimum requirements for inference, or if you can, please tell me some methods to reduce GPU memory.
Thanks a lot!

@C00reNUT
Copy link

Hi! Now I'm just trying to use the inference code to upscale some examples, and I use A40 48G GPU. However, it still reminds me OOM. So Please tell me the minimum requirements for inference, or if you can, please tell me some methods to reduce GPU memory. Thanks a lot!

And I was hoping that my 4090 will manage it :) thank you for providing the reference, but it's strange seeing how many starts this repo has one would think it is usable in real life...

@codecowboy
Copy link

@sczhou would be grateful if you could advise on the minimum VRAM requirements. I've tried it on a RTX A6000 and also get OOM error.

Loading Upscale-A-Video
[1/1] Processing video:  testclip
Traceback (most recent call last):
  File "/home/Upscale-A-Video/inference_upscale_a_video.py", line 215, in <module>
    output = vframes.new_zeros(output_shape)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 50.59 GiB (GPU 0; 47.53 GiB total capacity; 9.55 GiB already allocated; 37.55 GiB free; 9.64 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
(.venv) root@488a72695c82:/home/Upscale-A-Video# 

I've tried the following:
export PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.6,max_split_size_mb:128

which didn't resolve the problem.

@jahwanoh
Copy link

40GB fails too T_T

@doesmyemployercareaboutmyusername
Copy link

doesmyemployercareaboutmyusername commented May 14, 2025

Meanwhile I'm sitting here with my RTX 3080 12GB VRAM wondering why it doesn't work. This thread explains a lot.

Here's what I did so far to get this software running:
Disabling CPU-Intensive Features:

  • Use --no_llava to disable the image captioning model
  • Remove -p parameter to skip flow propagation steps
  • Consider using --color_fix None to avoid additional post-processing

Adjusting Processing Batch Size:

  • Reduce -n parameter (e.g., from 150 to 50) to process fewer frames at once
  • Enable tiling with --perform_tile for large images
  • Decrease tile size with --tile_size 192 or lower (default is 256)

GPU Configuration:

  • Set -g 1 to use a single GPU
  • Match value to your actual GPU count
  • The parameter affects resource allocation

Quality vs. Performance Tradeoffs:

  • Lower -s parameter (inference steps) from 30 to 20 for faster processing
  • Reduce -g (guidance scale) from 6 to 4 for less computation
  • Decrease -n (noise level) for potentially faster rendering

The hardware requirements are described in their working paper:
The training of Upscale-A-Video was done on “32 NVIDIA A100-80G GPUs with a batch size of 384”. This information indicates very high hardware requirements for training the model. For inference, the specific requirements are not explicitly stated, but there are hints:

  • The software uses a Latent Diffusion Model (LDM) framework, which is typically resource intensive.
  • To work around memory constraints, they use a strategy of “splitting the input video into multiple overlapping patches, processing them separately, and finally merging the improved patches.”
    This information suggests that the application also requires powerful GPUs with sufficient VRAM, although the exact minimum requirements are not specified.

Hope it helps some of you.
It would be very helpful if someone who has gotten this software to work could describe which hardware and which settings they used for it.

Edit: In this thread someone apparently got it working with a H200 with 140GB of VRAM...

@twobob
Copy link

twobob commented May 24, 2025

using #42 I estimated it would require around 35Gb, still out of reach for home GPUs

However... it seems it can be done in "around" 24Gb

Image

needs a little tuning to stop the peaks going about 24 but I am trying the following on a 3090 on windows

(UAV) D:\repo\Upscale-A-Video>python inference_upscale_a_video.py -i "C:\Users\new\THE REAL DEAL - CroppedFinal.mp4" --fp16 --load_8bit_llava --output_path output -n 50 -g 4 -s 10 --tile_size 96 --perform_tile --color_fix None
Detected 1 GPU. Using cuda:0 for all models.
Using FP16 precision for main pipeline on GPU.


/ / / /___ ______________ / /_ / | | | / ()/ / ____
/ / / / __ / / / __ `/ / _ _/ /| || | / / / __ / _ / __
/ // / // ( ) /
/ // / / // ___ /__/ |/ / / // / / // /
_
/ .//___/_,//_/ // || |//_,/_/_/
/
/

Upscale-A-Video Device: cuda:0 (torch.float16)
LLaVA Device: cuda:0 (8-bit: True)
Processing in chunks of: 8 frames
Tiling: Enabled, Tile Size: 96, Overlap: 32
Color Correction: None
Propagation Steps: Disabled

Loading Upscale-A-Video Pipeline components...

  • Pipeline structure loaded.
  • Using 3D VAE
  • VAE loaded to CPU.
  • UNet loaded to CPU.
  • Scheduler loaded.
    RAFT model loaded. Propagator initialized.
  • Moving pipeline components to cuda:0 with torch.float16...
  • Pipeline components moved and configured.

Loading LLaVA...
Loading vision tower: openai/clip-vit-large-patch14-336
bin C:\Users\new\AppData\Roaming\Python\Python310\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:13<00:00, 4.44s/it]
LLaVA loaded on cuda:0.
Found 1 video(s) for captioning.
Generating caption for: THE REAL DEAL - CroppedFinal.mp4 (using first frame)...
Caption: The image is a black and white photograph of a dark forest with a very
dark background. The scene is captured in a sepia-toned style, giving it a
vintage and nostalgic feel. The forest is filled with trees, some of which are
closer to the foreground, while others are further
Upscale-A-Video Pipeline ready.

Starting processing for 1 video(s)...

[1/1] Processing video: THE REAL DEAL - CroppedFinal
Input path: C:\Users\new\THE REAL DEAL - CroppedFinal.mp4
Detected ~171233 frames.
Video resolution: 660x540, FPS: 50.00
Using generated caption: The image is a black and white photograph of a dark forest with [...]
Output path: output\video\THE REAL DEAL - CroppedFinal_n50_g4_s10_t96o32.mp4

[1/1] Reading chunk 1...
[1/1] Processing chunk 1 (8 frames)...
Processing chunk w/ tile patches [7x6], size=96x96, overlap=32...
Propagation steps: None
Denoising: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:07<00:00, 1.35it/s]
Decoding: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00, 2.06it/s]
Propagation steps: None

@twobob
Copy link

twobob commented May 24, 2025

python inference_upscale_a_video.py -i "C:\Users\new\THE REAL DEAL - CroppedFinal.mp4" --fp16 --load_8bit_llava --output_path output -n 50 -g 4 -s 10 --tile_size 80 --perform_tile --color_fix None

Video resolution: 660x540, FPS: 50.00

Seems to fit so far. Obviously the quality will be terrible, but some "hey this is a place to start" numbers might help someone

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants