Manual model lowvram hint node #9433

city96 · 2025-08-19T23:34:48Z

This PR is an attempt at some similar logic to what llama.cpp (ggml-org/llama.cpp#11397) uses.

In case of ComfyUI, it means forcing some weights to use lowvram by default. The idea here is that when you're low on VRAM to begin with, the default logic ends up marking all tensors past a certain point as lowvram. If we allocate some of the larger tensors as lowvram to begin with, we can reduce the total number of lowvram weights.

Testing with qwen image and marking the two FFN blocks (img_mlp + txt_mlp) manually, I get a total of 324 lowvram weights instead of the 640 I get with the default logic.

Now, diffusion models are almost all compute bound, and lowvram weights are still moved to the GPU, so this probably doesn't help much (I get about ~2 seconds faster generations at best). The main usecase would be if for some reason each individual CPU<->CUDA copy op has high-ish overhead (Might be the case on PCIe 3.0 or with weird cross NUMA access?).

This may still be useful for debugging lowvram without restarts, though.

Add lowvram hint node

f17ac1f

city96 requested review from yoland68, robinjhuang, pythongosssss, ltdrdata, Kosinkadink, webfiltered, christian-byrne, guill and comfyanonymous as code owners August 19, 2025 23:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Manual model lowvram hint node #9433

Manual model lowvram hint node #9433

city96 commented Aug 19, 2025

Uh oh!

Uh oh!

Manual model lowvram hint node #9433

Are you sure you want to change the base?

Manual model lowvram hint node #9433

Conversation

city96 commented Aug 19, 2025

Uh oh!

Uh oh!