overriding tensors and RAM instead of VRAM #14000

ZorXL · 2025-06-04T01:15:33Z

ZorXL
Jun 4, 2025

It seems it is possible selectively restricting certain tensors from offloading to save GPU space and keep in VRAM just what's most frequently used and important with tensor overrides.

I wonder in a case there is no any significant VRAM , is somehow in a similar manner possible to force the most frequently used layers, experts, etc, to keep them in RAM, while whatever is not frequently used or important is stored on hard drive or ssd while freeing the RAM...
Since smaller MOE models are already pretty good with running on CPU while all in RAM, in this way some bigger MOE models should be significantly faster...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

overriding tensors and RAM instead of VRAM #14000

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

overriding tensors and RAM instead of VRAM #14000

Uh oh!

ZorXL Jun 4, 2025

Replies: 0 comments

ZorXL
Jun 4, 2025