Skip to content

Conversation

lshpku
Copy link

@lshpku lshpku commented Sep 2, 2025

实现了两个非常有用的 HuggingFace 小工具:

  1. HuggingFace Cache:将当前机器所需的 HuggingFace 权重文件保存在本机的 /dev/shm 中,下次加载的时候从本地高速读取

  2. HuggingFace 缩专家加载:即专家数小于 256 时自动对专家维度进行 slice

注1:如果在加载过程中不慎中断了程序,下次启动前需手动rm -f /dev/shm/lshrun_*.lock,否则 Cache 会死锁

注2:缩层还需要改special_cases,例如:

# 29层时改成:
special_cases = {(0, 0): "model", (28, 2): "model.layers.61", (28, 3): "model", (28, 4): "lm_head"}

# 21层时改成:
special_cases = {(0, 0): "model", (21, 1): "model.layers.61", (21, 2): "model", (21, 3): "lm_head"}

Copy link

paddle-bot bot commented Sep 2, 2025

Thanks for your contribution!

@lshpku lshpku force-pushed the hf-cache-shrink branch 4 times, most recently from 255abc5 to 23c0add Compare September 8, 2025 05:04
@lshpku lshpku changed the title 实现HuggingFace Cache以及缩专家加载 实现HuggingFace Cache和缩专家加载 Sep 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant