How merge/fusion several files .safetensors ( diffusers format ) to a whole .safetensors checkpoint ? #9319

BenDes21 · 2024-08-29T14:32:30Z

BenDes21
Aug 29, 2024

Hi there, I got diffusion_pytorch_model-00001-of-00003.safetensors, diffusion_pytorch_model-00002-of-00003.safetensors and diffusion_pytorch_model-00003-of-00003.safetensors after full finetuned Flux with Ostris AI tool kit, I guess at this point I will have to fusion/merge theses 3 parts for created the whole checkpoint usable for a1111 Forge . Is anyone know a script or a tool who can allow this last step ?

Thanks a lot

bmaxdk · 2024-08-29T21:58:58Z

bmaxdk
Aug 29, 2024

To merge the split .safetensors files into a single chckpt, you can usesafetensors library in huggingface.
pip install safetensors

import safetensors.torch

merge_state_dict ={}
files = ["name1.safetensors", "name2.safetensors"...] #file you want to be merged
merged_file = "merged_file.safetensors"
for file in files:
    load_files_dict = safetensors.torch.load_file(your.safetensors files)
merge_state_dict.update(load_files_dict)

# save
safetensors.torch.save_file(merge_state_dict, merged_file)

this merged_file will be now saved to the combined state dictionary with .safetensors file

2 replies

BenDes21 Aug 30, 2024
Author

To merge the split .safetensors files into a single chckpt, you can usesafetensors library in huggingface. pip install safetensors

import safetensors.torch

merge_state_dict ={}
files = ["name1.safetensors", "name2.safetensors"...] #file you want to be merged
merged_file = "merged_file.safetensors"
for file in files:
    load_files_dict = safetensors.torch.load_file(your.safetensors files)
merge_state_dict.update(load_files_dict)

# save
safetensors.torch.save_file(merge_state_dict, merged_file)

this merged_file will be now saved to the combined state dictionary with .safetensors file

Hi there, unfortunately the .safetensors merged cannot be read by a1111, got : ValueError: Failed to recognize model type!

bmaxdk Aug 30, 2024

It seem you are having merging issue/. can you make sure your file loaded correct and the your .safetensors are not corrupted.

let's try:

from safetensors.torch import load_file, save_file 
from transformers import AutoTokenizer, AutoModel 
from diffusers import StableDiffusionPipeline 
import torch
load1 = load_file("diffusion_pytorch_model-00001-of-00003.safetensors") 
load2 = load_file("diffusion_pytorch_model-00002-of-00003.safetensors") 
load3 = load_file("diffusion_pytorch_model-00003-of-00003.safetensors") 
# Debug 
print(load1.keys())
print(load2.keys())
print(load3.keys())
# If here you got and issue, you may facing file corrupted issue..

# Once everything looks right lets unpack this 
merged_state_dict = {**load1, **load2, **load3}


save_file(merged_state_dict, "merged_diffusion_torch.safetensors")


your_model_name_path = "merged_diffusion_torch.safetensors" 

 # Now, let's try loading the model to ensure it's valid 
merged_load = load_file(your_model_name_path)
print(merged_load.keys())
try: # Load the tokenizer and model 
    tokenizer = AutoTokenizer.from_pretrained(your_model_name_path) 
    model = AutoModel.from_pretrained(your_model_name_path, from_tf=False, from_safetensors=True)
    
    pipeline = StableDiffusionPipeline.from_pretrained(your_model_name_path, torch_dtype=torch.float16) 
    print("it is successfully loaded!") 
    
except Exception as e: 
    print(f"Error: {e}")

I got issue having a111 when my model architecture mismatch.. I am not sure your case but it's possible when file corrupted it could cuase the issue.

NotTheStallion · 2025-06-10T11:47:24Z

NotTheStallion
Jun 10, 2025

While finetuning an LLM on a local cluster, I needed to reshard safetensor files, so I created a code that merges them then resplits them into N parts.

This might prove useful to you if you're still looking for a solution, or for other people who might refer to this page when facing the same problem.

Here is the code i made for it : https://github.com/NotTheStallion/reshard-safetensors

0 replies

JamesClarke7283 · 2025-06-29T09:52:06Z

JamesClarke7283
Jun 29, 2025

None of the non-indexed methods will work well, because many models rely on a index.json file to map the different layers, etc to the relevant files.

You will end up with a corrupted or "functional", but damaged model-file (depending on if the layer order is sequential or not to each file, which is unlikely for larger models).

Example:
https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-720P/blob/main/diffusion_pytorch_model.safetensors.index.json

~~I bet~~ its really simple to do and in a huggingface doc somewhere, its just not easily searchable for this usecase.

~~Might report back if i find anything, (Came here for the same issue of needing to merge split files)~~
##UPDATE

I havent found a unified cli tool that is official to huggingface/diffusers yet, but the solution just consists of loading the model and running ".save_pretrained" on it.

General Solution (For any Huggingface Pipeline compatible model)

Load Model weights (AutoModel, DiffusionPipeline.from_pretrained, etc..)
run: model.save_pretrained("[out_path]")

You can set ~~max_shard_size=0~~ max_shard_size="2TB" and safe_serialization=True in your .save_pretrained call explicitly if you want.

If it has errors, make sure you load the correct pipeline, for diffusers:
https://huggingface.co/docs/diffusers/main/en/using-diffusers/loading
might be useful.

All i i know is .save_pretrained is a primitive, so it will work on transformers,diffusers, etc...

UPDATE: Errata: max_shard_size=0 is actually meant to be max_shard_size="2TB" (or any max size you would still want un-sharded)

2 replies

NotTheStallion Jun 29, 2025

The code i wrote does also write a index.json.
I have also tested it multiple times.
The problem i had with the method you gave is that it needs to load the model first and then save it again however, i made the mistake of loading it in GPU which i didn't have enough memory in so it OOM every time.

I was forced to create my own method to resplit the model weights and layers while maintaining the index.json to be able to distribute them across all machines for model parallel training.

Thank you for the update !

JamesClarke7283 Jun 29, 2025

Thank you for the update !

You are very welcome! (:

I was forced to create my own method to resplit the model weights and layers while maintaining the index.json to be able to distribute them across all machines for model parallel training.

I thought HF docs have things like accelerate to solve this problem?
(for distributed training see here: https://huggingface.co/docs/accelerate/main/en/usage_guides/deepspeed)

The code i wrote does also write a index.json. I have also tested it multiple times. The problem i had with the method you gave is that it needs to load the model first and then save it again however, i made the mistake of loading it in GPU which i didn't have enough memory in so it OOM every time.

I was forced to create my own method to resplit the model weights and layers while maintaining the index.json to be able to distribute them across all machines for model parallel training.

Why one would need to make their own safetensors.index.json, to my knowledge this is usually abstracted away by hf_hub and the underlying libraries it depends on(when you push a model, etc, the code below might help), for my use-case (merging a pre-existing model on HF into a single .safetensors for inference), if a model author is using a standard way to shard(the default way when they push it), a "safetensors.index.json" file will be provided for any sharded model (if that file is omitted from a sharded repo it would likely be in error or they are using a non-standard way of doing it).

Reguarding the OOMs, Loading fully into the GPU wont work unless you have enough VRAM, you can easily load into CPU, you can tweak the 'device_map' parameter in your .from_pretrained, if you have a Discrete GPU you might want to explicitly set device_map it to cpu.

For users that want to avoid OOM errors and utilise GPU's, This may help if you use CUDA: https://docs.pytorch.org/docs/stable/notes/cuda.html#optimizing-memory-usage-with-pytorch-cuda-alloc-conf
In fact the OOM errors will reference this env-var. Of course for most users(even those doing parallel training with modern frameworks like deepspeed), memory optimization/management will be abstracted away.

Side Note: For those who are training/finetuning these models in parallel, and only using pytorch/tensorflow directly for parallel training runs, they may need to worry a little more about these details, i haven't seen your training code so i wouldn't know if this applies to you, but *largely, its a solved problem if you are using established abstractions like accelerate/DeepSpeed you wont have to worry about sharding/unsharding in a parallel training scenario, underlying libraries support distributed training https://docs.pytorch.org/tutorials/beginner/dist_overview.html.

Maybe our use-cases are different, mine is to un-shard the original model weights of a safetensors formatted model i wish to use directly(no quantization), because the ComfyUI distributable uses bf16, and i wanted to keep it fp16 or whatever the original precision was, for this use-case i think my example is ideal as it utilises the official diffusers library within its scope/low margin for error.

I wanted to do it myself to ensure there are no mistakes and preserve the full integrity of the model.

If you dont like loading already split models into memory, i assume it can be done for any already split model without doing that (my assumption being that the sector-start-end/generic level info corresponds to each join).

probably the umbrella library that creates the split and safetensors.index.json file for all HF models will allow this, inspecting the file format leads me to think loading full weights.

Additionally, while i am sure your script is helpful for generating shards, for collapsing them it appears to rely on the safetensors having the correct metadata, while your tool embeds this information, its not guaranteed it will be for any shard-set as its not designed to store metadata(even if its possible, its not recommended for reliability reasons):

https://github.com/NotTheStallion/reshard-safetensors/blob/16294928f43371ffbb0f0205fe7e56a608fa28ef/safetensors_manager/splitter.py#L57

Safetensors are supposed to be for raw tensor values by design(preserving raw weights for training purposes), it's suppose to replace pickle-like formats which were known to be a security risk("increased attack surface"), for weights with metadata GGUF is purpose built for inference (converting to a unquantized GGUF, may be what you need instead).

See Here:
https://github.com/ggml-org/ggml/blob/master/docs/gguf.md

https://github.com/NotTheStallion/reshard-safetensors/blob/16294928f43371ffbb0f0205fe7e56a608fa28ef/safetensors_manager/merger.py#L15

Most models i see on HF, already have a safetensors.index.json file, and that is the official supported way to get that information.
https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-720P/blob/main/diffusion_pytorch_model.safetensors.index.json

Untested example to try(My weights are still downloading)

from diffusers import DiffusionPipeline

# Sharded diffusers to single file example, Details: HuggingFace library/pipeline agnostic, read the HF docs: https://huggingface.co/docs

model = DiffusionPipeline.from_pretrained("Wan-AI/Wan2.1-T2V-14B-Diffusers")
print("Loaded model, saving...")

model.save_pretrained("./unsharded-out-dir", max_shard_size="2TB", safe_serialization=True)
print("Saved Model...")

Conclusion

From reading the safetensors docs & some safetensors.index.json files i have, it appears this is likely a solved problem(for any safetensor shard joining usecase), and its a classic case of us all to RTFM (:

I do find it annoying that there isn't a universal, easily searchable entry on https://huggingface.co/docs about joining arbitrary safetensors from a depth_map (.index.json file), ~~it's highly in-plausable there isn't a official way to do it, my example seems to be closest to what the devs intend. I will see if i find anything definitive, i will update here.~~

UPDATE: I had a read at your reshard-safetensors code more in-depth, it appears to be doing the same underlying method that HF libraries are doing (but replicated at a lower level), ModelHubMixin.save_pretrained does, just at a lower level and likely at the cost of reliability, i think you will find the save_pretrained Mixin more reliable, it will likely also be better optimised because its the underlying primitive(you can tweak the params to tailor it to your use-case), i would guess the same should apply to the split/sharding (as that's where most models shards/splits are generated from), it may make your life easier.

JamesClarke7283 · 2025-06-29T14:45:58Z

JamesClarke7283
Jun 29, 2025

Solution

The simplest, most reliable,officially supported method ive found (for purposes of un-sharding models) is to load and save the model with a really high max_shard_size, so it wont shard it at all, Code below.

Note: Your model does not need to be on HF, for this to work, as it relies on the core Mixins/Helpers.

Method

Load model with the models pipeline (whatever model it is).
.save_pretrained("unsharded-out-directory", max_shard_size="2TB")

Result

Then you get this:

❯ tree . -h
[ 114]  .
├── [ 449]  model_index.json
├── [  35]  scheduler
│   └── [ 746]  scheduler_config.json
├── [  50]  text_encoder
│   ├── [ 811]  config.json
│   └── [ 21G]  model.safetensors
├── [  88]  tokenizer
│   ├── [6.9K]  special_tokens_map.json
│   ├── [ 60K]  tokenizer_config.json
│   └── [ 16M]  tokenizer.json
├── [  68]  transformer
│   ├── [ 651]  config.json
│   └── [ 53G]  diffusion_pytorch_model.safetensors
└── [  68]  vae
    ├── [ 872]  config.json
    └── [484M]  diffusion_pytorch_model.safetensors

Whole model repo is un-sharded, (VAE, Transformers, etc)

Note: You can easily make it only do one component, but for some users this way may actually be better, as all included components(VAE, Transformers, text-encoders) required for it to word are now each(separately) in a single file.

Note: if you want to read from local filesystem, look at the .from_pretrained or if you are not loading a pre-trained model there are other relevent helpers (model agnostic).

https://huggingface.co/docs/huggingface_hub/main/en/guides/integrations#frompretrained

Code

from diffusers import DiffusionPipeline

# Sharded diffusers to single file example, Details: HuggingFace library/pipeline agnostic, read the HF docs: https://huggingface.co/docs

model = DiffusionPipeline.from_pretrained("Wan-AI/Wan2.1-T2V-14B-Diffusers")
print("Loaded model, saving...")

model.save_pretrained("./unsharded-out-dir", max_shard_size="2TB", safe_serialization=True)
print("Saved Model...")

Areas for improvement

If someone could find a pipeline/task/framework agnostic solution, that would be nice, lossless interoperable weight format's like Onnx will eventually supersede safetensors in wide adoption, for ML raw model storage entirely, for model authors and users alike (because needing to think about which standard/tool/framework to use all the time, is no fun). However, for the people using raw safetensors on HF, you can do this:

Pipeline agnostic hack (Transformers, Diffusers, or any of these)

HowTo: Replace with any pipeline/model-type, from model page's

1. Visit Model Page and click on `Use Model` and pick your framework:

2. Click on your framework, in this case: `transformers` :

3. Copy the required imports and pipeline code.

Example Code Result:

from transformers import pipeline

pipe = pipeline("text-generation", model="MiniMaxAI/MiniMax-M1-80k", trust_remote_code=True)

# Leave this untouched (Its HF pipeline agnostic)
model.save_pretrained("./unsharded-out-dir", max_shard_size="2TB", safe_serialization=True)
print("Saved Model...")

Footnote

I had a hot take about Safetensors, should someone disagree with interoperable ML weight formats replacing .safetensors, please at least read the safetensors docs, and for comparison, read an interoperable standard like Onnx: https://onnx.ai/

In that instance, would be interested to hear thoughts,

Context of my argument

The main point i have is they solve different problems safetensors is for storing tensors (not ML specific use case, even if HF's implementation is for this), Onnx is an interoperable standard for ML model storage (it can interoperate *in principle, with any training/inference/etc... framework/engine), it came about because for ML model storage, pickle and other binary formats where prone to attacks through the deseralization & embedded code execution vectors (raw binary format's dont have a schema, pickling ML models leads to high complexity outputs by nature, therefore designing a secure parser/tooling securely for every conceivable platform/framework/etc, was unfeasible, attack surface was too high, this is the major reason safetensors are ubiquitous), See some examples here: https://github.com/huggingface/safetensors/tree/main/attacks

Therefore i argue .safetensors is not designed for ML Model storage, it was just something we made to solve a problem with what we had before.

Addendum

I would argue that model authors should store in an interoperable format first, then export to safetensors.
(still publishing with both available, just original copy should not be .safetensors, its not reliable, because without all the metadata/other files, you just got a pile of numbers within certain dimensions).

There are so many things that make .safetensors not suited for our current usecases, instead of listing all of them i can think of, i will list one that's a little out there (for fun):

For long-term archival purposes too, (Getting a bit more out there):
The tensor components and minimal metadata make it more inscrutable than hieroglyphics, because at least those followed Zipf's law, the large proportion of content, wont be easily decipherable by our distant descendants, the knowledge contained in the model could be lost. the word "pytorch" is based on metaphors we understand, even if the word "python" and "torch" meaning were known, the context of it's use might never be known.

In contrast, interoperable formats, contain far richer metadata, and information which could be used to re-construct inference engines/interpretability (i am sure, once the pattern is found, it would be done).

If we haven't yet figured out danger signs on nuclear waste storage, to warn future humans/intelligent life of the danger, interpreting a high entropy tensor file, with a small amount of __metadata__, wont be useful.

Many thanks,
Enjoyed writing this piece :)
James Clarke

0 replies

How merge/fusion several files .safetensors ( diffusers format ) to a whole .safetensors checkpoint ? #9319

Uh oh!

Replies: 4 comments · 4 replies

Uh oh!

Uh oh!

BenDes21 Aug 30, 2024 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

General Solution (For any Huggingface Pipeline compatible model)

Uh oh!

Uh oh!

Uh oh!

Conclusion

Uh oh!

Uh oh!

Solution

Method

Result

Code

Areas for improvement

Pipeline agnostic hack (Transformers, Diffusers, or any of these)

HowTo: Replace with any pipeline/model-type, from model page's

1. Visit Model Page and click on Use Model and pick your framework:

2. Click on your framework, in this case: transformers :

3. Copy the required imports and pipeline code.

Example Code Result:

Footnote

Context of my argument

Addendum

Replies: 4 comments 4 replies

BenDes21 Aug 30, 2024
Author

1. Visit Model Page and click on `Use Model` and pick your framework:

2. Click on your framework, in this case: `transformers` :