Skip to content

Commit 79166dc

Browse files
authored
Merge branch 'main' into modular-diffusers
2 parents f95c320 + 01240fe commit 79166dc

File tree

55 files changed

+246
-10683
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+246
-10683
lines changed

docs/source/en/using-diffusers/other-formats.md

Lines changed: 17 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -70,41 +70,32 @@ pipeline = StableDiffusionPipeline.from_single_file(
7070
</hfoption>
7171
</hfoptions>
7272

73-
#### LoRA files
73+
#### LoRAs
7474

75-
[LoRA](https://hf.co/docs/peft/conceptual_guides/adapter#low-rank-adaptation-lora) is a lightweight adapter that is fast and easy to train, making them especially popular for generating images in a certain way or style. These adapters are commonly stored in a safetensors file, and are widely popular on model sharing platforms like [civitai](https://civitai.com/).
75+
[LoRAs](../tutorials/using_peft_for_inference) are lightweight checkpoints fine-tuned to generate images or video in a specific style. If you are using a checkpoint trained with a Diffusers training script, the LoRA configuration is automatically saved as metadata in a safetensors file. When the safetensors file is loaded, the metadata is parsed to correctly configure the LoRA and avoids missing or incorrect LoRA configurations.
7676

77-
LoRAs are loaded into a base model with the [`~loaders.StableDiffusionLoraLoaderMixin.load_lora_weights`] method.
77+
The easiest way to inspect the metadata, if available, is by clicking on the Safetensors logo next to the weights.
78+
79+
<div class="flex justify-center">
80+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/safetensors_lora.png"/>
81+
</div>
82+
83+
For LoRAs that aren't trained with Diffusers, you can still save metadata with the `transformer_lora_adapter_metadata` and `text_encoder_lora_adapter_metadata` arguments in [`~loaders.FluxLoraLoaderMixin.save_lora_weights`] as long as it is a safetensors file.
7884

7985
```py
80-
from diffusers import StableDiffusionXLPipeline
8186
import torch
87+
from diffusers import FluxPipeline
8288

83-
# base model
84-
pipeline = StableDiffusionXLPipeline.from_pretrained(
85-
"Lykon/dreamshaper-xl-1-0", torch_dtype=torch.float16, variant="fp16"
89+
pipeline = FluxPipeline.from_pretrained(
90+
"black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16
8691
).to("cuda")
87-
88-
# download LoRA weights
89-
!wget https://civitai.com/api/download/models/168776 -O blueprintify.safetensors
90-
91-
# load LoRA weights
92-
pipeline.load_lora_weights(".", weight_name="blueprintify.safetensors")
93-
prompt = "bl3uprint, a highly detailed blueprint of the empire state building, explaining how to build all parts, many txt, blueprint grid backdrop"
94-
negative_prompt = "lowres, cropped, worst quality, low quality, normal quality, artifacts, signature, watermark, username, blurry, more than one bridge, bad architecture"
95-
96-
image = pipeline(
97-
prompt=prompt,
98-
negative_prompt=negative_prompt,
99-
generator=torch.manual_seed(0),
100-
).images[0]
101-
image
92+
pipeline.load_lora_weights("linoyts/yarn_art_Flux_LoRA")
93+
pipeline.save_lora_weights(
94+
transformer_lora_adapter_metadata={"r": 16, "lora_alpha": 16},
95+
text_encoder_lora_adapter_metadata={"r": 8, "lora_alpha": 8}
96+
)
10297
```
10398

104-
<div class="flex justify-center">
105-
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/blueprint-lora.png"/>
106-
</div>
107-
10899
### ckpt
109100

110101
> [!WARNING]

examples/dreambooth/README_flux.md

Lines changed: 49 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -263,9 +263,19 @@ This reduces memory requirements significantly w/o a significant quality loss. N
263263
## Training Kontext
264264

265265
[Kontext](https://bfl.ai/announcements/flux-1-kontext) lets us perform image editing as well as image generation. Even though it can accept both image and text as inputs, one can use it for text-to-image (T2I) generation, too. We
266-
provide a simple script for LoRA fine-tuning Kontext in [train_dreambooth_lora_flux_kontext.py](./train_dreambooth_lora_flux_kontext.py) for T2I. The optimizations discussed above apply this script, too.
266+
provide a simple script for LoRA fine-tuning Kontext in [train_dreambooth_lora_flux_kontext.py](./train_dreambooth_lora_flux_kontext.py) for both T2I and I2I. The optimizations discussed above apply this script, too.
267267

268-
Make sure to follow the [instructions to set up your environment](#running-locally-with-pytorch) before proceeding to the rest of the section.
268+
**important**
269+
270+
> [!NOTE]
271+
> To make sure you can successfully run the latest version of the kontext example script, we highly recommend installing from source, specifically from the commit mentioned below.
272+
> To do this, execute the following steps in a new virtual environment:
273+
> ```
274+
> git clone https://github.com/huggingface/diffusers
275+
> cd diffusers
276+
> git checkout 05e7a854d0a5661f5b433f6dd5954c224b104f0b
277+
> pip install -e .
278+
> ```
269279
270280
Below is an example training command:
271281
@@ -294,6 +304,42 @@ accelerate launch train_dreambooth_lora_flux_kontext.py \
294304
Fine-tuning Kontext on the T2I task can be useful when working with specific styles/subjects where it may not
295305
perform as expected.
296306

307+
Image-guided fine-tuning (I2I) is also supported. To start, you must have a dataset containing triplets:
308+
309+
* Condition image
310+
* Target image
311+
* Instruction
312+
313+
[kontext-community/relighting](https://huggingface.co/datasets/kontext-community/relighting) is a good example of such a dataset. If you are using such a dataset, you can use the command below to launch training:
314+
315+
```bash
316+
accelerate launch train_dreambooth_lora_flux_kontext.py \
317+
--pretrained_model_name_or_path=black-forest-labs/FLUX.1-Kontext-dev \
318+
--output_dir="kontext-i2i" \
319+
--dataset_name="kontext-community/relighting" \
320+
--image_column="output" --cond_image_column="file_name" --caption_column="instruction" \
321+
--mixed_precision="bf16" \
322+
--resolution=1024 \
323+
--train_batch_size=1 \
324+
--guidance_scale=1 \
325+
--gradient_accumulation_steps=4 \
326+
--gradient_checkpointing \
327+
--optimizer="adamw" \
328+
--use_8bit_adam \
329+
--cache_latents \
330+
--learning_rate=1e-4 \
331+
--lr_scheduler="constant" \
332+
--lr_warmup_steps=200 \
333+
--max_train_steps=1000 \
334+
--rank=16\
335+
--seed="0"
336+
```
337+
338+
More generally, when performing I2I fine-tuning, we expect you to:
339+
340+
* Have a dataset `kontext-community/relighting`
341+
* Supply `image_column`, `cond_image_column`, and `caption_column` values when launching training
342+
297343
### Misc notes
298344

299345
* By default, we use `mode` as the value of `--vae_encode_mode` argument. This is because Kontext uses `mode()` of the distribution predicted by the VAE instead of sampling from it.
@@ -307,4 +353,4 @@ To enable aspect ratio bucketing, pass `--aspect_ratio_buckets` argument with a
307353
Since Flux Kontext finetuning is still an experimental phase, we encourage you to explore different settings and share your insights! 🤗
308354

309355
## Other notes
310-
Thanks to `bghira` and `ostris` for their help with reviewing & insight sharing ♥️
356+
Thanks to `bghira` and `ostris` for their help with reviewing & insight sharing ♥️

0 commit comments

Comments
 (0)