Hierarchical and Step-Layer-Wise Tuning of Attention Specialty for Multi-Instance Synthesis in Diffusion Transformers 🎨

🔥 News

2025-03-25: Our paper with supplementary material Attention Specialty for Diffusion Transformers is now available on arXiv.
2025-03-25: We release the code!

📝 Introduction

A training-free method based on DiT-based models (e.g., FLUX.1.dev, FLUX.1.schnell, SD v3.5) that allows users to precisely place instances and accurately attribute representations in detailed multi-instance layouts using preliminary sketches, while maintaining overall image quality.

✅ To-Do List

🛠️ Installation

💻 Environment Setup

git clone https://github.com/bitzhangcy/MIS_DiT.git
cd MIS_DiT
conda create -n ast python=3.10
conda activate ast
pip install -r requirements.txt

🚀 Checkpoints

The default checkpoint is FLUX.1-dev (link). Additionally, FLUX.1-schnell and SD v3.5 are also supported, with FLUX.1-schnell utilizing different hyperparameters and SD v3.5 featuring a distinct model architecture and parameter set.

Get the access token from FLUX.1-dev and set it at line 63 in flux_hcrt.py as hf_token = "your_access_token".

🎨 Inference

You can quickly perform precise multi-instance synthesis using the following Gradio interface and the instructions below:

python flux_hcrt.py

User Instructions:

Create the image layout.

Enter text prompt and label each segment.

Check the generated images, and tune the hyperparameters if needed.
w^c : Degree of T2T attention modulation module.
w^d : Degree of I2T attention modulation module.
w^f : Degree of I2I attention modulation module.

📊 Comparison with Other Models

🤝 Acknowledgement

We sincerely thank the authors of DenseDiffusion for their open-source code, which serves as the foundation of our project.

📚 Citation

If you find this repository useful, please cite using the following BibTeX entry:

@misc{,
      title={Hierarchical and Step-Layer-Wise Tuning of Attention Specialty for Multi-Instance Synthesis in Diffusion Transformers},
      author={Zhang, Chunyang and Sun, Zhenhong and Zhang, Zhicheng and Wang, Junyan and Zhang, Yu and Gong, Dong and Mo, Huadong and Dong, Daoyi},
      year={2025},
      eprint={2504.10148},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2504.10148},
}

📬 Contact

If you have any questions or suggestions, please feel free to contact us 😆!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
dataset		dataset
figures		figures
LICENSE		LICENSE
README.md		README.md
flux_hcrt.py		flux_hcrt.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hierarchical and Step-Layer-Wise Tuning of Attention Specialty for Multi-Instance Synthesis in Diffusion Transformers 🎨

🔥 News

📝 Introduction

✅ To-Do List

🛠️ Installation

💻 Environment Setup

🚀 Checkpoints

🎨 Inference

User Instructions:

📊 Comparison with Other Models

🤝 Acknowledgement

📚 Citation

📬 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

EngineeringAI-LAB/MIS-DiT-AST

Folders and files

Latest commit

History

Repository files navigation

Hierarchical and Step-Layer-Wise Tuning of Attention Specialty for Multi-Instance Synthesis in Diffusion Transformers 🎨

🔥 News

📝 Introduction

✅ To-Do List

🛠️ Installation

💻 Environment Setup

🚀 Checkpoints

🎨 Inference

User Instructions:

📊 Comparison with Other Models

🤝 Acknowledgement

📚 Citation

📬 Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages