Skip to content

EngineeringAI-LAB/MIS-DiT-AST

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hierarchical and Step-Layer-Wise Tuning of Attention Specialty for Multi-Instance Synthesis in Diffusion Transformers 🎨

[Project Page] [Paper]

🔥 News

📝 Introduction

A training-free method based on DiT-based models (e.g., FLUX.1.dev, FLUX.1.schnell, SD v3.5) that allows users to precisely place instances and accurately attribute representations in detailed multi-instance layouts using preliminary sketches, while maintaining overall image quality.

✅ To-Do List

  • Arxiv Paper with Supplementary Material
  • Inference Code
  • More Demos. Coming soon. stay tuned! 🚀
  • ComfyUI support
  • Huggingface Space support

🛠️ Installation

💻 Environment Setup

git clone https://github.com/bitzhangcy/MIS_DiT.git
cd MIS_DiT
conda create -n ast python=3.10
conda activate ast
pip install -r requirements.txt

🚀 Checkpoints

The default checkpoint is FLUX.1-dev (link). Additionally, FLUX.1-schnell and SD v3.5 are also supported, with FLUX.1-schnell utilizing different hyperparameters and SD v3.5 featuring a distinct model architecture and parameter set.

Get the access token from FLUX.1-dev and set it at line 63 in flux_hcrt.py as hf_token = "your_access_token".

🎨 Inference

You can quickly perform precise multi-instance synthesis using the following Gradio interface and the instructions below:

python flux_hcrt.py

User Instructions:

  • Create the image layout.

  • Enter text prompt and label each segment.

  • Check the generated images, and tune the hyperparameters if needed.
    wc : Degree of T2T attention modulation module.
    wd : Degree of I2T attention modulation module.
    wf : Degree of I2I attention modulation module.

📊 Comparison with Other Models

comparison

🤝 Acknowledgement

We sincerely thank the authors of DenseDiffusion for their open-source code, which serves as the foundation of our project.

📚 Citation

If you find this repository useful, please cite using the following BibTeX entry:

@misc{,
      title={Hierarchical and Step-Layer-Wise Tuning of Attention Specialty for Multi-Instance Synthesis in Diffusion Transformers},
      author={Zhang, Chunyang and Sun, Zhenhong and Zhang, Zhicheng and Wang, Junyan and Zhang, Yu and Gong, Dong and Mo, Huadong and Dong, Daoyi},
      year={2025},
      eprint={2504.10148},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2504.10148},
}

📬 Contact

If you have any questions or suggestions, please feel free to contact us 😆!

About

This is a training-free sketch to scene generation.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages