Skip to content

Validate Input Data as Precomputed DINOv2 Embeddings #13

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
NavairaRehman opened this issue Mar 25, 2025 · 1 comment
Open

Validate Input Data as Precomputed DINOv2 Embeddings #13

NavairaRehman opened this issue Mar 25, 2025 · 1 comment

Comments

@NavairaRehman
Copy link

From the config of train_dit, it appears that the conditioner uses a frozen DINOv2 ViT-B/14 model (Dinov2Wrapper) to process image-based conditioning signals. However, I would like to confirm the exact input data format expected by the conditioner and whether the model relies on precomputed DINOv2 embeddings during training.

It'll be great if you can provide clarification on the following points:

  • Are the files at cond_url_template expected to contain precomputed DINOv2 image embeddings (e.g., extracted offline using DINOv2 and saved as tensors)?
  • Does the model avoid processing raw images during training and instead rely on these precomputed features?
  • Is there a script or guideline for generating these DINOv2 embeddings from raw images?

Thank you in advance!

@FrozenBurning
Copy link
Collaborator

Thanks for your interest in our work! Let me respond to your questions as follows:

  • cond_url_template directly points to the path of the precomputed DINO feature or text embedding of which we cached before training.
  • Yes, we directly load precomputed PrimX and corresponding condition features when training the DiT.
  • We provide detailed instructions and caching scripts for VAE and DINOv2 embedding here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants