Official implementation of the paper: "Delta Velocity Rectified Flow for Text-to-Image Editing".
ArXiv: [https://arxiv.org/abs/2509.05342]
DVRF is a text-guided image editing method that optimizes the latent of a pre-trained diffusion model (SD3 / SD3.5) using a rectified-flow objective on the delta of predicted velocities between a source and a target prompt. It provides high-fidelity, localized edits while preserving the structure of the source image.
- Models: Stable Diffusion 3 (SD3), Stable Diffusion 3.5 (
medium,large,large-turbo) - Pipelines: Diffusers pipelines (Hugging Face)
- Input: Source image + source prompt + target prompt(s)
- Output: Edited image and optimization trajectory frames
DeltaVelocityRectifiedFlow/
├── assets/ # Paper figures and results
│ ├── DVRF.png # Method schematic
│ ├── DVRF_results.png # Qualitative results
│ ├── DVRF_comparaison.png # Comparison results 1
│ └── DVRF_comparaison2.png # Comparison results 2
├── images/ # Example images and dataset config
│ ├── mapping_file.yaml # Dataset configuration
│ ├── a_cat_sitting_on_a_table.png
│ ├── city-street.jpg
│ ├── fallow-deer.jpg
│ ├── ...
├── models/ # Core DVRF implementation
│ ├── __init__.py
│ └── DVRF.py # Main DVRF algorithm
├── edit.py # Main script for running experiments
├── exp.yaml # Experiment configuration
├── dvrf_environment.yml # Conda environment
├── .gitignore # Git ignore rules
└── README.md # This file
The DVRF objective optimizes the latent by aligning the target velocity with the source velocity, inspired by the Dela Denoising Score for diffusions models. We further introduce a shift term to improve editing performance, and propose Delta Velocity Rectified Flow (DVRF), a trajectory-driven editing objective that operates in the velocity space of rectified flows. DVRF obtains state-of-the-art results on the PIE Benchmark. See the method schematic:
Selected qualitative results demonstrating localized edits and structure preservation:
Additional comparisons:
Clone the repo:
git clone https://github.com/gaspardbd/DeltaVelocityRectifiedFlow.git
cd DeltaVelocityRectifiedFlowCreate the conda environment :
conda env create -f dvrf_environment.yml
conda activate dvrf_env- Configure your experiment in
exp.yaml:
- exp_name: "DVRF_SD3"
dataset_yaml: images/mapping_file.yaml
model_type: "SD3" # or "SD3.5", "SD3.5-medium", "SD3.5-large", "SD3.5-large-turbo"
T_steps: 50 # diffusion timesteps
B: 1 # batch size for averaging the gradient
src_guidance_scale: 6
tgt_guidance_scale: 16.5
num_steps: 50 # optimization steps
seed: 41
eta: 1.0 # progressive c_t = k/T * t described in the paper
scheduler_strategy: "descending" # "random" or "descending"
lr: "custom" # or a float, e.g. 0.02
optimizer: "SGD" # SGD, Adam, AdamW, RMSprop, SGD_Nesterov- Prepare
images/mapping_file.yamlwith your images and prompts:
- input_img: images/a_cat_sitting_on_a_table.png
source_prompt: A cat sitting on a table.
target_prompts:
- A lion sitting on a table.- Run editing:
python edit.py --exp_yaml exp.yamlOutputs are saved under outputs/<exp_name>/<model_type>/src_<image_name>/tgt_<index>/ including the side-by-side image and trajectory frames.
If you use this code, please cite our paper:
@misc{beaudouin2025deltavelocityrectifiedflow,
title={Delta Velocity Rectified Flow for Text-to-Image Editing},
author={Gaspard Beaudouin and Minghan Li and Jaeyeon Kim and Sung-Hoon Yoon and Mengyu Wang},
year={2025},
eprint={2509.05342},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2509.05342},
}http://creativecommons.org/licenses/by/4.0/
- Built on top of Hugging Face Diffusers pipelines and Stable Diffusion 3/3.5.
- Thanks to the research community for open-source models and tooling.



