The Diffusion model is implemented from scratch in this repository. In this case, I used the swiss roll dataset and the algorithm basically followed Denoising Diffusion Probabilistic Models.
When we implement the Diffusion model, the following parts need to be implemented. I will explain each of these steps.
- Forward process
- Neural network for training
- Training
- Sampling
You can create local environment by running following commands.
conda create -n diff python=3.11
conda activate diff
conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txtThe forward process of adding Gaussian noise to the input data is expressed by the following equation. Diffusion model gradually adds Gaussian noise to the data according to a variance schedule
A notable property of the forward process is that it admits sampling
Thus, we can get
The code to get
def calculate_parameters(diffusion_steps, min_beta, max_beta):
step = (max_beta - min_beta) / diffusion_steps
beta_ts = torch.arange(min_beta, max_beta + step, step)
alpha_ts = 1 - beta_ts
bar_alpha_ts = torch.cumprod(alpha_ts, dim=0)
return beta_ts, alpha_ts, bar_alpha_tsThe code to retrieve data and
def calculate_data_at_certain_time(x_0, bar_alpha_ts, t):
eps = torch.randn(size=x_0.shape)
noised_x_t = (
torch.sqrt(bar_alpha_ts[t]) * x_0 + torch.sqrt(1 - bar_alpha_ts[t]) * eps
)
return noised_x_t, epsYou can try a forward process with swiss roll by running following commands. We set the forward process variance constants increasing linearly from
cd srcs
python3 forward_process.pyOriginal paper uses U-Net backbone, but I used simple neural network for training this time because it is enough in this data. It has 4 hidden layers and use ReLU as an activation function.
If you want to check the architecture of model, you can run the following command.
cd srcs
python3 simple_nn.pyIn reverse process, we calculate
The variance is fixed, so we need to predict
Therefore, we train
I followed below training algorithms of original paper.
Parameters which I used during training is as follows.
- Optimizer -> Adam
- Batch size -> 128
- Epochs -> 30
- Diffusion timesteps -> 50
- Minimum beta -> 1e-4
- Maximum beta -> 0.02
You can train the diffusion model by running following commands.
cd srcs
python3 train.pyTo sample
I followed below sampling algorithms of original paper. We initialize
You can try sampling by running following commands.
cd srcs
python3 sampling.pyI show the derivatino of the loss function of the diffusion model.
First, difussion model has forward process and the reverse process. In the forward process, noise is added to the input data step by step. In the reverse process, the reverse of the forward process is performed to recover the original image from the noisy data. The graph below is easy to understand these processes.
Diffusion model are latent variable models the form
Forward process is fixed to a Markov chain that gradually adds Gaussian noise to the data according to a variance schedule
The probability the generative model assigns to the data is as follows.
In the original paper, the integral is intractable, so the fomula transformation is shown as follows.
Although not described in detail in the paper, I personally think that the formula transformation was performed using the forward process,
which has a known probability distribution, probably because the probability distribution of the reverse process can be complicated
and it is difficult to calculate the integral.
Training is performed by optimizing the usual variational bound on negative log likelihood. Below equation has a upper bound provided by Jensen’s inequality,
This equation can be further transformed as follows.
In the above equation deformation,
In the course of the above equation transformation, the following relationship is used.
I also used the following equation transformation.
We will now simplify Eq(6).
We ignore the fact that the forward process variances
Now we discuss our choices in
We can get
By using the above
We use below
The above equation reveals that
where
We can simplify the equation of
To summarize, we can train the reverse process mean function approximator
We assume that image data consists of integers in
where
From Eq(17) and Eq(18), we can simplify training objective more.
The
The conditional distribution
Those two distributions are defined as follows.
When calculating variance, we use the product property of the Gaussian distribution. The inverse of the variance in a product of Gaussian distribution is expressed as the sum of the inverse variances of the individual distributions.
If we inverse both sides to obtain the variance
Here, we use the following property.
Substitute this into the variance fomula to transforme it.
Next, we consider the derivation of the mean. Letting
Therefore, we can calculate the mean as follows.
Here, we use the following property again.
Honestly, I'm not sure if this derivation is correct. I also don't know why




