Home

Awesome

Minimal text diffusion

A minimal implementation of diffusion models of text: learns a diffusion model of a given text corpus, allowing to generate text samples from the learned model.


diffusion
<b> Diffusion in action:</b> a DDPM model gradually denoising random text hotnutggy pi greentedsty rawyaented to the white eggplant is dried and mac clement star fe honey spin theapple purpleip to the brown radicchio is sour

This repo has been refactored by taking a large amount of code from https://github.com/XiangLi1999/Diffusion-LM (which includes some code from: https://github.com/openai/glide-text2im), thanks to the authors for their work!

The main idea was to retain just enough code to allow training a simple diffusion model and generating samples, remove image-related terms, and make it easier to use.

I've included an extremely simple corpus (data/simple-{train,test}.txt) I used for quick iterations and testing.


Table of Contents


Getting started

Setup

conda install mpi4py
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

Preparing dataset

python src/utils/custom_tokenizer.py train-word-level data/simple/simple.txt 

Training

Inference

bash scripts/text_sample.sh ckpts/simple/ema_0.9999_025000.pt 2000 10

Training from scratch on the greetings dataset

Experiments with using pre-trained models and embeddings

Controllable Generation

Gory details

Training

  1. Difference between the actual x_start and the output of the transformer model. This is the MSE loss.

  2. Mean of the xT should be close to zero. This is the tT_loss term. It is obtained by calling q_mean_variance for the t=T. q_mean_variance is like q_sample, but it returns the mean and variance of the distribution x_t | x0 instead of a sample.

  3. Decoder NLL loss. This is the decoder_nll term. It is obtained by calling token_discrete_loss. token_discrete_loss calls get_logits, which in turns uses the embeddings to convert to logits. The logits are then used to calculate the NLL loss. Essentially this is how the embeddings are trained.


    def get_logits(self, hidden_repr):
        return self.lm_head(hidden_repr)
    print(model.lm_head.weight == model.word_embedding.weight)
    print(model.lm_head.weight.shape, model.word_embedding.weight.shape)

They are identical! Intuitively, the model is trained to predict the embedded input. Thus, having a linear layer with the weights from word_embedding is like doing a nearest neighbor search. While initializing, the weights are assigned to lm_head from word_embedding under torch.no_grad(), so that the gradients are not computed for lm_head.

Evolving input

Sampling

  1. Starting with noise xT, a noisy x_start is first generated using the model.

  2. The xT and x_start are used to generate x_{T-1} using q_posterior_mean_variance (x_{T-1} ~ q(x_{T-1} | x_T, x_start)).

The process is repeated until x_0 is generated.


TODO

Opportunities for further minimization


Acknowledgements

TitleUrl
Tutorial on Denoising Diffusion-based Generative Modeling: Foundations and Applicationshttps://www.youtube.com/watch?v=cS6JQpEY9cs
Composable Text Control Operations in Latent Space with Ordinary Differential Equationshttp://arxiv.org/abs/2208.00638
Diffusion-LM Improves Controllable Text Generationhttp://arxiv.org/abs/2205.14217
Step-unrolled Denoising Autoencoders for Text Generationhttp://arxiv.org/abs/2112.06749
Latent Diffusion Energy-Based Model for Interpretable Text Modelinghttp://arxiv.org/abs/2206.05895
Parti - Scaling Autoregressive Models for Content-Rich Text-to-Image Generation (Paper Explained)https://www.youtube.com/watch?v=qS-iYnp00uc
Deep Unsupervised Learning using Nonequilibrium Thermodynamicshttp://arxiv.org/abs/1503.03585
lucidrains/denoising-diffusion-pytorchhttps://github.com/lucidrains/denoising-diffusion-pytorch
Guidance: a cheat code for diffusion modelshttps://benanne.github.io/2022/05/26/guidance.html
Cold Diffusion: Inverting Arbitrary Image Transforms Without Noisehttp://arxiv.org/abs/2208.09392
Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioninghttp://arxiv.org/abs/2208.04202
Diffusion Maps for Textual Network Embeddinghttps://proceedings.neurips.cc/paper/2018/hash/211a7a84d3d5ce4d80347da11e0c85ed-Abstract.html
Diffusion-LM Improves Controllable Text Generationhttps://github.com/XiangLi1999/Diffusion-LM
Denoising Diffusion Probabilistic Modelshttp://arxiv.org/abs/2006.11239
Variational Diffusion Modelshttp://arxiv.org/abs/2107.00630
Elucidating the Design Space of Diffusion-Based Generative Modelshttp://arxiv.org/abs/2206.00364
Diffusion Models Beat GANs on Image Synthesishttp://arxiv.org/abs/2105.05233
guided-diffusionhttps://github.com/openai/guided-diffusion
Minimal implementation of diffusion models ⚛https://github.com/VSehwag/minimal-diffusion
minDiffusionhttps://github.com/cloneofsimo/minDiffusion
What are Diffusion Models?https://lilianweng.github.io/posts/2021-07-11-diffusion-models/
High-Resolution Image Synthesis with Latent Diffusion Modelshttp://arxiv.org/abs/2112.10752
Generative Modeling by Estimating Gradients of the Data Distribution | Yang Songhttps://yang-song.net/blog/2021/score/
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Modelshttp://arxiv.org/abs/2112.10741
Blended Diffusion for Text-driven Editing of Natural Imageshttp://arxiv.org/abs/2111.14818
Generative Modeling by Estimating Gradients of the Data Distributionhttp://arxiv.org/abs/1907.05600
Diffusion Schr"odinger Bridge with Applications to Score-Based Generative Modelinghttp://arxiv.org/abs/2106.01357
Score-based Generative Modeling in Latent Spacehttp://arxiv.org/abs/2106.05931
A Connection Between Score Matching and Denoising Autoencodershttps://direct.mit.edu/neco/article/23/7/1661-1674/7677
Maximum Likelihood Training of Score-Based Diffusion Modelshttp://arxiv.org/abs/2101.09258

License