Awesome
Mukh-Oboyob
Official Implementation of the IJACSA paper : Mukh-Oboyob: Stable Diffusion and BanglaBERT enhanced Bangla Text-to-Face Synthesis
Environment Setup
Run the following commands on your anaconda environment.
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install diffusers==0.22.0
pip install transformers==4.29.2
pip install matplotlib==3.7.2
Library Modifications
- I modified the stable diffusion pipeline to load the `BanglaBERT`` text encoder along with setting the max sequence length to 150 (reason explained in my paper). So you have to take the file from
C:\Users\USER_NAME\anaconda3\envs\ENVIRONMENT_NAME\Lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_stable_diffusion.py
and replace it with the file from this github repository
Codernob\Mukh-Oboyob\library codes\pipeline_stable_diffusion.py
Don't forget to replace USER_NAME
with the user name in which your anaconda distribution is installed and replace ENVIRONMENT_NAME
with your intended anaconda environment.
- The BanglaBERT text encoder is not supported natively by the stable diffusion pipeline, as it expects a CLIP text encoder by default. So I had to comment out some code that checks for CLIP in pipeline_utils.py. Therefore, replace the file from
C:\Users\USER_NAME\anaconda3\envs\ENVIRONMENT_NAME\Lib\site-packages\diffusers\pipelines\pipeline_utils.py
and replace it with the file from my github repository
Codernob\Mukh-Oboyob\library codes\pipeline_utils.py
- From my experience, the safety checker gives many false positives. So I turned it off. If you want to do so, replace this file
C:\Users\USER_NAME\anaconda3\envs\ENVIRONMENT_NAME\Lib\site-packages\diffusers\pipelines\stable_diffusion\safety_checker.py
with
Codernob\Mukh-Oboyob\library codes\safety_checker.py
Inference
Now you should be able to run Codernob\Mukh-Oboyob\inference\sample inference.ipynb
Inference using Huggingface
Alternatively you can use huggingface, pretrained model is uploaded to Hugging Face. To run from Huggingface, use the code snippet below.
from diffusers import DiffusionPipeline
device="cuda"
pipeline = DiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
custom_pipeline="gr33nr1ng3r/Mukh-Oboyob"
)
pipeline.unet.load_attn_procs("gr33nr1ng3r/Mukh-Oboyob")
pipeline.to(device)
prompt = "মেয়েটির কালো চুল ছিল। মেয়েটির মুখে ভারী মেকাপ ছিল। মেয়েটির উঁচু গালের হাড় ছিল। মেয়েটির মুখ কিছুটা খোলা ছিল। মেয়েটির চেহারা ডিম্বাকৃতির। মেয়েটির চোখা নাক ছিল। মেয়েটির ঢেউ খেলানো চুল ছিল। মেয়েটির কানে দুল পরা ছিল। মেয়েটির লিপস্টিক পরা ছিল। "
image = pipeline(prompt, num_inference_steps=200, guidance_scale=7.5,height=128,width=128).images[0]
image
Example google colab notebook : https://colab.research.google.com/drive/1QGoaVVr89htsOx4jndKC_89IQhd7ywmT?usp=sharing
Training
Extract metadata.jsonl
from Mukh-Oboyob/dataset/celeba/train/metadata.zip
and place on Mukh-Oboyob/dataset/celeba/train/
Now go to Mukh-Oboyob/training script/
and run train_text_to_image_lora.py
.
Citation
If you use my code or ideas from my paper in your work, please cite my paper.
@article{Saha2023,
title = {Mukh-Oboyob: Stable Diffusion and BanglaBERT enhanced Bangla Text-to-Face Synthesis},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2023.01411142},
url = {http://dx.doi.org/10.14569/IJACSA.2023.01411142},
year = {2023},
publisher = {The Science and Information Organization},
volume = {14},
number = {11},
author = {Aloke Kumar Saha and Noor Mairukh Khan Arnob and Nakiba Nuren Rahman and Maria Haque and Shah Murtaza Rashid Al Masud and Rashik Rahman}
}