Awesome

Stable Diffusion inference on Intel Arc and Data Center GPUs

The default implementation uses PyTorch, i have provided a TensorFlow version as well in the directoy tensorflow.

Update: If you want to benchmark stable diffusion on your Intel dGPUs and CPUs, checkout my other repo .

Now that we have our Arc discrete GPU setup on Linux, let's try to run Stable Diffusion model using it. The gpu we are using is an Arc A770 16 GB card.

A quick recap / updated steps to set up Arc (Intel dGPUs) on Linux

Please follow the documentation on how to set up Intel dGPUs on Linux. After setting up your environment you can verify if everything works by running the python xpu_test.py script from the xpu_verify repository.

Stable Diffusion

There are two versions a Text to image version and an image to image version, cd to corresponding directories to get the code for each. For eg:

Text-to-Image Model Using Stable Diffusion

Stable Diffusion is a fully open-source (thank you Stability.ai) deep learning text to image and image to image model. For more information on the model, checkout the wikipedia entry for the same.

PyTorch

To use PyTorch on Intel GPUs (xpus), we need to install, the Intel extensions for PyTorch or ipex.

Installation

Clone this repository:

git clone https://github.com/rahulunair/stable_diffusion_arc

Change your directory to the cloned repository:
```
cd stable_diffusion_arc
```

Install the necessary packages:

Option 1:

conda env create -f environment.yml

Option 2:

conda create -n diffusion_env python=3.10 -y
conda activate diffusion_env
pip install decorator
# need the oneapi basekit >= 2023.2.0name: diffusion_env2
pip install torch==2.0.1a0 torchvision==0.15.2a0 -f https://developer.intel.com/ipex-whl-stable-xpu
pip install intel_extension_for_pytorch==2.0.110+xpu -f https://developer.intel.com/ipex-whl-stable-xpu
# if using oneapi baekit <= 2023.1.0
# pip install  torch==1.13.0a0+git6c9b55e intel_extension_for_pytorch==1.13.120+xpu -f https://developer.intel.com/ipex-whl-stable-xpu
pip install acclerate transformers diffusers validators

Option 3: If intel AIKit is already installed, then to setup your environment please run:

conda activate pytorch-gpu &&\
 conda install intelpython3_full &&\
 conda install -c conda-forge jupyterlab &&\
 conda install -c conda-forge diffusers transformers &&\
 conda install accelerate pillow ipywidgets ipython   -c conda-forge &&\
 pip install  invisible-watermark


6. Run stable diffusion

#### Monolithic version

The monolithic implementation can be found in the `sd_text_to_img.py` file. This version contains all necessary components, including model definition, loading, optimization, and inference code, in a single script. 
You can run the script from the command line as follows:

```bash
python sd_text_to_img.py

Client-server version

The client-server implementation splits the functionality into two separate scripts: sd_server.py for the server and sd_client.py for the client.

We have two additional requirements for this version:

pip install flask requests

To use this version, first start the server by running:

python sd_server.py

Then, in a separate terminal window, run the client script:

python sd_client.py

UI version

Run the server first:

python server.py

Then run gradio based ui web app:

python sd_ui.py

You should now be able to interact with a screen like this:

All versions offer the same functionality from a user's perspective. The monolithic version may be simpler to set up and run, as it doesn't require running two separate scripts. However, the client-server version could offer better performance for large-scale tasks, as it allows the server to handle multiple requests simultaneously.

Supported Models:

1. CompVis/stable-diffusion-v1-4
2. stabilityai/stable-diffusion-2-1
3. stabilityai/stable-diffusion-xl-base-1.0

This repository contains two versions of a text-to-image model using Stable Diffusion: a monolithic implementation and a client-server implementation. Both versions offer a similar user interface, allowing you to choose the version that best suits your needs.

Gist on how to run on intel xpus

PyTorch

For the optimized version use sd_pytorch.py, but just to get an over all idea of what we are doing, here is the pytorch and tensorflow version:

import intel_extension_for_pytorch
import torch
from diffusers import StableDiffusionPipeline

model_id="runwayml/stable-diffusion-v1-5"
prompt = "vivid red hot air ballons over paris in the evening"
pipe = StableDiffusionPipeline.from_pretrained(
    model_id,
    torch_dtype=torch.float16,  # this can be torch.float32 as well
    revision="fp16",
    use_auth_token="<the token you generated>")
pipe = pipe.to("xpu")
image = pipe(prompt).images[0]
image.save(f"{prompt[:5]}.png")

Executing this, we get the result:

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:03<00:00, 12.70it/s]

As you can see the first time you run the model, it takes about 35 seconds, subsequent runs take about 10 seconds, you can expect this number to double when using fp32.

TensorFlow

Moving on to TensorFlow, we have this awesome repo from divamgupta:

Create a conda environment with Python 3.9 and install tensorflow and intel_extension_for_tensorflow wheels.

~ → conda create -n itex python=3.9 -y

~ → conda activate itex
~ → pip install tensorflow==2.10.0
~ → pip install --upgrade intel-extension-for-tensorflow[gpu]

Let's see how to run the model using PyTorch first,

Install stable_diffusion_tensorflow package and dependencies

~ → pip install git+https://github.com/divamgupta/stable-diffusion-tensorflow ftfy pillow tqdm regex tensorflow-addons

Run stable diffusion

Running the TensorFlow model is straightforward as there are no user tokens or anything like that required.

import intel_extension_for_tensorflow
import tensorflow
from stable_diffusion_tf.stable_diffusion import StableDiffusion
from PIL import Image

prompt = "vivid red hot air ballons over paris in the evening"
generator = StableDiffusion(
    img_height=512,
    img_width=512,
    jit_compile=False,
)

img = generator.generate(
    prompt,
    num_steps=50,
    unconditional_guidance_scale=7.5,
    temperature=1,
    batch_size=1,
)
Image.fromarray(img[0]).save("sd_tf_fp32.png")