Home

Awesome

<p align="center"> <!--startmsg--> <a href="#server"><img src="./.github/deprecation-banner.svg?raw=true"></a> <!--endmsg--><p align="center"> <img src="https://github.com/jina-ai/dalle-flow/blob/main/.github/banner.svg?raw=true" alt="DALL·E Flow: A Human-in-the-loop workflow for creating HD images from text" width="60%"> <br> <b>A Human-in-the-loop<sup><a href="https://en.wikipedia.org/wiki/Human-in-the-loop">?</a></sup> workflow for creating HD images from text</b> </p> <p align=center> <a href="https://discord.jina.ai"><img src="https://img.shields.io/discord/1106542220112302130?logo=discord&logoColor=white&style=flat-square"></a> <a href="https://colab.research.google.com/github/jina-ai/dalle-flow/blob/main/client.ipynb"><img src="https://img.shields.io/badge/Open-in%20Colab-brightgreen?logo=google-colab&style=flat-square" alt="Open in Google Colab"/></a> <a href="https://hub.docker.com/r/jinaai/dalle-flow"><img alt="Docker Image Size (latest by date)" src="https://img.shields.io/docker/image-size/jinaai/dalle-flow?logo=docker&logoColor=white&style=flat-square"></a> </p>

DALL·E Flow is an interactive workflow for generating high-definition images from text prompt. First, it leverages DALL·E-Mega, GLID-3 XL, and Stable Diffusion to generate image candidates, and then calls CLIP-as-service to rank the candidates w.r.t. the prompt. The preferred candidate is fed to GLID-3 XL for diffusion, which often enriches the texture and background. Finally, the candidate is upscaled to 1024x1024 via SwinIR.

DALL·E Flow is built with Jina in a client-server architecture, which gives it high scalability, non-blocking streaming, and a modern Pythonic interface. Client can interact with the server via gRPC/Websocket/HTTP with TLS.

Why Human-in-the-loop? Generative art is a creative process. While recent advances of DALL·E unleash people's creativity, having a single-prompt-single-output UX/UI locks the imagination to a single possibility, which is bad no matter how fine this single result is. DALL·E Flow is an alternative to the one-liner, by formalizing the generative art as an iterative procedure.

Usage

DALL·E Flow is in client-server architecture.

Updates

Gallery

<img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/a%20realistic%20photo%20of%20a%20muddy%20dog.png?raw=true" width="32%" alt="a realistic photo of a muddy dog" title="a realistic photo of a muddy dog"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/A%20scientist%20comparing%20apples%20and%20oranges%2C%20by%20Norman%20Rockwell.png?raw=true" width="32%" alt="A scientist comparing apples and oranges, by Norman Rockwell" title="A scientist comparing apples and oranges, by Norman Rockwell"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/an%20oil%20painting%20portrait%20of%20the%20regal%20Burger%20King%20posing%20with%20a%20Whopper.png?raw=true" width="32%" alt="an oil painting portrait of the regal Burger King posing with a Whopper" title="an oil painting portrait of the regal Burger King posing with a Whopper"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/Eternal%20clock%20powered%20by%20a%20human%20cranium%2C%20artstation.png?raw=true" width="32%" alt="Eternal clock powered by a human cranium, artstation" title="Eternal clock powered by a human cranium, artstation"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/another%20planet%20amazing%20landscape.png?raw=true" width="32%" alt="another planet amazing landscape" title="another planet amazing landscape"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/The%20Decline%20and%20Fall%20of%20the%20Roman%20Empire%20board%20game%20kickstarter.png?raw=true" width="32%" alt="The Decline and Fall of the Roman Empire board game kickstarter" title="The Decline and Fall of the Roman Empire board game kickstarter"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/A%20raccoon%20astronaut%20with%20the%20cosmos%20reflecting%20on%20the%20glass%20of%20his%20helmet%20dreaming%20of%20the%20stars%2C%20digital%20art.png?raw=true" width="32%" alt="A raccoon astronaut with the cosmos reflecting on the glass of his helmet dreaming of the stars, digital art" title="A raccoon astronaut with the cosmos reflecting on the glass of his helmet dreaming of the stars, digital art"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/A%20photograph%20of%20an%20apple%20that%20is%20a%20disco%20ball%2C%2085%20mm%20lens%2C%20studio%20lighting.png?raw=true" width="32%" alt="A photograph of an apple that is a disco ball, 85 mm lens, studio lighting" title="A photograph of an apple that is a disco ball, 85 mm lens, studio lighting"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/a%20cubism%20painting%20Donald%20trump%20happy%20cyberpunk.png?raw=true" width="32%" alt="a cubism painting Donald trump happy cyberpunk" title="a cubism painting Donald trump happy cyberpunk"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/oil%20painting%20of%20a%20hamster%20drinking%20tea%20outside.png?raw=true" width="32%" alt="oil painting of a hamster drinking tea outside" title="oil painting of a hamster drinking tea outside"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/Colossus%20of%20Rhodes%20by%20Max%20Ernst.png?raw=true" width="32%" alt="Colossus of Rhodes by Max Ernst" title="Colossus of Rhodes by Max Ernst"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/landscape%20with%20great%20castle%20in%20middle%20of%20forest.png?raw=true" width="32%" alt="landscape with great castle in middle of forest" title="landscape with great castle in middle of forest"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/an%20medieval%20oil%20painting%20of%20Kanye%20west%20feels%20satisfied%20while%20playing%20chess%20in%20the%20style%20of%20Expressionism.png?raw=true" width="32%" alt="an medieval oil painting of Kanye west feels satisfied while playing chess in the style of Expressionism" title="an medieval oil painting of Kanye west feels satisfied while playing chess in the style of Expressionism"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/An%20oil%20pastel%20painting%20of%20an%20annoyed%20cat%20in%20a%20spaceship.png?raw=true" width="32%" alt="An oil pastel painting of an annoyed cat in a spaceship" title="An oil pastel painting of an annoyed cat in a spaceship"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/dinosaurs%20at%20the%20brink%20of%20a%20nuclear%20disaster.png?raw=true" width="32%" alt="dinosaurs at the brink of a nuclear disaster" title="dinosaurs at the brink of a nuclear disaster"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/fantasy%20landscape%20with%20medieval%20city.png?raw=true" width="32%" alt="fantasy landscape with medieval city" title="fantasy landscape with medieval city"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/GPU%20chip%20in%20the%20form%20of%20an%20avocado%2C%20digital%20art.png?raw=true" width="32%" alt="GPU chip in the form of an avocado, digital art" title="GPU chip in the form of an avocado, digital art"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/a%20giant%20rubber%20duck%20in%20the%20ocean.png?raw=true" width="32%" alt="a giant rubber duck in the ocean" title="a giant rubber duck in the ocean"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/Paddington%20bear%20as%20austrian%20emperor%20in%20antique%20black%20%26%20white%20photography.png?raw=true" width="32%" alt="Paddington bear as austrian emperor in antique black & white photography" title="Paddington bear as austrian emperor in antique black & white photography"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/a%20rainy%20night%20with%20a%20superhero%20perched%20above%20a%20city%2C%20in%20the%20style%20of%20a%20comic%20book.png?raw=true" width="32%" alt="a rainy night with a superhero perched above a city, in the style of a comic book" title="a rainy night with a superhero perched above a city, in the style of a comic book"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/A%20synthwave%20style%20sunset%20above%20the%20reflecting%20water%20of%20the%20sea%2C%20digital%20art.png?raw=true" width="32%" alt="A synthwave style sunset above the reflecting water of the sea, digital art" title="A synthwave style sunset above the reflecting water of the sea, digital art"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/an%20oil%20painting%20of%20ocean%20beach%20front%20in%20the%20style%20of%20Titian.png?raw=true" width="32%" alt="an oil painting of ocean beach front in the style of Titian" title="an oil painting of ocean beach front in the style of Titian"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/an%20oil%20painting%20of%20Klingon%20general%20in%20the%20style%20of%20Rubens.png?raw=true" width="32%" alt="an oil painting of Klingon general in the style of Rubens" title="an oil painting of Klingon general in the style of Rubens"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/city%2C%20top%20view%2C%20cyberpunk%2C%20digital%20realistic%20art.png?raw=true" width="32%" alt="city, top view, cyberpunk, digital realistic art" title="city, top view, cyberpunk, digital realistic art"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/an%20oil%20painting%20of%20a%20medieval%20cyborg%20automaton%20made%20of%20magic%20parts%20and%20old%20steampunk%20mechanics.png?raw=true" width="32%" alt="an oil painting of a medieval cyborg automaton made of magic parts and old steampunk mechanics" title="an oil painting of a medieval cyborg automaton made of magic parts and old steampunk mechanics"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/a%20watercolour%20painting%20of%20a%20top%20view%20of%20a%20pirate%20ship%20sailing%20on%20the%20clouds.png?raw=true" width="32%" alt="a watercolour painting of a top view of a pirate ship sailing on the clouds" title="a watercolour painting of a top view of a pirate ship sailing on the clouds"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/a%20knight%20made%20of%20beautiful%20flowers%20and%20fruits%20by%20Rachel%20ruysch%20in%20the%20style%20of%20Syd%20brak.png?raw=true" width="32%" alt="a knight made of beautiful flowers and fruits by Rachel ruysch in the style of Syd brak" title="a knight made of beautiful flowers and fruits by Rachel ruysch in the style of Syd brak"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/a%203D%20render%20of%20a%20rainbow%20colored%20hot%20air%20balloon%20flying%20above%20a%20reflective%20lake.png?raw=true" width="32%" alt="a 3D render of a rainbow colored hot air balloon flying above a reflective lake" title="a 3D render of a rainbow colored hot air balloon flying above a reflective lake"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/a%20teddy%20bear%20on%20a%20skateboard%20in%20Times%20Square%20.png?raw=true" width="32%" alt="a teddy bear on a skateboard in Times Square " title="a teddy bear on a skateboard in Times Square "><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/cozy%20bedroom%20at%20night.png?raw=true" width="32%" alt="cozy bedroom at night" title="cozy bedroom at night"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/an%20oil%20painting%20of%20monkey%20using%20computer.png?raw=true" width="32%" alt="an oil painting of monkey using computer" title="an oil painting of monkey using computer"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/the%20diagram%20of%20a%20search%20machine%20invented%20by%20Leonardo%20da%20Vinci.png?raw=true" width="32%" alt="the diagram of a search machine invented by Leonardo da Vinci" title="the diagram of a search machine invented by Leonardo da Vinci"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/A%20stained%20glass%20window%20of%20toucans%20in%20outer%20space.png?raw=true" width="32%" alt="A stained glass window of toucans in outer space" title="A stained glass window of toucans in outer space"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/a%20campfire%20in%20the%20woods%20at%20night%20with%20the%20milky-way%20galaxy%20in%20the%20sky.png?raw=true" width="32%" alt="a campfire in the woods at night with the milky-way galaxy in the sky" title="a campfire in the woods at night with the milky-way galaxy in the sky"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/Bionic%20killer%20robot%20made%20of%20AI%20scarab%20beetles.png?raw=true" width="32%" alt="Bionic killer robot made of AI scarab beetles" title="Bionic killer robot made of AI scarab beetles"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/The%20Hanging%20Gardens%20of%20Babylon%20in%20the%20middle%20of%20a%20city%2C%20in%20the%20style%20of%20Dal%C3%AD.png?raw=true" width="32%" alt="The Hanging Gardens of Babylon in the middle of a city, in the style of Dalí" title="The Hanging Gardens of Babylon in the middle of a city, in the style of Dalí"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/painting%20oil%20of%20Izhevsk.png?raw=true" width="32%" alt="painting oil of Izhevsk" title="painting oil of Izhevsk"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/a%20hyper%20realistic%20photo%20of%20a%20marshmallow%20office%20chair.png?raw=true" width="32%" alt="a hyper realistic photo of a marshmallow office chair" title="a hyper realistic photo of a marshmallow office chair"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/fantasy%20landscape%20with%20city.png?raw=true" width="32%" alt="fantasy landscape with city" title="fantasy landscape with city"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/ocean%20beach%20front%20view%20in%20Van%20Gogh%20style.png?raw=true" width="32%" alt="ocean beach front view in Van Gogh style" title="ocean beach front view in Van Gogh style"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/An%20oil%20painting%20of%20a%20family%20reunited%20inside%20of%20an%20airport%2C%20digital%20art.png?raw=true" width="32%" alt="An oil painting of a family reunited inside of an airport, digital art" title="An oil painting of a family reunited inside of an airport, digital art"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/antique%20photo%20of%20a%20knight%20riding%20a%20T-Rex.png?raw=true" width="32%" alt="antique photo of a knight riding a T-Rex" title="antique photo of a knight riding a T-Rex"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/a%20top%20view%20of%20a%20pirate%20ship%20sailing%20on%20the%20clouds.png?raw=true" width="32%" alt="a top view of a pirate ship sailing on the clouds" title="a top view of a pirate ship sailing on the clouds"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/an%20oil%20painting%20of%20a%20humanoid%20robot%20playing%20chess%20in%20the%20style%20of%20Matisse.png?raw=true" width="32%" alt="an oil painting of a humanoid robot playing chess in the style of Matisse" title="an oil painting of a humanoid robot playing chess in the style of Matisse"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/a%20cubism%20painting%20of%20a%20cat%20dressed%20as%20French%20emperor%20Napoleon.png?raw=true" width="32%" alt="a cubism painting of a cat dressed as French emperor Napoleon" title="a cubism painting of a cat dressed as French emperor Napoleon"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/a%20husky%20dog%20wearing%20a%20hat%20with%20sunglasses.png?raw=true" width="32%" alt="a husky dog wearing a hat with sunglasses" title="a husky dog wearing a hat with sunglasses"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/A%20mystical%20castle%20appears%20between%20the%20clouds%20in%20the%20style%20of%20Vincent%20di%20Fate.png?raw=true" width="32%" alt="A mystical castle appears between the clouds in the style of Vincent di Fate" title="A mystical castle appears between the clouds in the style of Vincent di Fate"><img src="https://github.com/hanxiao/dalle/blob/gallery/.github/gallery/golden%20gucci%20airpods%20realistic%20photo.png?raw=true" width="32%" alt="golden gucci airpods realistic photo" title="golden gucci airpods realistic photo">

Client

<a href="https://colab.research.google.com/github/jina-ai/dalle-flow/blob/main/client.ipynb"><img src="https://img.shields.io/badge/Open-in%20Colab-orange?logo=google-colab&style=flat-square" alt="Open in Google Colab"/></a>

Using client is super easy. The following steps are best run in Jupyter notebook or Google Colab.

You will need to install DocArray and Jina first:

pip install "docarray[common]>=0.13.5" jina

We have provided a demo server for you to play:

⚠️ Due to the massive requests, our server may be delay in response. Yet we are very confident on keeping the uptime high. You can also deploy your own server by following the instruction here.

server_url = 'grpcs://dalle-flow.dev.jina.ai'

Step 1: Generate via DALL·E Mega

Now let's define the prompt:

prompt = 'an oil painting of a humanoid robot playing chess in the style of Matisse'

Let's submit it to the server and visualize the results:

from docarray import Document

doc = Document(text=prompt).post(server_url, parameters={'num_images': 8})
da = doc.matches

da.plot_image_sprites(fig_size=(10,10), show_index=True)

Here we generate 24 candidates, 8 from DALLE-mega, 8 from GLID3 XL, and 8 from Stable Diffusion, this is as defined in num_images, which takes about ~2 minutes. You can use a smaller value if it is too long for you.

<p align="center"> <img src="https://github.com/jina-ai/dalle-flow/blob/main/.github/client-dalle.png?raw=true" width="70%"> </p>

Step 2: Select and refinement via GLID3 XL

The 24 candidates are sorted by CLIP-as-service, with index-0 as the best candidate judged by CLIP. Of course, you may think differently. Notice the number in the top-left corner? Select the one you like the most and get a better view:

fav_id = 3
fav = da[fav_id]
fav.embedding = doc.embedding
fav.display()
<p align="center"> <img src="https://github.com/jina-ai/dalle-flow/blob/main/.github/client-select1.png?raw=true" width="30%"> </p>

Now let's submit the selected candidates to the server for diffusion.

diffused = fav.post(f'{server_url}', parameters={'skip_rate': 0.5, 'num_images': 36}, target_executor='diffusion').matches

diffused.plot_image_sprites(fig_size=(10,10), show_index=True)

This will give 36 images based on the selected image. You may allow the model to improvise more by giving skip_rate a near-zero value, or a near-one value to force its closeness to the given image. The whole procedure takes about ~2 minutes.

<p align="center"> <img src="https://github.com/jina-ai/dalle-flow/blob/main/.github/client-glid.png?raw=true" width="60%"> </p>

Step 3: Select and upscale via SwinIR

Select the image you like the most, and give it a closer look:

dfav_id = 34
fav = diffused[dfav_id]
fav.display()
<p align="center"> <img src="https://github.com/jina-ai/dalle-flow/blob/main/.github/client-select2.png?raw=true" width="30%"> </p>

Finally, submit to the server for the last step: upscaling to 1024 x 1024px.

fav = fav.post(f'{server_url}/upscale')
fav.display()

That's it! It is the one. If not satisfied, please repeat the procedure.

<p align="center"> <img src="https://github.com/jina-ai/dalle-flow/blob/main/.github/client-select3.png?raw=true" width="50%"> </p>

Btw, DocArray is a powerful and easy-to-use data structure for unstructured data. It is super productive for data scientists who work in cross-/multi-modal domain. To learn more about DocArray, please check out the docs.

Server

You can host your own server by following the instruction below.

Hardware requirements

DALL·E Flow needs one GPU with 21GB VRAM at its peak. All services are squeezed into this one GPU, this includes (roughly)

The following reasonable tricks can be used for further reducing VRAM:

It requires at least 50GB free space on the hard drive, mostly for downloading pretrained models.

High-speed internet is required. Slow/unstable internet may throw frustrating timeout when downloading models.

CPU-only environment is not tested and likely won't work. Google Colab is likely throwing OOM hence also won't work.

Server architecture

<p align="center"> <img src="https://github.com/jina-ai/dalle-flow/blob/main/.github/flow.svg?raw=true" width="70%"> </p>

If you have installed Jina, the above flowchart can be generated via:

# pip install jina
jina export flowchart flow.yml flow.svg

Stable Diffusion weights

If you want to use Stable Diffusion, you will first need to register an account on the website Huggingface and agree to the terms and conditions for the model. After logging in, you can find the version of the model required by going here:

CompVis / sd-v1-5-inpainting.ckpt

Under the Download the Weights section, click the link for sd-v1-x.ckpt. The latest weights at the time of writing are sd-v1-5.ckpt.

DOCKER USERS: Put this file into a folder named ldm/stable-diffusion-v1 and rename it model.ckpt. Follow the instructions below carefully because SD is not enabled by default.

NATIVE USERS: Put this file into dalle/stable-diffusion/models/ldm/stable-diffusion-v1/model.ckpt after finishing the rest of the steps under "Run natively". Follow the instructions below carefully because SD is not enabled by default.

Run in Docker

Prebuilt image

We have provided a prebuilt Docker image that can be pull directly.

docker pull jinaai/dalle-flow:latest

Build it yourself

We have provided a Dockerfile which allows you to run a server out of the box.

Our Dockerfile is using CUDA 11.6 as the base image, you may want to adjust it according to your system.

git clone https://github.com/jina-ai/dalle-flow.git
cd dalle-flow

docker build --build-arg GROUP_ID=$(id -g ${USER}) --build-arg USER_ID=$(id -u ${USER}) -t jinaai/dalle-flow .

The building will take 10 minutes with average internet speed, which results in a 18GB Docker image.

Run container

To run it, simply do:

docker run -p 51005:51005 \
  -it \
  -v $HOME/.cache:/home/dalle/.cache \
  --gpus all \
  jinaai/dalle-flow

Alternatively, you may also run with some workflows enabled or disabled to prevent out-of-memory crashes. To do that, pass one of these environment variables:

DISABLE_DALLE_MEGA
DISABLE_GLID3XL
DISABLE_SWINIR
ENABLE_STABLE_DIFFUSION
ENABLE_CLIPSEG
ENABLE_REALESRGAN

For example, if you would like to disable GLID3XL workflows, run:

docker run -e DISABLE_GLID3XL='1' \
  -p 51005:51005 \
  -it \
  -v $HOME/.cache:/home/dalle/.cache \
  --gpus all \
  jinaai/dalle-flow

Special instructions for Stable Diffusion and Docker

Stable Diffusion may only be enabled if you have downloaded the weights and make them available as a virtual volume while enabling the environmental flag (ENABLE_STABLE_DIFFUSION) for SD.

You should have previously put the weights into a folder named ldm/stable-diffusion-v1 and labeled them model.ckpt. Replace YOUR_MODEL_PATH/ldm below with the path on your own system to pipe the weights into the docker image.

docker run -e ENABLE_STABLE_DIFFUSION="1" \
  -e DISABLE_DALLE_MEGA="1" \
  -e DISABLE_GLID3XL="1" \
  -p 51005:51005 \
  -it \
  -v YOUR_MODEL_PATH/ldm:/dalle/stable-diffusion/models/ldm/ \
  -v $HOME/.cache:/home/dalle/.cache \
  --gpus all \
  jinaai/dalle-flow

You should see the screen like following once running:

<p align="center"> <img src="https://github.com/jina-ai/dalle-flow/blob/main/.github/docker-run.png?raw=true" width="50%"> </p>

Note that unlike running natively, running inside Docker may give less vivid progressbar, color logs, and prints. This is due to the limitations of the terminal in a Docker container. It does not affect the actual usage.

Run natively

Running natively requires some manual steps, but it is often easier to debug.

Clone repos

mkdir dalle && cd dalle
git clone https://github.com/jina-ai/dalle-flow.git
git clone https://github.com/jina-ai/SwinIR.git
git clone --branch v0.0.15 https://github.com/AmericanPresidentJimmyCarter/stable-diffusion.git
git clone https://github.com/CompVis/latent-diffusion.git
git clone https://github.com/jina-ai/glid-3-xl.git
git clone https://github.com/timojl/clipseg.git

You should have the following folder structure:

dalle/
 |
 |-- Real-ESRGAN/
 |-- SwinIR/
 |-- clipseg/
 |-- dalle-flow/
 |-- glid-3-xl/
 |-- latent-diffusion/
 |-- stable-diffusion/

Install auxiliary repos

cd dalle-flow
python3 -m virtualenv env
source env/bin/activate && cd -
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
pip install numpy tqdm pytorch_lightning einops numpy omegaconf
pip install https://github.com/crowsonkb/k-diffusion/archive/master.zip
pip install git+https://github.com/AmericanPresidentJimmyCarter/stable-diffusion.git@v0.0.15
pip install basicsr facexlib gfpgan
pip install realesrgan
pip install https://github.com/AmericanPresidentJimmyCarter/xformers-builds/raw/master/cu116/xformers-0.0.14.dev0-cp310-cp310-linux_x86_64.whl && \
cd latent-diffusion && pip install -e . && cd -
cd stable-diffusion && pip install -e . && cd -
cd SwinIR && pip install -e . && cd -
cd glid-3-xl && pip install -e . && cd -
cd clipseg && pip install -e . && cd -

There are couple models we need to download for GLID-3-XL if you are using that:

cd glid-3-xl
wget https://dall-3.com/models/glid-3-xl/bert.pt
wget https://dall-3.com/models/glid-3-xl/kl-f8.pt
wget https://dall-3.com/models/glid-3-xl/finetune.pt
cd -

Both clipseg and RealESRGAN require you to set a correct cache folder path, typically something like $HOME/.

Install flow

cd dalle-flow
pip install -r requirements.txt
pip install jax~=0.3.24

Start the server

Now you are under dalle-flow/, run the following command:

# Optionally disable some generative models with the following flags when
# using flow_parser.py:
# --disable-dalle-mega
# --disable-glid3xl
# --disable-swinir
# --enable-stable-diffusion
python flow_parser.py
jina flow --uses flow.tmp.yml

You should see this screen immediately:

<p align="center"> <img src="https://github.com/jina-ai/dalle-flow/blob/main/.github/server-onstart.png?raw=true" width="50%"> </p>

On the first start it will take ~8 minutes for downloading the DALL·E mega model and other necessary models. The proceeding runs should only take ~1 minute to reach the success message.

<p align="center"> <img src="https://github.com/jina-ai/dalle-flow/blob/main/.github/server-wait.png?raw=true" width="50%"> </p>

When everything is ready, you will see:

<p align="center"> <img src="https://github.com/jina-ai/dalle-flow/blob/main/.github/server-success.png?raw=true" width="50%"> </p>

Congrats! Now you should be able to run the client.

You can modify and extend the server flow as you like, e.g. changing the model, adding persistence, or even auto-posting to Instagram/OpenSea. With Jina and DocArray, you can easily make DALL·E Flow cloud-native and ready for production.

Use the CLIP-as-service

To reduce the usage of vRAM, you can use the CLIP-as-service as an external executor freely available at grpcs://api.clip.jina.ai:2096.
First, make sure you have created an access token from console website, or CLI as following

jina auth token create <name of PAT> -e <expiration days>

Then, you need to change the executor related configs (host, port, external, tls and grpc_metadata) from flow.yml.

...
  - name: clip_encoder
    uses: jinahub+docker://CLIPTorchEncoder/latest-gpu
    host: 'api.clip.jina.ai'
    port: 2096
    tls: true
    external: true
    grpc_metadata:
      authorization: "<your access token>"
    needs: [gateway]
...
  - name: rerank
    uses: jinahub+docker://CLIPTorchEncoder/latest-gpu
    host: 'api.clip.jina.ai'
    port: 2096
    uses_requests:
      '/': rank
    tls: true
    external: true
    grpc_metadata:
      authorization: "<your access token>"
    needs: [dalle, diffusion]

You can also use the flow_parser.py to automatically generate and run the flow with using the CLIP-as-service as external executor:

python flow_parser.py --cas-token "<your access token>'
jina flow --uses flow.tmp.yml

⚠️ grpc_metadata is only available after Jina v3.11.0. If you are using an older version, please upgrade to the latest version.

Now, you can use the free CLIP-as-service in your flow.

<!-- start support-pitch -->

Support

Join Us

DALL·E Flow is backed by Jina AI and licensed under Apache-2.0. We are actively hiring AI engineers, solution engineers to build the next neural search ecosystem in open-source.

<!-- end support-pitch -->