Home

Awesome

Aphantasia

<p align='center'><img src='_out/Aphantasia4.jpg' /></p>

This is a collection of text-to-image tools, evolved from the artwork of the same name.
Based on CLIP model and Lucent library, with FFT/DWT/RGB parameterizers (no-GAN generation).
Updated: Old depth estimation method is replaced with Depth Anything 2.
Tested on Python 3.7-3.11 with PyTorch from 1.7.1 to 2.3.1.

Aphantasia is the inability to visualize mental images, the deprivation of visual dreams.
The image in the header is generated by the tool from this word.

Please be kind to mention this project, if you employ it for your masterpieces

Features

Setup CLIP et cetera:

pip install -r requirements.txt
pip install git+https://github.com/openai/CLIP.git

Operations

Open In Colab

python clip_fft.py -t "the text" --size 1280-720
python clip_fft.py -i theimage.jpg --sync 0.4

If --sync X argument > 0, LPIPS loss is added to keep the composition similar to the original image.

You can combine both text and image prompts.
For non-English languages use --translate (Google translation).

python clip_fft.py -t "topic sentence" -t2 "style description" -t0 "avoid this" --size 1280-720 

Text-to-video [continuous mode]

Here is two ways of making video from the text file(s), processing it line by line in one shot.

Illustrip

New method, interpolating topics as a constant flow with permanent pan/zoom motion and optional 3D look.

Open In Colab

python illustrip.py --in_txt mycontent.txt --in_txt2 mystyles.txt --size 1280-720 --steps 100
python illustrip.py --in_txt "my super content" --in_txt2 "my super style" --size 1280-720 --steps 500

Prefixes (-pre), postfixes (-post) and "stop words" (--in_txt0) may be loaded as phrases or text files as well.
All text inputs understand syntax with weights, like good prompt :1 | also good prompt :1 | bad prompt :-0.5 (within one line).
One can also use image(s) as references with --in_img argument. Explore other arguments for more explicit control.
This method works best with direct RGB pixels optimization, but can also be used with FFT parameterization:

python illustrip.py ... --gen FFT --smooth --align uniform --colors 1.8 --contrast 1.1

To add 3D look, add --depth 0.01 to the command.

Illustra

Generates separate images for every text line (with sequences and training videos, as in single-image mode above), then renders final video from those (mixing images in FFT space) of the length duration in seconds.

Open In Colab

python illustra.py -t mysong.txt --size 1280-720 --length 155

There is --keep X parameter, controlling how well the next line/image generation follows the previous. 0 means it's randomly initiated, the higher - the stricter it will keep the original composition. Safe values are 1~2 (much higher numbers may cause the imagery getting stuck).

python interpol.py -i mydir --length 155

Other generators

<p><img src='_out/some_cute_image-VQGAN.jpg' /></p>
python cppn.py -v -t "the text" --aest 0.5
<p><img src='_out/some_cute_image-SIREN.jpg' /></p>

Credits

Based on CLIP model by OpenAI (paper).
FFT encoding is taken from Lucent library, 3D depth processing made by deKxi.

Thanks to Ryan Murdock, Jonathan Fly, Hannu Toyryla, @eduwatch2, torridgristle for ideas.

<p align='center'><img src='_out/some_cute_image-FFT.jpg' /></p>