Home

Awesome

Visual Anagrams | Factorized Diffusion

NOTE: This repo contains code for both Visual Anagrams and Factorized Diffusion.

Please see this readme for info about factorized diffusion.

Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models

CVPR 2024 (Oral)

Daniel Geng, Aaron Park, Andrew Owens

[Arxiv] [Website] [Colab (Free Tier)] [Colab (Pro Tier)]

Open In Colab (Free Tier) <sub>(Free Tier)</sub>

Open In Colab (Pro Tier) <sub>(Colab Pro)</sub>

teaser

This repo contains code to generate visual anagrams and other multi-view optical illusions. These are images that change appearance or identity when transformed, such as by a rotation, a color inversion, or a jigsaw rearrangement. Please read our paper or visit our website for more details.

Colab Demos

We provide two colab demos. One was graciously written by Tamizh N, and is memory efficient enough to be run with Colab Free Tier resources (at the cost of just slightly more inconvenience):

Open In Colab (Free Tier) <sub>(Free Tier)</sub>

For people with, or willing to obtain, a Colab Pro subscription we also have the following notebook, which requires a High RAM and V100 runtime, but is slightly more convenient to use:

Open In Colab (Pro Tier) <sub>(Colab Pro)</sub>

Installation

Conda Environment

Create a conda env by running (only on Linux):

conda env create -f environment.yml

and then activate it by running

conda activate visual_anagrams

DeepFloyd

Our method uses DeepFloyd IF, a pixel-based diffusion model. We do not use Stable Diffusion because latent diffusion models cause artifacts in illusions (see our paper for more details).

Before using DeepFloyd IF, you must accept its usage conditions. To do so:

  1. Make sure to have a Hugging Face account and be logged in.
  2. Accept the license on the model card of DeepFloyd/IF-I-XL-v1.0. Accepting the license on the stage I model card will auto accept for the other IF models.
  3. Log in locally by running
python huggingface_login.py

and entering your Hugging Face Hub access token when prompted. It does not matter how you answer the Add token as git credential? (Y/n) question.

Usage

To generate 90 degree rotation illusions we can use the below command. This will create 10 samples, at 3 different sizes: 64×64, 256×256, and 1024×1024. See below for commands to generate more types of multi-view illusions.

python generate.py --name rotate_cw.village.horse --prompts "a snowy mountain village" "a horse" --style "an oil painting of" --views identity rotate_cw --num_samples 10 --num_inference_steps 30 --guidance_scale 10.0 --generate_1024

Here is a description of useful arguments:

Upscaling

We use the first two stages of DeepFloyd IF to generate a 64×64 and 256×256 multi-view illusion. DeepFloyd further uses the Stable Diffusion x4 Upscaler to go from 256×256 to 1024×1024. However, this model uses latents and we therefore did not or cannot implement multi-view denoising for this stage. So we can only naively upsample, using just the first prompt. Its important to not that this may affect the quality of transformed images, but in practice we find that it works quite well.

Animating

To animate the above two view illusion, we can run the below command. This command should work for all three sizes at which we sample (64×64, 256×256, and 1024x1024), although honestly 64×64 is very small and looks quite bad.

python animate.py --im_path results/rotate_cw.village.horse/0000/sample_1024.png --metadata_path results/rotate_cw.village.horse/metadata.pkl

Here is a description of useful arguments:

The Art of Choosing Prompts

Choosing prompts for illusions can be fairly tricky and unintuitive. Here are some tips:

More Examples

Flipping illusion:

python generate.py --name flip.campfire.man --prompts "an oil painting of people around a campfire" "an oil painting of an old man" --views identity flip --num_samples 10 --num_inference_steps 30 --guidance_scale 10.0 --generate_1024

Jigsaw illusions:

python generate.py --name jigsaw.houseplants.marilyn --prompts "houseplants" "marilyn monroe" --style "an oil painting of" --views identity jigsaw --num_samples 10 --num_inference_steps 30 --guidance_scale 10.0 --generate_1024

Inner circle illusions:

python generate.py --name inner.einstein.marilyn --prompts "albert einstein" "marilyn monroe" --style "an oil painting of" --views identity inner_circle --num_samples 10 --num_inference_steps 30 --guidance_scale 10.0 --generate_1024

Color inversion illusions:

python generate.py --name negate.landscape.houseplants --prompts "a landscape" "houseplants" --style "a lithograph of" --views identity negate --num_samples 10 --num_inference_steps 30 --guidance_scale 10.0 --generate_1024

Patch permutation illusions:

python generate.py --name patch.lemur.kangaroo --prompts "a lemur" "a kangaroo" --style "a pencil sketch of" --views identity patch_permute --num_samples 10 --num_inference_steps 30 --guidance_scale 10.0 --generate_1024

Pixel permutation illusions:

python generate.py --name pixel.duck.rabbit --prompts "a duck" "a rabbit" --style "a mosaic of" --views identity pixel_permute --num_samples 10 --num_inference_steps 30 --guidance_scale 10.0 --generate_1024

Skew illusions:

python generate.py --name skew.tudor.skull --prompts "a tudor portrait" "a skull" --style "an oil painting of" --views identity skew --num_samples 10 --num_inference_steps 30 --guidance_scale 10.0 --generate_1024

Three view illusions:

python generate.py --name threeview.waterfall.teddy.rabbit --prompts "a waterfall" "a teddy bear" "a rabbit" --style "an oil painting of" --views identity rotate_cw rotate_ccw --num_samples 10 --num_inference_steps 30 --guidance_scale 10.0 --generate_1024

"Square Hinge" illusions:

python generate.py --name hinge.duck.rabbit --prompts "a duck" "a rabbit" --style "a water color of" --views identity square_hinge --num_samples 10 --num_inference_steps 30 --guidance_scale 10.0 --generate_1024

Failure Views

We also implement views which fail, as discussed in our paper. These include:

All of the above views fail because they change the statistics of the Gaussian noise. See paper for more details.

Custom Views

Views are derived from the base class BaseView. You can see many examples of these transformations in views.py, if you want to write your own view.

Additionally, if your view can be implemented as a permutation of pixels, you can probably get away with just saving a permutation array to disk and pasing it to the PermuteView class. See permutations/make_inner_rotation_perm.py and get_view() in views.py for an example of this.