Home

Awesome

Vision Transformer Visualization: What Neurons Tell and How Neurons Behave?

This is the official implementation of ViT visualization tool

image

Prepare environment

The anaconda env file is vit_visualize.yml To create the environment, please install annaconda first and then run:

conda env create -f vit_visualize.yml

To make the jupyter notebook could load the environment, run:

python -m ipykernel install --user --name=vit_visual

During this analysis we use model Vision Transformer with version ViT-B16/224. To download the pretrain-weights of aforementioned model over ImageNet21k + ImageNet2012 and save to weights folder, we should run the below bash commands:

mkdir -p weights
wget -O weights/ViT-B_16-224.npz /tmp/Ubuntu.iso https://storage.googleapis.com/vit_models/imagenet21k+imagenet2012/ViT-B_16-224.npz

What Neurons Tell?

The ViT_neuron_visualization notebook file includes the code that we analyze the neuron's view. According to the chapter What Neurons Tell of the paper, we introduce the below features and analysis:

<center><img src="./fig/layer0_embedding_row.png" alt="View of a patch" width="80%"/></center> <center><img src="./fig/layer0_good_bad_filters.png" alt="Comparing view of embedding" width="80%"/></center> <center><img src="./fig/layer_high_object_non_object.png" alt="Global view over layers" width="60%"/></center> <center><img src="./fig/all_drops.png" alt="Occlusion comparison" width="60%"/></center>

How Neurons Behave?

We implement the code the generate clustering behavior of embeddings in ViT_embedding_visualization with full instruction to reproduce the result

image

image