Awesome
Vision Transformer Visualization: What Neurons Tell and How Neurons Behave?
This is the official implementation of ViT visualization tool
Prepare environment
The anaconda env file is vit_visualize.yml
To create the environment, please install annaconda
first and then run:
conda env create -f vit_visualize.yml
To make the jupyter notebook could load the environment, run:
python -m ipykernel install --user --name=vit_visual
During this analysis we use model Vision Transformer with version ViT-B16/224
. To download the pretrain-weights of aforementioned model over ImageNet21k + ImageNet2012
and save to weights
folder, we should run the below bash commands:
mkdir -p weights
wget -O weights/ViT-B_16-224.npz /tmp/Ubuntu.iso https://storage.googleapis.com/vit_models/imagenet21k+imagenet2012/ViT-B_16-224.npz
What Neurons Tell?
The ViT_neuron_visualization notebook file includes the code that we analyze the neuron's view. According to the chapter What Neurons Tell
of the paper, we introduce the below features and analysis:
- Visualize filters and views of a specific input patche at 0'th layer:
- Comparing the views of different filters. Afterward, concluding that each filter is good for a specific group images but not good for the others group images.
- Create a global view at the higher layers and compare the global views corresponding with different patches.
- Analyze the views of salient, non-salient, and random occlusion cases over the depth-level layers.
How Neurons Behave?
We implement the code the generate clustering behavior of embeddings in ViT_embedding_visualization with full instruction to reproduce the result