Home

Awesome

Understanding Why ViT Doesn’t Perform Well on Small Datasets: An Intuitive Perspective

This repository saves the code for our course at NYU Tandon: ECE7123 Deep Learning Course Project. We uploaded our work to arxiv. The paper is available here.

Download Dataset and Trained Checkpoints

First download CIFAR-10, CIFAR-100 and SVHN, into ./data folder, by running:

python3 ./data/downloader.py

Then, download the trained checkpoints from our huggingface repo, and put the checkpoints in ./checkpoint folder.

Visualize Layers as in Section 4

Run ./training/vis_vit.py and vis_resnet.py. Please make sure you open the files and set the args parser parameters correct. Below we provide the exact code to reproduce Figures in Section 4: <br />

python vis_vit.py --image_idx=2 --dataset="SVHN" --load_checkpoint="../checkpoint/vit_SHVN_e100_b10_lr0.0001.pt" 
python vis_vit.py --image_idx=10 --dataset="CIFAR-10" --load_checkpoint="../checkpoint/vit_CIFAR-10_e500_b100_lr0.0001.pt"
python vis_resnet.py --image_idx=2 --dataset="SVHN" --load_checkpoint="../checkpoint/res18_svhn-4-ckpt.t7"
python vis_resnet.py --image_idx=10 --dataset="CIFAR-10" --load_checkpoint="../checkpoint/res18_CIFAR-10_e500_b100_lr0.0001.pt"

Generate CKA Comparison Images:

python torch_cka/cka_compare.py --dataset cifar10
python torch_cka/cka_compare.py --dataset cifar100
python torch_cka/cka_compare.py --dataset svhn