Home

Awesome

VisionTransformer (Pytorch)

A complete easy to follow implementation of Google's Vision Transformer proposed in "AN IMAGE IS WORTH 16X16 WORDS". This pytorch implementation has comments for better understanding.

An image is worth 16x16 words: transformers for image recognition at scale: Find the original paper here.

<p align="center"> <img src="./ViT.png" width="600" title="Vision transformer"> </p>