Awesome
Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling (EMNLP 2024)
[Paper][Model Checkpoints][Data][Training]
Requirements
Model Checkpoints
Data
Training
Pretraining
Instruction-tuning
Training logs
All the logs regarding pretraining / finetuning can be found on wandb Note that some of the runs were resumed from a previous checkpoint.