Awesome
Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling
[Model Checkpoints][Data][Training]
Requirements
Model Checkpoints
Data
Training
Pretraining
Instruction-tuning
Training logs
All the logs regarding pretraining / finetuning can be found on wandb Note that some of the runs were resumed from a previous checkpoint.