Awesome
Omnivorous modeling for visual modalities
This repository contains PyTorch pretrained models, inference examples for the following papers:
<details> <summary> <a href="omnivore/">Omnivore</a> A single vision model for many different visual modalities, CVPR 2022 [<b>bib</b>] </summary>@inproceedings{girdhar2022omnivore,
title={{Omnivore: A Single Model for Many Visual Modalities}},
author={Girdhar, Rohit and Singh, Mannat and Ravi, Nikhila and van der Maaten, Laurens and Joulin, Armand and Misra, Ishan},
booktitle={CVPR},
year={2022}
}
</details>
<details>
<summary>
<a href="omnimae/">OmniMAE</a> Single Model Masked Pretraining on Images and Videos [<b>bib</b>]
</summary>
@article{girdhar2022omnimae,
title={OmniMAE: Single Model Masked Pretraining on Images and Videos},
author={Girdhar, Rohit and El-Nouby, Alaaeldin and Singh, Mannat and Alwala, Kalyan Vasudev and Joulin, Armand and Misra, Ishan},
journal={arXiv preprint arXiv:2206.08356},
year={2022}
}
</details>
</details>
<details>
<summary>
<a href="omnivision/">OmniVision</a> Our training pipeline supporting the multi-modal vision research.[<b>bib</b>]
</summary>
</details>
Contributing
We welcome your pull requests! Please see CONTRIBUTING and CODE_OF_CONDUCT for more information.
License
Omnivore is released under the CC-BY-NC 4.0 license. See LICENSE for additional details. However the Swin Transformer implementation is additionally licensed under the Apache 2.0 license (see NOTICE for additional details).