Home

Awesome

<a name="top"></a>

Charting New Territories

Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs

Jonathan Roberts, Timo Lüddecke, Rehan Sheikh, Kai Han, Samuel Albanie

We conduct a series of experiments exploring various vision capabilities of multimodal large language models (MLLMs) within these domains, particularly focusing on the frontier model GPT-4V, and benchmark its performance against open-source counterparts. Our methodology involves challenging these models with a small-scale geographic benchmark consisting of a suite of visual tasks, testing their abilities across a spectrum of complexity. The analysis uncovers not only where such models excel, including instances where they outperform humans, but also where they falter, providing a balanced view of their capabilities in the geographic domain.

Repo Overview

Experiments Taxonomy

Key takeaways

Data

Data for the majority of the experiments can be found in Data.

Prompts

Coming soon!

Citation

If you found our work useful in your own research, please consider citing our paper:

@article{roberts2023charting,
  title={{Charting New Territories: Exploring the geographic and geospatial capabilities of multimodal LLMs}},
  author={Roberts, Jonathan and L{\"u}ddecke, Timo and Sheikh, Rehan and Han, Kai and Albanie, Samuel},
  journal={arXiv preprint arXiv:2311.14656},
  year={2023}
}

Questions

If you have any questions about our work, please open an issue in this repository.