Awesome

Statewide Visual Geolocalization in the Wild (ECCV 2024)

Florian Fervers, Sebastian Bullinger, Christoph Bodensteiner, Michael Arens, Rainer Stiefelhagen

Links: Paper | Poster | Examples

summary

Overview

Installation
Dataset
Training
Evaluation (includes pretrained weights)
Examples: A photo album that contains >2000 randomly chosen street-view images and corresponding predictions from our model.

Installation

Install Jax with GPU support: https://jax.readthedocs.io/en/latest/installation.html

Clone this repository:

git clone https://github.com/fferflo/statewide-visual-geolocalization
cd statewide-visual-geolocalization

Install the remaining dependencies:
```
pip install -r requirements.txt
```

Dataset

We train and evaluate our method on street-view images from the Mapillary platform and aerial imagery from Massachusetts, Washington DC, North Carolina (states in the US) and Berlin-Brandenburg, NRW, Saxony (states in Germany).

Please follow these instructions to download the data.

Training

Fill in the dataset paths indicated by TODO in the configuration file config/main.yaml. The entries should look something like this:

train:
  list:
    - tiles-path: .../data/opennrw
    path: .../data/mapillary-opennrw

test:
  path: .../data/mapillary-boston100km2
  tiles:
    - path: .../data/massgis/utm19
  geojson: .../data/boston100km2.geojson

Run the training script:
```
python3 train.py --output .../train --config config/main.yaml
```
The results will be stored in .../train-YYYY-MM-DDTHH-mm-ss. The training uses all available GPUs by default. A training run with 2xH100 takes about 2.5 days.

Evaluation

Create a reference database for a search region by running the following script:
```
python create_reference_database.py --train .../train-YYYY-MM-DDTHH-mm-ss --output .../refdb-massgis --tiles .../data/massgis/utm19 .../data/massgis/utm18
```
This will create a division of the region into cells, predict embeddings for all cells, create a FAISS index for efficient retrieval and store everything in the output directory. This might take several days depending on your hardware setup and search region size.

By default, the search region is defined to cover all tiles that are specified in --tiles. The argument accepts multiple tile datasets, such as the overlapping UTM18 and UTM19 regions of Massachusetts. Optionally, a geojson file can be passed to the script via --geojson to define a custom search region as a subset of the region covered by the tiles.

Pretrained weights can be used by cloning the repository from Huggingface
```
git clone https://huggingface.co/fferflo/statewide-geoloc-nomassgis
```
and passing the path to the --train argument. These are not the original weights used in the paper, but are retrained using this repository. The results are slightly better than reported in the paper (see below).

The output folder will contain the files:
```
aerial_features.bin         # Embeddings for all cells
cellregion.npz              # Division of the region into cells
faiss.index                 # FAISS index that can be loaded via faiss.read_index("faiss.index")
config.yaml                 # Configuration parameters of the search region, model, etc
model_weights.safetensors   # Model weights used to create the embeddings
```

Localize query images against the reference database by running the following script:

python localize.py --query .../data/mapillary-boston100km2 --reference .../refdb-massgis --stride 1

This will predict embeddings for all street-view photos in the given dataset, and localize them against the reference database. The --stride parameter can be used to localize only a subset of the images (e.g. every 10th image with --stride 10).

The script will print the Recall@k<r of the localization for different radii r and top-k cells. For example, the pretrained weights from above yield the following results:

> python localize.py --query .../data/mapillary-massgis --reference .../refdb-massgis --stride 100

... takes some time ...

Recall@1<0m: 0.2880
Recall@5<0m: 0.5007
Recall@10<0m: 0.5516
Recall@50<0m: 0.6432
Recall@100<0m: 0.6771

Recall@1<25m: 0.4683
Recall@5<25m: 0.6495
Recall@10<25m: 0.6914
Recall@50<25m: 0.7659
Recall@100<25m: 0.7915

Recall@1<50m: 0.6105
Recall@5<50m: 0.7229
Recall@10<50m: 0.7578
Recall@50<50m: 0.8216
Recall@100<50m: 0.8430

Recall@1<100m: 0.6297
Recall@5<100m: 0.7385
Recall@10<100m: 0.7732
Recall@50<100m: 0.8369
Recall@100<100m: 0.8583

Citation

Please consider citing our work if you use the code or data, or build upon the ideas presented in the paper:

@inproceedings{fervers2024statewide,
  title     = {Statewide Visual Geolocalization in the Wild},
  author    = {Florian Fervers and Sebastian Bullinger and Christoph Bodensteiner and Michael Arens and Rainer Stiefelhagen},
  booktitle = {ECCV},
  year      = {2024}
}

Issues

Feel free to open an issue in this Github repository if you have any problems with the code or data.