Home

Awesome

Overview

Geobacter generates useful location embeddings on demand, it is an implementation of the Loc2Vec blog post from sentiance

A resnet is trained to embed renderings of geolocations using the triplet loss. Samples are generated based on the principle that:

"Everything is related to everything else, but near things are more related than distant things"

<img src="assets/readme/triplet_loss.svg" width="60%">
AnchorPositiveNegative
<img src="assets/readme/anchor.png" width="80%"><img src="assets/readme/positive.png" width="80%"><img src="assets/readme/negative.png" width="80%">

Setup

Initialise the open street map tile volumes and server

docker volume create openstreetmap-data
docker volume create openstreetmap-rendered-tiles

docker run \
    -e THREADS=12 \
    -v $PWD/data/osm/luxembourg-latest.osm.pbf:/data.osm.pbf \
    -v openstreetmap-data:/var/lib/postgresql/12/main \
    overv/openstreetmap-tile-server \
    import
export PYTHONPATH=$PYTHONPATH:$PWD/geobacter

Create a python environment (for training)

pipenv install --dev
pipenv shell

Create a python environment (for inference)

pipenv install
pipenv shell

Start the open street map tile server

docker-compose up

Train

Initialise some training and testing samples (which also caches tiles)

python bin/generate_samples.py --sample-count 100000 --buffer 100 --distance 500 --seed 1 --path data/extents/train_100000.json
python bin/generate_samples.py --sample-count 10000 --buffer 100 --distance 500 --seed 2 --path data/extents/test_10000.json
python -m geobacter.train

Run

(optional) Check that the open street map tile server is up

curl localhost:8080/tile/16/33879/22296.png --output test.png

Start the python service

export GEOBACTER_TOKEN=<token>
gunicorn -b 0.0.0.0:8000 --workers 4 --timeout 10 geobacter.inference.api:app

(optional) Get the embedding for Notre-Dame

curl "localhost:8000/embeddings?lat=49.609598&lon=6.131606&token=<token>" | jq
{
  "embeddings": [
    0.12629294395446777,
    0.5683436393737793,
    0.9822958111763,
    0.38620898127555847,
    -1.2079272270202637,
    0.16978177428245544,
    -0.3008042275905609,
    0.06522990763187408,
    0.5405853390693665,
    -0.8018991947174072,
    0.42124632000923157,
    0.6691603064537048,
    -0.40959250926971436,
    -0.18567749857902527,
    -0.017753595486283302,
    0.3173545002937317
  ],
  "checkpoint": "checkpoints/ResNetTriplet-OsmTileDataset-e393fd34-aa3c-4743-b270-e7f0d895b0a8_embedding_41450.pth",
  "lon": 6.131606,
  "lat": 49.609598,
  "image_url": "image?lon=6.131606,lat=49.609598,token=<token>"
}

Results

Semantically similar locations are embedded together

<img src="assets/readme/embeddings.png" width="80%">

The embedding space can be interpolated

<img src="assets/readme/3-interp-4.png" width="80%"> <img src="assets/readme/9-interp-10.png" width="80%">

Similar locations can be queried

<img src="assets/readme/19133-5.png" width="80%" > <img src="assets/readme/16798-5.png" width="80%">

Examples

Use the api to characterise a pre-created route.

examples/api.py

Use a checkpoint to characterise a large number of samples.

examples/checkpoint.py