Home

Awesome

Whereabouts

A light-weight, fast geocoder for Python using DuckDB. Try it out online at Huggingface

Description

Whereabouts is an open-source geocoding library for Python, allowing you to geocode and standardize address data all within your own environment:

Features:

Requirements

Installation: via PIP

whereabouts can be installed either from this repo using pip / uv / conda

pip install whereabouts

Download a geocoder database or create your own

You will need a geocoding database to match addresses against. You can either download a pre-built database or create your own using a dataset of high quality reference addresses for a given country, state or other geographic region.

Option 1: Download a geocoder database

Pre-built geocoding database are available from Huggingface. The list of available databases can be found here

As an example, to install the small size geocoder database for California:

python -m whereabouts download us_ca_sm

or for the small size geocoder database for all of Australia:

python -m whereabouts download au_all_sm

Option 2: Create a geocoder database

Rather than using a pre-built database, you can create your own geocoder database if you have your own address file. This file should be a single csv or parquet file with the following columns:

Column nameDescriptionData type
ADDRESS_DETAIL_PIDUnique identifier for addressint
ADDRESS_LABELThe full addressstr
ADDRESS_SITE_NAMEName of the site. This is usually nullstr
LOCALITY_NAMEName of the suburb or localitystr
POSTCODEPostcode of addressint
STATEStatestr
LATITUDELatitude of geocoded addressfloat
LONGITUDELongitude of geocoded addressfloat

These fields should be specified in a setup.yml file. Once the setup.yml is created and a reference dataset is available, the geocoding database can be created:

python -m whereabouts setup_geocoder setup.yml

Geocoding examples

Geocode a list of addresses

from whereabouts.Matcher import Matcher

matcher = Matcher(db_name='au_all_sm')
matcher.geocode(addresslist, how='standard')

For more accurate geocoding you can use trigram phrases rather than token phrases. Note you will need one of the large databases to use trigram geocoding.

matcher.geocode(addresslist, how='trigram')

How it works

The algorithm employs simple record linkage techniques, making it suitable for implementation in around 10 lines of SQL. It is based on the following papers

Documentation

Work in progress: https://whereabouts.readthedocs.io/en/latest/

License Disclaimer for Third-Party Data

Note that while the code from this package is licensed under the MIT license, the pre-built databases use data from data providers that may have restrictions for particular use cases:

Users of this software must comply with the terms and conditions of the respective data licenses, which may impose additional restrictions or requirements. By using this software, you agree to comply with the relevant licenses for any third-party data.

To do: