Home

Awesome

##About

An OpenRefine reconciliation service for GeoNames.

Tested with, working on python 2.7.10, 3.4.3

The service queries the GeoNames API and provides normalized scores across queries for reconciling in Refine.

I'm just a small-town metadataist in a big code world, so please don't assume I did something 'the hard way' because I had a theory or opinion or whatnot. I probably just don't know that an easier way exists. So please share your corrections and thoughts (but please don't be a jerk about it either).

If you'd like to hear my thoughts about why do this instead of creating a column by pulling in URLs, or what I do with this data once I export my data to metadata records, or if we should even have to keep coordinates in bibliographic metadata records, see some thoughts here: http://christinaharlow.com/thoughts-on-geospatial-metadata and http://christinaharlow.com/walkthrough-of-geonames-recon-service

##Provenance

Michael Stephens wrote a demo reconcilliation service and Ted Lawless wrote a FAST reconciliation service that this code basically repeats but for a different API.

Please give any thanks for this work to Ted Lawless, and any complaints to Christina. Also give thanks to Trevor Muñoz for some cleanups to make this code easier to work with.

##Special Notes

This came out of frustration that the Library of Congress authorities are:

So this service takes Library of Congress authorities headings (or headings formulated to mimic the LoC authorities structure), expand U.S. abbreviations, then reconcile against GeoNames. The returned GeoNames 'name' gives both the GeoNames name for the location as well as the coordinates. There are, no doubts, better ways to handle getting both in an OpenRefine reconciliation service, but this was a quick hack to get both while I continue to explore how OpenRefine Reconciliation Services are structured.

##Instructions

Before getting started, you'll need python on your computer (this was built with python 2.7.8, updated to work with python3.4, most recently tested and worked with python 2.7.10 and 3.4.3) and be comfortable using OpenRefine/Google Refine.

This reconciliation service also requires a GeoNames API username. You can find and use the one used in the original code for testing, but you'll run against maximum number counts quickly, so it is strongly recommended you get your own (free, quick & easy to obtain) GeoNames account.

To do so, go to this webpage and register: http://www.geonames.org/login After your account is activated, enable it for free web services: http://www.geonames.org/manageaccount

Although it appears that you have retrieved your reconciled data into your OpenRefine project, OpenRefine is actually storing the original data still. You need to explicit save the reconciled data in order to make sure it appears/exists when you export your data. Annoying as mosquito in your bedroom, I know, but please learn from my own mistakes, sweat and confusion.

I'll maybe make a screencast of this work later if I get time or there is interested.

Holla if you have questions - email is charlow2(at)utk(dot)edu and Twitter handle is @cm_harlow

##Plans for Improvement

I'm hoping to build in next a way for searching within reconciliation cells next.

I'd like to expand the extremely rudimentary but gets the job done LoC geographic names abbreviations parser/expander text to handle other LoC Authorities abbreviations oddities. I'm afraid to say, since even the states abbreviations vary in their construction, these will need to be added on a case by case basis.

I'd also like to build in a way to use other columns as additional search properties.

Finally, finding a better way to handling the API username updates as well as parsing name plus coordinates (instead of the hack I've put into this for the time being) would be great.