Home

Awesome

Welcome to LinkedGeoData: Providing OpenStreetMap data as RDF

LinkedGeoData (LGD) is an effort to add a spatial dimension to the Web of Data / Semantic Web. LinkedGeoData uses the information collected by the OpenStreetMap project and makes it available as an RDF knowledge base according to the Linked Data principles. It interlinks this data with other knowledge bases in the Linking Open Data initiative.

The project web site can be found here. If you are running Ubuntu then this repository contains everything you need to transform OpenStreetMap data to RDF yourself. For other systems please consider contributing adaptions of the existing scripts.

Contributions Welcome

The docker-based architecture is aimed at making it easy to contribute new or alternative components that can sit side-by-side with the core of the system - which is the a virtual knowledge graph view over an OSM database. Please open issues for discussion.

Examples include but are not limited to:

Dockerfiles for services such as a Linked Data or SPARQL interfaces should be designed to allow configuration of the target SPARQL endpoint(s), ideally via the docker environment.

How It Works

The architecture shown in the image below. The docker setup is located in the linkedgeodata-docker folder.

LGD Dockerized Architecture Overview

Debian package now available!

Technically, LinkedGeoData is set of SQL files, database-to-rdf (RDB2RDF) mappings, and bash scripts. The actual RDF conversion is carried out by the SPARQL-to-SQL rewriter Sparqlify. You can view the Sparqlify Mappings for LinkedGeoData here. Therefore, if you want to install the LinkedGeoData debian package, you also Sparqlify one.

For the latest version of LinkedGeoData package, perform the following steps to set up the package source:

Register the repo

echo 'deb http://cstadler.aksw.org/repos/apt precise main contrib non-free' | sudo tee /etc/apt/sources.list.d/cstadler.aksw.org.list

Import the public key

wget -qO - http://cstadler.aksw.org/repos/apt/conf/packages.precise.gpg.key  | sudo apt-key add -

Now you can install LinkedGeoData using

sudo apt-get update
sudo apt-get install linkedgeodata

You can download and install packages manually, however installing their dependencies requires more work:

After installing these packages, the following essential commands will be available:

Read the section on data conversion for their documentation.

Alternative set up

In /bin you find the following setup helper scripts which are aimed at easing the LinkedGeoData setup directly from source; without a debian package:

The following scripts are just helpers to build and/or install the Sparqlify debian package. Mainly intended for development.

Do it yourself data conversion

This section describes how to create and query a LinkedGeoData database. After you installed the LinkedGeoData scripts, you need to obtain an OpenStreetMap dataset which you want to load. Note: Make sure to read the section on database tuning when dealing with larger datasets!

As for obtaining datasets, a very good source for OSM datasets in bite-size chunks is GeoFabrik. For full dumps, refer to the planet downloads.

In /bin you find several scripts. Essentially they are designed to work both from a cloned LinkedGeoData Git repo and wrapped up as a debian package. All of them are configured via lgd.conf.dist. You can override the default settings without changing this file by creating a lgd.conf file. If you installed the debian package, instead of the lgd.conf.dist file, the file /etc/sparqlify/sparqlify.confis used. If you are using the following scripts from the git repo, invoke them with./scriptname.sh(i.e. don't forget the./and.sh`).

Example:

wget http://download.geofabrik.de/europe/monaco-latest.osm.pbf
lgd-createdb -h localhost -d lgd -U postgres -W mypwd -f monaco-latest.osm.pbf

The reason we chose Monaco for the example is simply that it is a small file (> 10MB).

Here is an example of a profile file, which is assumed to be located at /etc/sparqlify/profiles.d/lgd-example.conf. This file will be deployed when installing the linkedgeodata debian package.

    dbHost="localhost"
    dbName="lgd"
    dbUser="postgres"
    dbPass="postgres"
    mappingFile="/usr/share/linkedgeodata/sml/LinkedGeoData-Triplify-IndividualViews.sml /usr/share/linkedgeodata/sml/LinkedGeoData-Nominatim.sml"

A named query is just a SPARQL query that is referenced by a name. The mapping of a name to a SPARQL is configured via /etc/sparqlify/sparqlify.conf.

Currently, the following named queries exist:

Examples:

    sparqlify-tool -P lgd-example ontology
    sparqlify-tool -P lgd-example dump
    sparqlify-tool -h localhost -d lgd -U postgres -W mypwd -Q 'Construct { ?s ?p ?o } { ?s a <http://linkedgeodata.org/ontology/Pub> . ?s ?p ?o }'
    sparqlify-tool -P lgd-example -Q 'Select * { ?s ?p ?o . Filter(?s = <http://linkedgeodata.org/triplify/node2028098486>) }'

Again, note that Sparqlify is still in development and the supported features are a bit limited right now - still, basic graph patterns and equal-constraints should be working fine.

Additional tools

lgd-osm-replicate-sequences -u "http://planet.openstreetmap.org/replication/hour/" -t "2017-05-28T15:00:00Z"

# The above command from the debian package is a wrapper for:

java -cp linkedgeodata-debian/target/linkedgeodata-debian-*-jar-with-dependencies.jar \
    "org.aksw.linkedgeodata.cli.command.osm.CommandOsmReplicateSequences" \
    -u "http://planet.openstreetmap.org/replication/hour/" -t "2017-05-28T15:00:00Z"

The output is a (presently subset) of the appropriate state.txt file whose timestamp is strictly less than that given as the argument.

sequenceNumber=41263
timestamp=2017-05-28T14\:00\:00Z

Note, that the timestamp format is compatible with osmconvert, which can check for the most recent data item in a osm data file. Hence, these tools can be combined in order to find the state.txt file from which to proceed with replication.

timestamp=`osmconvert --out-timestamp "data.osm.pbf"`
lgd-osm-replicate-sequences -u "url-to-repo" -t "$timestamp"
# Use the -d option to option the (d)uration between the most recently published files
lgd-osm-replicate-sequences -u "http://planet.openstreetmap.org/replication/day/" -d
# This yields simply the output (possibly off by a few seconds)
# 86400

Postgresql Database Tuning

It is recommended to tune the database according to these recommendations. Here is a brief summary: Edit /etc/postgresql/9.1/main/postgresql.conf and set the following properties:

    shared_buffers       = 2GB #recommended values between 25% - 40% of available RAM, setting assumes 8GB RAM
    effective_cache_size = 4GB #recommended values between 50% - 75% of available RAM, setting assumes 8GB RAM
    checkpoint_segments  = 256
    checkpoint_completion_target = 0.9
    autovacuum = off # This can be re-enabled once loading has completed

    work_mem             = 256MB (This memory is used for sorting, so each user may use this amount of memory for his sorts; You may want to use a significantly lower value if there are many connections doing sorts)
    maintainance_work_mem = 256MB

Furthermore, allow more shared memory, otherwise postgres won't start: Append the following line to /etc/sysctl.conf:

    #Use more shared memory max
    kernel.shmmax=4294967296

    # Note: The amount (specified in bytes) for kernel.shmmax must be greater than the shared_buffers settings obove
    #4GB = 4294967296
    #8GB = 8589934592

Make the changes take effect:

    sudo sysctl -p
    sudo service postgresql restart

License

The content of this project are licensed under the GPL v3 License.