Home

Awesome

This repository contains instructions for setting up a SPARQL endpoint to Freebase via Virtuoso as well as some useful materials about Freebase.

Requirements

Setup

Freebase data dump

The latest offical data dump of Freebase can be downloaded here. However, in the official dump, the format of some literal types is not fully compatible with the N-Triples RDF standard (it's missing type decoration such as ^^<http://www.w3.org/2001/XMLSchema#integer>), which may cause it to fail to load into triplestores like Virtuoso. We fixed this issue. Our processed Virtuoso DB file can be downloaded from either this OneDrive link, Dropbox link or via wget (WARNING: 53G+ disk space is needed):

wget https://www.dropbox.com/s/q38g0fwx1a3lz8q/virtuoso_db.zip

In case you'd like to load your own RDF into Virtuoso, see here for instructions. If you prefer some other triplestore, try fixing the literal format issue of Freebase using the script fix_freebase_literal_format.py to get the N-Triples-formatted data.

Managing the Virtuoso service

We provide a wrapper script (virtuoso.py, adapted from Sempre) for managing the Virtuoso service. To use it, first change the virtuosoPath in the script to your local Virtuoso directory. Assuming the Virtuoso db file is located in a directory named virtuoso_db under the same directory as the script virtuoso.py and 3001 is the intended HTTP port for the service, to start the Virtuoso service:

python3 virtuoso.py start 3001 -d virtuoso_db

and to stop a currently running service at the same port:

python3 virtuoso.py stop 3001

A server with at least 100 GB RAM is recommended. You may adjust the maximum amount of RAM the service may use and other configurations via the provided script.

Useful materials about Freebase