Home

Awesome

dbpediakit

Python utilities to do analytics and perform transformations on the DBpedia dumps.

DBpedia is an extraction of the structured content of Wikipedia articles (links, redirect, infobox properties) augmented with a generic ontology (hierarchy of classes for the entities described by Wikipedia articles and schemas for their properties).

Quick example

>>> import dbpediakit as dbk

>>> archive_file = dbk.archive.fetch("long_abstracts")
>>> archive_file
'/home/ogrisel/data/dbpedia/long_abstracts_en.nt.bz2'

>>> tuples = dbk.archive.extract_text(archive_file)
>>> tuples
<generator object extract_text at 0x41e8aa0></generator>

>>> first = tuples.next()
>>> first.id
Autism

>>> first.text[:60] + u"..."
u'Autism is a disorder of neural development characterized by ...'

Overview

Complete examples