Home

Awesome

wikilinks

Authors

Marc Glettig - Matthias Minder - Yves Rychener - Charles Trotin

Description

This code is for the project in the course Network Tour of Data Science at EPFL. Using text similarity, we construct networks of Wikipedia articles and compare them to the Wikipedia Hyperlink Network. After assessing the different netrwork creation methods, we seek to recommend new possible hyperlinks. For more detailed description refer to the annotated jupyter notebook and the project report.

Requirements

Data

The root folder should contain a folder "data", which itself contains the SNAP Wikipedia dataset. <br> (https://snap.stanford.edu/data/wikispeedia.html) Before running the notebook, a folder "articles" should be created in the "data" folder. Then, the scripts should be run (first wikipedia_text_extract, then wikipedia_missed_extract) to fetch the article texts.

Packages

The following (non standard python) Packages are required:

Instructions