Home

Awesome

Discovering communities in global flight route graph

Abstract

Global flight route information, which naturally reveals the connections and communication among countries and regions, has long been of great importance to political and economic area. Instead of possessing regularity, the derived flight route graph has been discovered to have various parts within which flights are denser than the others. This inconsistency motivates the attempt to figure out communities in the aforementioned graph. In this project, we exploit the network analysis and machine learning techniques to discover communities in the flight routes graph in weakly supervised and unsupervised fashion.

Prerequisites

The following python modules are required to re-produce the results obtained.

Datasets

The dataset used in the project consists of an airline database and a flight routes database from https://openflights.org/data.html#route.

The airline database with 3188 entries provides the following essential information about airline:

AttributeAirport_idNameCityCountryIATAICAOLatitudeLongitudeTimezoneTzdatabase
Example1665Geneva Cointrin International AirportGenevaSwitzerlandGVALSGG46.2380986.108951Europe/Paris

The routes database with 66771 entries provides the following essential information about routes:

AttributeAirlineAirline_idSource_airportSource_airport_idDestination_airportDestination_airport_idStopsEquipment
ExampleAB214GVA1665MAD12290320 319

Project Sturcture

preprocessing.ipynb contains the code for preprocessing. We remove the flights which contain unrecorded airports or do not specify source/destination airports. Based on the preprocessed data, we build the weighted and non-weighted adjacency matrices and the graph.

explore_data.ipynb contains the code for data exploration. We explore the graph by computing degree distribution and centrality metrics. We also visualize all airports on the global map with continent borders.

exploit_data.ipynb contains the code for data exploitation. We implement spectral clustering algorithm and K-means algorithm on both weighted and non-weighted adjacency matrices. In order to discover communities without initialization, we also implement greedy modularity maximization and label propagation methods to detect communities.

data folder contains all raw data, preprocessed data and some other additional data used for visualization.

map folder contains all html files of the visualization results.

doc folder contains our report for this project.

Visualizations

The visualizations of our results are availbale here.

Authors

License

This project is licensed under the MIT License - see the LICENSE file for details.