Awesome

NTDS - TEAM 4 - EVOLUTION OF THE MOVIE INDUSTRY

The idea of our project is to use a subset of the IMDB movie dataset, taken from Kaggle: https://www.kaggle.com/tmdb/tmdb-movie-metadata , to make an analysis of the evolution of the movie industry throughout the years. More specifically, we want to have have an economy-oriented approach, by looking at properties such as the budget or the return on investment, and see if trends can be determined from these.

Structure of the repository

/: contains the final notebook, this README and the different folders.
data/: Contains the data necessary for the project, such as .csv files and numpy arrays.
milestones/: Contains the 4 milestone notebooks that were written during the semester.
pictures/: Contains figures that were exported from our different notebooks.
src/: Contains the python codes we wrote during the milestones to manipulate our data, such as scripts and functions.

Notebook

The main code is in the jupyter notebook final_project_ntds_2018.ipynb.

Python functions

A few functions were developped in their own function file. These functions can be found in the folder src. The most important ones are the following:

load_data contains multiple functions used to clean the initial dataset, create features dataframes and adjacency matrices.
genre_graph contains functions used to create graphs based on the genres of the movies.
test_success contains functions that reorder adjacency matrices based on kmeans results.