Awesome
A Network Tour of Millenial Movies
Project for the course A Network Tour of Data Science
Github repository of the project done by a team of four students for A Network Tour of Data Science course (EE-558) given at École polytechnique fédérale de Lausanne. This readme contains the project abstract, list of required libraries for the correct execution, datasets that were used for project implementation and the different research questions and products that were analyzed. The code can be found in the Jupyter Notebooks of this repository, and the report is given in the Project Report.pdf.
Libraries used
We used the following libraries for this project, with Python 3.6.6
Computational:
numpy (as np)
pandas (pd)
networkx (nx)
scipy
sklearn
surprise
operator
collections
pandas_profiling
Graphical:
seaborn (as sns)
matplotlib (as plt)
IPython
Textual:
json
base64
codecs
re
io
We also utilized these libraries for Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering.
Abstract
As Walt Disney once said: "Movies can and do have tremendous influence in shaping young lives in the realm of entertainment towards the ideals and objectives of normal adulthood." But what do viewers really know about movies and what makes them successful? This project, based on the TMDb dataset, offers some interesting insights into movies from the past several decades. It shows how some of the movie features are correlated, explores how movies can be classified into genres using spectral graph analysis and CNNs, and gives a simple demo of a recommender system.
Datasets
The data
folder contains the subsampled data that was used for the implementation.
Research Questions
-
Can spectral clustering classify movie genres using the k-means algorithm? Can this technique be used in different graph settings, such as cast and crew co-occurence graph or movie keyword co-occurence graph?
-
Can the movie genre classificator be improved by using Convolutional Neural Networks on graphs with Fast Localized Spectral Filtering? If so, what is the gained result from this analysis?
-
Can we suggest movies to users by creating a movie recommendation engine?
Structure of repo
The notebooks of the repository should be read in the following order:
-
Data Cleaning and Subsampling
notebook -
Data Exploration
notebook -
Data Exploitation - Spectral Graph Theory (Cast and Crew)
notebook -
Data Exploitation - Keyword co-occurrence graph
notebook -
Data Exploitation - CNNs
notebook -
Data Exploitation - Recommender Systems
notebook
Additionally, there is a Gephi graph visualization
notebook that was only used for visualization.
Authors
- Milena Filipović
- Kristijan Lopatichki
- Jelena Malić
- Davor Todorovski
License
Copyright 2019 Milena Filipović, Kristijan Lopatichki, Jelena Malić and Davor Todorovski
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License.
You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.