Home

Awesome

The internet movie database, imdb.com, is a website devoted to collecting movie data supplied by studios and fan. It claims to be the biggest movie database on the web and is run by amazon. More about information imdb.com can be found online, including information about the data collection process.

IMDB makes their raw data available. Unfortunately, the data is divided into many text files and the format of each file differs slightly. To create one data file containing all the desired information these ruby scripts extract the relevant information and store in a database. Finally, this data is exported to csv to make it easier to import into data analysis packages.

The following text files were downloaded and used:

Movies were selected for inclusion if they had a known length and had been rated by at least one IMDB user. The final output contains the following fields: