Awesome
Source code for Gated Multimodal Units for Information Fusion.
Dependencies
Make dataset
You can download the ready-to-use Fuel format version: multimodal_imdb.hdf5 (archive.org mirror) and metadata.npy (archive.org mirror).
Alternatively, you can build it manually:
-
Get the following files and uncompress it in the root folder of this project:
- MM-IMDb dataset (archive.org mirror)
- word2vec pretrained model
- vgg pretrained model
- class names: synsets_words.txt
-
Create the
list.txt
file:ls dataset/*.json > list.txt
-
Run the make script:
python3 make_dataset.py gmu.json
Getting more movies
You can extend the dataset by adding more IMDb IDs to the links.csv
file and run get_data.py
script to crawl other movies.
Train and eval the model
Generate random configurations:
python3 generators/gmu.py gmu.json
Train the model and then report performance in test set (e.g. best conf for GMU model #23):
python3 run.py json/gmu_23.json