Awesome
vg_cleasing
dataset cleansing for Visual Genome
Introduction
Since the relationship dataset from Visual Genome are extracted from sentences, they look a little messy. Therefore, to make full use of the relationship dataset, I do some dataset cleansing beforehand.
Some intermediate analyzing results are stored in Google Sheet. Feel free to check it.
All the preprocessing are implemented with Python. For easy interaction, I also use iPython notebook.
For more information, don't hesitate to contact me.
File organization
For further convinience, please organize the dataset as following:
- ROOT_PATH (the root dir for the dataset)
- VG_100K_images (images)
- vg_cleansing (the python scripts for visual genome cleansing)
- temp_data (to store some temporary data for dataset cleansing, e.g. object categories of Pascal VOC)
- models ([optinal] to store the word2vec models)
- Annotations (annotations for visual relationship)
- cleansed_data (the output of our cleansing)