


Code for the Human-related Object Detection based on Natural Language Parsing of Image Query Expressions article

Project status



To execute this, you must have Python 3.6.*, PyTorch, OpenCV, Numpy and Matplotlib installed, to accomplish this, we recommend installing the Anaconda Python distribution and use conda to install the dependencies, as it follows:

conda install pytorch torchvision cuda80 -c soumith
conda install opencv -c conda-forge
conda install matplotlib numpy
conda install aria2 -c bioconda
pip install visual-genome

Dataset download

You must download the Visual Genome dataset, as well the train/val/test split used for our experiments. For this, we provide the download_dataset.sh bash script, it will take care of the downloads required.

Pretrained models

Pretrained SSD + LSTM weights are provided as proof of our experimients. They are available at:

After downloading the models, they must be uncompressed under the weights folder.


A simple demo is provided as a Jupyter Notebook, here you can load images and predict bounding boxes given a object query phrase.

alt tag


The SSD multibox detector is based on amdegroot's PyTorch implementation: https://github.com/amdegroot/ssd.pytorch


Any contribution Pull Request will reviewed as part of Open Source initiative. We follow PEP8 and PEP257 guidelines