Awesome
textobjdetection
Code for the Human-related Object Detection based on Natural Language Parsing of Image Query Expressions article
Project status
Dependencies
To execute this, you must have Python 3.6.*, PyTorch, OpenCV, Numpy and Matplotlib installed, to accomplish this, we recommend installing the Anaconda Python distribution and use conda to install the dependencies, as it follows:
conda install pytorch torchvision cuda80 -c soumith
conda install opencv -c conda-forge
conda install matplotlib numpy
conda install aria2 -c bioconda
pip install visual-genome
Dataset download
You must download the Visual Genome dataset, as well the train/val/test split used for our experiments. For this, we provide the download_dataset.sh
bash script, it will take care of the downloads required.
Pretrained models
Pretrained SSD + LSTM weights are provided as proof of our experimients. They are available at:
- LSTM Model: https://s3-sa-east-1.amazonaws.com/textobjdetection/lstm_model.pt
- SSD Model: https://s3-sa-east-1.amazonaws.com/textobjdetection/ssd_lang.pt
After downloading the models, they must be uncompressed under the weights
folder.
Demo
A simple demo is provided as a Jupyter Notebook, here you can load images and predict bounding boxes given a object query phrase.
Acknowledgements
The SSD multibox detector is based on amdegroot's PyTorch implementation: https://github.com/amdegroot/ssd.pytorch
Contributions
Any contribution Pull Request will reviewed as part of Open Source initiative. We follow PEP8 and PEP257 guidelines