Awesome

ArtEmis: Affective Language for Visual Art

A codebase created and maintained by <a href="https://ai.stanford.edu/~optas" target="_blank">Panos Achlioptas</a>.

representative

Introduction

This work is based on the arXiv tech report which is provisionally accepted in CVPR-2021, for an <b>Oral</b> presentation.

Citation

If you find this work useful in your research, please consider citing:

@article{achlioptas2021artemis,
    title={ArtEmis: Affective Language for Visual Art},
    author={Achlioptas, Panos and Ovsjanikov, Maks and Haydarov, Kilichbek and
            Elhoseiny, Mohamed and Guibas, Leonidas},
    journal = {CoRR},
    volume = {abs/2101.07396},
    year={2021}
}

Dataset

To get the most out of this repo, please download the data associated with ArtEmis by filling this form.

Installation

This code has been tested with Python 3.6.9, Pytorch 1.3.1, CUDA 10.0 on Ubuntu 16.04.

Assuming some (potentially) virtual environment and python 3x

git clone https://github.com/optas/artemis.git
cd artemis
pip install -e .

This will install the repo with all its dependencies (listed in setup.py) and will enable you to do things like:

from artemis.models import xx

(provided you add this artemis repo in your PYTHON-PATH)

Playing with ArtEmis

Step-1 (important :pushpin:)

Preprocess the provided annotations (spell-check, patch, tokenize, make train/val/test splits, etc.).

   artemis/scripts/preprocess_artemis_data.py

This script allows you to preprocess ArtEmis according to your needs. The default arguments will do minimal preprocessing so the resulting output can be used to fairly compare ArtEmis with other datasets; and, derive most faithful statistics about ArtEmis's nature. That is what we used in our analysis and what you should use in "Step-2" below. With this in mind do:

  python artemis/scripts/preprocess_artemis_data.py -save-out-dir <ADD_YOURS> -raw-artemis-data-csv <ADD_YOURS>

If you wish to train deep-nets (speakers, emotion-classifiers etc.) exactly as we did it in our paper, then you need to rerun this script by providing only a single extra optional argument ("--preprocess-for-deep-nets True"). This will do more aggressive filtering and you should use its output for "Steps-3" and "Steps-4" below. Use a different save-out-dir to avoid overwritting the output of previous runs.

  python artemis/scripts/preprocess_artemis_data.py -save-out-dir <ADD_YOURS> -raw-artemis-data-csv <ADD_YOURS> --preprocess-for-deep-nets True

To understand and customize the different hyper-parameters please read the details in the provided help messages of the used argparse.

Step-2

Analyze & explore the dataset. :microscope:

Using the minimally preprocessed version of ArtEmis which includes all (454,684) collected annotation.

This is a great place to start :checkered_flag:. Run this notebook to do basic linguistic, emotion & art-oriented analysis of the ArtEmis dataset.
Run this notebook to analyze ArtEmis in terms of its: concreteness, subjectivity, sentiment and Parts-of-Speech. Optionally, contrast these values with with other common datasets like COCO.
Run this notebook to extract the emotion histograms (empirical distributions) of each artwork. This in necessary for the Step-3 (1).
Run this notebook to analyze the extracted emotion histograms (previous step) per art genre and style.

Step-3

Train and evaluate emotion-centric image & text classifiers. :hearts:

Using the preprocessed version of ArtEmis for deep-nets which includes 429,431 annotations. (Training on a single GPU from scratch is a matter of minutes for these classifiers!)

Run this notebook to train an image-to-emotion classifier.
Run this notebook to train an LSTM-based utterance-to-emotion classifier. Or, this notebook to train a BERT-based one.

Step-4

Train & evaluate neural-speakers. :bomb:

To train our customized SAT model on ArtEmis (~2 hours to train in a single GPU!) do:

    python artemis/scripts/train_speaker.py -log-dir <ADD_YOURS> -data-dir <ADD_YOURS> -img-dir <ADD_YOURS>

    log-dir: where to save the output of the training process, models etc.
    data-dir: directory that contains the _input_ data
              the directory that contains the ouput of preprocess_artemis_data.py: e.g., 
              the artemis_preprocessed.csv, the vocabulary.pkl
    img-dir: the top folder containing the WikiArt image dataset in its "standard" format:
                img-dir/art_style/painting_xx.jpg

Note. The default optional arguments will create the same vanilla-speaker variant we used in the CVPR21 paper.

To train the emotionally-grounded variant of SAT add an extra parameter in the above call:

    python artemis/scripts/train_speaker.py -log-dir <ADD_YOURS> -data-dir <ADD_YOURS> -img-dir <ADD_YOURS>
                                            --use-emo-grounding True

To sample utterances from a trained speaker:

 python artemis/scripts/sample_speaker.py -arguments

For an explanation of the arguments see the argparse help messages. It is worth noting that when you want to sample an emotionally-grounded variant you need to provide a pretrained image2emotion classifier. The image2emotion will be used to deduce the most likely emotion of an image, and input this emotion to the speaker. See Step-3 (1) for how to train such a net.

To evaluate the quality of the sampled captions (e.g., per BLEU, emotional alignment, methaphors etc.) use this notebook. As a bonus you can use it to inspect the neural attention placed on the different tokens/images.

MISC

You can make a pseudo "neural speaker" by copying training-sentences to the test according to Nearest-Neighbors in a pretrained network feature space by running this 5 min. notebook.

Pretrained Models (used in CVPR21-paper)

Image-To-Emotion classifier (81MB) - use it within notebook of Step.3.1 or to sample emotionally grounded speaker (Step.4.sample).
LSTM-based Text-To-Emotion classifier (8MB) - use it within inside notebook of Step.3.2 or to evaluate the samples of a speaker (Step.4.evaluate) | e.g., needed for emotional-alignment.
SAT-Speaker (434MB)
SAT-Speaker-with-emotion-grounding (431MB)

The above two links include also our sampled captions for the test-split. You can use them to evaluate the speakers without resampling them. Please read the included README.txt.
Caveats: ArtEmis is a real-world dataset containing the opinion and sentiment of thousands of people. It is expected thus to contain text with biases, factual inaccuracies, and perhaps foul language. Please use responsibly. The provided models are likely to be biased and/or inaccurate in ways reflected in the training data.

News

:champagne: ArtEmis has attracted already some noticeable media coverage. E.g., @ New-Scientist, HAI, MarkTechPost, KCBS-Radio, Communications of ACM, Synced Review, École Polytechnique, Forbes Science.
:telephone_receiver: important More code, will be added in April. Namely, for the ANP-baseline, the comparisons of ArtEmis with other datasets, please do a git-pull at that time. The update will be seamless! During this first months, if you have ANY question feel free to send me an email at optas@stanford.edu.
:trophy: If you are developing more models with ArtEmis and you want to incorporate them here please talk to me or simply do a pull-request.

License

This code is released under MIT License (see LICENSE file for details). In simple words, if you copy/use parts of this code please keep the copyright note in place.