Home

Awesome

Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis(VLP-MABSA)

Codes and datasets for our ACL'2022 paper:Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis

Author

Yan Ling

ylin@njust.edu.cn

--------------update--------------

2023.04.04 Add full text data of MVSA named data_new.rar to Google Drive and fix some errors of our code.

2023.03.29 Add full image features and pretraining-labels to Google Drive.

2023.03.15 Add labels files of pre-training tasks and provide 3 samples of MVSA dataset after processing by the following steps in Google Drive.

2022.12.2 Add training files of subtasks.

Data Processing

The pre-training dataset we use is MVSA-Multi. You can get from this git. At first, you need to use the judgement rules provided by the git to remove the samples with inconsistent labels.

Text Processing

For texts in MVSA-Multi dataset, we first use NLTK to perform the tokenization.

We use twitter_nlp to perform Named Entity Recognition in order to find the aspects. For example, given the text "It is unbelievable ! Stephen Curry won the game !". The result of NER through twitter_nlp is

It/O is/O unbelievable/O !/O Stephen/B-ENTITY Curry/I-ENTITY won/O the/O game/O !/O

We save the result in a dict with the following format

{"text_id":{"aspect_spans":[list of aspect spans],"aspect_texts":[list of aspect texts]},...}
{"100":{"aspect_spans":[[4,5]],"aspect_texts":[["Stephen","Curry"]},...}

We use the sentiment lexicon sentiwordnet to matching the opinion words. The lexicon is adopted as a dictionary. The words in the text which belongs to the lexicon are considered as the opinion terms. Using the text above as an example, the word "nice" belongs to the lexicon. We save the infomation with the same format of aspect spans.

{"text_id":{"opinion_spans":[list of opinion spans],"opinion_texts":[list of opinion texts]},...}
{"100":{"opinion_spans":[[2,2]],"opinion_texts":[["unbelievable"]},...}

The dics of aspect spans and opinion spans are used for the AOE pre-training task.

Image Processing

For images in MVSA-Multi dataset, we perform Faster-RCNN to extract the region feature(only retain 36 regions with highest Confidence) as the input feature and the dimension of each region feature is 2048. For the details of how to perform Faster-RCNN, you can refer to the Faster-RCNN.

We employ ANPs extractor to predict the ANPs distribution of each image. To run the code, you need to provide the list of image paths like

/home/MVSA/data/2499.jpg
/home/MVSA/data/2500.jpg
...

The result of ANPs extractor is a dict

{
    "numbers":100             #the number of images
    "images":[
        {
             "bi-concepts": {
                   handsome_guy: 0.13, # probability of each ANP in descending order 
                   cute_boy: 0.08,
                   ...
              },
              "features": [ 
              ...
              ]
         },
         ...
}

The ANP with the highest probability is chosen as the output text of the AOG pre-training task.

Sentiment Processing

As introduced in the git, there are many tweets, in which the labels of text and image are inconsistent. Firstly, you need to adopt the judgement rule defined by the author to remove the incosistent data. Then we save the sentiment labels with the following format

{"data_id": sentiment label}
{"13357": 2, "13356": 0,...} # 0,1,2 denote negtive, neutral and positive, respectively

For more details, we provide the description of our pre-training data files in src/data/jsons/MVSA_descriptions.txt which explains the files defined in src/data/jsons/MVSA.json.

Data Download

Because the pre-training dataset after processing is very large, we only provide the downstream datasets. You can download the downstream datasets and our pre-training model via Baidu Netdist with code:d0tn or Google Drive

Pre-Training

If you have done all the processing above, you can perform the pre-training by running the code as follows.

sh MVSA_pretrain.sh

Downstream Task Training

To Train the downstream JMASA task on two twitter datasets, you can just run the following code. Note that you need to change all the file path in file src\data\jsons\twitter15_info.json and src\data\jsons\twitter17_info.json to your own path.

sh 15_pretrain_full.sh
sh 17_pretrain_full.sh

The following is the description of some parameters of the above shell

--dataset           include dataset name and the path of info json file.
--checkpoint_dir    path to save your training model
--log_dir           path to save the training log
--checkpoint        path of the pre-training model

We also provide our training logs on two datasets in folder ./log.

Acknowledgements