Awesome

MMSoc: Multimodal Social Media Analysis

Project Overview

MMSoc is a large-scale benchmark for analyzing the performances of multimodal LLMs (MLLMs) in social media analysis.

@inproceedings{jin2024mm,
  title={MM-Soc: Benchmarking Multimodal Large Language Models in Social Media Platforms},
  author={Jin, Yiqiao and Choi, Minje and Verma, Gaurav and Wang, Jindong and Kumar, Srijan},
  booktitle={ACL},
  year={2024}
}

The 🤗 MMSoc benchmark contains the following datasets

Memotion [🤗Link]

12,143 memes, annotated by AMT with labels that categorize the memes according to their:
- sentiment (positive, negative, neutral)
- types of emotion they convey (sarcastic, funny, offensive, motivational)
- intensity of the expressed emotion.
Modality: images, embedded text
Tasks: OCR, humor detection, sarcasm detection, offensive detection, motivation analysis, sentiment analysis

Hateful Memes [🤗 Link]

12,840 memes with meme-like visuals abd text laid over them.
Modality: images, embedded text
Tasks: hate speech detection

YouTube2M [🤗 Link]

2 million YouTube videos shared on Reddit
62 unique tags
1,389,219 videos bearing the top 5 tags (70.7% of the dataset)
Modalities: text, image
Tasks:
- tagging: predicting appropriate “topic categories” for YouTube videos
- text generation: Generate the titles / descriptions of the videos
- For ease of testing, we have also released a smaller sample of the dataset, 🤗 YouTube2000, with 2000 samples (1600 train, 200 validation, 200 test).

FakeNewsNet

We consider two datasets under the misinformation detection theme:
- 🤗 PolitiFact
- 🤗 GossipCop
Modalities: news content (text), online posts (text), images, user metadata

The datasets were originally curated by Shu et al (GitHub).

Tasks: Misinformation detection

Project Structure

mmsoc/: main package directory for the MMSoc project.
models/: Sample code for using the dataset
blip.py**: This file includes the implementation of the BLIP2 (Bidirectional Language Image Pretraining) and InstructBLIP models, which is used for tasks that require joint image and text understanding.

Installation

To get started with MMSoc, follow these steps:

Clone the repository:
```
git clone <repository-url>
cd MMSoc
```

Create a virtual environment (optional but recommended):

conda create -n mmsoc python=3.11
conda activate mmsoc

Install the required packages:
```
pip install -r requirements.txt
```

License

This repository is licensed under the Apache-2.0 License.