Awesome
MMSoc: Multimodal Social Media Analysis
Project Overview
MMSoc is a large-scale benchmark for analyzing the performances of multimodal LLMs (MLLMs) in social media analysis.
@inproceedings{jin2024mm,
title={MM-Soc: Benchmarking Multimodal Large Language Models in Social Media Platforms},
author={Jin, Yiqiao and Choi, Minje and Verma, Gaurav and Wang, Jindong and Kumar, Srijan},
booktitle={ACL},
year={2024}
}
The 🤗 MMSoc benchmark contains the following datasets
Memotion [🤗Link]
- 12,143 memes, annotated by AMT with labels that categorize the memes according to their:
- sentiment (positive, negative, neutral)
- types of emotion they convey (sarcastic, funny, offensive, motivational)
- intensity of the expressed emotion.
- Modality: images, embedded text
- Tasks: OCR, humor detection, sarcasm detection, offensive detection, motivation analysis, sentiment analysis
Hateful Memes [🤗 Link]
- 12,840 memes with meme-like visuals abd text laid over them.
- Modality: images, embedded text
- Tasks: hate speech detection
YouTube2M [🤗 Link]
- 2 million YouTube videos shared on Reddit
- 62 unique tags
- 1,389,219 videos bearing the top 5 tags (70.7% of the dataset)
- Modalities: text, image
- Tasks:
-
tagging: predicting appropriate “topic categories” for YouTube videos
-
text generation: Generate the titles / descriptions of the videos
-
For ease of testing, we have also released a smaller sample of the dataset, 🤗 YouTube2000, with 2000 samples (1600 train, 200 validation, 200 test).
-
FakeNewsNet
-
We consider two datasets under the misinformation detection theme:
-
Modalities: news content (text), online posts (text), images, user metadata
The datasets were originally curated by Shu et al (GitHub).
- Tasks: Misinformation detection
Project Structure
mmsoc/
: main package directory for the MMSoc project.models/
: Sample code for using the datasetblip.py
**: This file includes the implementation of the BLIP2 (Bidirectional Language Image Pretraining) and InstructBLIP models, which is used for tasks that require joint image and text understanding.
Installation
To get started with MMSoc, follow these steps:
-
Clone the repository:
git clone <repository-url> cd MMSoc
-
Create a virtual environment (optional but recommended):
conda create -n mmsoc python=3.11 conda activate mmsoc
-
Install the required packages:
pip install -r requirements.txt
License
This repository is licensed under the Apache-2.0 License.