Awesome

LEAF: A Benchmark for Federated Settings

Resources

Homepage: leaf.cmu.edu
Paper: "LEAF: A Benchmark for Federated Settings"

Datasets

FEMNIST

Overview: Image Dataset
Details: 62 different classes (10 digits, 26 lowercase, 26 uppercase), images are 28 by 28 pixels (with option to make them all 128 by 128 pixels), 3500 users
Task: Image Classification

Sentiment140

Overview: Text Dataset of Tweets
Details 660120 users
Task: Sentiment Analysis

Shakespeare

Overview: Text Dataset of Shakespeare Dialogues
Details: 1129 users (reduced to 660 with our choice of sequence length. See bug.)
Task: Next-Character Prediction

Celeba

Overview: Image Dataset based on the Large-scale CelebFaces Attributes Dataset
Details: 9343 users (we exclude celebrities with less than 5 images)
Task: Image Classification (Smiling vs. Not smiling)

Synthetic Dataset

Overview: We propose a process to generate synthetic, challenging federated datasets. The high-level goal is to create devices whose true models are device-dependant. To see a description of the whole generative process, please refer to the paper
Details: The user can customize the number of devices, the number of classes and the number of dimensions, among others
Task: Classification

Reddit

Overview: We preprocess the Reddit data released by pushshift.io corresponding to December 2017.
Details: 1,660,820 users with a total of 56,587,343 comments.
Task: Next-word Prediction.

Notes

Install the libraries listed in requirements.txt
- I.e. with pip: run pip3 install -r requirements.txt
Go to directory of respective dataset for instructions on generating data
- in MacOS check if wget is installed and working
models directory contains instructions on running baseline reference implementations