Awesome

<center>COVID-DA: Deep Domain Adaptation from <br> Typical Pneumonia to COVID-19</center>

We provide the COVID-DA dataset for domain adaptation from typical pneumonia to COVID-19. The paper is available here.

Dataset

The descriptions for the COVID-DA dataset are presented below. We first provide a link to download the dataset. Next, the data statistics and usage of the dataset will be introduced.

Download

The dataset in this paper is available here.

Data Composition

To make up this dataset, we collected and integrated the following open-source datasets: https://github.com/ieee8023/covid-chestxray-dataset
https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/data
https://www.kaggle.com/darshan1504/covid19-detection-xray-dataset
https://www.kaggle.com/andrewmvd/convid19-X-rays
https://www.kaggle.com/nabeelsajid917/covid-19-x-ray-10000-images
https://www.kaggle.com/usmantahirkiani/covid19-vs-healthy-xray
https://www.kaggle.com/tarandeep97/covid19-normal-posteroanteriorpaxrays

Data Structure and Statistics

The data structure:

all_data
└── all_data_pneumonia
|   |
|   ├── train
|   └── val 
|
└── all_data_covid
    |
    |── train
    |── val
    └── test

Statistics of the dataset are shown as follow:

Pneumonia ("all_data_pneumonia" sub-directory) serves as the source domain and COVID-19 ("all_data_covid" sub-directory) serves as the target domain.
You can refer to the paper for more details about the dataset.

Usage

In the directory ./data, there are two .pkl files which record the image lists and its corresponding labels. Specifically, an image and its label is stored in a tuple (image_name, label). "1" denotes class "pneumonia" and class "COVID-19" in source and target domain, respectively, while "0" denotes class "normal". You can read the data list following the below manner:

for the source domain (Pneumonia):

      with open('./data/pneumonia_task.pkl', 'rb') as f:
          train_dict = pickle.load(f)
      train_list = train_dict['train_list'] # train sub-directory
      val_list = train_dict['val_list'] # test sub-directory

for the target domain (COVID-19):

      with open('./data/COVID-19_task.pkl', 'rb') as f:
          train_dict = pickle.load(f)
      train_list_labeled = train_dict['train_list_labeled'] # labeled data (train sub-directory)
      train_list_unlabeled = train_dict['train_list_unlabeled'] # unlabeled data (train sub-directory)
      val_list = train_dict['val_list'] # val sub-directory
      test_list = train_dict['test_list'] # test sub-directory

For convenience, we provide .pkl files for both python 2 and 3, respectively.

According to the image lists, you can load images using Pillow:

      # e.g., for the source domain (Pneumonia)
      for img_tup in train_list:
          img = PIL.Image.open(os.path.join('all_data/all_data_pneumonia', 'train', img_tup[0])
          label = img_tup[1]

Citation

If you find the COVID-DA dataset useful, please cite the following paper:

@article{zhang2020covidda,
    title={COVID-DA: Deep Domain Adaptation from Typical Pneumonia to COVID-19},
    author={Yifan Zhang and Shuaicheng Niu and Zhen Qiu and Ying Wei and Peilin Zhao and Jianhua Yao and Junzhou Huang and Qingyao Wu and Mingkui Tan},
    journal={arXiv},
    year={2020},
}