Home

Awesome

This is the code to our paper "Fair and accurate age prediction using distribution aware data curation and augmentation".

Basic Overview

<p align="center"><img src="./results/Paper_Workflow.png"></p> <br>

Data

In our whole procedure, we used 6 datasets in total. For pre-training, we used IMDB-WIKI dataset, which are separated into two subdatasets: WIKI and IMDB. For analysis and curating our Balanced Dataset, UTK-Face, MOPRH-2, Megaage-Asian and APPA-REAL datasets are utilized. For generalization test, FG-NET dataset is taken as a dataset from a total different distribution. These datasets are downloaded or purchased via the following links:

After downloading these datasets, they are required to be moved to the ./data folder extracted to their corresponding folders.

<br>

Data pre-processing

After downloading and unzipping data in the ./data folder, go into pre-processing folder and run the following code to construct Balanced Data.

python data_preprocess.py -dir <PATH_TO_DATA> -train_save_path <PATH_TO_TRAIN_DATA> -test_save_path <PATH_TO_TEST_DATA>

Results

After balancing, the dataset has the following distribution: <img src="results/Balanced_distribution.png">

Training and Testing

When data is ready, run the train.py file to train the model and use the test.py file to test the model.

python train.py -datafolder <PATH_TO_DATA_FOLDER> -opt <OPT_METHOD> -train_path <PATH_TO_TRAIN_DATA> -test_path <PATH_TO_TEST_DATA> -model_name <MODEL_NAME> -dataset <DATASET_NAME> -num_epoches <num_epochs> -lr <LEARNING_RATE> -pretrained_model <PATH_TO_PRETRAINED_MODEL>
python test.py -test_path <PATH_TO_TEST_DATA> -result_folder <PATH_TO_SAVE_RESULTS> -trained_model <PATH_TO_TRAINED_MODEL> 

Data Augmentation and OOD_retrival

After training, runing the file data_augmentation.py to do the augmentation and OOD selecting to get augmentated data.

python data_augmentation -train_path <PATH_TO_TRAINING_DATA> -model_path <PATH_TO_TRAINED_MODEL> -in_path <PATH_TO_IN_DISTRIBUTION_DATA> -out_path <PATH_TO_OUT_OF_DISTRIBUTION_DATA> -batch_size <BATCH_SIZE> -quantile <QUANTILE_TO_SPLIT_DATA> -save_path <PATH_TO_SAVE_BALANCED_AUG_DATA> -aug_save_path <PATH_TO_SAVE_AUG_DATA>

Results

Augmentation OOD-Scores

<center><img src="./results/distribution_augmentation.png" width='70%'></center>

Augmentated Data Training and Testing

Similarly, run the train.py and test.py to train and test the model on augmentated data.

python train.py -datafolder <PATH_TO_DATA_FOLDER> -opt <OPT_METHOD> -train_path <PATH_TO_TRAIN_DATA> -test_path <PATH_TO_TEST_DATA> -model_name <MODEL_NAME> -dataset <DATASET_NAME> -num_epoches <num_epochs> -lr <LEARNING_RATE> -trained_model <PATH_TO_PRETRAINED_MODEL>