Awesome

Addressing Parameter Choice Issues in Unsupervised Domain Adaptation by Aggregation [Paper]

alt text

A PyTorch suite to systematically evaluate different domain adaptation methods.

Requirmenets:

Python3
Pytorch==1.7
Numpy==1.20.1
scikit-learn==0.24.1
Pandas==1.2.4
skorch==0.10.0 (For DEV risk calculations)
openpyxl==3.0.7 (for classification reports)
Wandb=0.12.7
Hydra=1.2.0
OmegaConf=2.2.3

Installing

Clone repository

git clone git@github.com:<repo>
cd bpda

Create a python 3 conda environment

conda env create -f environment.yml

Ensure that all required temp directories are available

data

Datasets

Available Datasets

We used four public datasets in this study. We also provide the preprocessed versions as follows:

Adding New Dataset

Structure of data

To add new dataset (e.g., NewData), it should be placed in a folder named: NewData in the datasets directory.

Since "NewData" has several domains, each domain should be split into train/test splits with naming style as "train_x.pt" and "test_x.pt".

The structure of data files should in dictionary form as follows: train.pt = {"samples": data, "labels: labels}, and similarly for test.pt.

Configurations

Next, you have to add a class with the name NewData in the configs/data_model_configs.py file. You can find similar classes for existing datasets as guidelines. Also, you have to specify the cross-domain scenarios in self.scenarios variable.

Last, you have to add another class with the name NewData in the configs/hparams.py file to specify the training parameters.

Domain Adaptation Algorithms

Existing Algorithms

Adding New Algorithm

To add a new algorithm, place it in algorithms/algorithms.py file.

Training procedure

To train the models run:

./run.sh

To collect the results run:

./collect_results.sh

Upper and Lower bounds

Main trainer file is trainer.py and includes also source-only results when executed.

Results

Each run will have all the cross-domain scenarios results in the format runx_src_to_trg, where x is the run_id.
Under each directory, you will find the classification report, a log file, checkpoint, and the different risks scores.
By the end of the all the runs, you will find the overall average and std results in the run directory.

References

Citation

@inproceedings{
  IWA23,
  title={Addressing Parameter Choice Issues in Unsupervised Domain Adaptation by Aggregation},
  author={Dinu, Marius-Constantin and Beck, Maximilian and Nguyen, Duc Hoan and Huber, Andrea and Eghbal-zadeh, Hamid and Moser, Bernhard A. and Pereverzyev, Sergei V. and Hochreiter, Sepp and Zellinger, Werner},
  booktitle={Submitted to The Eleventh International Conference on Learning Representations },
  year={2023},
  url={https://openreview.net/forum?id=M95oDwJXayG},
  note={under review}
}