

A Unified Approach to Domain Incremental Learning with Memory: Theory and Algorithm (UDIL)

This repo (built upon the amazing codebase of mammoth) contains the code for our NeurIPS 2023 paper:<br> A Unified Approach to Domain Incremental Learning with Memory: Theory and Algorithm<br> Haizhou Shi, Hao Wang<br> Thirty-seventh Conference on Neural Information Processing Systems, 2023<br> [Paper] [OpenReview] [Slides] [Talk (Youtube)] [Talk (Bilibili)]

<p align="center"> <img src="fig/udil-overview.png" alt="" data-canonical-src="fig/udil-overview.png" width="100%"/> </p>


How does UDIL Unify Existing Methods?

Long story short, in the paper, we start by re-iterating the learning objective of domain-incremental learning (which is also true for other types of continual learning). Then we propose to combine three ways of upper bounding the past-domain error (ERM, intra-domain bound, and cross-domain bound, see Chapter 3 in the paper) and assign adaptive coefficients to each of the upper bound training terms.

Here is the main theorem of our paper, which not only leads to the unification of the current domain-incremental learning methods, but allows for the possibility of minimizing a tighter bound in the next chapter.

<p align="center"> <img src="fig/thm.png" alt="" data-canonical-src="fig/thm.png" width="80%"/> </p>

The first main argument of our work is that, by fixating the value of the coefficients $\Omega={\alpha_i, \beta_i, \gamma_i}$, the UDIL framework can exactly correspond to some of the exisiting methods, when some conditions need to be satisfied. Here we show the final unification result derived for you (refer to Appendix B in the paper).

<p align="center"> <img src="fig/unification.png" alt="" data-canonical-src="fig/unification.png" width="80%"/> </p>

How does UDIL Lead to a Tighter Bound?

A natural question following the unification is: can we do better than using a single set of fixed coefficients to train a domain-incremental learning model? The answer is a firmly YES. And what we do in this work is to parameterize the coefficients, and try to optimize a tighter bound by adjusting them during model training. We know you are in a hurry, so here we will give an extremely brief review of what we do to form the final training objective.

<p align="center"> <img src="fig/udil-objective.png" alt="" data-canonical-src="fig/udil-objective.png" width="80%"/> </p> As you can see, there are in total four kinds of differentiable loss terms in our proposed algorithm:

Installing the Required Packages

conda create -n udil python=3.9
conda activate udil
conda install pytorch==1.12 torchvision cudatoolkit=11.3 -c pytorch
conda install wandb ipdb -c conda-forge

Code for Running UDIL

Before you run the code, there are a couple of settings you might want to modify:

We have provided the command to run UDIL in the /scripts folders, for different datasets. Once you are done with setting up everything, a quick example of running UDIL on Permutated-MNIST is shown as follows:

chmod +x scripts/*.sh

This script will start a UDIL training process and log everything on your wandb repository.

If you are in a hurry, and want to just take a quick review on the training process and final results of UDIL on three different realistic datasets (Permutated-MNIST, Rotated-MNIST, and Seq-CORe50), you can check out the following public UDIL wandb project, where we viusalized everything you might care for you!

Quantitative Results

Here we provide some quantitative results of UDIL.

<p align="center"> <img src="fig/table-pmnist.png" alt="" data-canonical-src="fig/table-pmnist.png" width="90%"/> </p> <p align="center"> <img src="fig/table-rmnist.png" alt="" data-canonical-src="fig/table-rmnist.png" width="90%"/> </p> <p align="center"> <img src="fig/table-core50.png" alt="" data-canonical-src="fig/table-core50.png" width="90%"/> </p>

Qualitative Results

Here we provide some qualitative results of UDIL, which come from the public UDIL wandb project, and we only show the results on Rotated-MNIST data.

<p align="center"> <span>Accuracy Matrix after 20-Domain Training</span> <img src="fig/acc_matrix.png" alt="" data-canonical-src="fig/acc_matrix.png" width="80%"/> </p>

Below are the visualization of embedding distributions of different classes & domains, where:

<p align="center"> <span>Embedding Space Visualization after 1-Domain Training</span> <img src="fig/embedding1.png" alt="" data-canonical-src="fig/embedding1.png" width="90%"/> </p> <p align="center"> <span>Embedding Space Visualization after 20-Domain Training</span> <img src="fig/embedding2.png" alt="" data-canonical-src="fig/embedding2.png" width="90%"/> </p>

Also Check Our Relevant Work on Domain Adaptation

<span id="paper_1">[1] Domain-Indexing Variational Bayes: Interpretable Domain Index for Domain Adaptation<br></span> Zihao Xu*, Guang-Yuan Hao*, Hao He, Hao Wang<br> Eleventh International Conference on Learning Representations, 2023<br> [Paper] [OpenReview] [PPT] [Talk (Youtube)] [Talk (Bilibili)]

<span id="paper_2">[2] Graph-Relational Domain Adaptation<br></span> Zihao Xu, Hao He, Guang-He Lee, Yuyang Wang, Hao Wang<br> Tenth International Conference on Learning Representations (ICLR), 2022<br> [Paper] [Code] [Talk] [Slides]

<span id="paper_3">[3] Continuously Indexed Domain Adaptation<br></span> Hao Wang*, Hao He*, Dina Katabi<br> Thirty-Seventh International Conference on Machine Learning (ICML), 2020<br> [Paper] [Code] [Talk] [Blog] [Slides] [Website]


A Unified Approach to Domain Incremental Learning with Memory: Theory and Algorithm

  title={A Unified Approach to Domain Incremental Learning with Memory: Theory and Algorithm},
  author={Shi, Haizhou and Wang, Hao},
  booktitle={Advances in Neural Information Processing Systems},