Awesome
Awesome-Forgetting-in-Deep-Learning
<img src="https://img.shields.io/badge/Contributions-Welcome-278ea5" alt=""/>
A comprehensive list of papers about 'A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning'.
Abstract
Forgetting refers to the loss or deterioration of previously acquired information or knowledge. While the existing surveys on forgetting have primarily focused on continual learning, forgetting is a prevalent phenomenon observed in various other research domains within deep learning. Forgetting manifests in research fields such as generative models due to generator shifts, and federated learning due to heterogeneous data distributions across clients. Addressing forgetting encompasses several challenges, including balancing the retention of old task knowledge with fast learning of new tasks, managing task interference with conflicting goals, and preventing privacy leakage, etc. Moreover, most existing surveys on continual learning implicitly assume that forgetting is always harmful. In contrast, our survey argues that forgetting is a double-edged sword and can be beneficial and desirable in certain cases, such as privacy-preserving scenarios. By exploring forgetting in a broader context, we aim to present a more nuanced understanding of this phenomenon and highlight its potential advantages. Through this comprehensive survey, we aspire to uncover potential solutions by drawing upon ideas and approaches from various fields that have dealt with forgetting. By examining forgetting beyond its conventional boundaries, in future work, we hope to encourage the development of novel strategies for mitigating, harnessing, or even embracing forgetting in real applications.
Citation
If you find our paper or this resource helpful, please consider citing:
@article{Forgetting_Survey_2024,
title={A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning},
author={Wang, Zhenyi and Yang, Enneng and Shen, Li and Huang, Heng},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2024},
publisher={IEEE}
}
Thanks!
Framework
- Harmful Forgetting
- Beneficial Forgetting
Harmful Forgetting
Harmful forgetting occurs when we desire the machine learning model to retain previously learned knowledge while adapting to new tasks, domains, or environments. In such cases, it is important to prevent and mitigate knowledge forgetting.
Problem Setting | Goal | Source of forgetting |
---|---|---|
Continual Learning | learn non-stationary data distribution without forgetting previous knowledge | data-distribution shift during training |
Foundation Model | unsupervised learning on large-scale unlabeled data | data-distribution shift in pre-training, fine-tuning |
Domain Adaptation | adapt to target domain while maintaining performance on source domain | target domain sequentially shift over time |
Test-time Adaptation | mitigate the distribution gap between training and testing | adaptation to the test data distribution during testing |
Meta-Learning | learn adaptable knowledge to new tasks | incrementally meta-learn new classes / task-distribution shift |
Generative Model | learn a generator to appriximate real data distribution | generator shift/data-distribution shift |
Reinforcement Learning | maximize accumulate rewards | state, action, reward and state transition dynamics |
Federated Learning | decentralized training without sharing data | model average; non-i.i.d data; data-distribution shift |
Links:
<u> Forgetting in Continual Learning </u> |
<u> Forgetting in Foundation Models </u> |
<u> Forgetting in Domain Adaptation</u> |
<u> Forgetting in Test-Time Adaptation</u> |
<u> Forgetting in Meta-Learning </u>|
<u> Forgetting in Generative Models </u>|
<u> Forgetting in Reinforcement Learning</u> |
<u> Forgetting in Federated Learning</u>
Forgetting in Continual Learning
<a href="#top">[Back to top]</a>
The goal of continual learning (CL) is to learn on a sequence of tasks without forgetting the knowledge on previous tasks.
Links: <u> Task-aware CL </u>| <u> Task-free CL </u>| <u> Online CL </u>| <u> Semi-supervised CL </u>| <u> Few-shot CL </u>| <u> Unsupervised CL </u>| <u> Theoretical Analysis </u>
Survey and Book
Task-aware CL
<a href="#top">[Back to top]</a>
Task-aware CL focuses on addressing scenarios where explicit task definitions, such as task IDs or labels, are available during the CL process. Existing methods on task-aware CL have explored five main branches: Memory-based Methods | Architecture-based Methods | Regularization-based Methods | Subspace-based Methods | Bayesian Methods.
Memory-based Methods
<a href="#top">[Back to top]</a>
Memory-based (or Rehearsal-based) method keeps a memory buffer that stores the examples/knowledges from previous tasks and replay those examples during learning new tasks.
Architecture-based Methods
<a href="#top">[Back to top]</a>
The architecture-based approach avoids forgetting by reducing parameter sharing between tasks or adding parameters to new tasks.
Regularization-based Methods
<a href="#top">[Back to top]</a>
Regularization-based approaches avoid forgetting by penalizing updates of important parameters or distilling knowledge with previous model as a teacher.
Subspace-based Methods
<a href="#top">[Back to top]</a>
Subspace-based methods perform CL in multiple disjoint subspaces to avoid interference between multiple tasks.
Bayesian Methods
<a href="#top">[Back to top]</a>
Bayesian methods provide a principled probabilistic framework for addressing Forgetting.
Task-free CL
<a href="#top">[Back to top]</a>
Task-free CL refers to a specific scenario that the learning system does not have access to any explicit task information.
Online CL
<a href="#top">[Back to top]</a>
In online CL, the learner is only allowed to process the data for each task once.
The presence of imbalanced data streams in CL (especially online CL) has drawn significant attention, primarily due to its prevalence in real-world application scenarios.
Semi-supervised CL
<a href="#top">[Back to top]</a>
Semi-supervised CL is an extension of traditional CL that allows each task to incorporate unlabeled data as well.
Few-shot CL
<a href="#top">[Back to top]</a>
Few-shot CL refers to the scenario where a model needs to learn new tasks with only a limited number of labeled examples per task while retaining knowledge from previously encountered tasks.
Unsupervised CL
<a href="#top">[Back to top]</a>
Unsupervised CL (UCL) assumes that only unlabeled data is provided to the CL learner.
Theoretical Analysis
<a href="#top">[Back to top]</a>
Theory or analysis of continual learning
Forgetting in Foundation Models
<a href="#top">[Back to top]</a>
Foundation models are large machine learning models trained on a vast quantity of data at scale, such that they can be adapted to a wide range of downstream tasks.
Links: Forgetting in Fine-Tuning Foundation Models | Forgetting in One-Epoch Pre-training | CL in Foundation Model
Forgetting in Fine-Tuning Foundation Models
<a href="#top">[Back to top]</a>
When fine-tuning a foundation model, there is a tendency to forget the pre-trained knowledge, resulting in sub-optimal performance on downstream tasks.
Forgetting in One-Epoch Pre-training
<a href="#top">[Back to top]</a>
Foundation models often undergo training on a dataset for a single pass. As a result, the earlier examples encountered during pre-training may be overwritten or forgotten by the model more quickly than the later examples.
CL in Foundation Model
<a href="#top">[Back to top]</a>
By leveraging the powerful feature extraction capabilities of foundation models, researchers have been able to explore new avenues for advancing continual learning techniques.
Forgetting in Domain Adaptation
<a href="#top">[Back to top]</a>
The goal of domain adaptation is to transfer the knowledge from a source domain to a target domain.
Paper Title | Year | Conference/Journal |
---|---|---|
Towards Cross-Domain Continual Learning | 2024 | ICDE |
Continual Source-Free Unsupervised Domain Adaptation | 2023 | International Conference on Image Analysis and Processing |
CoSDA: Continual Source-Free Domain Adaptation | 2023 | Arxiv |
Lifelong Domain Adaptation via Consolidated Internal Distribution | 2022 | NeurIPS |
Online Domain Adaptation for Semantic Segmentation in Ever-Changing Conditions | 2022 | ECCV |
FRIDA -- Generative Feature Replay for Incremental Domain Adaptation | 2022 | CVIU |
Unsupervised Continual Learning for Gradually Varying Domains | 2022 | CVPRW |
Continual Adaptation of Visual Representations via Domain Randomization and Meta-learning | 2021 | CVPR |
Gradient Regularized Contrastive Learning for Continual Domain Adaptation | 2021 | AAAI |
Learning to Adapt to Evolving Domains | 2020 | NeurIPS |
AdaGraph: Unifying Predictive and Continuous Domain Adaptation through Graphs | 2019 | CVPR |
ACE: Adapting to Changing Environments for Semantic Segmentation | 2019 | ICCV |
Adapting to Continuously Shifting Domains | 2018 | ICLRW |
Forgetting in Test-Time Adaptation
<a href="#top">[Back to top]</a>
<!-- <u>[Click back to content outline](#framework)</u> -->Test time adaptation (TTA) refers to the process of adapting a pre-trained model on-the-fly to unlabeled test data during inference or testing.
Forgetting in Meta-Learning
<a href="#top">[Back to top]</a>
Meta-learning, also known as learning to learn, focuses on developing algorithms and models that can learn from previous learning experiences to improve their ability to learn new tasks or adapt to new domains more efficiently and effectively.
Links: Incremental Few-Shot Learning | Continual Meta-Learning
Incremental Few-Shot Learning
<a href="#top">[Back to top]</a>
Incremental few-shot learning (IFSL) focuses on the challenge of learning new categories with limited labeled data while retaining knowledge about previously learned categories.
Continual Meta-Learning
<a href="#top">[Back to top]</a>
The goal of continual meta-learning (CML) is to address the challenge of forgetting in non-stationary task distributions.
Forgetting in Generative Models
<a href="#top">[Back to top]</a>
The goal of a generative model is to learn a generator that can generate samples from a target distribution.
Links: GAN Training is a Continual Learning Problem | Lifelong Learning of Generative Models
GAN Training is a Continual Learning Problem
<a href="#top">[Back to top]</a>
Treating GAN training as a continual learning problem.
Lifelong Learning of Generative Models
<a href="#top">[Back to top]</a>
The goal is to develop generative models that can continually generate high-quality samples for both new and previously encountered tasks.
Forgetting in Reinforcement Learning
<a href="#top">[Back to top]</a>
Reinforcement learning is a machine learning technique that allows an agent to learn how to behave in an environment by trial and error, through rewards and punishments.
Forgetting in Federated Learning
<a href="#top">[Back to top]</a>
Federated learning (FL) is a decentralized machine learning approach where the training process takes place on local devices or edge servers instead of a centralized server.
Links: Forgetting Due to Non-IID Data in FL | Federated Continual Learning
Forgetting Due to Non-IID Data in FL
<a href="#top">[Back to top]</a>
This branch pertains to the forgetting problem caused by the inherent non-IID (not identically and independently distributed) data among different clients participating in FL.
Federated Continual Learning
<a href="#top">[Back to top]</a>
This branch addresses the issue of continual learning within each individual client in the federated learning process, which results in forgetting at the overall FL level.
Beneficial Forgetting
<a href="#top">[Back to top]</a> Beneficial forgetting arises when the model contains private information that could lead to privacy breaches or when irrelevant information hinders the learning of new tasks. In these situations, forgetting becomes desirable as it helps protect privacy and facilitate efficient learning by discarding unnecessary information.
Problem Setting | Goal |
---|---|
Mitigate Overfitting | mitigate memorization of training data through selective forgetting |
Debias and Forget Irrelevant Information | forget biased information to achieve better performance or remove irrelevant information to learn new tasks |
Machine Unlearning | forget some specified training data to protect user privacy |
Links: <u>Combat Overfitting Through Forgetting</u> | <u>Learning New Knowledge Through Forgetting Previous Knowledge</u> | <u>Machine Unlearning</u>
Forgetting Irrelevant Information to Achieve Better Performance
<a href="#top">[Back to top]</a>
Combat Overfitting Through Forgetting
<a href="#top">[Back to top]</a>
Overfitting in neural networks occurs when the model excessively memorizes the training data, leading to poor generalization. To address overfitting, it is necessary to selectively forget irrelevant or noisy information.
Learning New Knowledge Through Forgetting Previous Knowledge
<a href="#top">[Back to top]</a>
"Learning to forget" suggests that not all previously acquired prior knowledge is helpful for learning new tasks.
Machine Unlearning
<a href="#top">[Back to top]</a>
Machine unlearning, a recent area of research, addresses the need to forget previously learned training data in order to protect user data privacy.
Star History
Contact
We welcome all researchers to contribute to this repository 'forgetting in deep learning'.
Email: wangzhenyineu@gmail.com | ennengyang@stumail.neu.edu.cn