Awesome
<!-- Improved compatibility of back to top link: See: https://github.com/othneildrew/Best-README-Template/pull/73 --><a name="readme-top"></a>
<!-- *** Thanks for checking out the Best-README-Template. If you have a suggestion *** that would make this better, please fork the repo and create a pull request *** or simply open an issue with the tag "enhancement". *** Don't forget to give the project a star! *** Thanks again! Now go create something AMAZING! :D --> <!-- ABOUT THE PROJECT -->Class-incremental Learning for Time Series: Benchmark and Evaluation
A unified experimental framework for Time Series Class-Incremental Learning (TSCIL) based on Pytorch. The paper has been accepted by SIGKDD 2024. Our CIL benchmarks are established with open-sourced real-world time series datasets. Based on these, our toolkit provides a simple way to customize the continual learning settings. Hyperparameter selection is based on Ray Tune.
What's new
-
Jun 2024: Include FastICARL into our toolkit.
-
May 2024: Our TSCIL paper has been accepted by SIGKDD 2024 (ADS track).
-
Feb 2024: Release of TSCIL toolkit.
Requirements
Create Conda Environment
-
Create the environment from the file
conda env create -f environment.yml
-
Activate the environment
tscl
conda activate tscl
Dataset
Available Datasets
Data Prepareation
We process each dataset individually by executing the corresponding .py
files located in data
directory. This process results in the formation of training and test np.array
data, which are saved as .pkl
files in data/saved
. The samples are processed into the shape of (𝐿,𝐶).
For datasets comprising discrete sequences (UCI-HAR, Uwave and Dailysports), we directly use their original raw sequences as samples. For datasets comprising long-term, continuous signals (GrabMyo and WISDM), we apply sliding window techniques to segment these signals into
appropriately shaped samples (downsampling may be applied before window sliding). If the original dataset is not pre-divided into training and testing sets, a manual train-test split will be conducted. Information about the processed data can be found in utils/setup_elements.py
. The saved data are not preprocessed with normalization due to the continual learning setup. Instead, we add a non-trainable input normalization layer before the encoder to do the sample-wise normalization.
For convenience, we provide the processed data files for direct download. Please check the "Setup" part in the "Get Started" section.
Adding New Dataset
- Create a new python file in the
data
directory for the new dataset. - Format the data into discrete samples in format of numpy array, ensuring each sample maintains the shape of (𝐿,𝐶). Use downsampling or sliding window if needed.
- If the dataset is not pre-divided into training and test subsets, perform the train-test split manually.
- Save the numpy arrays of training data, training labels, test data, and test labels into
x_train.pkl
,state_train.pkl
,x_test.pkl
,state_test.pkl
in a new folder indata/saved
. - Finally, add the necessary information of the dataset in
utils/setup_elements.py
.
Continual Learning Algorithms
Existing Algorithms
Regularization-based:
Replay-based:
Adding New Algorithm
- Create a new python file in the
agent
directory for the new algorithm. - Create a subclass that inherits from the
BaseLearner
class inagent/base.py
. - Customize methods including
train_epoch()
,after_task()
,learn_task()
and so on, based on your needs. - Add the new algorithm to
agents
inagents/utils/name_match.py
. If memory buffer is used, add it intoagents_replay
as well. - Add the hyperparameters and their ranges for the new algorithm into
config_cl
withinexperiment/tune_config.py
.
Getting Started
Setup
- Download the processed data from Google Drive. Put it into
data/saved
and unzip
You can also download the raw datasets and process the data with the corresponding python files.cd data/saved unzip <dataset>.zip
- Revise the following configurations to suit your device:
resources
intune_hyper_params
inexperiment/tune_and_exp.py
(See here for details)- GPU numbers in the
.sh
files inshell
.
Run Experiment
There are two functions to run experiments. Set the arguments in the corresponding files or in the command line.
-
Run CIL experiments with custom configurations in
main.config.py
. Note that this function cannot tune/change the hyperparameters for multiple runs. It is recommended for use in sanity checks or debugging.python main_config.py
-
Tune the hyperparameters on the
Val Tasks
first, and then use the best hyperparameters to run experiment on theExp Tasks
:python main_tune.py --data DATA_NAME --agent AGENT_NAME --norm BN/LN
To run multiple experiments, you can revise the script
shell/tune_and_exp.sh
and call it:nohup sh shell/tune_and_exp.sh &
To reproduce the results in the paper, use the corresponding
.sh
files:nohup sh shell/{data}_all_exp.sh &
We run the experiment for multiple runs to compute the average performance. In each run, we randomize the class order and tune the best hyperparameters. So the hyperparameters are different across runs. The searching grid of hyperparamteters is set in
experiment/tune_config.py
. Experiment results will be saved as log intoresult/tune_and_exp
.
Custom Experiment Setup
Change the configurations in
utils/setup_elements.py
: Parameters for data and task stream, including Number of tasks / Number of classes per task / Task splitexperiment/tune_config.py
: Parameters formain_tune.py
experiments, such as Memory Budget / Classifier Type / Number of runs / Agent-specific parameters, etc.
For ablation study, revise the corresponding parameters in experiment/tune_config.py
and rerun the experiments.
For online continual learning, set epochs
to 1 and er_mode
to online
. (beta)
Acknowledgements
Our implementation uses the source code from the following repositories:
- Framework & Buffer & LwF & ER & ASER: Online Continual Learning in Image Classification: An Empirical Survey
- EWC & SI & MAS: Avalanche: an End-to-End Library for Continual Learning
- DER: Mammoth - An Extendible (General) Continual Learning Framework for Pytorch
- DeepInversion: Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion
- Herding & Mnemonics: Mnemonics Training: Multi-Class Incremental Learning without Forgetting
- Soft-DTW: Soft DTW for PyTorch in CUDA
- CNN: AdaTime: A Benchmarking Suite for Domain Adaptation on Time Series Data
- TST & lr scheduler: PatchTST: A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
- Generator: TimeVAE for Synthetic Timeseries Data Generation
Contact
For any issues/questions regarding the repo, please contact the following.
Zhongzheng Qiao - qiao0020@e.ntu.edu.sg
School of Electrical and Electronic Engineering (EEE), Nanyang Technological University (NTU), Singapore.
<p align="right">(<a href="#readme-top">back to top</a>)</p> <!-- MARKDOWN LINKS & IMAGES --> <!-- https://www.markdownguide.org/basic-syntax/#reference-style-links -->