Awesome
(NeurIPS 2023) OneNet: Enhancing Time Series Forecasting Models under Concept Drift by Online Ensembling
This codebase is the official implementation of OneNet: Enhancing Time Series Forecasting Models under Concept Drift by Online Ensembling
(NeurIPS 2023) and Addressing Concept Shift in Online Time Series Forecasting: Detect-then-Adapt
🔥 Update
- [2023-09-22]: ⭐️ Paper online. Check out Detect-then-Adapt for details.
- [2023-09-22]: ⭐️ Paper online. Check out OneNet for details.
- [2023-09-20]: 🚀🚀 Codes released.
Introduction for OneNet
Online updating of time series forecasting models aims to address the concept drifting problem by efficiently updating forecasting models based on streaming data. Many algorithms are designed for online time series forecasting, with some exploiting cross-variable dependency while others assume independence among variables. Given every data assumption has its own pros and cons in online time series modeling, we propose Online ensembling Network (OneNet). It dynamically updates and combines two models, with one focusing on modeling the dependency across the time dimension and the other on cross-variate dependency. Our method incorporates a reinforcement learning-based approach into the traditional online convex programming framework, allowing for the linear combination of the two models with dynamically adjusted weights. OneNet addresses the main shortcomings of classical online learning methods that tend to be slow in adapting to the concept drift. Empirical results show that OneNet reduces online forecasting error by more than $50$% compared to the State-Of-The-Art (SOTA) method.
- The proposed OneNet-TCN (online ensembling of TCN and Time-TCN) surpasses most of the competing baselines across various forecasting horizons;
- If the combined branches are stronger, for example, OneNet combined FSNet and Time-FSNet, achieving much better performance than OneNet-TCN. Namely, OneNet can integrate any advanced online forecasting methods or representation learning structures to enhance the robustness of the model.
- The average MSE and MAE of OneNet are significantly better than using either branch (FSNet or Time-TCN) alone, which underscores the significance of incorporating online ensembling.
- OneNet achieves faster and better convergence than other methods;
Introduction for Detect-then-Adapt
While numerous algorithms have been developed, most of them focus on model design and updating. In practice, many of these methods struggle with continuous performance regression in the face of accumulated concept drifts over time. We first detects drifting conception and then aggressively adapts the current model to the drifted concepts after the detection for rapid adaption. Our empirical studies across six datasets demonstrate the effectiveness of in improving model adaptation capability. Notably, compared to a simple Temporal Convolutional Network (TCN) baseline, $D^3A$ reduces the average Mean Squared Error (MSE) by $43.9$%. For the state-of-the-art (SOTA) model, the MSE is reduced by $33.3$%.
-
Introduce a Concept Detection Framework: Our framework monitors loss distribution drift, aiming to predict the occurrence of concept drift. This detector provides instructions for our model updating, enhancing model robustness and AI safety, particularly in high-risk tasks.
-
More realistic Evaluation setting: We observe that previous benchmarks often presume a substantial overlap in the forecasting target during testing. In this paper, we advocate for the evaluation of online time series forecasting models with delayed feedback, demonstrating a more realistic and challenging assessment.
Requirements
- python == 3.7.3
- pytorch == 1.8.0
- matplotlib == 3.1.1
- numpy == 1.19.4
- pandas == 0.25.1
- scikit_learn == 0.21.3
- tqdm == 4.62.3
- einops == 0.4.0
Benchmarking
1. Data preparation
We follow the same data formatting as the Informer repo (https://github.com/zhouhaoyi/Informer2020), which also hosts the raw data.
Please put all raw data (csv) files in the ./data
folder.
2. Run experiments
To replicate our results on the ETT, ECL, Traffic, and WTH datasets, run
sh run.sh
To replicate our results of $D^3A$, run
sh run_d3a.sh
3. Arguments
You can specify one of the above method via the --method
argument.
Dataset: Our implementation currently supports the following datasets: Electricity Transformer - ETT (including ETTh1, ETTh2, ETTm1, and ETTm2), ECL, Traffic, and WTH. You can specify the dataset via the --data
argument.
Other arguments: Other useful arguments for experiments are:
--test_bsz
: batch size used for testing: must be set to 1 for online learning,--seq_len
: look-back windows' length, set to 60 by default,--pred_len
: forecast windows' length, set to 1 for online learning.
D3A Arguments: Here are additional arguments useful for experiments:
--sleep_interval
: Corresponds to ( l_w ) in our paper, representing the window size for the drift detector.--sleep_epochs
: Determines the number of epochs the model should be fully fine-tuned when a drift is detected. It is set to 20 by default.--online_adjust
: After detecting a drift, the regularization weight ( \lambda ) in our paper is set to 0.5 by default.--offline_adjust
: During each step, the algorithm samples previous data and augments it for regularization. The regularization weight is set to 0.5 by default.--alpha_d
: Represents a predefined confidence level for triggering concept drift, set to 0.003 by default.
4. Baselines
Backbones: Our implementation supports the following backbones in Table.1:
- patch: PatchTST for online time series forecasting
- fedformer: FedFormer for online time series forecasting
- dlinear: DLinear for online time series forecasting
- cross_former: Crossformer for online time series forecasting
- naive_time: The proposed Time-TCN for online time series forecasting
- naive_time: The proposed Time-TCN for online time series forecasting
Ablations: Our online learning and ensembling ablation baselines in Table.4:
- fsnet_plus_time: Simple averaging
- onenet_gate: Gating mechanism
- onenet_linear_regression: Linear Regression (LR)
- onenet_egd: Exponentiated Gradient Descent (EGD)
- onenet_weight: Reinforcement learning to learn the weight directly (RL-W)
Algorithms: Our implementation supports the following training strategies in Table.2,3:
- ogd: OGD training
- large: OGD training with a large backbone
- er: experience replay
- derpp: dark experience replay
- nomem: FSNET without the associative memory
- naive: FSNET without both the memory and adapter, directly trains the adaptation coefficients.
- fsnet: FSNet framework
- fsnet_d3a: FSNet with Detect-then-Adapt framework
- fsnet_time: Cross-Time FSNet
- onenet_minus: the proposed OneNet- in section 4
- onenet_tcn: the proposed OneNet with tcn backbone
- onenet_fsnet: the proposed OneNet
- onenet_d3a: the proposed OneNet with Detect-then-Adapt framework
5. Baselines
License
This source code is released under the MIT license, included here.
Citation
If you find this repo useful, please consider citing:
@inproceedings{
zhang2023onenet,
title={OneNet: Enhancing Time Series Forecasting Models under Concept Drift by Online Ensembling},
author={YiFan Zhang and Qingsong Wen and Xue Wang and Weiqi Chen and Liang Sun and Zhang Zhang and Liang Wang and Rong Jin and Tieniu Tan},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
year={2023}
}
@misc{zhang2024addressing,
title={Addressing Concept Shift in Online Time Series Forecasting: Detect-then-Adapt},
author={YiFan Zhang and Weiqi Chen and Zhaoyang Zhu and Dalin Qin and Liang Sun and Xue Wang and Qingsong Wen and Zhang Zhang and Liang Wang and Rong Jin},
year={2024},
eprint={2403.14949},
archivePrefix={arXiv},
primaryClass={cs.LG}
}