Home

Awesome

FITS: Modeling Time Series with 10k parameters (ICLR 2024 Spotlight)

This is the official implementation of FITS. Please run the scripts in scripts\FITS for results. Scripts without _best are for ablation study and grid search for parameters. Scripts with _best are for multiple run on the optimal parameters.

See updates here: Update

Also see our exciting new work!

Wanna see something beyond FITS? Check:

"Beyond Trend and Periodicity: Guiding Time Series Forecasting with Textual Cues" Paper Code Dataset

Update

🚨 Important Update: 2023-12-25 🎄

We've identified a significant bug in our code, originally found in Informer (AAAI 2021 Best Paper), thanks to Luke Nicholas Darlow from the University of Edinburgh. This issue has implications for a broad spectrum of research on time series forecasting, including but not limited to:

Efforts are underway to correct this bug, and we will update our Arxiv submission and this repository with the revised results. A bug fix method will also be released to assist the community in addressing this issue in their work.

Description of the Bug:

The bug stems from an incorrect implementation in the data loader. Specifically, the test dataloader uses drop_last=True, which may exclude a significant portion of test data, particularly with large batch sizes, leading to unfair model comparisons.

Solution:

To fix this issue in codebases using LSTF-Linear's architecture:

  1. In data_factory.py within the data_provider folder (usually on line 19), change:

    if flag == 'test':
        shuffle_flag = False
        drop_last = True
        batch_size = args.batch_size
        freq = args.freq
    

    To:

    if flag == 'test':
        shuffle_flag = False
        drop_last = False #True
        batch_size = args.batch_size
        freq = args.freq
    
  2. In your experiment script (e.g., ./exp/exp_main.py), modify the following (around line 290):

    From:

    preds = np.array(preds)
    trues = np.array(trues)
    inputx = np.array(inputx) # some times there is not this line, it does not matter
    

    To:

    preds = np.concatenate(preds, axis=0)
    trues = np.concatenate(trues, axis=0)
    inputx = np.concatenate(inputx, axis=0) # if there is not that line, ignore this
    

    If you do not do this, it will generate an error during testing because of the dimension 0 (batch_size) is not aligned. Maybe this is why everyone is dropping the last batch. But concatenate them on the 0 axis (batch_size) can solve this problem.

  3. Run the officially provided scripts!

Result Update

The best result is in bold and the second best is in italic. The results are reported in terms of MSE. This is still preliminary results for FITS. We are rerunning the parameter search, ablation study and multi-runs for the final results. The final results will be updated in the paper. Following are our final results. We have reported these results in the ICLR final version.

ModelETTh1-96ETTh1-192ETTh1-336ETTh1-720ETTh2-96ETTh2-192ETTh2-336ETTh2-720ETTm1-96ETTm1-192ETTm1-336ETTm1-720ETTm2-96ETTm2-192ETTm2-336ETTm2-720
PatchTST0.3850.4130.440.4560.2740.3380.3670.3910.2920.330.3650.4190.1630.2190.2760.368
Dlinear0.3840.4430.4460.5040.2820.3500.4140.5880.3010.3350.3710.4260.1710.2370.2940.426
FedFormer0.3750.4270.4590.4840.3400.4330.5080.4800.3620.3930.4420.4830.1890.2560.3260.437
TimesNet0.3840.4360.4910.5210.3400.4020.4520.4620.3380.3740.4100.4780.1870.2490.3210.408
FITS0.3720.4040.4270.4240.2710.3310.3540.3770.3030.3370.3660.4150.1620.2160.2680.348
IMP0.0030.0090.0130.0320.0030.0070.0130.014-0.011-0.007-0.0010.0040.0010.0030.0080.020
ModelWeather-96Weather-192Weather-336Weather-720Electricity-96Electricity-192Electricity-336Electricity-720Traffic-96Traffic-192Traffic-336Traffic-720
PatchTST0.1510.1950.2490.3210.1290.1490.1660.2100.3660.3880.3980.457
Dlinear0.1740.2170.2620.3320.1400.1530.1690.2040.4130.4230.4370.466
Fedformer0.2460.2920.3780.4470.1880.1970.2120.2440.5730.6110.6210.630
TimesNet0.1720.2190.2800.3650.1680.1840.1980.2200.5930.6170.6290.640
FITS0.1430.1860.2360.3070.1340.1490.1650.2030.3850.3970.4100.448
IMP0.0080.0090.0130.014-0.0050.0000.0010.001-0.019-0.009-0.0120.009

Analysis

The discovered bug predominantly impacts results on smaller datasets like ETTh1 and ETTh2. Interestingly, for other datasets, certain models, such as PatchTST on ETTm1, demonstrate enhanced performance. FITS still maintains its good enough and comparable-to-sota performance.

Replication

(A minor note: The only change we made in hyperparameters was reducing the learning rate for DLinear on ETTh2 from 0.05 to 0.005, resulting in improved outcomes.)

(A word of caution: Training PatchTST, particularly on datasets like traffic and electricity, can be extremely time-consuming.)

(We failed to reproduce the FiLM result since it takes over 40GB GPU memory and over 2 hour per epoch on an A800. Further, the provided scripts seems to have flaws, i.e. the 'modes1' parameter is set to 1032 in ETTh1 instead of the '32' in others, the train_epoch is 1 in ETTh2 which may result in a downgraded performance. Thus, we exclude FiLM in the following analysis since we can not ensure a fair comparison.)

🚨 Another potential information leakage in previous AD works

In previous anomaly detection works, anomaly threshold is calculated based on the test_set, see affected code in Anomaly Transformer. Such setting may violate the assumption that the test_set should be unavailable before deploying the model. Such method may cause information leakage and cherrypicked result on the test_set.

As claimed in the paper, FITS directly uses the validation set for threshold selecting as indicated in code.

However, we still compare FITS with the results reported in their original paper which may have potential information leakage. And we encourage the community to reevaluate the affected methods for further reference. XD

Notice

Acknowledgement

We thank Luke Darlow from the University of Edinburgh who find the bug.