Awesome
TEST: Text Prototype Aligned Embedding to Activate LLM's Ability for Time Series
Requirements
- Install Python>=3.8, PyTorch 1.8.1.
- Numpy (
numpy
) v1.15.2; - Matplotlib (
matplotlib
) v3.0.0; - Orange (
Orange
) v3.18.0; - Pandas (
pandas
) v1.4.2; - Weke (
python-weka-wrapper3
) v0.1.6 for multivariate time series (requires Oracle JDK 8 or OpenJDK 8); - PyTorch (
torch
) v1.8.1 with CUDA 11.0; - Scikit-learn (
sklearn
) v1.0.2; - Scipy (
scipy
) v1.7.3; - Huggingface (
transformers
) v4.30.1; - Absl-py (
absl-py
) v1.2.0 ; - Einops (
einops
) v0.4.1; - H5PY (
h5py
) v3.7.0; keopscore
v2.1opt-einsum
v3.3.0pandas
v1.4.2pytorch-wavelet
PyWavelets
v1.4.1scikit-image
v0.19.3statsmodels
v0.13.2sympy
v1.11.1
Datasets
The datasets manipulated in this code can be downloaded on the following locations:
- the UCR archive: https://www.cs.ucr.edu/~eamonn/time_series_data_2018/;
- the UEA archive: http://www.timeseriesclassification.com/;
- the long-term forecasting archive: https://github.com/thuml/Time-Series-Library.
Files
Core
datasets
data and related methods;encoders
folder: implements encoder and its building blocks (dilated convolutions, causal CNN);losses
folder: implements the triplet loss in the cases of a training set with all time series of the same length, and a training set with time series of unequal lengths;models
folder: implements LLM4TS and its building blocks (encoder + GPT attention + output head);utils
folder: utils;main_encoder
file: handles learning for encoders (see usage below);main_LLM4TS
file: handles learning for LLM4TS. The prerequisite is to have a well trained encoder (see usage below);optimizers
file: optimizer methods for training models;options
file: input args;running
file: methods to train and test models.
Usage
Selecting text prototype
Download LLM from huggingface
To select text prototypes from GPT2
python losses/text_prototype.py --llm_model_dir= path/to/llm/folder/ --prototype_dir path/to/save/prototype/file/ --provide Flase(ramdom) or a text lisr --number_of_prototype 10
Training encoder on the UEA archives
To train a model on the EthanolConcentration dataset from the UEA archive with specific gpu:
python main_encoder.py --data_dir path/to/EthanolConcentration/folder/ --gpu 0
Adding the --load_encoder
option allows to load a model from the specified save path.
Setting the --gpu -1
option to use cpu.
Training LLM4TS on the UEA archives (Classification)
To train a model on the EthanolConcentration dataset from the UEA archive with specific gpu:
python main_LLM4TS.py --output_dir experiments --comment "classification from Scratch" --name EthanolConcentration --records_file Classification_records.xls --data_dir path/to/EthanolConcentration/folder/ --data_class tsra --pattern TRAIN --val_pattern TEST --epochs 50 --lr 0.001 --patch_size 8 --stride 8 --optimizer RAdam --d_model 768 --pos_encoding learnable --task classification --key_metric accuracy --gpu 0
Setting the --gpu -1
option to use cpu.
Training encoder on the traffic archives
To train a model on the traffic dataset with specific gpu:
python main_encoder.py --root_path path/to/traffic/folder/ --data_path traffic.csv --model_id traffic --name traffic --data custom --seq_len 512 --output_dir ./experiments_encoder --gpu 0
Training LLM4TS on the traffic archives (Forecasting)
To train a model on the traffic dataset with specific gpu:
python main_LLM4TS.py --root_path path/to/traffic/folder/ --data_path traffic.csv --model_id electricity --name electricity --data custom --seq_len 512 --label_len 48 --pred_len 96 --output_dir ./experiments --gpu 0
Setting the --gpu -1
option to use cpu.