Home

Awesome

Bi-Mamba4TS

A python implementation of the 2404.15772v1.pdf paper on arxiv.org

1st draft, untested

Based on the paper, there are 4 components:

SRA Decider

The SRA Decider in the Bi-Mamba4TS model is designed to choose between channel-independent and channel-mixing tokenization strategies based on the Pearson correlation coefficients among different series. The decision is based on a threshold λλ which you set (defaulting to 0.6 in the skeleton).

SRA Decider Logic:

The SRA_Decider module:

Calculates the Pearson correlation coefficients between each pair of series. Use a threshold λλ to determine the degree of correlation that justifies switching from a channel-independent strategy to a channel-mixing strategy.

Explanation:

Correlation(X,Y)=∑(X−X‾)(Y−Y‾)∑(X−X‾)^2∑(Y−Y‾)^2 Correlation(X,Y)=∑(X−X)^2∑(Y−Y)^2​∑(X−X)(Y−Y)

​which simplifies to a matrix multiplication of normalized series when each series is normalized.

Integration:

This function integrates into the training loop where you pass the current batch of your multivariate time series data through this decider to choose the appropriate tokenization strategy dynamically based on the data's inter-series relationships. Adjustments may be needed depending on the exact shape and nature of your data inputs.

PatchTokenizer

The main task is to convert a sequence of multivariate time series data into patches. This transformation allows the model to focus on local sub-sequences or "patches" of the data. This can be critical for capturing local temporal patterns more effectively. Patch Tokenization:

Explanation:

Integration and Usage:

This component is integrated in the model's forward function where it will preprocess the multivariate time series data before passing it to the encoder or other components. Make sure that your data dimensions are correctly managed, and the sequence_length is indeed divisible by patch_size for every batch of data.

This implementation provides a foundational structure for the PatchTokenizer component. Depending on your specific requirements and data characteristics, further customization might be necessary, especially concerning how to handle edge cases where the sequence length is not a perfect multiple of the patch size.

TODO

Test against the ETT data set here https://github.com/zhouhaoyi/ETDataset/tree/main for equivalence with the paper Initialize the model and prepare the dataset. Implement the training loop using the loss function (MSE) and an optimizer (like Adam). Evaluate the model on your validation/test dataset.