Awesome
Distributing hctsa calculations on a computing cluster
Code for distributing highly comparative time-series analysis computations, using hctsa, on a computing cluster using pbs or slurm using Matlab (without linking to a mySQL database).
A basic pipeline:
- Set up a large
HCTSA.mat
file for your computation on your local machine usingTS_Init
. - Ensure that the hctsa version on your computing cluster is identical to the local version used to run
TS_Init
(otherwise results could be inconsistent). - Transfer the (uncomputed)
HCTSA.mat
file onto the cluster - Set the parameters
tsMin
,tsMax
, andnumPerJob
inHCTSA_run.sh
. These parameters determine howHCTSA.mat
will be distributed into segments, each of which will be submitted as a cluster job. - Run
HCTSA_run.sh
in the parent directory which should contain theHCTSA.mat
file. This will generate a set of directories containing subsets of time series. (NB: you may need to grant yourself permission to execute:chmod u+x HCTSA_run.sh
) - When all computations are complete, stitch all the subsections of the main
HCTSA
file back together again usingcombineBatchFiles
. This yields a fully computedHCTSA.mat
file. :smile: