Home

Awesome

k-Shape: Efficient and Accurate Clustering of Time Series

<div align="center"> <p> <img alt="PyPI - Downloads" src="https://pepy.tech/badge/kshape"> <img alt="GitHub" src="https://img.shields.io/github/license/TheDatumOrg/kshape-python"> <img alt="PyPI" src="https://img.shields.io/pypi/v/kshape"> <img alt="GitHub issues" src="https://img.shields.io/github/issues/TheDatumOrg/kshape-python"> <img alt="PyPI - Python Version" src="https://img.shields.io/pypi/pyversions/kshape"> </p> </div>

k-Shape is a highly accurate and efficient unsupervised method for univariate and multivariate time-series clustering. k-Shape appeared at the ACM SIGMOD 2015 conference, where it was selected as one of the (2) best papers and received the inaugural 2015 ACM SIGMOD Research Highlight Award. An extended version appeared in the ACM TODS 2017 journal. Since then, k-Shape has achieved state-of-the-art performance in both univariate and multivariate time-series datasets (i.e., k-Shape is among the fastest and most accurate time-series clustering methods, ranked in the top positions of established benchmarks with 100+ datasets).

k-Shape has been widely adopted across scientific areas (e.g., computer science, social science, space science, engineering, econometrics, biology, neuroscience, and medicine), Fortune 100-500 enterprises (e.g., Exelon, Nokia, and many financial firms), and organizations such as the European Space Agency.

If you use k-Shape in your project or research, cite the following two papers:

References

"k-Shape: Efficient and Accurate Clustering of Time Series"<br/> John Paparrizos and Luis Gravano<br/> 2015 ACM SIGMOD International Conference on Management of Data (ACM SIGMOD 2015)<br/>

@inproceedings{paparrizos2015k,
  title={{k-Shape: Efficient and Accurate Clustering of Time Series}},
  author={Paparrizos, John and Gravano, Luis},
  booktitle={Proceedings of the 2015 ACM SIGMOD international conference on management of data},
  pages={1855--1870},
  year={2015}
}

"Fast and Accurate Time-Series Clustering"<br/> John Paparrizos and Luis Gravano<br/> ACM Transactions on Database Systems (ACM TODS 2017), volume 42(2), pages 1-49<br/>

@article{paparrizos2017fast,
  title={{Fast and Accurate Time-Series Clustering}},
  author={Paparrizos, John and Gravano, Luis},
  journal={ACM Transactions on Database Systems (ACM TODS)},
  volume={42},
  number={2},
  pages={1--49},
  year={2017}
}

Acknowledgements

We thank Teja Bogireddy for his valuable help on this repository.

We also thank the initial contributors Jörg Thalheim and Gregory Rehm. The initial code was used in Sieve.

k-Shape's Python Repository

This repository contains the Python implementation for k-Shape. For the Matlab version, check here.

Data

To ease reproducibility, we share our results over two established benchmarks:

For the preprocessing steps check here.

Installation

Our code has dependencies on the following python packages:

Install from pip

$ pip install kshape

Install from source

$ git clone https://github.com/thedatumorg/kshape-python
$ cd kshape-python
$ python setup.py install

Benchmarking

We present the runtime performance of k-Shape when varying the number of time series, number of clusters, and the lengths of time series. (All results are the average of 5 runs.)

<p align="center"> <img src="https://github.com/TheDatumOrg/kshape-python/blob/main/docs/benchmarkings.png"> </p>

Usage

Univariate Example:

import numpy as np
from kshape.core import KShapeClusteringCPU 
from kshape.core_gpu import KShapeClusteringGPU 

univariate_ts_datasets = np.expand_dims(np.random.rand(200, 60), axis=2)
num_clusters = 3

# CPU Model
ksc = KShapeClusteringCPU(num_clusters, centroid_init='zero', max_iter=100, n_jobs=-1)
ksc.fit(univariate_ts_datasets)

labels = ksc.labels_ # or ksc.predict(univariate_ts_datasets)
cluster_centroids = ksc.centroids_
    
    
# GPU Model
ksg = KShapeClusteringGPU(num_clusters, centroid_init='zero', max_iter=100)
ksg.fit(univariate_ts_datasets)

labels = ksg.labels_
cluster_centroids = ksg.centroids_.detach().cpu()

Multivariate Example:

import numpy as np
from kshape.core import KShapeClusteringCPU 
from kshape.core_gpu import KShapeClusteringGPU 

multivariate_ts_datasets = np.random.rand(200, 60, 6)
num_clusters = 3

# CPU Model
ksc = KShapeClusteringCPU(num_clusters, centroid_init='zero', max_iter=100, n_jobs=-1)
ksc.fit(univariate_ts_datasets)

labels = ksc.labels_
cluster_centroids = ksc.centroids_
    
    
# GPU Model
ksg = KShapeClusteringGPU(num_clusters, centroid_init='zero', max_iter=100)
ksg.fit(univariate_ts_datasets)

labels = ksg.labels_
cluster_centroids = ksg.centroids_.detach().cpu()

Also see Examples for UCR/UAE dataset clustering

Results

The following tables contain the average Rand Index (RI), Adjusted Rand Index (ARI), and Normalized Mutual Information (NMI) accuracy values over 10 runs for k-Shape on the univariate and multivariate datasets.

Note: We collected the results using a single core implementation.

Server Specifications: AMD Ryzen 9 5900HX 8 Cores 3.30 GHz, 16GB RAM.

GPU Specifications: NVIDIA GeForce RTX 3070, 8GB memory.

Univariate Results:

DatasetsRIARINMIRuntime (secs)
ACSF10.7288894470.1391271780.385362576181.97282
Adiac0.9481992190.2374560720.585026777150.23389
AllGestureWiimoteX0.8309889890.0918331050.19967124132.64325
AllGestureWiimoteY0.833560360.13060810.26532011668.32064
AllGestureWiimoteZ0.8317961960.081846440.184288361117.54415
ArrowHead0.6236966820.1764088280.2517164431.42841
Beef0.6665536720.1022916220.2749834962.04646
BeetleFly0.5184615380.0372432620.0491706340.62138
BirdChicken0.5229487180.0468634440.0558057130.46606
BME0.6236623220.2091892150.3375624470.75734
Car0.6680952380.1427859260.2225746134.87239
CBF0.8755773930.7245637170.7703340577.47873
Chinatown0.5260755680.0411171660.0156938190.548231
ChlorineConcentration0.526233814-0.0010190870.00077235468.01957
CinCECGTorso0.6253071490.0518036060.093350668271.74131
Coffee0.7264935060.4538378340.4218209480.41349
Computers0.5291879760.0584817150.04856093.01130
CricketX0.8697017870.1746559470.35791691555.23645
CricketY0.8731539450.2063813170.37365636848.83094
CricketZ0.8699098120.1726696050.35560441144.52660
Crop0.9241083490.2419743350.43881235420.01129
DiatomSizeReduction0.9191791950.8077108450.8271172981.59904
DistalPhalanxOutlineAgeGroup0.7221848250.4359435680.3299056082.12145
DistalPhalanxOutlineCorrect0.499455708-0.0010303512.97E-052.26317
DistalPhalanxTW0.8396079760.592727260.53106025510.96752
DodgerLoopDay0.7819882290.2109169250.4028973751.69891
DodgerLoopGame0.5700717570.1406204990.1171619690.86779
DodgerLoopWeekend0.8308070630.6579669090.6281312210.495587
Earthquakes0.5416599080.0242671930.0062622689.69413
ECG2000.6137537690.2157942220.128705740.74401
ECG50000.7713079980.5307033530.523220504163.82402
ECGFiveDays0.8114467340.6231225650.5864925734.52766
ElectricDevices0.6935519630.0711614490.177107461591.80007
EOGHorizontalSignal0.868648510.2270348040.408923026357.01975
EOGVerticalSignal0.87082521,0.2007632310.37416983236.19376
EthanolLevel0.6222736170.0034802050.007896876188.62335
FaceAll0.9102950250.4332660260.610598916317.37956
FaceFour0.7573359070.3742398960.4667465431.38740
FacesUCR0.9102950250.4332660260.610598916136.62772
FiftyWords0.9515582070.3589258640.651569015198.84656
Fish0.7853458860.1898856150.32795136117.13432
FordA0.5646192440.1292376860.096210429344.81591
FordB0.5161093830.0322182110.023938345254.47971
FreezerRegularTrain0.6387441370.2774886820.21154738718.45565
FreezerSmallTrain0.6390496820.2780997830.21204566326.71921
Fungi0.8291268230.3575436720.7311732676.11174
GestureMidAirD10.9448194120.29376620.63550344430.88751
GestureMidAirD20.9476972240.3485824750.67731090543.38524
GestureMidAirD30.9312661320.1267591990.45878250918.98568
GesturePebbleZ10.8830814660.5859314820.67529312711.72848
GesturePebbleZ20.8813531350.5805545380.663927927.60654
GunPoint0.497487437-0.00505050500.431333
GunPointAgeSpan0.5319911310.0641411450.0531468841.59410
GunPointMaleVersusFemale0.7901276180.5802420810.5717765351.08047
GunPointOldVersusYoung0.5187346640.0374731340.0282076143.55970
Ham0.5288315560.0576731040.0446126732.13764
HandOutlines0.6828566860.3600519470.251176285247.46488
Haptics0.6890755750.0637099390.0904219297.01234
Herring0.5014640750.0031606420.0076504631.22652
HouseTwenty0.5201974370.0400147740.0324878849.73466
InlineSkate0.7340651890.0398461630.104643365372.13227
InsectEPGRegularTrain0.7065117730.3639418160.3795565227.86684
InsectEPGSmallTrain0.704091360.3613709640.3795049885.37182
InsectWingbeatSound0.7926405390.1962258310.402373638220.85374
ItalyPowerDemand0.609728860.2196084060.1881524033.01081
LargeKitchenAppliances0.5700706720.1255766690.13042237612.03511
Lightning20.5312947660.0570176170.0897831451.93780
Lightning70.8061755150.3229630650.5064944314.51913
Mallat0.9247564610.7216560550.86989108884.35894
Meat0.7619187680.4944034010.5804227510.86227
MedicalImages0.6720050130.0734902310.228736632.23141
MelbournePedestrian0.8694416560.3491047770.470402239275.40925
MiddlePhalanxOutlineAgeGroup0.7295852620.4231152260.4017224981.57184
MiddlePhalanxOutlineCorrect0.49977175-0.003736340.0008948492.28809
MiddlePhalanxTW0.8093475640.4496361180.4313643618.09901
MixedShapesRegularTrain0.8009910790.4204144180.488448041285.77452
MixedShapesSmallTrain0.8007950290.4190363740.4766379115.97755
MoteStrain0.8048091430.6095890150.5018650614.56190
NonInvasiveFetalECGThorax10.9509819740.333739220.6764209092995.88974
NonInvasiveFetalECGThorax20.9671743350.4657611560.7656147761748.11823
OliveOil0.8068926550.5700123610.6074183331.97315
OSULeaf0.7851058370.2635509730.36158070818.38517
PhalangesOutlinesCorrect0.5053624130.010703690.0102215766.79001
Phoneme0.927697860.0347057320.2101089841747.00270
PickupGestureWiimoteZ0.8545454550.2882101520.5402343583.61598
PigAirwayPressure0.9032298620.033382520.4275796311632.92364
PigArtPressure0.9598215020.2734421780.717389411914.99103
PigCVP0.9613467720.1945169740.6583637361304.41961
PLAID0.8594448810.2816342590.40487855555.89190
Plane0.9117657780.7083442090.8515926041.14514
PowerCons0.576378830.1530699820.1379296891.74243
ProximalPhalanxOutlineAgeGroup0.7526741830.4771543950.4685376551.72700
ProximalPhalanxOutlineCorrect0.533905850.0664532880.085352631.15338
ProximalPhalanxTW0.8312227030.5694546920.5506943745.31783
RefrigerationDevices0.5562082780.0075952780.00943760928.19549
Rock0.6969358180.2180814930.322230745179.14048
ScreenType0.5596037380.0105282490.01174259726.81045
SemgHandGenderCh20.5463154120.0915594280.05847128139.87313
SemgHandMovementCh20.7394435790.1164295220.209097135195.28737
SemgHandSubjectCh20.7247870470.196609490.263889093211.94098
ShakeGestureWiimoteZ0.9031717170.4715331020.6849596043.51105
ShapeletSim0.6999396980.4000504250.3773316863.14061
ShapesAll0.9787354740.425898720.742885495201.26739
SmallKitchenAppliances0.3988539390.0049074050.0251415925.50886
SmoothSubspace0.6424347830.1982529440.199542722.06081
SonyAIBORobotSurface10.7280577630.4555182030.4640216062.53491
SonyAIBORobotSurface20.5891405220.1724968020.117502944.86348
StarLightCurves0.7691940650.5206889620.61022134164.50148
Strawberry0.504165518-0.0193987830.1233965076.72441
SwedishLeaf0.8902540130.3123067790.55617961158.87581
Symbols0.8803144180.6192229410.75759431723.11830
SyntheticControl0.8819849750.6006818960.7125331756.90626
ToeSegmentation10.502006820.0040593690.0050571911.78287
ToeSegmentation20.6356188390.2602427380.1915057171.96561
Trace0.7110653270.4559009940.5989519992.30357
TwoLeadECG0.5380249680.0761559160.0590006938.53791
TwoPatterns0.6779791720.2078307720.318418523185.70084
UMD0.5970577280.1309926370.1891841370.93842
UWaveGestureLibraryAll0.903649520.5760240480.662693972288.38747
UWaveGestureLibraryX0.854355870.3539635250.457132359348.93967
UWaveGestureLibraryY0.8304762880.248454140.342123959471.75583
UWaveGestureLibraryZ0.8490912060.3500806370.46397562448.39118
Wafer0.5419956090.0264596780.01036778441.34034
Wine0.496478296-0.0051879190.0010564790.57659
WordSynonyms0.8925370360.2215783060.45175472274.17649
Worms0.6475281270.0284585750.06259139324.33412
WormsTwoClass0.5036165660.006954460.0098279698.10779
Yoga0.499909412-0.0003406637.76E-05146.22124

Multivariate Results:

DatasetsRIARINMIRuntime (secs)
ArticularyWordRecognition0.972846530.6829360.8642092532.5272
AtrialFibrillation0.5609195400.016338120.12810625976.43405
BasicMotions0.7250.30906100.445923938.9816293
CharacterTrajectories0.93659070.4594230.70255146976.2988
Cricket0.933829910.6245380.825730241370.6116316
DuckDuckGeese0.6256560.011008730.0813033213447.5819
ERing0.878684500.57420140.647674291.04038
Epilepsy0.810000.503520.5480585183.565232
EthanolConcentration0.59969-0.003948740.0010586471.00570
FaceDetection0.500100.0002123470.000230054983.670330
FingerMovements0.50254860.00509350.005415024977.18741
HandMovementDirection0.6006740.048467410.05801942.49626
Handwriting0.9166500.1204140.407972304.06015
Heartbeat0.5020370.00403790.00326084857.40035
InsectWingbeat0.655130.002220.01020705605.323
JapaneseVowels0.8596390.3147330.45915411286.313125
LSST0.7604420.06084860.1244025608.39906
Libras0.906850.306829970.560319437.0877
MotorImagery0.499571940.000493110.003326118263.257795
NATOPS0.821757960.37390070.45782146265.831900
PenDigits0.91469770.5735920.698418605172.9306
PhonemeSpectra0.8071460.01431220.0894769628615.90575
RacketSports0.76668190.383860.442636255289.75656
SelfRegulationSCP10.5159910.0323660.035956543.48927
SelfRegulationSCP20.498805-0.0023690.000182381194.50309
SpokenArabicDigits0.954150.74550010.8026964512275.5243
StandWalkJump0.49572640.0408503540.16682412.409304
UWaveGestureLibrary0.865960.4741136160.629729184.98871