Home

Awesome

CUSIM

License Build Status contributions welcome Documentation Status

Superfast CUDA implementation of Word2Vec and Latent Dirichlet Allocation (LDA)

Introduction

This project is to speed up various ML models (e.g. topic modeling, word embedding, etc) by CUDA. It would be nice to think of it as gensim's GPU version project. As a starting step, I implemented the most widely used word embedding model, the word2vec model, and the most representative topic model, the LDA (Latent Dirichlet Allocation) model.

Requirements

How to install

pip install cusim
# clone repo and submodules
git clone git@github.com:js1010/cusim.git && cd cusim && git submodule update --init

# install requirements
pip install -r requirements.txt

# generate proto
python -m grpc_tools.protoc --python_out cusim/ --proto_path cusim/proto/ config.proto

# install
python setup.py install

How to use

Performance

attr1 workers (gensim)2 workers (gensim)4 workers (gensim)8 workers (gensim)NVIDIA T4 (cusim)
training time (sec)892.596544.212310.727226.47216.162
pearson0.4878320.4876960.4828210.4871360.492101
spearman0.5008460.5062140.5010480.5067180.479468
attr1 workers (gensim)2 workers (gensim)4 workers (gensim)8 workers (gensim)NVIDIA T4 (cusim)
training time (sec)586.545340.489220.804146.2333.9173
pearson0.3544480.3539520.3523980.3529250.360436
spearman0.3691460.3693650.3705650.3658220.355204
attr1 workers (gensim)2 workers (gensim)4 workers (gensim)8 workers (gensim)NVIDIA T4 (cusim)
training time (sec)250.135155.121103.5773.80736.20787
pearson0.3096510.3218030.3248540.3142550.480298
spearman0.2940470.3087230.3182930.3005910.480971
attr1 workers (gensim)2 workers (gensim)4 workers (gensim)8 workers (gensim)NVIDIA T4 (cusim)
training time (sec)176.923100.36969.782949.92749.90391
pearson0.187720.1931520.2045090.1879240.368202
spearman0.2439750.245870.2605310.2374410.358042
attrgensim (8 vpus)cusim (NVIDIA T4)
training time (sec)447.37676.6972

Future tasks