Home

Awesome

chakin

chakin is a downloader for pre-trained word vectors. Supported many vectors

This library lets you download pre-trained word vectors without troublesome work.

<div align="center"> <img src="https://github.com/chakki-works/chakin/blob/master/docs/top.jpg?raw=true"><br> </div>
<!-- Word vectors are very important for many natural language processing tasks such as document classification, named entity recognition, question answering and so on. In such tasks, you can use the pre-trained word vectors many people have published. But it is troublesome that you find and download them by yourself. -->

Installation

To install chakin, simply:

$ pip install chakin

Usage

You can download pre-trained word vectors as follows:

$ python
>>> import chakin
>>> chakin.search(lang='English')
                   Name  Dimension                     Corpus VocabularySize  
2          fastText(en)        300                  Wikipedia           2.5M   
11         GloVe.6B.50d         50  Wikipedia+Gigaword 5 (6B)           400K   
12        GloVe.6B.100d        100  Wikipedia+Gigaword 5 (6B)           400K   
13        GloVe.6B.200d        200  Wikipedia+Gigaword 5 (6B)           400K   
14        GloVe.6B.300d        300  Wikipedia+Gigaword 5 (6B)           400K   
15       GloVe.42B.300d        300          Common Crawl(42B)           1.9M   
16      GloVe.840B.300d        300         Common Crawl(840B)           2.2M   
17    GloVe.Twitter.25d         25               Twitter(27B)           1.2M   
18    GloVe.Twitter.50d         50               Twitter(27B)           1.2M   
19   GloVe.Twitter.100d        100               Twitter(27B)           1.2M   
20   GloVe.Twitter.200d        200               Twitter(27B)           1.2M   
21  word2vec.GoogleNews        300          Google News(100B)           3.0M 

>>> chakin.download(number=2, save_dir='./') # select fastText(en)
Test: 100% ||               | Time: 0:00:02  60.7 MiB/s
'./wiki.en.vec'

Supported vectors

So far, chakin supports following word vectors:

NameDimensionCorpusVocabularySizeMethodLanguage
fastText(ar)300Wikipedia610KfastTextArabic
fastText(de)300Wikipedia2.3MfastTextGerman
fastText(en)300Wikipedia2.5MfastTextEnglish
fastText(es)300Wikipedia985KfastTextSpanish
fastText(fr)300Wikipedia1.2MfastTextFrench
fastText(it)300Wikipedia871KfastTextItalian
fastText(ja)300Wikipedia580KfastTextJapanese
fastText(ko)300Wikipedia880KfastTextKorean
fastText(pt)300Wikipedia592KfastTextPortuguese
fastText(ru)300Wikipedia1.9MfastTextRussian
fastText(zh)300Wikipedia330KfastTextChinese
GloVe.6B.50d50Wikipedia+Gigaword 5 (6B)400KGloVeEnglish
GloVe.6B.100d100Wikipedia+Gigaword 5 (6B)400KGloVeEnglish
GloVe.6B.200d200Wikipedia+Gigaword 5 (6B)400KGloVeEnglish
GloVe.6B.300d300Wikipedia+Gigaword 5 (6B)400KGloVeEnglish
GloVe.42B.300d300Common Crawl(42B)1.9MGloVeEnglish
GloVe.840B.300d300Common Crawl(840B)2.2MGloVeEnglish
GloVe.Twitter.25d25Twitter(27B)1.2MGloVeEnglish
GloVe.Twitter.50d50Twitter(27B)1.2MGloVeEnglish
GloVe.Twitter.100d100Twitter(27B)1.2MGloVeEnglish
GloVe.Twitter.200d200Twitter(27B)1.2MGloVeEnglish
word2vec.GoogleNews300Google News(100B)3.0Mword2vecEnglish
word2vec.Wiki-NEologd.50d50Wikipedia335Kword2vec + NEologdJapanese