Home

Awesome

word2vec-api

Simple web service providing a word embedding API. The methods are based on Gensim Word2Vec implementation. Models are passed as parameters and must be in the Word2Vec text or binary format. Updated to run on Python 3.

pip install -r requirements.txt
python word2vec-api --model path/to/the/model [--host host --port 1234]

or

python word2vec-api.py --model /path/to/GoogleNews-vectors-negative300.bin --binary BINARY --path /word2vec --host 0.0.0.0 --port 5000
curl http://127.0.0.1:5000/word2vec/n_similarity?ws1=Sushi&ws1=Shop&ws2=Japanese&ws2=Restaurant
curl http://127.0.0.1:5000/word2vec/similarity?w1=Sushi&w2=Japanese
curl http://127.0.0.1:5000/word2vec/most_similar?positive=indian&positive=food[&negative=][&topn=]
curl http://127.0.0.1:5000/word2vec/model?word=restaurant
curl http://127.0.0.1:5000/word2vec/model_word_set

Note: The "model" method returns a base64 encoding of the vector. "model_word_set" returns a base64 encoded pickle of the model's vocabulary.

Where to get a pretrained model

In case you do not have domain specific data to train, it can be convenient to use a pretrained model. Please feel free to submit additions to this list through a pull request.

Model fileNumber of dimensionsCorpus (size)Vocabulary sizeAuthorArchitectureTraining AlgorithmContext window - sizeWeb page
Google News300Google News (100B)3MGoogleword2vecnegative samplingBoW - ~5link
Freebase IDs1000Gooogle News (100B)1.4MGoogleword2vec, skip-gram?BoW - ~10link
Freebase names1000Gooogle News (100B)1.4MGoogleword2vec, skip-gram?BoW - ~10link
Wikipedia+Gigaword 550Wikipedia+Gigaword 5 (6B)400,000GloVeGloVeAdaGrad10+10link
Wikipedia+Gigaword 5100Wikipedia+Gigaword 5 (6B)400,000GloVeGloVeAdaGrad10+10link
Wikipedia+Gigaword 5200Wikipedia+Gigaword 5 (6B)400,000GloVeGloVeAdaGrad10+10link
Wikipedia+Gigaword 5300Wikipedia+Gigaword 5 (6B)400,000GloVeGloVeAdaGrad10+10link
Common Crawl 42B300Common Crawl (42B)1.9MGloVeGloVeGloVeAdaGradlink
Common Crawl 840B300Common Crawl (840B)2.2MGloVeGloVeGloVeAdaGradlink
Twitter (2B Tweets)25Twitter (27B)?GloVeGloVeGloVeAdaGradlink
Twitter (2B Tweets)50Twitter (27B)?GloVeGloVeGloVeAdaGradlink
Twitter (2B Tweets)100Twitter (27B)?GloVeGloVeGloVeAdaGradlink
Twitter (2B Tweets)200Twitter (27B)?GloVeGloVeGloVeAdaGradlink
Wikipedia dependency300Wikipedia (?)174,015Levy & Goldbergword2vec modifiedword2vecsyntactic dependencieslink
DBPedia vectors (wiki2vec)1000Wikipedia (?)?Idioword2vecword2vec, skip-gramBoW, 10link
60 Wikipedia embeddings with 4 kinds of context25,50,100,250,500WikipediavariesLi, Liu et al.Skip-Gram, CBOW, GloVeoriginal and modified2link
German Wikipedia+News300Wikipedia + Statmt News 2013 (1.1B)608.130Andreas Müllerword2vecSkip-Gram5link