Home

Awesome

awesome-sentence-embedding Awesome

Build Status GitHub - LICENSE

A curated list of pretrained sentence and word embedding models

Table of Contents

About This Repo

General Framework

Word Embeddings

datepapercitation counttraining codepretrained models
-WebVectors: A Toolkit for Building Web Interfaces for Vector Semantic ModelsN/A-RusVectōrēs
2013/01Efficient Estimation of Word Representations in Vector Space999+C Word2Vec
2014/12Word Representations via Gaussian Embedding221Cython -
2014/??A Probabilistic Model for Learning Multi-Prototype Word Embeddings127DMTK -
2014/??Dependency-Based Word Embeddings719C++word2vecf
2014/??GloVe: Global Vectors for Word Representation999+C GloVe
2015/06Sparse Overcomplete Word Vector Representations129C++ -
2015/06From Paraphrase Database to Compositional Paraphrase Model and Back3Theano PARAGRAM
2015/06Non-distributional Word Vector Representations68Python WordFeat
2015/??Joint Learning of Character and Word Embeddings195C -
2015/??SensEmbed: Learning Sense Embeddings for Word and Relational Similarity249-SensEmbed
2015/??Topical Word Embeddings292Cython
2016/02Swivel: Improving Embeddings by Noticing What's Missing61TF -
2016/03Counter-fitting Word Vectors to Linguistic Constraints232Python counter-fitting(broken)
2016/05Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec91Chainer -
2016/06Siamese CBOW: Optimizing Word Embeddings for Sentence Representations166TheanoSiamese CBOW
2016/06Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations58Go lexvec
2016/07Enriching Word Vectors with Subword Information999+C++ fastText
2016/08Morphological Priors for Probabilistic Neural Word Embeddings34Theano -
2016/11A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks359C++ charNgram2vec
2016/12ConceptNet 5.5: An Open Multilingual Graph of General Knowledge604Python Numberbatch
2016/??Learning Word Meta-Embeddings58-Meta-Emb(broken)
2017/02Offline bilingual word vectors, orthogonal transformations and the inverted softmax336Python -
2017/04Multimodal Word Distributions57TF word2gm
2017/05Poincaré Embeddings for Learning Hierarchical Representations413Pytorch -
2017/06Context encoders as a simple but powerful extension of word2vec13Python -
2017/06Semantic Specialisation of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints99TF Attract-Repel
2017/08Learning Chinese Word Representations From Glyphs Of Characters44C -
2017/08Making Sense of Word Embeddings92Python sensegram
2017/09Hash Embeddings for Efficient Word Representations25Keras -
2017/10BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages91Gensim BPEmb
2017/11SPINE: SParse Interpretable Neural Embeddings48Pytorch SPINE
2017/??AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP161Gensim AraVec
2017/??Ngram2vec: Learning Improved Word Representations from Ngram Co-occurrence Statistics25C -
2017/??Dict2vec : Learning Word Embeddings using Lexical Dictionaries49C++ Dict2vec
2017/??Joint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components63C -
2018/04Representation Tradeoffs for Hyperbolic Embeddings120Pytorch h-MDS
2018/04Dynamic Meta-Embeddings for Improved Sentence Representations60Pytorch DME/CDME
2018/05Analogical Reasoning on Chinese Morphological and Semantic Relations128-ChineseWordVectors
2018/06Probabilistic FastText for Multi-Sense Word Embeddings39C++ Probabilistic FastText
2018/09Incorporating Syntactic and Semantic Information in Word Embeddings using Graph Convolutional Networks3TF SynGCN
2018/09FRAGE: Frequency-Agnostic Word Representation64Pytorch -
2018/12Wikipedia2Vec: An Optimized Tool for LearningEmbeddings of Words and Entities from Wikipedia17Cython Wikipedia2Vec
2018/??Directional Skip-Gram: Explicitly Distinguishing Left and Right Context for Word Embeddings106-ChineseEmbedding
2018/??cw2vec: Learning Chinese Word Embeddings with Stroke n-gram Information45C++ -
2019/02VCWE: Visual Character-Enhanced Word Embeddings5Pytorch VCWE
2019/05Learning Cross-lingual Embeddings from Twitter via Distant Supervision2Text -
2019/08An Unsupervised Character-Aware Neural Approach to Word and Context Representation Learning5TF -
2019/08ViCo: Word Embeddings from Visual Co-occurrences7Pytorch ViCo
2019/11Spherical Text Embedding25C -
2019/??Unsupervised word embeddings capture latent knowledge from materials science literature150Gensim -

OOV Handling

Contextualized Word Embeddings

datepapercitation countcodepretrained models
-Language Models are Unsupervised Multitask LearnersN/ATF <br>Pytorch, TF2.0 <br>Keras GPT-2(117M, 124M, 345M, 355M, 774M, 1558M)
2017/08Learned in Translation: Contextualized Word Vectors524Pytorch <br>Keras CoVe
2018/01Universal Language Model Fine-tuning for Text Classification167Pytorch ULMFit(English, Zoo)
2018/02Deep contextualized word representations999+Pytorch <br>TF ELMO(AllenNLP, TF-Hub)
2018/04Efficient Contextualized Representation:Language Model Pruning for Sequence Labeling26Pytorch LD-Net
2018/07Towards Better UD Parsing: Deep Contextualized Word Embeddings, Ensemble, and Treebank Concatenation120Pytorch ELMo
2018/08Direct Output Connection for a High-Rank Language Model24Pytorch DOC
2018/10BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding999+TF <br>Keras <br>Pytorch, TF2.0 <br>MXNet <br>PaddlePaddle <br>TF <br>Keras BERT(BERT, ERNIE, KoBERT)
2018/??Contextual String Embeddings for Sequence Labeling486Pytorch Flair
2018/??Improving Language Understanding by Generative Pre-Training999+TF <br>Keras <br>Pytorch, TF2.0 GPT
2019/01Multi-Task Deep Neural Networks for Natural Language Understanding364Pytorch MT-DNN
2019/01BioBERT: pre-trained biomedical language representation model for biomedical text mining634TF BioBERT
2019/01Cross-lingual Language Model Pretraining639Pytorch <br>Pytorch, TF2.0 XLM
2019/01Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context754TF <br>Pytorch <br>Pytorch, TF2.0 Transformer-XL
2019/02Efficient Contextual Representation Learning Without Softmax Layer2Pytorch -
2019/03SciBERT: Pretrained Contextualized Embeddings for Scientific Text124Pytorch, TF SciBERT
2019/04Publicly Available Clinical BERT Embeddings229Text clinicalBERT
2019/04ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission84Pytorch ClinicalBERT
2019/05ERNIE: Enhanced Language Representation with Informative Entities210Pytorch ERNIE
2019/05Unified Language Model Pre-training for Natural Language Understanding and Generation278Pytorch UniLMv1(unilm1-large-cased, unilm1-base-cased)
2019/05HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization81-
2019/06Pre-Training with Whole Word Masking for Chinese BERT98Pytorch, TF BERT-wwm
2019/06XLNet: Generalized Autoregressive Pretraining for Language Understanding999+TF <br>Pytorch, TF2.0 XLNet
2019/07ERNIE 2.0: A Continual Pre-training Framework for Language Understanding107PaddlePaddle ERNIE 2.0
2019/07SpanBERT: Improving Pre-training by Representing and Predicting Spans282Pytorch SpanBERT
2019/07RoBERTa: A Robustly Optimized BERT Pretraining Approach999+Pytorch <br>Pytorch, TF2.0 RoBERTa
2019/09Subword ELMo1Pytorch -
2019/09Knowledge Enhanced Contextual Word Representations115-
2019/09TinyBERT: Distilling BERT for Natural Language Understanding129-
2019/09Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism136Pytorch Megatron-LM(BERT-345M, GPT-2-345M)
2019/09MultiFiT: Efficient Multi-lingual Language Model Fine-tuning29Pytorch -
2019/09Extreme Language Model Compression with Optimal Subwords and Shared Projections32-
2019/09MULE: Multimodal Universal Language Embedding5-
2019/09Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks51-
2019/09K-BERT: Enabling Language Representation with Knowledge Graph59-
2019/09UNITER: Learning UNiversal Image-TExt Representations60-
2019/09ALBERT: A Lite BERT for Self-supervised Learning of Language Representations803TF -
2019/10BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension349Pytorch BART(bart.base, bart.large, bart.large.mnli, bart.large.cnn, bart.large.xsum)
2019/10DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter481Pytorch, TF2.0 DistilBERT
2019/10Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer696TF T5
2019/11CamemBERT: a Tasty French Language Model102-CamemBERT
2019/11ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations15Pytorch -
2019/11Unsupervised Cross-lingual Representation Learning at Scale319Pytorch XLM-R (XLM-RoBERTa)(xlmr.large, xlmr.base)
2020/01ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training35Pytorch ProphetNet(ProphetNet-large-16GB, ProphetNet-large-160GB)
2020/02CodeBERT: A Pre-Trained Model for Programming and Natural Languages25Pytorch CodeBERT
2020/02UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training33Pytorch -
2020/03ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators203TF ELECTRA(ELECTRA-Small, ELECTRA-Base, ELECTRA-Large)
2020/04MPNet: Masked and Permuted Pre-training for Language Understanding5Pytorch MPNet
2020/05ParsBERT: Transformer-based Model for Persian Language Understanding1Pytorch ParsBERT
2020/05Language Models are Few-Shot Learners382--
2020/07InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training12Pytorch -

Pooling Methods

Encoders

datepapercitation countcodemodel_name
-Incremental Domain Adaptation for Neural Machine Translation in Low-Resource SettingsN/APython AraSIF
2014/05Distributed Representations of Sentences and Documents999+Pytorch <br>Python Doc2Vec
2014/11Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models849Theano <br>Pytorch VSE
2015/06Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books795Theano <br>TF <br>Pytorch, Torch SkipThought
2015/11Order-Embeddings of Images and Language354Theano order-embedding
2015/11Towards Universal Paraphrastic Sentence Embeddings411Theano ParagramPhrase
2015/??From Word Embeddings to Document Distances999+C, Python Word Mover's Distance
2016/02Learning Distributed Representations of Sentences from Unlabelled Data363Python FastSent
2016/07Charagram: Embedding Words and Sentences via Character n-grams144Theano Charagram
2016/11Learning Generic Sentence Representations Using Convolutional Neural Networks76Theano ConvSent
2017/03Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features319C++ Sent2Vec
2017/04Learning to Generate Reviews and Discovering Sentiment293TF <br>Pytorch <br>Pytorch Sentiment Neuron
2017/05Revisiting Recurrent Networks for Paraphrastic Sentence Embeddings60Theano GRAN
2017/05Supervised Learning of Universal Sentence Representations from Natural Language Inference Data999+Pytorch InferSent
2017/07VSE++: Improving Visual-Semantic Embeddings with Hard Negatives132Pytorch VSE++
2017/08Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm357Keras <br>Pytorch DeepMoji
2017/09StarSpace: Embed All The Things!129C++ StarSpace
2017/10DisSent: Learning Sentence Representations from Explicit Discourse Relations47Pytorch DisSent
2017/11Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations128Theano para-nmt
2017/11Dual-Path Convolutional Image-Text Embedding with Instance Loss44Matlab Image-Text-Embedding
2018/03An efficient framework for learning sentence representations183TF Quick-Thought
2018/03Universal Sentence Encoder564TF-HubUSE
2018/04End-Task Oriented Textual Entailment via Deep Explorations of Inter-Sentence Interactions14Theano DEISTE
2018/04Learning general purpose distributed sentence representations via large scale multi-task learning198Pytorch GenSen
2018/06Embedding Text in Hyperbolic Spaces50TF HyperText
2018/07Representation Learning with Contrastive Predictive Coding736Keras CPC
2018/08Context Mover’s Distance & Barycenters: Optimal transport of contexts for building representations8Python CMD
2018/09Learning Universal Sentence Representations with Mean-Max Attention Autoencoder14TF Mean-MaxAAE
2018/10Learning Cross-Lingual Sentence Representations via a Multi-task Dual-Encoder Model35TF-HubUSE-xling
2018/10Improving Sentence Representations with Consensus Maximisation4-Multi-view
2018/10BioSentVec: creating sentence embeddings for biomedical texts70Python BioSentVec
2018/11Word Mover's Embedding: From Word2Vec to Document Embedding47C, Python WordMoversEmbeddings
2018/11A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks76Pytorch HMTL
2018/12Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond238Pytorch LASER
2018/??Convolutional Neural Network for Universal Sentence Embeddings6Theano CSE
2019/01No Training Required: Exploring Random Encoders for Sentence Classification54Pytorch randsent
2019/02CBOW Is Not All You Need: Combining CBOW with the Compositional Matrix Space Model4Pytorch CMOW
2019/07GLOSS: Generative Latent Optimization of Sentence Representations1-GLOSS
2019/07Multilingual Universal Sentence Encoder52TF-HubMultilingualUSE
2019/08Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks261Pytorch Sentence-BERT
2020/02SBERT-WK: A Sentence Embedding Method By Dissecting BERT-based Word Models11Pytorch SBERT-WK
2020/06DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations4Pytorch DeCLUTR
2020/07Language-agnostic BERT Sentence Embedding5TF-HubLaBSE
2020/11On the Sentence Embeddings from Pre-trained Language Models0TF BERT-flow

Evaluation

Misc

Vector Mapping

Articles