Home

Awesome

BioReddit Embeddings

This repository contains word embeddings trained on medical subreddits. We provide embeddings for GloVe (Pennington et al., 2014), ELMo (Peters et al., 2018), and Flair (Akbik et al., 2018).

The embeddings are trained on ~800,000 Reddit posts from over 60 medical-themed communities. We describe the training and evaluation process of the embeddings in Basaldella and Collier, BioReddit: Word Embeddings for User-Generated Biomedical NLP, presented at the Tenth International Workshop on Health Text Mining and Information Analysis (LOUHI 2019), co-located with the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP 2019).

Embeddings

You can download the embeddings in the release section of this repository or using the links in the table below:

EmbeddingDownload Link
ELMooptions, weights
Flairforward, backward
GloVe 50txt, bin
Glove 100txt, bin
Glove 200txt, bin
FastTextSee COMETA
BERTSee COMETA

Code

You can find the code used to download the subreddits here.