Awesome
Joint-Multiclass-Debiasing-of-Word-Embeddings
This repository contains code for the paper: Joint Multiclass Debiasing of Word Embeddings, accepted for 25th International Symposium on Intelligent Systems (ISMIS 2020), Graz, Austria, September 2020.
Paper (arxiv.org version) can be found at https://arxiv.org/abs/2003.11520.
Description
Word Embedding, as an important tool for numerous downstream NLP tasks, can contain different kinds of biases, based on gender, religion, race. In this direction, by extending work from Bolukbasi et al. and Caliskan et al. HardWEAT and SoftWEAT are created with an aim to reduce this phenomenon simultaneously/jointly on multiple bias classes/categories. Former completely eliminates bias measured with WEAT, while latter provides an user with a choice to what extent debiasing procedure will occur. Experiments show that the two methods are able to both decrease bias levels while minimizing the structure modification of vector representation. In addition, debiasing of Word Embeddings, translates to variance decline of polarity scores within the task of Sentiment Analysis.
Here, samples of Word2Vec, GloVe, FastText Embeddings are used, containing only words having frequency of 200 or more within the English Wikipedia, along with highly polarizing IMDB Movie Dataset. On these datasets, HardWEAT and SoftWEAT are examined via WEAT bias experiments, Mikolov analogy, Rank Similarity and Sentiment Analysis tasks (see Main.ipynb). Furthermore, corresponding online appendix for the paper is provided.
Installation
For running Main.ipynb, python environment can be created using libraries listed in requirements.txt file. Project was done by in Python 3.6.