Home

Awesome

Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer

Introduction

This repository contains the code and data for replicating results from

Intrinsic Bias

- Prerequisite

- Multilingual Intrinsic Bias Dataset:

We include all the occupations as well as the gender seed words for each language under intrinsic folder.

- Codes:

To evaluate intrinsic bias in each language, refer to inBias.ipynb for bias analysis and results.

Extrinsic Bias

- Multilingual BiosBias (MLBs) Dataset:

To replicate the MLBs dataset, please refer to replicateMLBs folder. For EN dataset, please refer to biosbias

- Codes:

The codes for downstream task is under bios_codes folder.

If you use this code or use the EN MLB dataset, please also cite Bias in Bios: A Case Study of Semantic Representation Bias in a High Stakes Setting

@inproceedings{de2019bias,
  title={Bias in bios: A case study of semantic representation bias in a high-stakes setting},
  author={De-Arteaga, Maria and Romanov, Alexey and Wallach, Hanna and Chayes, Jennifer and Borgs, Christian and Chouldechova, Alexandra and Geyik, Sahin and Kenthapadi, Krishnaram and Kalai, Adam Tauman},
  booktitle={Proceedings of the Conference on Fairness, Accountability, and Transparency},
  pages={120--128},
  year={2019}
}

Citation

@inproceedings{zhao-etal-2020-gender,
    title = "Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer",
    author = "Zhao, Jieyu  and
      Mukherjee, Subhabrata  and
      Hosseini, saghar  and
      Chang, Kai-Wei  and
      Hassan Awadallah, Ahmed",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    year = "2020",
    publisher = "Association for Computational Linguistics",
    pages = "2896--2907",
}