Home

Awesome

Lacuna Funded Project: MasakhaNER

Datasets developed by the projects are:

Team & Partners

Language Coordinators


LanguageCoordinator
BambaraAllahsera Auguste Tapo
ChichewaAmelia Taylor
EweGodson Kalipe
FonBonaventure Dossou
GhomalaKoagne Victoire Memdjokam
HausaTajuddeen Gwadabe
IgboChris Emezue
KinyarwandaHappy Buzaaba
LugandaJonathan Mukiibi
LuoPerez Ogayo
MooreFatoumata Kabore
Nigerian-PidginAremu Anuoluwapo
SetswanaValencia Wagner
ShonaBlessing Sibanda
SwahiliCatherine Gitau
TwiEdwin Buabeng-Munkoh
WolofDerguene Mbaye
isiXhosaAndiswa Bukula
YorùbáJesujoba Alabi
isiZuluRooweither Mabuya

Adding a corpus to the project

It is better to have a folder for each language (folder_name is iso 693-3 letter code) which will have two files,

  1. corpus with filename (iso 693-3 language code) e.g xho.txt
  2. A readme file describing the number of articles sentences, and tokens in the corpus. If possible, please specify news categories for the articles, since we prefer a balanced dataset across different categories.