Home

Awesome

Masakhane - A living collection of NLP projects for Africans, by Africans

PRs Welcome Slack Status

<div align="center"> <img src="https://pbs.twimg.com/profile_images/1255858628986384384/d7Lk9I-w_400x400.jpg" > </div>

MASAKHANE is an research effort for NLP for African languages that is OPEN SOURCE, CONTINENT-WIDE, DISTRIBUTED and ONLINE. This GitHub repository houses the data, code, results and research for building open baseline NLP results for African languages.

Website: masakhane.io

Our Mission

Masakhane is a grassroots organisation whose mission is to strengthen and spur NLP research in African languages, for Africans, by Africans. Despite the fact that 2000 of the world’s languages are African, African languages are barely represented in technology. The tragic past of colonialism has been devastating for African languages in terms of their support, preservation and integration. This has resulted in technological space that does not understand our names, our cultures, our places, our history.

Masakhane roughly translates to “We build together” in isiZulu. Our goal is for Africans to shape and own these technological advances towards human dignity, well-being and equity, through inclusive community building, open participatory research and multidisciplinarity

Our Values

Goals

Progress

How can I contribute?

There are many ways to contribute to MASAKHANE.

  1. TRAIN A MODEL - Contribute a trained model and related code for your language
  2. ANALYSIS - Contribute analysis of data/models for any African languages. You do not need any technical experience for this! If you're a linguist, we can pair you up with a NLP practitioner and you can help contribute analysis
  3. DATA - Help build or find datasets for your language
  4. DOCUMENTATION - Help document our discussions, progress. This is VERY much needed. Or contribute to documentation of the base "notebook" that will improve the experience of others
  5. MENTORSHIP - Provide advice or help tune models for their languages and datasets, or help people get started
  6. ADMIN - Working with so many researchers can be quite a challenge! Help out with administrative tasks
  7. COMPUTE - Help with infrastructure and compute! Do you have spare compute to donate? Let us know! We're always looking for more!
  8. BRAINSTORM Join our weekly meetings, provide advice or ideas
  9. STORY-TELLING - Tell our stories to the world by doing talks about the community, contributing to our Medium publication, or engaging with media outlets
  10. MLOps & ML Engineering - Do you enjoy delving into the MLOps side of machine learning? Are you a software developer looking to hone-in on your ML engineer abilities? Join us to help build tools to support out reproducability, data gathering, and model sharing!

Want more details? Check out our current initiatives

How do I join?

  1. Join our Slack

  2. Request to join our Google Group - this will add you to our weekly meetings

  3. So we can feature you on our webpage masakhane.io, please fill in our membership form HERE:

Please be patient with a response via our email address, we're very behind on our administration, in the time of COVID-19.

Where do I start

Initiatives

Every week we have more ideas, and more impromptu projects that emerge. Keen on any initiatives? Join our slack and find the respective group.

Working on a Masakhane initiative that is not listed here? Please add it with a PR :heart:

Keen to help on any of these initiatives? Please see our message board

InitiativeDescriptionSlack ChannelRepository
Machine Translation BenchmarksContinued expansion and iterations on our language benchmarks as documented on the main GitHUB README#benchmarksHERE
NER Datasets and BenhmarksWe're busy releasing datasets and research around NER#nerHERE
Dataset CreationWe never have enough data. More is always needed. We have a number of members finding creative ways to build datasets.#datasetcreation
ReproducibilityThe goal is to ensure reproducibility and comparability of models and results.#reproducibility
Takalani NLPDevelopment of Language Models for South African languages#takalani-nlp
WazobiaYoruba, Igbo, Hausa and Nigerian languages NMT#wazobia
Multilingual ChatbotDeveloping multilingual chatbots#multilingual-dialogue
Transfer LearningTransfer Learning & Multilingual Expansion of Benchmarks#transfer-learning
Evaluation of Masakhane ModelsHow good are the Masakhane models? How can we measure it, besides looking at BLEU scores?#evaluation
Text-to-speechCorpora and models for text to speech synthesis (TTS) from audio bibles in Ewe, Hausa, Lingala, Asante Twi, Akuapem Twi and Yoruba#bible-speechHERE

Code of Conduct

See Code of Conduct