Awesome
KKLTK: Kinyarwanda and Kirundi Languages ToolKit
KKLTK is a Python package for Kinyarwanda and Kirundi languages processing. KKLTK currently provides the sets of stopwords for both languages and other preprocessing tools such as Kinyarwanda and Kirundi tokenizers will be added soon. KKLTK requires Python 3.0, 3.5, 3.6, 3.7, or 3.8.
For more details information on how these stopwords were obtained, please refer to the paper to appear in COLING 2020 titled "KINNEWS and KIRNEWS: Benchmarking Cross-Lingual Text Classification for Kinyarwanda and Kirundi" by Rubungo Andre Niyongabo, Hong Qu, Julia Kreutzer, and Li Huang.
Installation
pip install kkltk==1.0
Usage
Stopwords
from kkltk.kin_kir_stopwords import stopwords
# Kinyarwanda
stopset_kin = stopwords.words('kinyarwanda')
# Kirundi
stopset_kir = stopwords.words('kirundi')
Contributing
KKLTK is the beginning step of putting under-represented languages on the NLP map. The provided stopwords lists on both languages are still growing. Please, kindly reach out to me for any contribution you may wish to provide.