Home

Awesome

Awesome New Languages in Machine Translation

This is a list of initiatives for adding new languages to opensource machine translation models (such as NLLB).

Also, some notable projects for increasing the translation quality for an already supported low-resourced language would be highlighted.

The first part of the document lists individual languages in the alphabetic order of their English names.

The second part of the document lists multilingual initiatives.

Any new additions are welcome (in the form of pull requests or issues)!

Single-language projects

Ainu

Amis

Aromanian

Awajun

Bambara

Buryat

Circassian (Kabardian)

Erzya

Additionally, see TartuNLP.

Fula

Hill Mari

See TartuNLP

Interslavic

Karakalpak

Komi

See TartuNLP

Lezgian

lez, lezg1247

Livonian

See TartuNLP

Livvi Karelian

See TartuNLP

Mansi

Mari (Meadow)

See TartuNLP

Moksha

See TartuNLP

Ngambay

Qarachay Malqar

Tyvan

Udmurt

See TartuNLP

Zarma

Multilingual projects

Finno-Ugric languages (tartuNLP)

Multiple Finno-Ugric languages (including Komi, Udmurt, Hill and Meadow Mari, Erzya, Livonian, Mansi, Moksha and Livvi Karelian)

Indigenous languages of the Americas (AmericasNLP Shared Tasks)

Indigenous languages of the Americas (including Ashaninka, Aymara, Bribri, Chatino, Guarani, Hñähñu, Nahuatl, Quechua, Raramuri, Shipibo-Konibo, and Wixarika from the AmericasNLP Mt shared task, and Wayuunaiki, Arhuaco, Inga, and Nasa – additionally)

Hundreds of diverse languages (Apertium)

Apertium is a system of rule-based machine translation.

Currently, it has linguistic tools (such as dictionaries and morphological parsers) for an insane number of languages, but only few of them (51 language pairs) have been developed to a state considered stable enough for publicly releasing a translation service.