Home

Awesome

<p align="center"> <img width="357" src="docs/images/EENLP-logo.png"> </p>

About

This repo contains a curated meta-index of NLP datasets and models for Eastern European languages. It originally started as a summer school project at EEML 2021 (Eastern European Machine Learning Summer School) (hence the scope), self-organized by a group of participants. You can read more details about this initial summer school project here.

We hope this broad index of NLP resources for Eastern European languages could help:

Initially, EENLP was biased towards datasets for semantic NLP tasks such as sentiment analysis, NLI, word sense disambiguation, etc. However, we are expanding and improving this index further, so feel free to contribute new relevant resources. We are also happy to hear your feedback and suggestions via issues or at altsoph@gmail.com.

Resources

The datasets

Browse the datasets index or select your language of interest:

<a title='Albanian' href='docs/datasets.md#albania-albanian'>:albania:</a> <a title='Armenian' href='docs/datasets.md#armenia-armenian'>:armenia:</a> <a title='Belarusian' href='docs/datasets.md#belarus-belarusian'>:belarus:</a> <a title='Bosnian' href='docs/datasets.md#bosnia_herzegovina-bosnian'>:bosnia_herzegovina:</a> <a title='Bulgarian' href='docs/datasets.md#bulgaria-bulgarian'>:bulgaria:</a> <a title='Croatian' href='docs/datasets.md#croatia-croatian'>:croatia:</a> <a title='Czech' href='docs/datasets.md#czech_republic-czech'>:czech_republic:</a> <a title='Estonian' href='docs/datasets.md#estonia-estonian'>:estonia:</a> <a title='Georgian' href='docs/datasets.md#georgia-georgian'>:georgia:</a> <a title='Hungarian' href='docs/datasets.md#hungary-hungarian'>:hungary:</a> <a title='Kazakh' href='docs/datasets.md#kazakhstan-kazakh'>:kazakhstan:</a> <a title='Latvian' href='docs/datasets.md#latvia-latvian'>:latvia:</a> <a title='Lithuanian' href='docs/datasets.md#lithuania-lithuanian'>:lithuania:</a> <a title='Macedonian' href='docs/datasets.md#macedonia-macedonian'>:macedonia:</a> <a title='Moldovan' href='docs/datasets.md#moldova-moldovan'>:moldova:</a> <a title='Montenegrin' href='docs/datasets.md#montenegro-montenegrin'>:montenegro:</a> <a title='Polish' href='docs/datasets.md#poland-polish'>:poland:</a> <a title='Romanian' href='docs/datasets.md#romania-romanian'>:romania:</a> <a title='Russian' href='docs/datasets.md#ru-russian'>:ru:</a> <a title='Serbian' href='docs/datasets.md#serbia-serbian'>:serbia:</a> <a title='Slovakian' href='docs/datasets.md#slovakia-slovakian'>:slovakia:</a> <a title='Slovenian' href='docs/datasets.md#slovenia-slovenian'>:slovenia:</a> <a title='Ukrainian' href='docs/datasets.md#ukraine-ukrainian'>:ukraine:</a>

The models

Browse the models index or select your language of interest:

<a title='Albanian' href='docs/models.md#albania-albanian'>:albania:</a> <a title='Armenian' href='docs/models.md#armenia-armenian'>:armenia:</a> <a title='Belarusian' href='docs/models.md#belarus-belarusian'>:belarus:</a> <a title='Bosnian' href='docs/models.md#bosnia_herzegovina-bosnian'>:bosnia_herzegovina:</a> <a title='Bulgarian' href='docs/models.md#bulgaria-bulgarian'>:bulgaria:</a> <a title='Croatian' href='docs/models.md#croatia-croatian'>:croatia:</a> <a title='Czech' href='docs/models.md#czech_republic-czech'>:czech_republic:</a> <a title='Estonian' href='docs/models.md#estonia-estonian'>:estonia:</a> <a title='Georgian' href='docs/models.md#georgia-georgian'>:georgia:</a> <a title='Hungarian' href='docs/models.md#hungary-hungarian'>:hungary:</a> <a title='Kazakh' href='docs/models.md#kazakhstan-kazakh'>:kazakhstan:</a> <a title='Latvian' href='docs/models.md#latvia-latvian'>:latvia:</a> <a title='Lithuanian' href='docs/models.md#lithuania-lithuanian'>:lithuania:</a> <a title='Macedonian' href='docs/models.md#macedonia-macedonian'>:macedonia:</a> <a title='Moldovan' href='docs/models.md#moldova-moldovan'>:moldova:</a> <a title='Montenegrin' href='docs/models.md#montenegro-montenegrin'>:montenegro:</a> <a title='Polish' href='docs/models.md#poland-polish'>:poland:</a> <a title='Romanian' href='docs/models.md#romania-romanian'>:romania:</a> <a title='Russian' href='docs/models.md#ru-russian'>:ru:</a> <a title='Serbian' href='docs/models.md#serbia-serbian'>:serbia:</a> <a title='Slovakian' href='docs/models.md#slovakia-slovakian'>:slovakia:</a> <a title='Slovenian' href='docs/models.md#slovenia-slovenian'>:slovenia:</a> <a title='Ukrainian' href='docs/models.md#ukraine-ukrainian'>:ukraine:</a>

Contribution

Feel free to contribute. The details are in our contributing guidelines.

Citation

@misc{tikhonov2021eenlp,
      title={EENLP: Cross-lingual Eastern European NLP Index}, 
      author={Alexey Tikhonov and Alex Malkhasov and Andrey Manoshin and George Dima and Réka Cserháti and Md. Sadek Hossain Asif and Matt Sárdi},
      year={2021},
      eprint={2108.02605},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Licensing

This index is licensed under Apache-2.0 License. However, please, note that each resource has individual licensing properties.

Development

This is mostly internal documentation for us.

See developing this repository.