Awesome

NLP for Oriya

This repository contains State of the Art Language models and Classifier for Oriya, which is spoken in the Indian state of Odisha.

Architecture/Dataset	Oriya Wikipedia Articles
ULMFiT	26.57
TransformerXL	26.81

Dataset	Accuracy	MCC	Notebook to Reproduce results
IndicNLP News Article Classification Dataset - Oriya	98.83	98.44	Link

Architecture	Visualization
ULMFiT	Embeddings projection
TransformerXL	Embeddings projection

Download pretrained Language Models from here

Trained tokenizer using Google's sentencepiece

Download the trained model and vocabulary from here