Awesome
NLP for Telugu
This repository contains State of the Art Language models and Classifier for Telugu language(spoken in Indian sub-continent)
The models trained here have been used in Natural Language Toolkit for Indic Languages (iNLTK)
Dataset
Created as part of this project
Results
Language Model Perplexity
Architecture/Dataset | Telugu Wikipedia Articles |
---|---|
ULMFiT | 27.47 |
TransformerXL | 29.44 |
Classification Metrics
ULMFiT
Dataset | Accuracy | Kappa Score |
---|---|---|
Telugu News Articles | 95.4 | 93.8 |
Telugu News Articles - Andhra Jyoti | 92.09 |
Visualizations
Embedding Space
Architecture | Visualization |
---|---|
ULMFiT | Embeddings projection |
TransformerXL | Embeddings projection |
Pretrained Language Model
Download pretrained Language Model from here
Classifier
Download classifier from here
Tokenizer
Trained tokenizer using Google's sentencepiece
Download the trained model and vocabulary from here