Home

Awesome

Awadhi

Repository for all codes, data and resources on Awadhi language that is being developed at the Institute. Currently, it contains all the data generated as part of the M.Phil. dissertation of Mr. Abdul Basit. It contains a raw corpus of approximately 70,000 tokens and a POS-annotated corpus of approximately 20,000 tokens.

The raw corpus is in the directory called 'source'. And the annotated corpus is in the directory called 'annotation'. The annotations are in CONLL 2000 format.

The original dissertation and the complete tagset is in the directory called 'publications'.

Please use the following to cite the corpus:

Basit, Abdul. 2017. A POS Annotated Corpus of Awadhi Language. Unpublished M.Phil. Dissertation, Dr. Bhim Rao Ambedkar University, Agra

For any queries, offers of collaboration and others, please send an email to linguistics[dot]kmi[at]gmail[dot]com