

150k+ unique Urdu words collections

Consists of text files containing 150k+ Urdu words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion / Embedding networks / Tagging

I pulled out the words into a simple new-line-delimited text file. Which is more useful when building apps or importing into databases etc.

NER Labels

I have added words for labelling Named Entity Recognition(NER) Data. These labels contain words related to different categories like Persons, Locations, Organizations and Dates etc. These words give a good starting point for labelling NER data. Below are the files containing different label words.

All contributions are more than welcomed. Contributions may close an issue, fix a bug (reported or not reported), improve the existing code and so on. If you would like to add a word or a new set of words, send a PR.

Have a bug or a feature request? If you wish to remove or update some of the words, please file an issue first before sending a PR on the repo. [please open a new issue]


Special thanks to everyone who contributed to getting the Urdu hack to the current state. Thanks to <a href="http://cle.org.pk/software/ling_resources/wordlist.htm">Center for Language Engineering</a> for providing the word list.

