Awesome
idn-tagged-corpus
Manually Tagged Indonesian Corpus
README.md versi Bahasa
Format Data
Korpus ini menggunakan format tab-separated file (.tsv). Setiap baris berisi token beserta part-of-speech tag dari token tersebut yang terpisahkan oleh satu karakter tab(\t). Antar kalimat dipisahkan oleh satu baris kosong.
README.md English version
Data Format
Each line consists of token with its respective part-of-speech tag separated by a tab character(\t). There is an empty line between sentences.
Authors
- Ruli Manurung
- Arawinda Dinakaramani
- Fam Rashel
- Andry Luthfi
Page
For publication and more details about this work, please visit http://bahasa.cs.ui.ac.id/postag/corpus
License
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/.
UPDATE
This work was carried out under the framework of a research project done at IR-NLP Lab. As there is an initiative to bring together and document all the works done in the IR-NLP Lab, please refer to the IR-NLP Lab's repository for official updates and future versions of this work. This repository will still be available as a personal repository.