Home

Awesome

idn-tagged-corpus

Manually Tagged Indonesian Corpus

README.md versi Bahasa

Format Data

Korpus ini menggunakan format tab-separated file (.tsv). Setiap baris berisi token beserta part-of-speech tag dari token tersebut yang terpisahkan oleh satu karakter tab(\t). Antar kalimat dipisahkan oleh satu baris kosong.

README.md English version

Data Format

Each line consists of token with its respective part-of-speech tag separated by a tab character(\t). There is an empty line between sentences.

Authors

Page

For publication and more details about this work, please visit http://bahasa.cs.ui.ac.id/postag/corpus

License

This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/.

UPDATE

This work was carried out under the framework of a research project done at IR-NLP Lab. As there is an initiative to bring together and document all the works done in the IR-NLP Lab, please refer to the IR-NLP Lab's repository for official updates and future versions of this work. This repository will still be available as a personal repository.