Awesome
idn-treebank
=================
Manually Tagged Indonesian Corpus
README.md versi Bahasa
Format Data
Setiap baris berisi parse-tree dari kalimat bahasa Indonesia. Korpus ini terdiri dari dua jenis berkas, yang satu disertai ID dan yang lainnya RAW. Setiap parse-tree pada berkas dengan ID selalu diawali dengan id kalimat yang dipisahkan sebuah karakter tab (\t).
README.md English version
Data Format
Each line consists of parse-tree from Indonesian sentence. The corpus consists of two types of file, the one with ID and the other is RAW. Each parse-tree inside the file with ID has a sentence ID in the beginning of the line separated by a single tab character (\t).
Authors
- Ruli Manurung
- Arawinda Dinakaramani
- Fam Rashel
- Andry Luthfi
Page
For more details about this work, please visit http://bahasa.cs.ui.ac.id/treebank/corpus
License
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/.
UPDATE
This work was carried out under the framework of a research project done at IR-NLP Lab. As there is an initiative to bring together and document all the works done in the IR-NLP Lab, please refer to the IR-NLP Lab's repository for official updates and future versions of this work. This repository will still be available as a personal repository.