Home

Awesome

Summary

The Galician-TreeGal is a treebank for Galician developed at LyS Group (Universidade da Coruña) and at CiTIUS (Universidade de Santiago de Compostela).

Introduction

The resource derives from a subset (called xeral) of the XIADA corpus (v2.6), created at the Centro Ramón Piñeiro para a Investigación en Humanidades (http://corpus.cirp.es/xiada/).

All the information except the syntactic one was semi-automatically converted to UD from the original resource. The dependency labels were assigned using cross-lingual parsing techniques, and then manually corrected by a linguist (see the references for more information). At the end of this process, several corrections were carried out in order to agree with the UD guidelines.

Galician-TreeGal v0.42 contains 1000 sentences of the xeral corpus (~25k tokens), and it is divided 60-40 splits (train-test).

Differences from the generic Galician guidelines

Morphology

Features

For more information, see Garcia, Marcos (2016), Universal Dependencies Guidelines for the Galician-TreeGal Treebank (note that this document follows old UD guidelines).

Syntax

Acknowledgments

Stats

Issues

Changelog

<pre> === Machine-readable metadata (DO NOT REMOVE!) ================================ Documentation status: partial Data source: manual Data available since: UD v1.4 License: LGPL-LR Genre: news Includes text: yes Lemmas: manual native UPOS: manual native XPOS: manual native Features: converted with corrections Relations: manual native Contributors: Garcia, Marcos; Sánchez-Rodríguez, Xulia Contributing: elsewhere Contact: marcos.garcia.gonzalez@usc.gal =============================================================================== </pre>