Home

Awesome

Summary

CINTIL-UDep is a dependency bank of Portuguese that is treebanked with Universal Dependencies. It contains over 38K annotated sentences (and 476K tokens), of mostly newspaper text.

Introduction

CINTIL-UDep is a dependency bank of Portuguese with 38,400 sentences (and nearly 476,000 tokens), that is treebanked with Universal Dependencies (UD).

CINTIL-UDep was obtained through the merger and automatic conversion to UD of two non-UD dependency banks, CINTIL-DependencyBank and CINTIL DependencyBank PREMIUM.

For more details, refer to (Branco et al., 2022), the canonical reference given below.

Acknowledgments

This work was partly supported by PORTULAN CLARIN Research Infrastructure for the Science and Technology of Language, funded by Lisboa 2020, Alentejo 2020, and FCT-Fundação para a Ciência e Tecnologia under the grant PINFRA/22117/2016; by FCT-Fundação para a Ciência e Tecnologia through the Portuguese project DP4LT (PTDC/EEI-SII/1940/2012); and by the European Commission through the European project QTLeap (EC/FP7/610516).

Several people have contributed to the creation of this treebank, from devising annotation guidelines and developing supporting tools, to manually annotating the text and data curation: Andreia Querido, António Branco, Catarina Carvalheiro, Clara Pinto, Cláudia Martins, Francisco Costa, Joana Ramos, João Silva, Mariana Avelãs, Marisa Campos, Rita Carvalho, Rita Pereira, Sara Silveira, Sérgio Castro, and Sílvia Pereira.

References

CINTIL-UDep is described in the following article

which should be used as its canonical citation, and which interested users are referred to for detailed information.

The source treebanks for CINTIL-UDep dependency bank were CINTIL-DependencyBank and CINTIL DependencyBank PREMIUM, which were initially manually annotated with different guidelines, as described in:

Other relevant references are:

Changelog

<pre> === Machine-readable metadata (DO NOT REMOVE!) ================================ Data available since: UD v2.11 License: CC BY-NC-ND 4.0 Includes text: yes Genre: news fiction nonfiction grammar-examples Lemmas: converted from manual UPOS: converted from manual XPOS: manual native Features: converted from manual Relations: converted from manual Contributors: Avelãs, Mariana; Branco, António; Campos, Marisa; Carvalheiro, Catarina; Carvalho, Rita; Castro, Sérgio; Costa, Francisco; Martins, Cláudia; Pereira, Rita; Pereira, Sílvia; Pinto, Clara; Querido, Andreia; Ramos, Joana; Silva, João; Silveira, Sara Contributing: elsewhere Contact: jrsilva@fc.ul.pt =============================================================================== </pre>