Home

Awesome

Summary

UD_Portuguese-PetroGold is a fully revised treebank which consists of academic texts from the oil & gas domain in Brazilian Portuguese.

Introduction

UD_Portuguese-PetroGold is a fully revised treebank which consists of academic texts from the oil & gas domain in Brazilian Portuguese processed in full: only elements such as summary, abstract, appendices and bibliographic references were excluded, as well as figures, graphs, formulas and tables. The annotation was manually revised from automatic annotation by a team of linguists from PUC-Rio (Brazil).

The corpus was created as part of the Petrolês Project (http://petroles.puc-rio.ai), a partnership between Petrobras Research and Development Center (CENPES) and Applied Computational Intelligence Lab (PUC-Rio/ICA). Petrolês aims to promote research initiatives related to Natural Language Processing and Computational Linguistics for the Portuguese Language.

Acknowledgments

We want to thank everyone from ICA/PUC-Rio who assisted in the process of gathering the text from originally PDF files. We also want to thank Petrobras researchers and geoscientists for making the Petrolês corpus publicly available, for their technical assistance and funding.

How to contribute

Changes should be made via pull request directly to not-to-release/petrogold.conllu in the dev branch.

How to cite

@inproceedings{souza2022polishing,
  title={Polishing the gold--how much revision do we need in treebanks?},
  author={de{ }Souza, Elvis and Freitas, Cl{\'a}udia},
  booktitle={Procedings of the Universal Dependencies Brazilian Festival},
  pages={1--11},
  year={2022}
}

References

Changelog

<pre> === Machine-readable metadata (DO NOT REMOVE!) ================================ Data available since: UD v2.11 License: CC BY-SA 4.0 Includes text: yes Genre: academic Lemmas: manual native UPOS: manual native XPOS: not available Features: manual native Relations: manual native Contributors: de Souza, Elvis; Freitas, Cláudia; Silveira, Aline; Cavalcanti, Tatiana; Castro, Maria Clara; Evelyn, Wograine Contributing: here source Contact: elvis.desouza99@gmail.com =============================================================================== </pre>