Home

Awesome

Summary

A treebank of Scottish Gaelic based on the Annotated Reference Corpus Of Scottish Gaelic (ARCOSG).

Introduction

The Scottish Gaelic treebank takes data from ARCOSG, the Annotated Reference Corpus of Scottish Gaelic (Lamb et al. 2016) with the annotation scheme based on that in the Irish UD treebank. Full bibliographic details are to be had there.

It contains eight subcorpora of a varying number of original files, each of approximately 1000 tokens. All files listed below are in the training set unless they are explicitly marked as being in test or dev. In the ARCOSG documentation the names of contributors are largely given in Gaelic, which I have kept and glossed with their names in English where they will be familiar to non-Gaelic speakers.

See https://universaldependencies.org/gd/index.html for detailed linguistic documentation.

Acknowledgments

We wish to thank all of the contributors to ARCOSG and fellow Celtic language UD developers Teresa Lynn, Kevin Scannell, Johannes Heinecke and Fran Tyers.

References

Changelog

<pre> === Machine-readable metadata (DO NOT REMOVE!) ================================ Data available since: UD v2.5 License: CC BY-SA 4.0 Includes text: yes Genre: nonfiction fiction news spoken Lemmas: converted from manual UPOS: converted from manual XPOS: manual native Features: converted from manual Relations: converted from manual Contributors: Batchelor, Colin Contributing: here Contact: colin.r.batchelor@googlemail.com =============================================================================== </pre>