Home

Awesome

Summary

UD_Italian-ParTUT is a conversion of a multilingual parallel treebank developed at the University of Turin, and consisting of a variety of text genres, including talks, legal texts and Wikipedia articles, among others.

Introduction

UD_Italian-ParTUT data is derived from the already-existing parallel treebank Par(allel)TUT.

ParTUT is a morpho-syntactically annotated collection of Italian/French/English parallel sentences, which includes texts from different sources and representing different genres and domains, released in several formats.

ParTUT comprises approximately 167,000 tokens, with an average amount of 2,100 sentences per language. The texts of the collection currently available were gathered from a large number of sources and domains:

ParTUT data can be downloaded here and here.

NOTE: While the Italian section of ParTUT is already included in UD_Italian, UD_Italian-ParTUT comprises just those sentences having a 1:1 correspondence with their English and French counterparts.

Acknowledgements

We are deeply grateful to Project Syndicate© for letting us download and exploit their articles as text material, under the terms of educational use.

Corpus splitting

Since version 2.1, the corpus has been re-partitioned so as to avoid overlapping sentences with UD_Italian. The treebank has thus been randomly split as follows:

In order to preserve the 1:1 correspondence among the three language sections, all of them were partitioned in the same way; therefore the same sentences, in the same order, are found in the training, development and test set of UD_French-ParTUT and UD_English-ParTUT as well.

Basic statistics

References

Changelog

2019-11-15 v2.5

2019-05-15 v2.4

2018-11-15 v2.3

2018-04-15 v2.2

2017-11-15 v2.1

2017-03-01 v2

=== Machine-readable metadata ================================================

Data available since: UD v2.0
License: CC BY-NC-SA 4.0
Includes text: yes
Genre: legal news wiki
Lemmas: converted with corrections
UPOS: converted with corrections
XPOS: converted with corrections
Features: converted with corrections
Relations: converted with corrections
Contributors: Bosco, Cristina; Sanguinetti, Manuela
Contributing: elsewhere
Contact: msanguin@di.unito.it

===============================================================================