


Spanish data from the AnCora corpus.


The original annotation was done in a constituency framework as a part of the AnCora project at the University of Barcelona. It was converted to dependencies and used in the CoNLL 2009 shared task. The CoNLL 2009 version was later converted to HamleDT and to Universal Dependencies.

The GNU license is inherited from the original dataset, downloaded from the AnCora website. Any license-related questions have to be directed to the original data providers at the University of Barcelona (that is, not to the UD contact address listed at the end of this README file).

Coreference and Entities

The MISC column contains annotation of named entities and coreference, converted from the original XML files of AnCora-CO and merged with the UD-style morpho- syntactic annotation. The format of these annotations is described in Nedoluzhko et al. (2021): Coreference meets Universal Dependencies – a pilot experiment on harmonizing coreference datasets for 11 languages.


The following paper must be cited when using this corpus:

In addition, the following paper must be cited if coreference information (attributes entity, coreftype, corefsubtype, homophoricDD or entityref) is used:

Additionally, the following paper must be cited when argumental attributes in "sn" or "grup.nom" (attributes func, arg, tem or lexicalid) are used:


=== Machine-readable metadata ================================================= Data available since: UD v1.3 License: CC BY 4.0 Includes text: yes Genre: news Lemmas: converted from manual UPOS: converted from manual XPOS: not available Features: converted from manual Relations: converted from manual Contributors: Martínez Alonso, Héctor; Zeman, Daniel Contributing: here Contact: zeman@ufal.mff.cuni.cz