Home

Awesome

SemDaX corpora

Sense-annotated corpora from the Semantic Processing Across Domains project. This project pools together the data from several articles related to sense annotation for Danish corpora.

This repository contains three main folders:

  1. supersenses constains the all-words supersense-annotated corpus. 2. It contains a folder official_distribution with the files used for training and testing in the noted articles, and a folder all_annotations with all the annotations generated by each annotator, previous to adjucation. 3. It is made up of six domains from the ClarinDK corpus plus the test section of the Danish Dependency Treebank (DDT).
  2. lexicalsample constains the lexical-sample annotations for a regular, dictionary based sense inventory, and for a supersense-clustered inventory.
  3. active_learning constains the resulting annotation of "Active Learning for Sense Annotation".

The following publications make use or document the construction of this resource.

@inproceedings{olsenetal2015,
  title={Coarse-Grained Sense Annotation of Danish across Textual Domains},
  author={Olsen, Sussi and Pedersen, Bolette Sandford Mart{\i}nez Alonso, H{\'e}ctor and Johannsen, Anders},
  booktitle={Proceedings of the workshop on Semantic resources and semantic annotation for Natural Language Processing and the Digital Humanities at NODALIDA},
  pages={37},
  year={2015}
}

@inproceedings{martinezalonsoetal2015supersenses,
  title={Supersense tagging for Danish},
  author={Mart{\i}nez Alonso, H{\'e}ctor and Johannsen, Anders and Olsen, Sussi and Nimb, Sanni and Sørensen, Nicolai Hartvig and Braasch, Anna and Søgaard, Anders and Pedersen, Bolette Sandford},
  booktitle={Nordic Conference of Computational Linguistics NODALIDA 2015},
  pages={21},
  year={2015}
}

@inproceedings{martinezalonsoetal2016,
  title={An empirically grounded expansion of the supersense inventory},
  author={Mart{\i}nez Alonso, H{\'e}ctor and Johannsen, Anders and Olsen, Sussi and Nimb, Sanni and Pedersen, Bolette Sandford},
  booktitle={Global Wordnet Conference 2016 (to appear)},
}


  @inproceedings{martinezalonsoetal2015active,
  title={Active learning for sense annotation},
  author={ Mart{\i}nez Alonso, H{\'e}ctor and  Plank, Barbara and Johannsen, Anders and  S{\o}gaard,  Anders},
  booktitle={Nordic Conference of Computational Linguistics NODALIDA 2015},
  pages={245},
  year={2015}
}