Home

Awesome

word_sense_disambigation_corpora

##SemCor and Masc documents annotated with NOAD word senses.

This package includes SemCor and Masc documents annotated with NOAD (New Oxford American Dictionary) word senses.

The files are in XML format with DTD simple-wsd-doc.dtd.

We use the part-of-speech tags specified as follows:

The break level of a word indicates how it was separated from the previous token in the text:

##Paper

These annotated documents are used to evaluate our word sense disambiguation systems in "Semi-supervised Word Sense Disambiguation with Neural Models", Dayu Yuan, Julian Richardson, Ryan Doherty, Colin Evans and Eric Altendorf, Proceedings COLING 2016

##Word sense mappings

This package also includes maps from NOAD word senses to WordNet senses.

  1. manual_map.txt is a manually created map for a few word senses.
  2. algorithmic_map.txt is built algorithmically.

Both files have the same format: NOAD_word_sense\tWordNet_word_senses separated by comma

The above data is annoated with a crowdsourcing platform. The data is not guaranteed to be 100% accurate.

Please contact Dayu Yuan (dayuyuan@google.com) for classifications or further information.