Home

Awesome

Danish-Similarity-Dataset

The Danish similarity dataset is a gold standard resource for evaluation of Danish word embedding models. The dataset consists of 99 word pairs rated by 38 human judges according to their semantic similarity, i.e. the extend to which the two words are similar in meaning, in a normalized 0-1 range.

Note that this dataset provides a way of measuring similarity rather than relatedness/association.

Description of files included in this material:

(Note: In both of the included files, rows correspond to items (word pairs) and columns to properties of each item.)

Author: Nina Schneidermann

Cite: Towards a Gold Standard for Evaluating Danish Word Embeddings Schneidermann, N., Hvingelby, R. & Pedersen, Bolette Sandford, 2020, Proceedings of the 12th Language Resources and Evaluation Conference, Marseille 2020. pp. 4756-4765
http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.585.pdf

Contact: Bolette Sandford Pedersen (bspedersen@hum.ku.dk)