Home

Awesome

Datasets

This page shares Ziqi Zhang's research datasets. Please follow the links below to find the datasets you need. All data are distributed under the Creative Commons CC-BY Licence, unless otherwise stated. Please also read the 'readme' file downloaded with each dataset. I would be grateful if you cite our work (see below) when using data shared on this site. Thanks.

NOTE: you are recommended NOT to check out the entire respository, but nagivate to specific dataset and download them there. This is because some datasets can be very large but maybe irrelevant to your research.

<a name="hate">+ Hate Speech</a>

If you use the RM dataset within this collection, please cite: Zhang, Z., Robinson, D., Tepper, J. (2018). Detecting Hate Speech on Twitter Using a Convolution-GRU Based Deep Neural Network. Proceedings of the 2018 Extended Semantic Web Conference. For other datasets included in the collection please give credits to their original distributors.

NOTE Due to a recent change in our University's research data sharing policy, we can no longer share the 'RM' dataset (refugees and muslim) described in this paper.

Description: dataset used for evaluating hate speech on Twitter. <br/> Keywords: hate speech, Twitter, social media, abusive language, classification <br/> Related code/project: chase <br/> Data folder: /hate speech

<a name="ontomap">+ Ontology Mapping</a>

If you use this dataset, please cite: Z. Zhang, A. Gentile, E. Blomqvist, I. Augenstein, F. Ciravegna. 2016. An unsupervised data driven method to discover equivalent relations in large Linked Datasets. Semantic web 8 (2), 197-223

Description: dataset used for evaluating mapping relations collected from DBpedia <br/> Keywords: ontology mapping, ontology alignment, DBpedia <br/> Related Wikipedia page: Ontology alignment <br/> Related code/project: LODIE <br/> Data folder: /ontology mapping

<a name="procknow">+ Procedural knowledge</a>

If you use this dataset, please cite: Z. Zhang, P. Webster, V. Uren, A. Varga, F. Ciravegna. 2012. Automatically Extracting Procedural Knowledge from Instructional Texts using Natural Language Processing. LREC 2012 (520-527), 520-527

Description: dataset containing annotated instructions that describe procedures (e.g., how to cook a recipe, how to mount snow chain on wheels etc. <br/> Keywords: procedure, instruction, annotation, classification<br/> Related Wikipedia page: Procedural knowledge <br/> Data folder: /procedural knowledge

<a name="scholarlydata">+ Scholarly Data Linking</a>

If you use this dataset, please cite: Z. Zhang, A. N. Nuzzolese, and A. L. Gentile. Entity Deduplication on ScholarlyData. In Proceedings of ESWC 2017, pp 85-100, Lecture Notes in Computer Science. Springer, 2017.

Description: dataset used for evaluating author name and organisation linking in scholarly data <br/> Keywords: author name disambiguation, link discovery, entity linking, entity disambiguation <br/> Related Wikipedia page: Author name disambiguation <br/> Related code/project: scholarlydata <br/> Data folder: /scholarly data linking

<a name="ate">+ Terminology Extraction</a>

If you use this dataset, please cite: Z. Zhang, J. Gao, F. Ciravegna. 2018. SemRe-Rank: Improving Automatic Term Extraction By Incorporating Semantic Relatedness With Personalised PageRank. Accepted at ACM Transactions of Knowledge Discovery from Data

Description: dataset used for evaluating automatic term extraction/recognition. <br/> Keywords: automatic term extraction or recognition, ATE, ATR, text mining, terminology, thesaurus, glossary, ontology engineering <br/> Related Wikipedia page: Terminology extraction <br/> Related code/project: SemRe-Rank <br/> Data folder: /terminology extraction

<a name="webtable">+ Webtable Entity Linking</a>

If you use this dataset, please cite: Zhang, Z. 2017. Effective and efficient semantic table interpretation using tableminer+. Semantic Web 8 (6), 921-957

Description: dataset used for evaluating entity linking in webtables, and also table header classification and relation annotation; contains 16,000+ annotated relational tables that can be used for many studies related to webtables. <br/> Related Wikipedia page: Entity linking <br/> Keywords: webtable, web table, entity linking, classification, relation extraction<br/> Related code/project: sti <br/> Data folder: /webtable entity linking