Home

Awesome

The scripts in this directory can be used to manipulate the CQADupStack data, downloadable from http://nlp.cis.unimelb.edu.au/resources/cqadupstack/.

CQADupStack contains 12 Stackexchange (http://stackexchange.com/) subforums which have been preprocessed as described in the paper mentioned below. The StackExchange data dump that forms the basis of this set is the version released on September 26, 2014.

query_cqadupstack.py enables easy access to all the different fields in CQADupStack. It can be used to split the data into pre-defined retrieval or classification splits, and it can be used to evaluate the output of your system, using one of several evaluation metrics available.

Please cite the following paper when making use of CQADupStack:

@inproceedings{hoogeveen2015, <br /> author = {Hoogeveen, Doris and Verspoor, Karin M. and Baldwin, Timothy}, <br /> title = {CQADupStack: A Benchmark Data Set for Community Question-Answering Research}, <br /> booktitle = {Proceedings of the 20th Australasian Document Computing Symposium (ADCS)}, <br /> series = {ADCS '15}, <br /> year = {2015}, <br /> isbn = {978-1-4503-4040-3}, <br /> location = {Parramatta, NSW, Australia}, <br /> pages = {3:1--3:8}, <br /> articleno = {3}, <br /> numpages = {8}, <br /> url = {http://doi.acm.org/10.1145/2838931.2838934}, <br /> doi = {10.1145/2838931.2838934}, <br /> acmid = {2838934}, <br /> publisher = {ACM}, <br /> address = {New York, NY, USA}, <br /> }

For licensing information please see the LICENCE file.

For more information on the structure of the files in the data set, please see the README file that comes with the data. The README file you are reading now contains information on the query script (query_cqadupstack.py) only.

query_cqadupstack.py contains a main function called load_subforum(). It has one argument: a StackExchange subforum.zip file from CQADupStack. load_subforum() uses this file to create a 'Subforum' object and returns this. Alternatively, you can make a subforum object directly by calling the class 'Subforum' yourself. Just like load_subforum() it needs a zipped subforum file as its only argument.

Subforum objects can be queried using the following methods: (examples on how to use them can be found at the end of this file)

SPLIT METHODS

GENERAL POST/QUESTION METHODS

PARTICULAR POST/QUESTION METHODS

ANSWER METHODS

COMMENT METHODS

USER METHODS

CLEANING/PREPROCESSING METHODS

EVALUATION METHODS FOR RETRIEVAL

EVALUATION METHODS FOR CLASSIFICATION

For questions please contact Doris Hoogeveen at doris dot hoogeveen at gmail.