Home

Awesome

PyStack: Powerful Python Toolkit for Analyzing Stack Exchange Sites

This tooklit provides several useful Python scripts for processing stack exchange data dump

Requirements

PyStack

Usage

Parameters of pystack.py

Outputs of pystack.py

Process Posts.xml

Usage

python process_posts.py --input ../dataset/ai/Posts.xml

OR

python pystack.py --input ../dataset/ai/ --task Posts

Output

Process PostLinks.xml

Usage

python process_postlinks.py --input ../dataset/ai/PostLinks.xml

OR

python pystack.py --input ../dataset/ai/ --task PostLinks

Output

Process Votes.xml

Usage

python process_votes.py --input ../dataset/ai/Votes.xml

OR

python pystack.py --input ../dataset/ai/ --task Votes

Output

Process Badges.xml

Usage

python process_badges.py --input ../dataset/ai/Badges.xml

OR

python pystack.py --input ../dataset/ai/ --task Badges

Output

Process Comments.xml

Usage

python process_comments.py --input ../dataset/ai/Comments.xml

OR

python pystack.py --input ../dataset/ai/ --task Comments

Output

Questions

How to unzip a *.7z file

Discuss

This code is written for research. It aims to help you start to do your analysi on Stack Exchange Sites without the dirty preprocessing work.

Feel free to post any questions or comments.

Citation

If you use this code, please consider to cite QDEE: Question Difficulty and Expertise Estimation in Community Question Answering Sites and ColdRoute: effective routing of cold questions in stack exchange sites