Awesome
Urban Dictionary Data Collector
Script used to download the entire Urban Dictionary dataset. Actual dataset is pretty large, so I've split it into four Google Fusion Tables:
Downloading the Data Yourself
If you want to collect your own sample from urban dictionary, this repo includes a few scripts that can help you do just that.
download.js
Main entry downloader. Requires a word list to download entries for. Try grabbing the one from here.
$ npm install
# Pass in a word list file
$ node download.js data/a.txt
This will attempt to download the first 10 definitions for each word in the list into a file data/a.txt
. Data is stored in NeDB databases, but you should be able to easily update download.js
to output whatever format you need.
gen_csv.py
Simple python script used to turn NeDB dataset from download.js
into CSV:
$ python3 gen_csv.py data.db out.csv
gen_md.js
Simple Javascript script used to generate markdown for entries. Used for character level machine learning of urban dictionary entries.
$ node gen_md.js data.db urban.md
Notes
This is for research purposes. I'm not affiliated with Urban Dictionary.