Home

Awesome

Urban Dictionary Data Collector

Script used to download the entire Urban Dictionary dataset. Actual dataset is pretty large, so I've split it into four Google Fusion Tables:

Downloading the Data Yourself

If you want to collect your own sample from urban dictionary, this repo includes a few scripts that can help you do just that.

download.js

Main entry downloader. Requires a word list to download entries for. Try grabbing the one from here.

$ npm install

# Pass in a word list file
$ node download.js data/a.txt

This will attempt to download the first 10 definitions for each word in the list into a file data/a.txt. Data is stored in NeDB databases, but you should be able to easily update download.js to output whatever format you need.

gen_csv.py

Simple python script used to turn NeDB dataset from download.js into CSV:

$ python3 gen_csv.py data.db out.csv

gen_md.js

Simple Javascript script used to generate markdown for entries. Used for character level machine learning of urban dictionary entries.

$ node gen_md.js data.db urban.md

Notes

This is for research purposes. I'm not affiliated with Urban Dictionary.