Home

Awesome

make-ipinyou-data

This project is forked from make-ipinyou-data, slightly changing the data format and feature alignment for future use.

Step 0

Go to data.computational-advertising.org to download ipinyou.contest.dataset.zip.

The raw data of iPinYou (ipinyou.contest.dataset.zip) can be downloaded from here.

Go to here to download the data.

Unzip it and get the folder ipinyou.contest.dataset.

Step 1

Update the soft link for the folder ipinyou.contest.dataset in original-data.

XXX/make-ipinyou-data/original-data$ ln -sfn ~/Data/ipinyou.contest.dataset ipinyou.contest.dataset

Under make-ipinyou-data/original-data/ipinyou.contest.dataset there should be the original dataset files like this:

weinan@ZHANG:~/Project/make-ipinyou-data/original-data/ipinyou.contest.dataset$ ls
algo.submission.demo.tar.bz2  README         testing2nd   training3rd
city.cn.txt                   region.cn.txt  testing3rd   user.profile.tags.cn.txt
city.en.txt                   region.en.txt  training1st  user.profile.tags.en.txt
files.md5                     testing1st     training2nd

You do not need to further unzip the packages in the subfolders.

Step 2

Under make-ipinyou-data folder, just run make all.

After the program finished, the total size of the folder will be 14G. The files under make-ipinyou-data should be like this:

XXX/make-ipinyou-data$ ls
1458  2261  2997  3386  3476  LICENSE   mkyxdata.sh   python     schema.txt
2259  2821  3358  3427  all   Makefile  original-data  README.md

Normally, we only do experiment for each campaign (e.g. 1458). all is just the merge of all the campaigns. You can delete all if you think it is unuseful in your experiment.

Use of the data

We use campaign 1458 as example here.

XXX/make-ipinyou-data/1458$ ls
featindex.txt  test.log.txt  test.txt  train.log.txt  train.txt

For any questions, please report the issues or contact Yanru Qu.