Home

Awesome

Datasets @ APEX Lab

ANNOUNCEMENT

This dataset is a standard version of indexed iPinyou/criteo dataset (threshold chosen) ONLY IN APEX LAB.

Usage

At the very beginning, run command echo NASPATH > .naspath where NASPATH here is the mounted path of NAS.

Python package apexdsets

Use

import sys
sys.path.append("REPOSITORY")
import apexdsets

to import package apexdsets, where "REPOSITORY" here represents the path of this repository. <br/>

Method: datapath(dataname)

Return the absolute file path given the name of the dataset (dataname). <br/>

Class: apexdsets.CTRLoader

Method: __init__(datapath)

Constructor. Example:

loader = CTRLoader(datapath("ipinyou"))

Method: meta(key)

Args:

Returns: A python object (usually a list) for the meta information of key key. <br/>

Method: data_generator(dsetname, batch_size, unified_index=True)

To get a generator, yielding data in pair (inputs, labels).

Args:

Returns: A generator for data packed in batches. <br/>

Property: unified_size

A value representing the total number of categories across all categorical fields

iPinyou

Threshold chosen: 5

Fields:

Original dataset includes:

Positive samples over total samples: 0.075%.

criteo

Threshold chosen: 20

Use day_6.gz to day_12.gz as training set, day_13.gz as test set.

Negative-down sampling used, positive samples over total samples: 50%.