Home

Awesome

Data Repository for PyGOD

The statistics of the available dataset (#Con. means the number of contextual outliers, while #Strct. means the number of structural outliers. The number of outliers is slightly less than the sum of two types of outliers because of the intersection between two types of outliers.):

DatasetType#Nodes#Edges#FeatAvg. Degree#Con.#Strct.#OutliersOutlier Ratio
'weibo'organic8,405407,96340048.5--86810.3%
'reddit'organic10,984168,0166415.3--3663.3%
'disney'organic124335282.7--64.8%
'books'organic1,4183,695212.6--282.0%
'enron'organic13,533176,9871813.1--50.04%
'inj_cora'injected2,70811,0601,4334.170701385.1%
'inj_amazon'injected13,752515,04276737.23503506945.0%
'inj_flickr'injected89,250933,80450010.52,2402,2404,4144.9%
'gen_time'generated1,0005,746645.710010018918.9%
'gen_100'generated100618646.210101818.0%
'gen_500'generated5002,662645.31010204.0%
'gen_1000'generated1,0004,936644.91010202.0%
'gen_5000'generated5,00024,938645.01010200.4%
'gen_10000'generated10,00049,614645.01010200.2%

To use the datasets:

from pygod.utils import load_data
data = load_data('weibo') # in PyG format

Alternative download source in Baidu Disk (Chinese): https://pan.baidu.com/s/1afEZaygCRUYWJPtVbzuRYw Access Code: bond

For injected/generated datasets, the labels meanings are as follows.

Examples to convert the labels are as follows:

y = data.y.bool()    # binary labels (inlier/outlier)
yc = data.y >> 0 & 1 # contextual outliers
ys = data.y >> 1 & 1 # structural outliers