Home

Awesome

Datasets For Recommender Systems

This is a repository of public data sources for Recommender Systems (RS).

All of these recommendation datasets can convert to the atomic files defined in RecBole, which is a unified, comprehensive and efficient recommendation library.

After converting to the atomic files, you can use RecBole to test the performance of different recommender models on these datasets easily. For more information about RecBole, please refer to RecBole.

Usage

In order to use RecBole, you need to convert these original datasets to the atomic file which is a kind of data format defined by RecBole.

We provide two ways to convert these datasets into atomic files:

  1. Download the raw dataset and process it with conversion tools we provide in this repository. Please refer to conversion tools.

  2. Directly download the processed atomic files. Baidu Wangpan (Password: e272), Google Drive.

Datasets link and brief introduction

Shopping

Advertising

Check-in

Movies

Music

Books

Games

Anime

Pictures

Jokes

Exercises

Websites

Adult

News

Food

Beverages

Clothes

Datasets information statistics

General Datasets

SNDataset#User#Item#InteactionSparsityInteraction TypeTimeStampUser ContextItem ContextInteraction Context
1MovieLens----Rating
2Anime73,51511,2007,813,73799.05%Rating <br> [-1, 1-10]
3Epinions116,26041,269188,47899.99%Rating <br> [1-5]
4Yelp<br>(5 versions)----Rating <br> [1-5]
5Netflix480,18917,770100,480,50798.82%Rating <br> [1-5]
6Book-Crossing105,284340,5571,149,78099.99%Rating <br> [0-10]
7Jester73,4211014,136,36044.22%Rating <br> [-10, 10]
8Douban738,701282,125,05689.73%Rating <br> [0,5]
9Yahoo Music1,948,88298,21111,557,94399.99%Rating <br> [0, 100]
10KDD2010----Rating
11Amazon<br>(2014 & 2018)----Rating<br/> [0,5]
12Pinterest55,1879,9111,445,62299.74%-
13Gowalla107,0921,280,9696,442,89299.99%Check-in
14Last.FM1,89217,63292,83499.72%Click
15DIGINETICA204,789184,047993,48399.99%Click
16Steam2,567,53832,1357,793,06999.99%Buy
17Ta Feng32,26623,812817,74199.89%Click
18Foursquare----Check-in
19Tmall963,9232,353,20744,528,12799.99%Click/Buy
20YOOCHOOSE9,249,72952,73934,154,69799.99%Click/Buy
21Retailrocket1,407,580247,0852,756,10199.99%View/Addtocart/Transaction
22LFM-1b120,3223,123,4961,088,161,69299.71%Click
23MIND----Click
24BeerAdvocate33,38866,0551,586,61499.9281%Rating<br/> [0,5]
25Behance63,497178,7881,000,00099.9912%Likes
26DianPing542,706243,2474,422,47399.9967%Rating<br/> [0,5]
27EndoMondo1,104253,020253,02099.9094%Workout Logs
28Food226,570231,6371,132,36799.9978%Rating<br/> [0,5]
29GoodReads876,1452,360,650228,648,34299.9889%Rating<br/> [0,5]
30KGRec----Click
31ModCloth47,9581,37882,79099.8747%Rating<br/> [0,5]
32RateBeer29,265110,3692,924,16399.9095%Overall Rating<br/> [0,20]
33RentTheRunway105,5715,850192,54499.9688%Rating<br/> [0,10]
34Twitch15,524,3096,161,666474,676,92999.9995%Click
35Amazon_M23,606,3491,410,67515,306,183-Click
36Music4All-Onion119,140109,269252,984,396-Click

CTR Datasets

SNDataset#User#Item#InteactionSparsityInteraction TypeTimeStampUser ContextItem ContextInteraction Context
1Criteo--45,850,617-Click
2Avazu--40,428,967-Click <br> [0, 1]
3iPinYou19,731,66016324,637,65799.23%View/Click
4Phishing websites--11,055-
5Adult--32,561-income>=50k <br> [0, 1]
6Alibaba-iFashion3,569,1124,463,302191,394,39399.9988%Click
7AliEC491,647240,1301,366,05699.9988%Click

Knowledge-aware Datasets

These knowledge-aware recommender datasets are based on KB4Rec, which associate items from recommender systems with entities from Freebase. Note that Amazon-book dataset is the version released in 2014.

Raw datasets information

SNDataset#Items#Linked-Items#Users#Interactions
1MovieLens27,27825,503138,49320,000,263
2Amazon-book2,370,605108,5158,026,32422,507,155
3LFM-1b (tracks)31,634,4501,254,923120,322319,951,294

After filtering by 5-core (And filter out the tracks that are listened to less than 10 times in LFM-1b)

SNDataset#Items#Linked-Items#Users#Interactions
1MovieLens18,34518,057138,49319,984,024
2Amazon-book367,98234,476603,6688,898,041
3LFM-1b (tracks)615,823337,34979,13315,765,756