Home

Awesome

<div> <img src='https://cdn.jsdelivr.net/gh/BlackSpaceGZY/cdn/img/logo.jpg' width='36%'/> </div>

Recommender System with TF2.0---v0.0.3

<p align="left"> <img src='https://img.shields.io/badge/python-3.7+-blue'> <img src='https://img.shields.io/badge/Tensorflow-2.0+-blue'> <img src='https://img.shields.io/badge/NumPy-1.17-brightgreen'> <img src='https://img.shields.io/badge/pandas-1.0.5-brightgreen'> <img src='https://img.shields.io/badge/sklearn-0.23.2-brightgreen'> </p>

开源项目Recommender System with TF2.0主要是对经典的推荐算法论文进行复现,包括Matching(召回)(MF、BPR、SASRec等)、Ranking(排序)(DeepFM、DCN等)。

建立原因:

  1. 理论和实践似乎有很大的间隔,学术界与工业界的差距更是如此;
  2. 更好的理解论文的核心内容,增强自己的工程能力;
  3. 很多论文给出的开源代码都是TF1.x,因此想要用更简单的TF2.x进行复现;

项目特点:

 

重要更新

 

复现论文列表

1. 召回模型(Top-K推荐)

Paper|ModelPublishedAuthor
Matrix Factorization Techniques for Recommender Systems|MFIEEE Computer Society,2009Koren|Yahoo Research
BPR: Bayesian Personalized Ranking from Implicit Feedback|MF-BPRUAI, 2009Steffen Rendle
Neural network-based Collaborative Filtering|NCFWWW, 2017Xiangnan He
Self-Attentive Sequential Recommendation|SASRecICDM, 2018UCSD
STAMP: Short-Term Attention/Memory Priority Model for Session-based Recommendation| STAMPKDD, 2018Qiao Liu
Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding|CaserWSDM, 2018Jiaxi Tang
Next Item Recommendation with Self-Attentive Metric Learning|AttRecAAAAI, 2019Shuai Zhang

 

2. 排序模型(CTR预估)

Paper|ModelPublishedAuthor
Factorization Machines|FMICDM, 2010Steffen Rendle
Field-aware Factorization Machines for CTR Prediction|FFMRecSys, 2016Criteo Research
Wide & Deep Learning for Recommender Systems|WDLDLRS, 2016Google Inc.
Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features|Deep CrossingKDD, 2016Microsoft Research
Product-based Neural Networks for User Response Prediction|PNNICDM, 2016Shanghai Jiao Tong University
Deep & Cross Network for Ad Click Predictions|DCNADKDD, 2017Stanford University|Google Inc.
Neural Factorization Machines for Sparse Predictive Analytics|NFMSIGIR, 2017Xiangnan He
Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks|AFMIJCAI, 2017Zhejiang University|National University of Singapore
DeepFM: A Factorization-Machine based Neural Network for CTR Prediction|DeepFMIJCAI, 2017Harbin Institute of Technology|Noah’s Ark Research Lab, Huawei
xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems|xDeepFMKDD, 2018University of Science and Technology of China
Deep Interest Network for Click-Through Rate Prediction|DINKDD, 2018Alibaba Group

 

数据集

一些公开数据集链接失效,总是有同学找我要,但是由于数据集过大,无法上传。因此我提供以下链接方便下载:

  1. Criteo:vufh;
  2. Amazon_Electronic:96f2;
  3. Diginetica:p2hn;

 

致谢

项目中难免会存在一些代码Bug,感谢以下朋友指出问题:

  1. wangzhe258369:指出在DIN模型中tf.keras.layers.BatchNormalization默认行为是training=False,此时不会去更新BN中的moving_mean和moving_variance变量。但是重新修改了DIN模型代码内容时,再仔细查找了资料,发现

    如果使用模型调用fit()的话,是可以不给的(官方推荐是不给),因为在fit()的时候,模型会自己根据相应的阶段(是train阶段还是inference阶段)决定training值,这是由learning——phase机制实现的。

  2. boluochuile:发现SASRec模型训练出错,原因是验证集必须使用tuple的方式,已更正;

  3. dominic-z:指出DIN中Attention的mask问题,更改为从seq_inputs中得到mask,因为采用的是0填充(这里与重写之前的代码不同,之前是在每个mini-batch中选择最大的长度作为序列长度,不会存在序列过长被切割的问题,而现在为了方便,采用最普遍padding的方法)

  4. dominic-z:指出DIN训练中seq_inputsshape与model不匹配的问题,已更正,应该是(batch_size, maxlen, behavior_num),model相关内容进行更改,另外对于行为数量,之前的名称seq_len有歧义,改为behavior_num添加了重写之前的代码,在DIN/old目录下

  5. zhangfangkaiR7788380:指出在使用movielens的utils.py文件中,trans_score并不能指定正负样本,应将

    data_df.loc[data_df.label < trans_score, 'label'] = 0
    data_df.loc[data_df.label >= trans_score, 'label'] = 1
    

    更改为:

    data_df = data_df[data_df.label >= trans_score]
    

 

联系方式

1、对于项目有任何建议或问题,可以在Issue留言,或者发邮件至zggzy1996@163.com

2、作者有一个自己的公众号:潜心学习的潜心,如果喜欢里面的内容,不妨点个关注。

<div align=center><img src="https://cdn.jsdelivr.net/gh/BlackSpaceGZY/cdn/img/weixin.jpg" width="30%"/></div>