Home

Awesome

Library of Recommender System based on PyTorch

Introduction

A Python package to integrate the pipeline of recommender systems for simple model designing and fast idea verification.

Installation

bash install.sh

Requirement

  1. Python >= 3.8
  2. PyTorch >= 1.2.0
  3. visdom == 0.1.8.9 (for visualization)
  4. nvidia_ml_py3 == 7.352.0 (for auto-selection of GPU)
  5. Negative sampling recommender systems

Advantage

  1. Separate model design from other dirty things like CLI, logging, GPU selection, etc.
  2. Provide two general methods to save and visualize key values in the pipeline, TrainHook and Metric.
  3. Provide widely used loss functions and model design patterns in recommendation, supporting customization easily.

Submodules

analysis

The submodule is to automatically analyse the dataset or the model results for paper presentation. Now the submodule only supports:

argument

data

utils

logger.py

The submodule is to easily record the hyperparameters setting, the environment setting (like optimizer) and all related values of each model into specific path.

loss.py

The submodule is to contain some commonly used loss functions in recommendation, including L2loss, BPR, MSE with mask, BCE.

metric.py

The submodule is to provide a general way to save metrics in model evaluation and visualize them by Visdom. We provide Precision Recall NDCG MRR for fully-ranking and leave-one-out mode (in leave-one-out mode, Precision Recall->HR)

model.py

The submodule is to provide some base class for models.

pipeline.py

test.py

The submodule is to evaluate model in fully-ranking or leave-one-out mode by fully_ranking_test or leave_one_out_test, respectively.

train.py

The submodule is to train model.

trainhook.py

The submodule is to provide a general way to save key values in model forward phase and visualize them as metrics. The trainhooks will be found in model.trainhooks as dict, whose key is the title of trainhook (__loss__ is used for recording loss of each epoch).

Example

Configuration

config.json

{
    "user": "name_for_setproctitle",
    "visdom": {
        "server": "visdom_ip",      # use empty string to disable visdom visualization
        "port": {
            "dataset_name1": 10001,
            "dataset_name2": 10002
        }
    },
    "training": {
        "test_interval": 5,
        "early_stop": 50,
        "overfit": {
            "protected_epoch": 10,
            "threshold": 0.1
        }
    },
    "dataset": {
        "path": "path_to_dataset",
        "seed": 123,
        "use_backup": true
    },
    "logger": {
        "path": "path_to_log",
        "policy": "best"
    },
    "metric": {
        "target": {
            "type": "NDCG",
            "topk": 10
        },
        "metrics": [
            {
                "type": "Recall",
                "topk": 5
            },
            {
                "type": "Recall",
                "topk": 10
            },
            {
                "type": "NDCG",
                "topk": 5
            },
            {
                "type": "NDCG",
                "topk": 10
            }
        ]
    }
}

Visdom

visdom part in config.json

Training

training part in config.json

Dataset

dataset part in config.json

Logger

logger part in config.json

Metric

metric part in config.json

Each metric is described by

{
    "type": "MetricName",
    "topk": X
}

, which means MetricName@X (e.g. NDCG@10, Recall@5, MRR@40).

Pretrain Configuration

pretrain.json

{
    "BeiBei": {
        "GBMF": "path-to-GBMF-pretrain-model"
    }
}

The key of the first layer is dataset name.

The key of the second layer is pretrain model name and the value is the path.

According to dataset used for training or testing, the second layer will be sended into model.load_pretrain as python dict pretrain_info.

Roadmap