Awesome
UniRec
Introduction
UniRec is an easy-to-use, lightweight, and scalable implementation of recommender systems. Its primary objective is to enable users to swiftly construct a comprehensive ecosystem of recommenders using a minimal set of robust and practical recommendation models. These models are designed to deliver scalable and competitive performance, encompassing a majority of real-world recommendation scenarios.
It is important to note that this goal differs from those of other well-known public libraries, such as Recommender and RecBole, which include missions of providing an extensive range of recommendation algorithms or offering various datasets.
The term "Uni-" carries several implications:
-
Unit: Our aim is to employ a minimal set of models to facilitate the recommendation service onboarding process across most real-world scenarios. By maintaining a lightweight and extensible architecture, users can effortlessly modify and incorporate customized models into UniRec, catering to their specific future requirements.
-
United: In contrast to the Natural Language Processing (NLP) domain, it is challenging to rely on a single model to serve end-to-end business applications in recommender systems. It is desirable that various modules or stages (such as retrieval and ranking) within a recommender system are not isolated and trained independently but are closely interconnected.
-
Unified: While we acknowledge that model parameters cannot be unified, we believe there is potential to unify model structures. Consequently, we are exploring the possibility of utilizing a unified Transformer structure to serve different modules within recommender systems.
-
Universal: We aspire for UniRec to support a wide range of recommendation scenarios, including gaming, music, movies, ads, and e-commerce, using a universal data model.
Installation
Installation from PyPI
-
Ensure that PyTorch with CUDA supported (version 1.10.0-1.13.1) is installed:
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113 python -c "import torch; print(torch.__version__)"
-
Install
unirec
with pip:pip install unirec
Installation from Wheel Locally
-
Ensure that PyTorch with CUDA supported (version 1.10.0-1.13.1) is installed:
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113 python -c "import torch; print(torch.__version__)"
-
Clone Git Repo
git clone https://github.com/microsoft/UniRec.git
-
Build
cd UniRec pip install --user --upgrade setuptools wheel twine python setup.py sdist bdist_wheel
After building, the wheel package could be found in
UniRec/dist
. -
Install
pip install dist/unirec-*.whl
The specific package name could be find in
UniRec/dist
.Check if
unirec
is installed sucessfully:python -c "from unirec.utils import general; print(general.get_local_time_str())"
Algorithms
Examples
To go through all the examples listed below, we provide a script for downloading and split for ml-100k dataset. Run:
python download_split_ml100k.py
The files for the raw dataset would be saved in your home dir: ~/.unirec/dataset/ml-100k
Next, it is essential to convert the raw dataset into a format compatible with UniRec. Use the script to process and save the files in UniRec/data/ml-100k
.
cd examples/preprocess
bash preprocess_ml100k.sh
General Training
To train an existing model in UniRec, for instance, training SASRec with ml-100k dataset, refer to the script provided in examples/training/train_ml100k.sh.
Multi-GPU Training
UniRec supports multi-GPU training with the integration of Accelerate. An example script is available at examples/training/multi_gpu_train_ml100k.sh. The key arguments in the script could be found in line 3-12 in the script:
GPU_INDICES="0,1" # e.g. "0,1"
# Specify the number of nodes to use (one node may have multiple GPUs)
NUM_NODES=1
# Specify the number of processes in each node (the number should equal the number of GPU_INDICES)
NPROC_PER_NODE=2
For more details about the launching command, please refer to Accelerate Docs.
Hyperparameter Tuning with wandb
UniRec supports hyperparameter tuning (or hyperparameter optimization, HPO) with the intergration of WandB. There are three major steps to start a wandb experiment.
-
Compose a training script and enable
wandb
. An example is provided in examples/training/train_ml100k_with_wandb.sh. The key arguments are:--use_wandb=1
: enable wandb in process--wandb_file=/path/to/configuration_file
: the configuration file for wandb, including command, metrics, method, and search space.
-
Define sweep configuration. Write a YAML-format configuration file to set the command, monitor metrics, tuning method and search space.An example is available at examples/training/wandb.yaml. For more details about the configuration file, refer to WandB Docs
-
Initialize sweeps and start sweep agents. To start an experiment with wandb, first, initialize a sweep controller for selecting hyperparameters and issuing intructions; then an agent would actually perform the runs. An example for launching wandb experiments is provided in examples/training/wandb_start.sh. Note that we offer a pipeline command in the script to start the agent automatically after sweep initialization. However, we recommend the simpler manual two-step process:
## Step 1. Initialize sweeps with CLI using configuration file.
## For more details, please refer to https://docs.wandb.ai/guides/sweeps/initialize-sweeps
wandb sweep config.yaml
## Step 2. After `wandb sweep`, you would get a sweep id and the hint to use `sweep agent`, like:
## wandb: Creating sweep from: ./wandb.yaml
## wandb: Created sweep with ID: xxx
## wandb: View sweep at: https://wandb.ai/xxx/xxx/xxx/xxx
## wandb: Run sweep agent with: wandb agent xxx/xxx/xxx/xxx
wandb agent entity/project/sweep_ID
Serving with C# and Java
UniRec supports C# and Java inference based on ONNX format. We provide inference for user embedding, item embedding, and user-item score.
For more details, please refer to examples/serving/README
Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
Trademarks
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.