Home

Awesome

<p align="center"> <img src="https://user-images.githubusercontent.com/26851363/172485577-be6993ef-47c3-4b3c-9187-4988f6c44d94.svg" alt="ClayRS logo" style="width:75%;"/> </p>

ClayRS

Build Status   Docs   codecov   Python versions

ClayRS is a python framework for (mainly) content-based recommender systems which allows you to perform several operations, starting from a raw representation of users and items to building and evaluating a recommender system. It also supports graph-based recommendation with feature selection algorithms and graph manipulation methods.

The framework has three main modules, which you can also use individually:

<p align="center"> <img src="https://user-images.githubusercontent.com/26851363/164490523-00d60efd-7b17-4d20-872a-28eaf2323b03.png" alt="ClayRS" style="width:75%;"/> </p>

Given a raw source, the Content Analyzer:

The RecSys module allows to:

The EvalModel has the task of evaluating a recommender system, using several state-of-the-art metrics

Code examples for all three modules will follow in the Usage section

Installation

ClayRS requires Python 3.7 or later, while package dependencies are in requirements.txt and are all installable via pip, as ClayRS itself.

To install it execute the following command:

pip install clayrs

Usage

Content Analyzer

The first thing to do is to import the Content Analyzer module

import clayrs.content_analyzer as ca

Then, let's point to the source containing raw information to process

raw_source = ca.JSONFile('items_info.json')

We can now start building the configuration for the items

# Configuration of item representation
movies_ca_config = ca.ItemAnalyzerConfig(
    source=raw_source,
    id='movielens_id',  # id which uniquely identifies each item
    output_directory='movies_codified/'  # where items complexly represented will be stored
)

Let's represent the plot field of each content with a TfIdf representation

movies_ca_config.add_single_config(
    'plot',
    ca.FieldConfig(ca.SkLearnTfIdf(),
                   preprocessing=ca.NLTK(stopwords_removal=True,
                                         lemmatization=True),
                   id='tfidf')  # Custom id
)

To finalize the Content Analyzer part, let's instantiate the ContentAnalyzer class by passing the built configuration and by calling its fit() method

ca.ContentAnalyzer(movies_ca_config).fit()

RecSys

Similarly above, we must first import the RecSys module

import clayrs.recsys as rs

Then we load the rating frame from a TSV file

ratings = ca.Ratings(ca.CSVFile('ratings.tsv', separator='\t'))

Let's split with the KFold technique the loaded rating frame into train set and test set

train_list, test_list = rs.KFoldPartitioning(n_splits=2).split_all(ratings)

In order to recommend items to users, we must choose an algorithm to use

centroid_vec = rs.CentroidVector(
    {'plot': 'tfidf'},
  
    similarity=rs.CosineSimilarity()
)

Let's now compute the top-10 ranking for each user of the train set

Since we used the kfold technique, we iterate over the train sets and test sets

result_list = []

for train_set, test_set in zip(train_list, test_list):
  
  cbrs = rs.ContentBasedRS(centroid_vec, train_set, 'movies_codified/')
  rank = cbrs.fit_rank(test_set, n_recs=10)

  result_list.append(rank)

EvalModel

Similarly to the Content Analyzer and RecSys module, we must first import the evaluation module

import clayrs.evaluation as eva

The Evaluation module needs the following parameters:

Obviously the list of computed rank/predictions and list of truths must have the same length, and the rank/prediction in position <img src="https://render.githubusercontent.com/render/math?math=i"> will be compared with the truth at position <img src="https://render.githubusercontent.com/render/math?math=i">

em = eva.EvalModel(
    pred_list=result_list,
    truth_list=test_list,
    metric_list=[
        eva.NDCG(),
        eva.Precision(),
        eva.RecallAtK(k=5)
    ]
)

Then simply call the fit() method of the instantiated object

sys_result, users_result =  em.fit()

Note that the EvalModel is able to compute evaluation of recommendations generated by other tools/frameworks, check documentation for more