Home

Awesome

HybridRecommendation

Implementation for paper: Recommendation Based on Review Texts and Social Communities: A Hybrid Model

Overview

In this project, we implement a community regression model to predict user ratings towards bussinesse. The project is based on Spark Scala API. It is a local version of our proposed model, you can run it in a single machine in the Spark Standalone Mode. After downloading the spark dependencies and our processed Yelp data, presiction resuls will be printed by executing the Scala2.jar file. Have a good time!

Requirement

Softeware requirement: Java 1.8

Preparation

You can download our processed dataset from: https://drive.google.com/open?id=1uFmDlS73DRSzjqX7yL2_N3EO05N6iA7L The executable jar and dependencies from: https://drive.google.com/open?id=1M566erL8LHjpDLmL7KkeTfeapRMO9_eQ

Model Training and Testing

To training our hybrid recommendation model, use java -jar command:

eg. $ java -jar -Xmx10g Scala2.jar --root_path DataDirectory/ --coda_result socialUR20CaGroup200.txt

Here is the params list and introduction:

--root_path

This is the root dir where the processed data are stored. You must set this param at first to init our model.

--task

If you want to random split the processed data to traing and testing set, set "--task DataSplit", else program will find data in the
root/output/Access/ floder by default.

--model_type

The regression model you want to apply. Default is "LR"(Linear Regression).

--word2vec_num

This is a word2vec param which used to set the dimensionality of the word embedding vector. Default is 10.

--review_num

The review number of users. Default is 20.

--min_count

A word2vec param. The minimal occurance number of words. Default is 5.

--window_num

A word2vec param. Default is 5.

--social_type

If you want to choose the community detection algorithms, please set this param to "--social_type coda" or "--social_type cnm". The de fault algorithm is coda.

--cnm_result

The file name of the cnm community detection results. Default is "Yelp2016UserBusinessStarReview"+reviewNum+"cnm2.txt"

--coda_result

The file name of the coda community detection results. Default is "Review"+reviewNum+"mc50xc200ClusterSkipcmtyvv.in.txt"