Awesome

Counterfactual-Feature-aware-Collaborative-Filtering

Requirements

Python 3.6.9
PyTorch 1.4.0 + cuda 10
numpy 1.16

Project Structure

.
|-- __init__.py
|-- data               # the directory of dataset:  the preprocessed Amazon Dataset, please follow the "DataStructure" Step.
|-- main.py            # the training logic of our CFCF method. change the --anchor_model as 1|2|3|4|5 to change to different logic
|-- config.py          # config the anchor model type, you can set anchor_type to "ele_add | ele_mul | hybrid | attention", and you can set the number of gpus 
|-- data_loader.py     # dataloader of the Amazon Dataset. 
|-- model              # the model directory. contain anchor model and the intervener model
|   |-- __init__.py
|   |-- anchor_model.py # the f model in our paper, Here is the Multiply anchor model
|   |-- intervention_model.py # counter factual sample method
|   |-- anchor_hybrid.py # the f model in our paper, Here is the hybrid anchor model
|   |-- anchor_attention.py # the f model in our paper, Here is the attention anchor model
|   `-- anchor_ele_add.py # the f model in our paper, Here is the ele add anchor model
| 
`-- utils
    |-- QPC.py    
    `-- eval.py       # inference code, calculate the metrics of recommendation: NDCG F1 HitRate

Data Structure

download our dataset and unzip it in ./data/ directory.

here, we provided the small dataset Amazon_Instant_Video.tar for test. the extraction code is 9w4m the complete dataset is 3G Amazon_dataset_complete. the extraction code is q4vt

data directory will like following:

data
|-- Amazon_Instant_Video
    |-- Amazon_Instant_Video.formated
    |-- anchor.ptr
    |-- anchor_best.ptr
    |-- feature_id_dict
    |-- id_feature_dict
    |-- id_item_dict
    |-- id_user_dict
    |-- item_feature_quality_matrix
    |-- item_id_dict
    |-- predicted_item_feature_quality
    |-- predicted_user_feature_attention
    |-- sorted_ided_dataset
    |-- statistics
    |-- test_compute_user_items_dict
    |-- test_data
    |-- test_ground_truth_user_items_dict
    |-- train_data
    |-- train_user_negative_items_dict
    |-- train_user_positive_items_dict
    |-- user_feature_attention_matrix
    `-- user_id_dict

Usage

Install all the required packages
Unzip the dataset in ./data directory and check every file is exists.
modify the config.py, set the anchor model to what you want. default is "ele_mul" anchor model : CF-mul
Run

python main.py --data_path=./data/Amazon_Instant_Video/ --anchor_model=1  # train the anchor model and save it
mv ./data/Amazon_Instant_Video/anchor.ptr ./data/Amazon_Instant_Video/anchor_best.ptr  # change name of saved model to 'anchor_best.ptr'
python main.py --data_path=./data/Amazon_Instant_Video/ --anchor_model=2 --confidence=0.55 --intervener_learning_rate=0.001 --intervener_reg=0.01 --learning_rate=0.001 --intervener_feature_number=60 --intervener_l1_reg=0.0025          # generate the counterfactual sample and finetune the anchor model(CF-Base)

Result

CF-Base for AmazonInstantVideo : 0.088 CF-Hard for AmazonInstantVideo : 0.097

Cases

Please See the paper.