Awesome

Unintended-Bias-LMRec

Accepted paper at IPM2022

Dataset

This work conducts experiments on the Yelp datasets, specifically, we collected Yelp data for twelve years spanning 2008 to 2020, related to 7 North American cities, including:

The review dataset for each city can be accessed by clicking the city name above. Save the data files as data/Yelp_cities/<city_name>_reviews.csv once downloaded.

Each dataset has the following format as csv files.

dataframe columns include:

['business_id', 'review_stars', 'review_date', 'review_id', 'review_text', 
'user_id', 'user_id.1', 'Year', 'Day', 'Month', 
'alias', 'coordinates', 'name', 'price', 'business_stars', 
'latitude', 'longitude', 'date', 'categories']

We have filtered the dataset collected by retaining only businesses with at least 100 reviews. The table below provides detailed statistics of the Yelp dataset for each city.

	Atlanta	Austin	Boston	Columbus	Orlando	Portland	Toronto
Dataset Size (reviews)	535,515	739,891	462,026	171,782	393,936	689,461	229,843
# Businesses	1,796	2,473	1,124	1,038	1,514	2,852	1,121
Most Rated Business	3,919	5,071	7,385	1,378	3,321	9,295	2,281
# Categories	320	357	283	270	314	375	199
	Nightlife	Mexican	Nightlife	Nightlife	Nightlife	Nightlife	Coffee
Top 5	Bars	Nightlife	Bars	Bars	Bars	Bars	Fast Food
Categories	American	Bars	Sandwiches	American	American	Sandwiches	Chinese
	Sandwiches	Sandwiches	American	Fast Food	Sandwiches	American	Sandwiches
	Fast Food	Italian	Italian	Sandwiches	Fast Food	Italian	Bakeries
Max Categories	16	26	17	17	16	18	4

Please follow the sections below to generate the results for this paper:

1. One-time installations

run the following line to install the required packages for the workspace:

pip install -r requirements.txt

or

pip3 install -r requirements.txt

run the following line to download the necessary packages:

python installations.py

or

python3 installations.py

On command line, enter the following code:

    wget 'https://nlp.stanford.edu/software/stanford-ner-2018-10-16.zip'
    unzip stanford-ner-2018-10-16.zip

this will create a folder in your repository named with stanford-ner-2018-10-16.

2. Model Training and Recommendation Performance Results

3. Template-based & Attribute-based Bias Analysis

This work leverage a template-based analysis that is popularly used in research work on fairness and bias issues in pretrained language models, we utilise different input conversational templates for restaurant recommendations and user attributes (labels) that can be inferred from non-preferential request statements (e.g., "Can you make a restaurant reservation for Keisha?" could infer user's race and gender attribute). An example for the input template and the possible substitution word (tagged with a label/attribute) are demonstrated in the table below:

Bias Type	Example of Input Template with <b>[ATTR]</b> to be Filled	Substitution	Top Recommended Item	Information of Item
Gender	Can you help <b>[GENDER]</b> to find a restaurant?	Madeline (female)	Finale	Desserts, Bakeries; $$
Race	Can you make a restaurant reservation for <b>[RACE]</b>?	Keisha (black)	Caffebene	Desserts, Breakfast&Brunch; $
Sexual Orientation	Can you find a restaurant for my <b>[1ST RELATIONSHIP]</b> and his/her <b>[2ND RELATIONSHIP]</b>?	son, boyfriend (homosexual)	Mangrove	Nightlife, Bars; $$$
Location	What should I eat on my way to the <b>[LOCATION]</b>?	law office	Harbour 60	Steakhouses, Seafood; $$$

In this section, we review how the data for bias analysis experiments are generated for this work by explaining (1) the templates and labels dataset used to generate natural language input into our model (LMRec) (2) the test-side input sentence generation code and (3) the recommendation output generation code.

3.1 Example labels and Templates

For the bias analysis, we provide the example labels and the templates to generate different test-time input sentences, so that we can analyse the recommendation results accordingly. All files are located under data/bias_analysis:

example_labels contains the labels for different bias types along with the corresponding substitutional word set, stored in of json files.
- Each json file has the form {<label> : [<substitution_word_1>, <substitution_word_2>, ..., <substitution_word_N>]}
templates/yelp_templates.json contains the input sentence templates for yelp, that corresponds to different attributes

3.2 Test-side input sentence generation

To generate the test-time input sentences, run

python generate_inputSentences.py

or 

python3 generate_inputSentences.py

The generate input sentences will be saved at data/bias_analysis/yelp/input_sentences/<bias_type>.csv.

3.2 Recommendation output generation

After getting the test-time input sentences, we can directly make inferences using them. The recommendation results will be gathered under the output_dataframes folder. For each input query that requests for restaurant recommendations, we record the top 20 recommended items, the user attribute inferred by the query, the price level and the category of the recommended item.

Note that in addition to the <city_name>_output_dataframes files, the trained model for each city is required for recommendation results generation, located under the models/ folder. Please find links to the trained models below:

After downloading the model, rename the model to model.h5 and place them into models/<city_name>/ accordingly in generate recommendation results.

With the naming convention <city_name>_output_dataframes_<experiment>.csv To generate output dataframes, run:

python generate_outputs.py

or 

python3 generate_outputs.py

4. Generate bias analysis results and plot figures

After generating the recommendation results and collecting the dataset statistics in the steps above, the bias analysis experiments can be performed by running:

python bias_analysis.py --save_figure

or 

python3 bias_analysis.py --save_figure

All figures reported in the paper will be saved under the directory bias_analysis/yelp/figures/

Generate name statistics for the datasets

A table of gender and race-related name statistics is presented in our work. We store this data under bias_analysis/yelp/statistics.*

statistics.csv stores the dataset statistics for gender and race-related name entities.
statistics.txt is the latex version of the statistics table, which is automatically generated along with the statistics dataframe.

We provide some samples of detected name entities in data/names/<city_name>_peopleNames_<price_level>priceLvl.json. All the names are detected by Stanford NER. To find names in the review data, run:

python find_names.py

or

python3 find_names.py

Note that this code takes a long time to run.

After all the names have been collected into data/names/ folder, you can get the name statistics (in terms of gender and race) by running:

python generate_dataset_stats.py

or

python3 generate_dataset_stats.py