Home

Awesome

Unintended-Bias-LMRec

Accepted paper at IPM2022

Dataset

This work conducts experiments on the Yelp datasets, specifically, we collected Yelp data for twelve years spanning 2008 to 2020, related to 7 North American cities, including:

The review dataset for each city can be accessed by clicking the city name above. Save the data files as data/Yelp_cities/<city_name>_reviews.csv once downloaded.

Each dataset has the following format as csv files.

dataframe columns include:

['business_id', 'review_stars', 'review_date', 'review_id', 'review_text', 
'user_id', 'user_id.1', 'Year', 'Day', 'Month', 
'alias', 'coordinates', 'name', 'price', 'business_stars', 
'latitude', 'longitude', 'date', 'categories']

We have filtered the dataset collected by retaining only businesses with at least 100 reviews. The table below provides detailed statistics of the Yelp dataset for each city.

AtlantaAustinBostonColumbusOrlandoPortlandToronto
Dataset Size (reviews)535,515739,891462,026171,782393,936689,461229,843
# Businesses1,7962,4731,1241,0381,5142,8521,121
Most Rated Business3,9195,0717,3851,3783,3219,2952,281
# Categories320357283270314375199
NightlifeMexicanNightlifeNightlifeNightlifeNightlifeCoffee
Top 5BarsNightlifeBarsBarsBarsBarsFast Food
CategoriesAmericanBarsSandwichesAmericanAmericanSandwichesChinese
SandwichesSandwichesAmericanFast FoodSandwichesAmericanSandwiches
Fast FoodItalianItalianSandwichesFast FoodItalianBakeries
Max Categories1626171716184

Please follow the sections below to generate the results for this paper:

1. One-time installations

run the following line to install the required packages for the workspace:

pip install -r requirements.txt

or

pip3 install -r requirements.txt

run the following line to download the necessary packages:

python installations.py

or

python3 installations.py

On command line, enter the following code:

    wget 'https://nlp.stanford.edu/software/stanford-ner-2018-10-16.zip'
    unzip stanford-ner-2018-10-16.zip

this will create a folder in your repository named with stanford-ner-2018-10-16.

2. Model Training and Recommendation Performance Results

3. Template-based & Attribute-based Bias Analysis

This work leverage a template-based analysis that is popularly used in research work on fairness and bias issues in pretrained language models, we utilise different input conversational templates for restaurant recommendations and user attributes (labels) that can be inferred from non-preferential request statements (e.g., "Can you make a restaurant reservation for Keisha?" could infer user's race and gender attribute). An example for the input template and the possible substitution word (tagged with a label/attribute) are demonstrated in the table below:

Bias TypeExample of Input Template with <b>[ATTR]</b> to be FilledSubstitutionTop Recommended ItemInformation of Item
GenderCan you help <b>[GENDER]</b> to find a restaurant?Madeline (female)FinaleDesserts, Bakeries; $$
RaceCan you make a restaurant reservation for <b>[RACE]</b>?Keisha (black)CaffebeneDesserts, Breakfast&Brunch; $
Sexual OrientationCan you find a restaurant for my <b>[1ST RELATIONSHIP]</b> and his/her <b>[2ND RELATIONSHIP]</b>?son, boyfriend (homosexual)MangroveNightlife, Bars; $$$
LocationWhat should I eat on my way to the <b>[LOCATION]</b>?law officeHarbour 60Steakhouses, Seafood; $$$

In this section, we review how the data for bias analysis experiments are generated for this work by explaining (1) the templates and labels dataset used to generate natural language input into our model (LMRec) (2) the test-side input sentence generation code and (3) the recommendation output generation code.

3.1 Example labels and Templates

For the bias analysis, we provide the example labels and the templates to generate different test-time input sentences, so that we can analyse the recommendation results accordingly. All files are located under data/bias_analysis:

3.2 Test-side input sentence generation

To generate the test-time input sentences, run

python generate_inputSentences.py

or 

python3 generate_inputSentences.py

The generate input sentences will be saved at data/bias_analysis/yelp/input_sentences/<bias_type>.csv.

3.2 Recommendation output generation

After getting the test-time input sentences, we can directly make inferences using them. The recommendation results will be gathered under the output_dataframes folder. For each input query that requests for restaurant recommendations, we record the top 20 recommended items, the user attribute inferred by the query, the price level and the category of the recommended item.

Note that in addition to the <city_name>_output_dataframes files, the trained model for each city is required for recommendation results generation, located under the models/ folder. Please find links to the trained models below:

After downloading the model, rename the model to model.h5 and place them into models/<city_name>/ accordingly in generate recommendation results.

With the naming convention <city_name>_output_dataframes_<experiment>.csv To generate output dataframes, run:

python generate_outputs.py

or 

python3 generate_outputs.py

4. Generate bias analysis results and plot figures

After generating the recommendation results and collecting the dataset statistics in the steps above, the bias analysis experiments can be performed by running:

python bias_analysis.py --save_figure

or 

python3 bias_analysis.py --save_figure

All figures reported in the paper will be saved under the directory bias_analysis/yelp/figures/

Generate name statistics for the datasets

A table of gender and race-related name statistics is presented in our work. We store this data under bias_analysis/yelp/statistics.*

We provide some samples of detected name entities in data/names/<city_name>_peopleNames_<price_level>priceLvl.json. All the names are detected by Stanford NER. To find names in the review data, run:

python find_names.py

or

python3 find_names.py

Note that this code takes a long time to run.

After all the names have been collected into data/names/ folder, you can get the name statistics (in terms of gender and race) by running:

python generate_dataset_stats.py

or

python3 generate_dataset_stats.py