Home

Awesome

Show me a "Male Nurse"! How Gender Bias is Reflected in the Query Formulation of Search Engine Users

This repository contains the resources and the results of the user studies in this paper. The work investigates the biases of the search engines' users, when formulating queries on gender-sensitive topics. A pilot study and main study were conducted with substantial variation in study setups, as described in detail in the paper. This repository contains the data used in the online user studies, and the datasets that result from these studies, as explained below.

@inproceedings{Kopeinik2023MaleNurse,
  title={Show me a "Male Nurse"! How Gender Bias is Reflected in the Query Formulation of Search Engine Users},
  author={Kopeinik, Simone and Mara, Martina and Ratz, Linda and Krieg, Klara and Schedl, Markus and Rekabsaz, Navid},
  booktitle={Proceeding of the ACM Conference on Human Factors in Computing Systems (CHI)},
  year={2023}
}

Paper, preprint, supplementary material

Pilot study

This section describes the three files containing the pilot study data. The formatting and contents of each file are described below.

DocumentsPilot.csv

The file DocumentsPilot.csv contains a set of 22 documents that were used in the pilot study. Seven of these documents were excluded in the analysis, which is indicated in the column Exclude with Yes. The documents were retrieved from the Grep-BiasIR dataset (dataset, paper) and are labelled according to gender indication and gender stereotype.

DatasetPilot.csv

The file DatasetPilot.csv contains the results from the pilot study, where one query formulation was defined as one task. The columns names and their content descriptions follow:

DatasetPilotLabeled.csv

The file DatasetPilotLabeled.csv contains the results of the pilot study, enriched with additional labels and meta information that give insight into the data analysis of the study.

Main study

This section describes the three files containing the main study data. The formatting and contents of each file are described below.

DocumentsMain.csv

The file DocumentsMain.csv contains the documents used for the main study. The documents were taken from the Grep-BiasIR dataset (https://github.com/KlaraKrieg/GrepBiasIR). We used a set of eight documents originating from four domains. Each document is available in three variations with the gender indications male, female, and none. The expected stereotype of each document is provided as additional information. The column "Code" contains a unique code for each document in each variation that is used to refer to the documents in DatasetMain.csv as explained below.

DatasetMain.csv

The file DatasetMain.csv contains the results from the main study. The column names containing the codes referring to the documents (e.g. 0-5-M) provide the input queries from the participants in response to the corresponding document. The codes and the documents can be found in DocumentsMain.csv. The descriptions of the content of each column follows. Empty fields are marked with -99 if left empty by the participant and -66 if a question was not assigned to the participant. For the questions using the likert scale the encoding is: 1=Stronlgy disagree, 2=Disagree, 3=Neither agree nor disagree, 4=Agree, 5=Strongly agree.

Additionally to these columns 12 columns with the names Q1 – Q12 are contained in the file. which contain answers to the questions listed below. For each question the answer format is given in brackets. For the questions using the likert scale the encoding is: 1=Stronlgy disagree, 2=Disagree, 3=Neither agree nor disagree, 4=Agree, 5=Strongly agree.

DatasetMainLabeled.csv

DatasetMainLabeled.csv contains the labelled results from the main study.