Awesome
GJRC
German Job Reference Corpus
This repository provides the German Job References Corpus used in our publication "Analyzing Sentiments of German Job References". Please note, that due to copyright restrictions, we cannot make the full corpus available currently.
Authors: Finn Folkerts, Vanessa Schreck, Shirin Riazy and Katharina Simbeck
Published at: more information will follow
For more information on our research group, please go to https://iug.htw-berlin.de/.
This repository contains the data that was used in support of the HCC 2019 paper Analyzing Sentiments of German Job References.
Please visit our second GitHub Repository in order to find out more about the context in which we used this data.
Content of the Repository
We compiled a test corpus with typical German job reference letter sentences from German books on how to write job reference letters. We combined those template sentences with subjects of varying gender, origin and nobility. To find suitable surnames for our experiment, we looked up the lists of members of the German state parliaments. Then we mapped the names to their origins and randomly picked ten German surnames, ten German surnames with nobiliary particle and ten Turkish surnames. In data/surnames.csv we listed all surnames that we used to compile the GJRC. Following a literature review, we collected 843 different sentences that are commonly used in German job references. In order to generate multiple versions of the same sentence, we modified each one so that it can be used as a template: all words that are gender-specific or require gender-specific declension were substituted with a suitable placeholder. You can find these template sentences in data/template_sentences.csv. Note that by law, German reference letters must be phrased favorably to the employee, even if the employee did not perform well. To compile the German Job Reference Corpus, we combined each template sentence with each of the 30 different surnames and both gender specific titles. This yields 60 distinct sentences originating from the same template. Additionally, we altered each template sentence by replacing the title and surname with the corresponding male or female pronoun, thus adding another two sentences per template to the corpus. Eventually, the corpus consists of 52,266 sentences in total, out of which 1,686 sentences are formed with a pronoun instead of a name.
You will also find the Python script we used to generate the corpus from the template sentences and the names as well as the resulting corpus.
Literature
The literature from which we took the sentences which we then transformed into template sentences are:
-
H.-G. Dachrodt and V. Engelbert, Zeugnisse richtig formulieren: Mit vielen Mustern und Analysen. Wiesbaden: Springer Gabler, 2013.
-
G. Huber and W. Müller, Das Arbeitszeugnis in Recht und Praxis: Rechtliche Grundlagen, Textbausteine, Musterzeugnisse, Zeugnisanalysen, 16th ed. Haufe-Lexware GmbH & Co. KG, 2016.
-
S. Schustereit and J. Welscher, Arbeitszeugnisse für den öffentlichen Dienst, 2nd ed. München: Haufe-Lexware GmbH & Co. KG, 2013.
-
T. Knobbe, M. Leis, and K. Umnuß, Arbeitszeugnisse für Führungskräfte, 5th ed. Freiburg, Br.: Haufe, 2010.
-
T. Knobbe, M. Leis, and K. Umnuß, Arbeitszeugnisse: Textbausteine und Tätigkeitsbeschreibungen, 6th ed. München: Haufe-Lexware GmbH & Co. KG, 2011.
Project
The present research was done as part of the project Diskriminiert durch Künstliche Intelligenz (Discriminated by Artificial Intelligence) at Hochschule für Technik und Wirtschaft (University of Applied Sciences) Berlin under the direction of Katharina Simbeck. This research project was funded by Hans-Böckler-Stiftung.
Hochschule für Technik und Wirtschaft
HTW Berlin, 10313 Berlin (Postfach)
Hans-Böckler-Stiftung
Hans-Böckler-Straße 39, 40476 Düsseldorf
Authors
- Finn Folkerts - HTW Berlin - Email
- Vanessa Schreck - HTW Berlin - Email
- Shirin Riazy - HTW Berlin - Email
- Katharina Simbeck - HTW Berlin - Email
License
Please refer to our LICENSE file for this information.
Citing
If you found this repository or our paper helpful please consider citing us with this bibtex.
@inproceedings{folkerts2019,
author = {Folkerts, Finn and Schreck, Vanessa and Riazy, Shirin and Simbeck, Katharina},
title = {Analyzing Sentiments of German Job References},
crossref = {hcc2019},
pages = {??--??},
doi = {???},
}
@proceedings{hcc2019,
editor = {???},
title = "???",
booktitle = "???(gleich wie title)",
publisher = {???}
venue = {Laguna Hills, California, USA},
month = sep,
year = {2019},
isbn = {???},
}