Home

Awesome

ProSub

Introduction

This repository contains resources developed by the JSPS KAKENHI project "A cross-linguistic study of pronoun substitutes and address terms" (JP20H01255).

How to cite

@InProceedings{OkanoEtAl22,
    author = {Okano, Kenji and Nomoto, Hiroki and Wittayapanyanon, Sunisa and Thuzar Hlaing and Kasuga, Atsushi},
    year = {2022},
    title = {Ajia sangengo niokeru daimeishidaiyou, yobikakego no kyoutsuukoumoku chousa},
    booktitle = {Proceedings of the Twenty-Eighth Annual Meeting of the {A}ssociation for {N}atural {L}anguage {P}rocessing},
    pages = {69-73},
    url = {https://www.anlp.jp/proceedings/annual_meeting/2022/pdf_dir/D1-2.pdf},
    note = {An investigation of pronoun substitutes and address terms in three Asian languages based on a common questionnaire}
}

@InProceedings{TaniguchiEtAl22,
    author = {Taniguchi, Ryuko and Okubo, Wataru and Nomoto, Hiroki and Nam, Yunjin},
    year = {2022},
    title = {Daimeishidaiyou, yobikake hyougen no tagengo deetasetto},
    booktitle = {The Proceedings of the 164th Meeting of the {L}inguistic {S}ociety of {J}apan},
    pages = {307-313},
    url = {https://www.ls-japan.org/modules/documents/LSJpapers/meeting/164/handouts/p/P-6_164.pdf},
    note = {A multilingual dataset of pronoun substitutes and address terms}
}

@InProceedings{NomotoEtAl23,
    author = {Nomoto, Hiroki and Taniguchi, Ryuko and Nakamura, Shiori and Nam, Yunjin and Lestari, Sri Budi and Wittayapanyanon (Saito), Sunisa and Sornlertlamvanich, Virach and Kasuga, Atsushi and Okano, Kenji and Thuzar Hlaing},
    year = {2023},
    title = {Pronoun substitute annotation in seven {A}sian languages},
    booktitle = {Proceedings of the Twenty-Ninth Annual Meeting of the {A}ssociation for {N}atural {L}anguage {P}rocessing},
    pages = {2242-2247},
    url = {https://www.anlp.jp/proceedings/annual_meeting/2023/pdf_dir/P9-4.pdf}
}

@misc{NakamuraEtAl24,
    author = {Nakamura, Shiori and Umeda, Rina and Taniguchi, Ryuko},
    year = {2024},
    title = {Hatsuwakinoo no shurui to jishooshi, taishooshi no meijika},
    howpublished = {Paper presented at the 40th research seminar on "Contrastive Studies of Japanese and Other Languages", Contrastive Japanese Division, International Center for Japanese Studies, Tokyo University of Foreign Studies},
    note = {Speech function types and the explict realization of speaker- and addressee-referring expressions}
}

Contents

Format

common_questionnaire.tsv

ID [TAB] concept (Japanese) [TAB] concept (English) [TAB] WordNet synset ID [TAB] semantic category

The semantic categories are based on the feature system proposed by:
Nomoto, Hiroki, Kenji Okano, Sunisa Wittayapanyanon and Junta Nomura. 2019. Interpersonal meaning annotation for Asian language corpora: The case of TUFS Asian Language Parallel Corpus (TALPCo). Proceedings of the Twenty-Fifth Annual Meeting of the Association for Natural Language Processing, 846-849.

data_all_v1.0.tsv, data_all_v2.0.tsv

language code [TAB] concept ID [TAB] concept (Japanese) [TAB] concept (English) [TAB] semantic category [TAB] expression [TAB] Japanese translation of expression [TAB] English translation of expression [TAB] function [TAB] judgement [TAB] example [TAB] Japanese translation of example [TAB] English translation of example [TAB] source [TAB] notes

data_all_v1.0.json, data_all_v2.0.json

{
     "language code": {
        "concept ID": {
            "concept_jpn": "concept (Japanese)",
            "concept_eng": "concept (English)",
            "category": "semantic category",
            "expression": [
                {
                    "form": "expression",
                    "trans_jpn": "Japanese translation of expression",
                    "trans_eng": "English translation of expression",
                    "function": {
                        "1st": {
                            "judgement": "judgement",
                            "example": [
                                {
                                    "form": "example",
                                    "trans_jpn": "Japanese translation of example",
                                    "trans_eng": "English translation of example",
                                    "source": "source",
                                    "notes": "notes"
                                }
                            ]
                        },
                        "2nd": {
                            "judgement": "judgement",
                            "example": [
                                {
                                    "form": "example",
                                    "trans_jpn": "Japanese translation of example",
                                    "trans_eng": "English translation of example",
                                    "source": "source",
                                    "notes": "notes"
                                }
                            ]
                        },
                        "address": {
                            "judgement": "judgement",
                            "example": [
                                {
                                    "form": "example",
                                    "trans_jpn": "Japanese translation of example",
                                    "trans_eng": "English translation of example",
                                    "source": "source",
                                    "notes": "notes"
                                }
                            ]
                        },
                        "title": {
                            "judgement": "judgement",
                            "example": [
                                {
                                    "form": "example",
                                    "trans_jpn": "Japanese translation of example",
                                    "trans_eng": "English translation of example",
                                    "source": "source",
                                    "notes": "notes"
                                }
                            ]
                        }
                    }
                }
            ]
        }
}

full_data.json

{
    "id": "ID",
    "language": "language code",
    "form": "form",
    "romanization": "romanization of the form",
    "trans_jpn": "Japanese translation of the form",
    "trans_eng": "English translation of the form",
    "wordnet_id": "wordnet synset ID",
    "wordnet_note": "notes on the wordnet information",
    "component_elements": "component elements of the form",
    "related_word": "word related to the form",
    "related_word_relation": "relation of the related word to the form",
    "1st": [
        {
            "example": "example of the 1st person function use",
            "trans_jpn": "Japanese translation of the example",
            "trans_eng": "English translation of the example",
            "source": "source of the example",
            "notes": "notes on the example"
        }
    ],
    "2nd": [
        {
            "example": "example of the 2nd person function use",
            "trans_jpn": "Japanese translation of the example",
            "trans_eng": "English translation of the example",
            "source": "source of the example",
            "notes": "notes on the example"
        }
    ],
    "address": [
        {
            "example": "example of the address term function use",
            "trans_jpn": "Japanese translation of the example",
            "trans_eng": "English translation of the example",
            "source": "source of the example",
            "notes": "notes on the example"
        }
    ],
    "title": [
        {
            "example": "example of the honorific title function use",
            "trans_jpn": "Japanese translation of the example",
            "trans_eng": "English translation of the example",
            "source": "source of the example",
            "notes": "notes on the example"
        }
    ],
    "loan_word": "whether or not the form is a loan word (true/false/null)",
    "formal_features": {
        "common_noun": "whether or not the form is a common noun (true/false/null)",
        "proper_noun_real_name": "whether or not the form is a proper name that is a real name (true/false/null)",
        "proper_noun_alias": "whether or not the form is a proper name that is an alias (true/false/null)",
        "demontrative": "whether or not the form is a proper name that is an alias (true/false/null)",
        "locative_pronoun": "whether or not the form is a proper name that is an alias (true/false/null)",
        "numeral": "whether or not the form is a numeral (true/false/null)",
        "classifier": "whether or not the form is a numeral classifier (true/false/null)",
        "quantifier": "whether or not the form is a quantifier (true/false/null)",
        "anaphor": "whether or not the form is an anaphor (i.e. reflexive or reciprocal)",
        "personal_pronoun": "whether or not the form is a personal pronoun (true/false/null)",
        "bound_morpheme": "whether or not the form is a bound morpheme (true/false/null)",
        "endearment": "whether or not the form is an expression conveying endearment (true/false/null)",
        "other": "whether or not the form belongs to none of the categories above (true/false/null)",
        "notes": "notes on the formal features of the form"
    },
    "meaning": {
        "gender": "gender",
        "marital_status": "marital status",
        "honour": "presence and kind of honour",
        "age": "age",
        "social_status": "social status",
        "role": "role",
        "group": "group",
        "formality": "formality",
        "intimacy": "intimacy",
        "number": "number"
    },
    "meaning_note": "notes on the meaning features",
    "speaker": "restriction on who uses the form",
    "memo": "notes for the item as a whole",
    "creator": "creator of the item",
    "createdAt": "date and time of item creation",
    "editor": "editor of the item",
    "updatedAt": "date and time of the last update"
}

The values of the meaning features are as in the table below. The absence of a feature means that the relevant feature is unspecified (or the creator did not provide the information).

FeatureValues
Gendermale, female
Marital_statusmarried, unmarried
Honourhon, anti-hon
Ageelder, elder.grandparent, elder.parents_elder_sibling, elder.parent, elder.parents_younger_sibling, elder.sibling, younger, younger.sibling, younger.child, mature, mature.old, mature.middle, youth
Social_statushigher, equal_or_higher, equal, equal_or_lower, lower]
Roleteacher, teacher.school, teacher.university, teacher.nonK12, teacher.other, student, student.school, student.university, student.nonK12, student.other, grandparent, grandparent.paternal, grandparent.maternal, parent, child, sibling, parents_sibling, parents_sibling.paternal, parents_sibling.maternal, spouse, titled, titled.head, titled.head.territory, titled.head.territory.central, titled.head.territory.local, titled.head.organization, titled.head.organization.ministry, titled.head.organization.ministry, titled.head.organization.company, titled.head.organization.education, titled.head.organization.other, titled.conferred, titled.other, non_titled, friend, partner, partner.married, partner.unmarried, mate, mate.senior, mate.junior, boss, subordinate, server, server.clerk, server.doctor, server.nurse, server.police, server.driver, server.other, customer, god, leader, clergy, clergy.Buddhism, clergy.Christianity, clergy.other, follower, follower.quasi_clergy, follower.other, royal, royal.king, royal.queen, royal.other, commoner
Groupforeigner, local, local.indigenous, local.immigrant, local.immigrant.Chinese, local.immigrant.Indian
Formalityformal, informal
Intimacyclose, remote
Numbersg, pl, pl.incl, pl.excl

Codes and tags

Language codes

The ISO 639-3 codes are used.

Functions

Judgements