Home

Awesome

Character Mining

The Character Mining project challenges machine comprehension on multiparty dialogue. The objective of this project is to infer explicit and implicit contexts about individual characters through their conversations. This is an open-source project led by the Emory NLP research group that provides resources for the following tasks:

We welcome feedbacks and contributions from the community. Most of our annotation are crowdsourced; implying that, errors are expected to be found. Please make pull requests if you wish to fix errors in our datasets.

Dataset

Our dataset is based on the popular TV show called Friends. Transcripts for all 10 seasons of the show as well as manual and crowdsourced annotation for subparts of the show are provided. All text data are available in the JSON files; please visit the individual task pages to retrieve datasets specifically designed for those tasks.

Statistics

Each season consists of episodes, each episode is divided into scenes, each scene comprises utterances, each utterance is a list of sentences where tokens are split.

Season IDEpisodesScenesUtterancesSentencesTokensSpeakers
s01243265,96810,79081,453107
s02242935,7479,33781,910107
s03253486,49510,85890,753108
s04243386,31810,88987,289100
s05243116,22011,13383,907107
s06253506,45811,49690,384112
s07243326,31411,34084,97494
s08242886,22011,71486,164107
s09243026,32211,83193,77399
s10182195,2479,34569,49378
Total2363,10761,309108,733850,100700

Some utterances include action notes. In the following example, extracted from s01_e01_c01_u028, the speaker is talking to Ross, which is indicated by the action note:

"transcript": "Let me get you some coffee.",
"transcript_with_note": "(to Ross) Let me get you some coffee.",

The followings show the statistics including action notes:

Season IDUtterancesSentencesTokens
s016,62612,088100,773
s026,04810,56597,763
s037,26712,288117,912
s047,11912,811116,703
s057,08213,540118,509
s067,23513,506120,471
s077,01913,363116,341
s086,84513,321109,984
s096,65313,548119,090
s105,47911,02993,390
Total67,373126,0591,110,936

Documentations

References

Contact