Home

Awesome

AGRR-2019

Results

binarygap resolutionfull
Teamprecisionrecallf-measuref-measuref-measure
fit_predict0.96851574210.950.95916852260.90059416230.8920508622
EXO0.89903181190.96439169140.93056549750.81486910340.7860819509
Koziev Ilya0.77427490540.90294117650.83367277660.67721926850.6465203883
Derise0.80104031210.90588235290.85024154590.66486732090.6217855431
Meanotek0.89093959730.78088235290.83228840130.63538793230.5144021953
МГУ-DeepPavlov0.9339019190.64411764710.76240208880.60068565530.5867789431
Vlad0.77805800760.91543026710.84117246080.5739121103
MorphoBabushka0.76268115940.61911764710.68344155840.46580280360.4404665955
nsu-ai 0.4853801170.12314540060.19644970410.037316105850.03649377219

Results we obtained after test data publication:

submitbinarygap resolutionfull
Teamprecisionrecallf-measuref-measuref-measure
EXO0.94558823530.94558823530.94558823530.85905946590.8364099229
1МГУ-DeepPavlov0.89816124470.93382352940.9156452776
2МГУ-DeepPavlov0.97339246120.64558823530.77630415560.61679311290.599489203
3МГУ-DeepPavlov0.96993987980.71176470590.82103477520.6581115120.6529509963
Meanotek0.81481481480.93823529410.87218045110.72722461720.6878476803

SynTagRus gapping test set

The link to the test set obtained from Syntagrus.

Test data released

We are happy to announce that test data (test.csv) has been released and uploaded to this repo.

Test data format description

The test data comprises sentences from different genres: news, fiction, social media, technical texts and other sources. The format of the test data is as follows:

Please make sure to keep this data format in your submissions while filling the empty columns with class labels and span symbol offsets (in case your system does participate in tasks predicting annotations).

One row as it appears in test data uploaded to github:

Аналогичным образом, среднегодовой прирост ВВП на душу населения, который в странах, расположенных к югу от Сахары, составлял в период с 1965 по 1973 год 3 процента, упал с 1980 до 1986 года на 2,8 процента, в 1987 году - на 4,4 процента и в 1989 году - на 0,5 процента.\t\t\t\t\t\t\t\n

One row as it is supposed to look like in your submission:

Аналогичным образом, среднегодовой прирост ВВП на душу населения, который в странах, расположенных к югу от Сахары, составлял в период с 1965 по 1973 год 3 процента, упал с 1980 до 1986 года на 2,8 процента, в 1987 году - на 4,4 процента и в 1989 году - на 0,5 процента.\t1\t166:170\t171:190\t191:206\t222:222 254:254\t208:219 240:251\t222:237 254:269\n

Columns containing spans can be skipped in case your system does not participate in the tasks predicting annotations.

Submission process description

All contest rules announced previously remain unchanged.

The test data submission deadline is 18:00 February 23rd (GMT+3) (this Saturday).

Please send your team’s submission to dialogueeval2019@gmail.com. Please ensure that your email contains your team’s name and information concerning the tasks (binary presence-absence classification, gap resolution and/or full annotation) and tracks (open track or closed track) you wish to participate in.

Dates and links

Date
Registration dueJan 25th 2019
Release of the Training DataJan 26th 2019
Release of the Test DataFeb 20th 2019
Systems submissions due18:00 February 23rd (GMT+3)
Final results from organizersMar 5th 2019

AGRR: Automatic Gapping Resolution for Russian

Gapping is the most common type of ellipsis, concerning such examples as

Motivation

The aim of this task is to challenge non-trivial linguistic phenomenon, gapping, that occurs in coordinated structures and elides a repeated predicate, typically from the second clause. Besides the adversity of the construction itself, the phenomenon is naturally rare, which results in lack of training data. During the last two years Gapping has received considerable attention ( S Schuster, M Lamm, CD Manning 2017; K Droganova, D Zeman 2017; K Droganova et al 2018; S Schuster, J Nivre, CD Manning 2018; Nivre et al 2018). Unfortunately, research was mainly held on insufficient data not exceeding several hundreds of sentences so far. This campaign is a pilot event for gapping resolution task for Russian held for the first time.

Examples (data)

Participants will be provided with a corpus of several thousands of examples coming from texts of different genres, such as news, fiction, and science. Each sentence will be annotated as follows: two remnants R1 and R2, their correlates in the antecedent clause cR1 and cR2, the position of the elided predicate V and the head of the correspondent predicate cV.

Task Description

1. Binary presence-absence classification
For every sentence decide if there is a gapping construction in it.

2. Gap resolution
Predict the position of the elided predicate and the correspondent predicate in the antecedent clause.

3. Full annotation
In the clause with the gap predict the linear position of the elided predicate and annotate its remnants. In the antecedent clause find the constituents that correspond the remnants and the predicate that corresponds the gap.

Data formats and metrics

Input data consists of sentences without any additional markup (raw texts).For each sentence output should contain 7 columns. First column should have 0 or 1 in it, depending on presence of gapping construction in the sentence. Other output cells separated with tab symbol correspond gapping element names (cV, cR1, cR2, V, R1, R2) and should contain char offsets (first symbol in each sentence has offset 0 1) for annotation borders (two numbers separated by colon (:) symbol) for each gapping element. If the provided sentence lacks certain gapping element, the corresponding cell should not contain any symbols. Here is the example

Input

Аналогичным образом, среднегодовой прирост ВВП на душу населения, который в странах, расположенных к югу от Сахары, составлял в период с 1965 по 1973 год 3 процента, упал с 1980 до 1986 года на 2,8 процента, в 1987 году - на 4,4 процента и в 1989 году - на 0,5 процента.

Output

classcVcR1cR2VR1R2
1166:170171:190191:206222:222 254:254208:219 240:251222:237 254:269

Such output corresponds to the following markup:

For the binary presence-absence classification for each sentence all the output cells except the first one are ignored. For gap resolution task cells in columns cR1, cR2, R1, R2 are ignored. For the full annotation task all output cells are evaluated.

The main metric for binary classification task would be standard f-measure. Gapping element annotations would be measured by symbol-wise f-measure. E. g. if the gold standard offset for certain gapping element is 10:15 and the prediction is 8:14, we have 4 true positive chars, 1 false negative char and 2 false positive chars and the resulting f-measure equals 0.727.

AGRR tracks

The following tracks are offered to participants:

1. Closed track – open source track
convenient for research groups and student teams
Participants are allowed to train their models only on open-access data (open source dictionaries, word embeddings, open parsing systems, etc). To verify the results, participants should place their code and the model on github, so that it would be publicly available - both for organizers and other teams.

2. Open track - no restriction on data and systems used
recommended for industrial participants, representing their products
Track participants are allowed to bring any data for learning beyond the data provided and use their own commercial programs. Github sharing is not required.

Participants are welcome to submit their models to both of the tracks under specified constraints.