Home

Awesome

CELER: A 365-Participant Corpus of Eye Movements in L1 and L2 English Reading

Eye movement recordings of 69 native English speakers and 296 English learners reading Wall Street Journal (WSJ) newswire sentences. Each participant reads 156 sentences: 78 sentences shared across participants and 78 unique to each participant.

Example

Table of Contents

  1. Obtaining the Eye-tracking Data
  2. Statistics
  3. Directory Structure
  4. Additional Documentation
  5. Citation
<a name="obtaining">

Obtaining the Eye-tracking Data

</a>

The eyetracking data is not made directly available due to licensing restictions of the Penn Treebank (PTB) and the BLLIP datasets from which the reading materials are drawn. In order to obtain the data with the underlying texts please follow these instructions (require Python 3).

  1. Obtain the PTB-WSJ and BLLIP corpora through LDC.
    • Copy the README file of the PTB-WSJ (starts with "This is the Penn Treebank Project: Release 2 ...") to the folder ptb_bllip_readmes/.
    • Copy the README.1st file of BLLIP (starts with "File: README.1st ...") to the folder ptb_bllip_readmes/.
  2. Run python obtain_data.py. This will download a zipped data_v2.0/ data folder. Extract to the top level of this directory.
<a name="statistics">

Statistics (v2.0)

</a>
ParticipantsSentencesWords
Native695,46061,272
ESL29623,166260,888
Total36528,548321,260
<a name="files">

Directory Structure

</a>

data_[version]/

SR DataViewer Interest Area and Fixation Reports, and syntactic annotations.

participant_metadata/

splits/

Trial and participant splits.

<a name="docs">

dataset_analyses.Rmd

Analyses for the paper "CELER: A 365-Participant Corpus of Eye Movements in L1 and L2 English Reading". Note that this script requires:

Documentation

</a> <a name="cite">

Citation

Paper: CELER: A 365-Participant Corpus of Eye Movements in L1 and L2 English Reading

@article{celer2022,
    author = {Berzak, Yevgeni and Nakamura, Chie and Smith, Amelia and Weng, Emily and Katz, Boris and Flynn, Suzanne and Levy, Roger},
    title = "{CELER: A 365-Participant Corpus of Eye Movements in L1 and L2 English Reading}",
    journal = {Open Mind},
    pages = {1-10},
    year = {2022},
    month = {04},
    issn = {2470-2986},
    doi = {10.1162/opmi_a_00054},
    url = {https://doi.org/10.1162/opmi\_a\_00054},
    eprint = {https://direct.mit.edu/opmi/article-pdf/doi/10.1162/opmi\_a\_00054/2012324/opmi\_a\_00054.pdf},
}

License

<a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" /></a><br />This work, with the exception of the underlying PTB-WSJ and BLLIP texts, is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>.

</a>