Home

Awesome

Data Generation

This project includes utilities and scripts for automatic dataset generation. It is used in the following papers:

Usage

To run a sample data generation script, navigate to the data_generation directory and run the following command:

python -m generation_projects.blimp.adjunct_island

If all dependencies are present in your workspace, this will generate the adjunct_island dataset in BLiMP. Generation will take a minute to begin, after which point the progress can be watched in outputs/benchmark/adjunct_island.jsonl.

Branches

Project Structure

Vocabulary

Utils

Citation

If you use the data generation project in your work, please cite the BLiMP paper:

@article{warstadt2019blimp,
  title={BLiMP: A Benchmark of Linguistic Minimal Pairs for English},
  author={Warstadt, Alex and Parrish, Alicia and Liu, Haokun and Mohananey, Anhad and Peng, Wei, and Wang, Sheng-Fu and Bowman, Samuel R},
  journal={arXiv preprint arXiv:1912.00582},
  year={2019}
}