Awesome

BECEL: BEnchmark for Consistency Evaluation of Language Models

This is the repository of the paper: "BECEL: Benchmark for Consistency Evaluation of Language Models (TBD)

Directory Description

data: BECEL datasets for 7 downstream tasks. Please refer README.md files in each data repository for more information.
- ag_news: includes additive and semantic consistency datasets.
- boolq: includes semantic and negational consistency datasets.
- mrpc: includes semantic, negational, and symmetric consistency datasets.
- rte: includes semantic, negational, and symmetric consistency datasets.
- snli: includes semantic, negational, symmetric, and transitive consistency datasets.
- sst2: includes additive and semantic consistency datasets are provided.
- wic: includes semantic, negational, symmetric, and transitive consistency datasets.
src: Scripts for evaluation metrics and examples.

Citation

@inproceedings{jang-etal-2022-becel,
    title = "{BECEL}: Benchmark for Consistency Evaluation of Language Models",
    author = "Jang, Myeongjun  and
      Kwon, Deuk Sin  and
      Lukasiewicz, Thomas",
    booktitle = "Proceedings of the 29th International Conference on Computational Linguistics",
    month = oct,
    year = "2022",
    address = "Gyeongju, Republic of Korea",
    publisher = "International Committee on Computational Linguistics",
    url = "https://aclanthology.org/2022.coling-1.324",
    pages = "3680--3696",
}