Home

Awesome

FLD Corpus

This repository includes the released FLD corpora.

See the entry-point repository about the whole FLD project.

Available Corpora

Note that these corpora are version 2.0, which is detailed in the Appendix.H of our paper.

How to use the corpora

First, install the datasets library:

pip install datasets

Then, you can load the FLD corpora as follows:

from datasets import load_dataset
FLD = load_dataset('hitachi-nlp/FLD.v2', name='default')
FLD_star = load_dataset('hitachi-nlp/FLD.v2', name='star')

What does the dataset example look like?

Concept

An example of deduction example in our dataset is conceptually illustrated in the figure below:

deduction_example

That is, given a set of facts and a hypothesis, a model must generate a proof sequence and determine an answer marker (proved, disproved, or unknown).

Schema

The actual schema can be viewed on the huggingface hub. The most important fields are:

Additionally, we have preprocessed fields as follows:

To train or evaluate a Language Model (LM), one can take one of two approaches:

Further, we have "logical formula" versions of the fields, such as prompt_serial_formula, which can be used to evaluate LLMs' pure logical reasoning capabilities within the domain of logical formulas, rather than natural language.