Awesome

Complex Claim Verification with Evidence Retrieved in the Wild

Getting started

Clone the repository and install the requirements:

pip install -r requirements.txt

Download the data to the local directory via: https://drive.google.com/file/d/1YMpr5hnJqzrXcp3kBwpVsAK0uv9q_Xf3/view?usp=sharing

Data format

The data files are formatted as jsonlines. The description of each field is as follows:

Field	type	Description
`example_id`	string	Example ID
`claim`	string	Claim
`label`	string	Label: pants-fire, false, barely-true, half-true, mostly-true, true
`person`	string	Person who made the claim
`venue`	string	Date and venue of the claim
`url`	string	Politifact URL of the claim
`justification`	List[string]	Justification paragraph written by the fact-checkers
`qg-output`	List[string]	Sub-questions generated by claim decomposition
`search_results`	List[dict]	Bing search results without timestamp
`search_results_timestamp`	List[dict]	Bing search results with timestamp
`summary`	string	Summary generated by synthesizing the results from second-stage retrieval
`summarization_prompt`	string	Prompt used for generate claim-focused summary

Each search_results is a dictionary with the following fields:

search_results = {
    "entities_info": [
        {"name": name of the entity included in the search results,
         "decsription": description of the entity
        }
        ...
    ],
    'pages_info': [
        {"page_name": name of the page included in the search results,
         "page_url": url of the page,
         "page_timestamp": timestamp of the page,
         "page_snippet": snippet of the page
        }
        ...
    ]
}

Generate sub-questions using claim

To decompopse the claim into a set of sub-questions, you can run the following command:

python generate_subquestions.py --input_file ./data/dev-site-restricted.jsonl --output_file OUTPUT_FILE_PATH

Use Bing API to retrieve evidence

To use Bing API, you need to register a Bing API key from here. Then, you can run the following command to retrieve evidence using Bing:

python evidence_retrieval.py \
--input_url data/train.jsonl \
--output_file OUTPUT_FILE_PATH \
--use_time_stamp 1 \
--sites_constrain 1 \
--use_annotation 0 \
--use_claim 0 \
--question_num 10 \
--answer_count 10 \
--chunk_size 50 \
--time_offset 1

Check the argument description in evidence_retrieval.py for more details.

Second stage retrieval + summarization

To generate the claim-focused summary, you can run the following command:

pyhton python generate_summarization.py \
--input_path ./data/train-site-restricted.jsonl \
--corpus_path ./data/corpus/train.json \
--output_path OUTPUT_FILE_PATH \
--num 5 \
--window_size 30 \
--topk_units 10 \
--topk_docs 5 \
--stride 15 \
--use_claim 0 \
--use_annotation 0 \
--use_justification 0 \
--text_unit span \
--time_restricted 1 \
--time_window 1 \
--summ_type multi_summs \
--use_ChatGPT 0 \
--use_fewshot 0 \
--filter_politifact 0

Check the argument description in generate_summarization.py for more details.

Veracity prediction

To predict the veracity of the claim, you can run the following command:

bash run_veracity_classification.sh