Home

Awesome

The full pipeline of creating UHGEval hallucination dataset

1. Collect the raw news

2. Preprocess the raw news

3. Generate candidates

4. Automatic labelling

5. Use Label Studio to enable human rechecking

Label Studio is a multi-type data labeling and annotation tool with standardized output format.

Relevant files can be found in ./label_studio_annotations/.

5.1 Prepare Label Studio Pre-annotations

5.2 Setup labeling configuration and begin human rechecking

5.3 Export Label Studio JSON annotations

6. Get final hallucination dataset