Home

Awesome

RAG4RE

Python  3.10.9 PWC PWC PWC PWC

The repository consists of the source codes of "Retrieval-Augmented Generation-based Relation Extraction" journal paper which has been submitted to Semantic Web Journal (SWJ).

Note: This project's paper is still under review at the SWJ!

To cite its preprint:

@misc{efeoglu2024retrievalaugmented,
      title={Retrieval-Augmented Generation-based Relation Extraction}, 
      author={Sefika Efeoglu and Adrian Paschke},
      year={2024},
      eprint={2404.13397},
      archivePrefix={arXiv}
}

Please use the setting in this branch. There is no sampling on prediction of T5 results. Please use original TACRED datasets from the LDC

Hardware: NVIDIA GeForce GTX 1080 Ti (4GPUs X 12GB, cpu=300 GB).

Note that TACRED is licensed by the Linguistic Data Consortium (LDC), so we cannot directly publish the prompts or the raw results from the experiments conducted with Llama and Mistral, since the responses of these models consists of the prompts in their instruction parts. However, we have published the returned results when Llama and Mistral were integrated. Upon an official request, the data can be accessed on LDC, and the experiments can be easily replicated by following the instructions provided.

Project Folder Hierarchy

.
├── LICENSE
├── README.md
├── data                            ---> dataset, such as tacred, tacrev, re-tacred and semeval
├── results                         ---> results will be saved here.
└── src
    ├── config.ini                  ---> configuration for dataset, approach and llm and results.
    ├── data_preparation
    ├── main.py                     ---> the pipeline is started with this
    ├── retrieval                   ---> retrieval module
    │   ├── refinement.py
    │   └── retriever.py
    ├── data_augmentation           ---> regenerated the user query
    │   ├── embeddings
    │   └── prompt_generation
    ├── generation_module           ---> llm prompting.
    │   └── generation.py
    ├── evaluation                  ---> evaluate and visualize results. 
    │   ├── results_analysis.py
    │   └── vizualization.py
    └── utils.py                    

How to run

Change the paths and configs under config.ini for your experiment.

    pip install -r requirements.txt
    cd src/data_augmentation/embeddings
    python sentence_embeddings.py
    python sentence_sim.py
$ python src/main.py