Home

Awesome

Self-Improving for Zero-Shot Named Entity Recognition with Large Language Models

This is the github repository for the paper to be appeared at NAACL 2024 main conference: Self-Improving for Zero-Shot Named Entity Recognition with Large Language Models.

Introduction

A self-improving framework for zero-shot named entity recognition (NER) with large language models (LLMs), which utilizes unlabeled data to stimulate the self-learning ability of LLMs via a training-free process.

This work pushes the performance boundary of zero-shot NER with LLMs by proposing a training-free self-improving framework, which utilizes an unlabeled corpus to stimulate the self-learning ability of LLMs. The pipeline of the proposed framework is:

Please find more details of this work in our paper.

Directory

Usage

Requirements

We run our code on Windows. The following are the dependencies: python 3.8, openai 0.27.4, pytorch 2.0.1, pandas, hanlp

Datasets

We provide processed WikiGold dataset in this repository for quick start, along with its generated prompts, self-annotation results and inference results.

In data folder of wikigold:

We also provide other processed datasets used in our paper at the Google Drive, except ACE05 and Ontonotes 4 for copyright reasons. You can download and unzip the dataset files and put them in the data folder.

Generate embeddings

Run the below command to generate embeddings with OpenAI API.

python code/standard/GenerateEmbsGPT.py --dataname wikigold --datamode test --emb_model text-embedding-ada-002 --emb_encoding cl100k_base

Run

We provide shell scripts in folder scripts for quick start.The generated prompts will be saved to folder prompts. The response from the LLM and the evaluation results will be saved to folder result.

Run the following commands in the provided order to use our methods. Before you run on ChatGPT, please set your OpenAI API Keys in my_openai_api_keys in const.py.

# --- No-demos ---
sh scripts/wikigold_no_demos.sh

# --- Self-improving ---
# Self-annotating and two-stage majority voting
sh scripts/wikigold_1_self_annotate_TSMV.sh
# Entity-level threshold filtering
sh scripts/wikigold_2_entity_level_sel.sh
# Sample-level threshold filtering
sh scripts/wikigold_2_sample_level_sel.sh
# Inference with Diverse nearest with SC ranking
sh scripts/wikigold_3_test_inference.sh

Arguments

Self-consistency (SC)

Reliable annotation selection

Inference

Citation

@misc{xie2023selfimproving,
      title={Self-Improving for Zero-Shot Named Entity Recognition with Large Language Models}, 
      author={Tingyu Xie and Qi Li and Yan Zhang and Zuozhu Liu and Hongwei Wang},
      year={2023},
      eprint={2311.08921},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}