Home

Awesome

Impact of Sample Selection on In-Context Learning for Entity Extraction from Scientific Writing

This repository provides the system used in our work for in-context learning (ICL) sample selection methods for scientific entity extraction task.

Installation

Install the required libraries

pip install easyinstruct -i https://pypi.org/simple
pip install --upgrade openai
pip install transformers
pip install datasets

Datasets

We use five scientific entity extraction datasets.

Results

Main Results

MethodADEMeasEvalSciERCSTEM-ECRWLPC
Baseline Models
RoBERTa90.4256.6868.5269.7028.36
Zero-shot71.2919.6517.8628.8931.64
Random74.5622.4929.2726.8532.20
In-context sample selecting methods
KATE83.1122.7529.9730.7845.02
Perplexity79.1321.4331.3126.5730.46
BM2577.2824.7235.9625.6144.14
Influence86.3527.1336.4727.8145.41

Low-Resource Scenario

MethodADEMeasEvalSciERCSTEM-ECRWLPC
RoBERTa full90.4256.6868.5269.7028.36
Baseline Models
RoBERTa %114.3219.2010.1615.4210.37
Zero-shot71.2919.6517.8628.8931.64
Random %166.5321.3225.3121.3828.46
In-context sample selecting methods
KATE %169.0624.4826.7826.4928.97
Perplexity68.8322.2326.4225.4826.05
BM25 %172.6623.3931.3324.2436.73
Influence %173.6824.2132.4925.0134.24

Running Experiments

Sample Selection

python icl_sample.py \
    --data \
    --metric \
    --embed \
    --model \
    --trained \
    --reversed \
    --train_file \
    --test_file

Evaluation

python icl_evaluate.py \
    --data --metric \
    --icl_file_name \
    --model \
    --train_file \ 
    --test_file