Home

Awesome

BERT for Keyphrase Extraction (PyTorch)

This repository provides the code of the paper Capturing Global Informativeness in Open Domain Keyphrase Extraction.

In this paper, we conduct an empirical study of <u>5 keyphrase extraction models</u> with <u>3 BERT variants</u>, and then propose a multi-task model BERT-JointKPE. Experiments on two KPE benchmarks, OpenKP with Bing web pages and KP20K demonstrate JointKPE’s state-of-the-art and robust effectiveness. Our further analyses also show that JointKPE has advantages in predicting <u>long keyphrases</u> and <u>non-entity keyphrases</u>, which were challenging for previous KPE techniques.

Please cite our paper if our experimental results, analysis conclusions or the code are helpful to you ~ 😊

@article{sun2020joint,
    title={Joint Keyphrase Chunking and Salience Ranking with BERT},
    author={Si Sun, Zhenghao Liu, Chenyan Xiong, Zhiyuan Liu and Jie Bao},
    year={2020},
    eprint={2004.13639},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

CONTACT

For any question, feel free to create an issue, and we will try our best to solve.
If the problem is more urgent, you can send an email to me at the same time (I check email almost everyday 😉).

NAME: Si Sun
EMAIL: s-sun17@mails.tsinghua.edu.cn

🤠 What's New ?

Spported Model Classes

IndexModelDescriptions
1BERT-JointKPE (Bert2Joint)A <u>multi-task</u> model is trained jointly on the chunking task and the ranking task, balancing the estimation of keyphrase quality and salience.
2BERT-RankKPE (Bert2Rank)Learn the salience phrases in the documents using a <u>ranking</u> network.
3BERT-ChunkKPE (Bert2Chunk)Classify high quality keyphrases using a <u>chunking</u> network.
4BERT-TagKPE (Bert2Tag)We modified the <u>sequence tagging</u> model to generate enough candidate keyphrases for a document.
5BERT-SpanKPE (Bert2Span)We modified the <u>span extraction</u> model to extract multiple keyphrases from a document.
6DistilBERT-JointKPE (DistilBert2Joint)A <u>multi-task</u> model is trained jointly on the chunking task and the ranking task, balancing the estimation of keyphrase quality and salience.

BERT Variants Tested

Requirements

python 3.8
pytorch 1.9.0
pip install -r pip-requirements.txt

QUICKSTART

1/ Download

2/ Preprocess

3/ Train Models

4/ Inference

5/ Re-produce evaluation results using our checkpoints

* RESULTS

The following results are ranked by F1@3 on OpenKP Dev dataset, the eval results can be seen in the OpenKP Leaderboard.

* BERT (Base)

RankMethodF1 @1,@3,@5Precision @1,@3,@5Recall @1,@3,@5
1Bert2Joint0.371, 0.384, 0.3260.504, 0.313, 0.2270.315, 0.555, 0.657
2Bert2Rank0.369, 0.381, 0.3250.502, 0.311, 0.2270.315, 0.551, 0.655
3Bert2Tag0.370, 0.374, 0.3180.502, 0.305, 0.2220.315, 0.541, 0.642
4Bert2Chunk0.370, 0.370, 0.3110.504, 0.302, 0.2170.314, 0.533, 0.627
5Bert2Span0.341, 0.340, 0.2930.466, 0.277, 0.2030.289, 0.492, 0.593

* SpanBERT (Base)

RankMethodF1 @1,@3,@5Precision @1,@3,@5Recall @1,@3,@5
1Bert2Joint0.388, 0.393, 0.3330.527, 0.321, 0.2320.331, 0.567, 0.671
2Bert2Rank0.385, 0.390, 0.3320.521, 0.319, 0.2320.328, 0.564, 0.666
3Bert2Tag0.384, 0.385, 0.3270.520, 0.315, 0.2280.327, 0.555, 0.657
4Bert2Chunk0.378, 0.385, 0.3260.514, 0.314, 0.2280.322, 0.555, 0.656
5Bert2Span0.347, 0.359, 0.3040.477, 0.294, 0.2120.293, 0.518, 0.613

* RoBERTa (Base)

RankMethodF1 @1,@3,@5Precision @1,@3,@5Recall @1,@3,@5
1Bert2Joint0.391, 0.398, 0.3380.532, 0.325, 0.2350.334, 0.577, 0.681
2Bert2Rank0.388, 0.395, 0.3350.526, 0.322, 0.2330.330, 0.570, 0.677
3Bert2Tag0.387, 0.389, 0.3300.525, 0.318, 0.2300.329, 0.562, 0.666
4Bert2Chunk0.380, 0.382, 0.3270.518, 0.312, 0.2280.324, 0.551, 0.660
5Bert2Span0.358, 0.355, 0.3060.487, 0.289, 0.2130.304, 0.513, 0.619

MODEL OVERVIEW

* BERT-JointKPE, RankKPE, ChunkKPE (See Paper)

* BERT-TagKPE (See Code)

* BERT-SpanKPE (See Code)