Awesome

BERT-related Papers

This is a list of BERT-related papers. Any feedback is welcome.

(ChatGPT-related papers are listed at https://github.com/tomohideshibata/ChatGPT-related-papers.)

Table of Contents

Survey paper
Downstream task
Generation
Quality evaluator
Modification (multi-task, masking strategy, etc.)
Sentence embedding
Transformer variants
Probe
Inside BERT
Multi-lingual
Other than English models
Domain specific
Multi-modal
Model compression
Large language model
Reinforcement learning from human feedback
Misc.

Survey paper

Downstream task

QA, MC, Dialogue

Slot filling and Intent Detection

A Stack-Propagation Framework with Token-Level Intent Detection for Spoken Language Understanding (EMNLP2019)
BERT for Joint Intent Classification and Slot Filling
A Co-Interactive Transformer for Joint Slot Filling and Intent Detection (ICASSP2021)
Few-shot Intent Classification and Slot Filling with Retrieved Examples (NAACL2021)
Multi-lingual Intent Detection and Slot Filling in a Joint BERT-based Model
A Comparison of Deep Learning Methods for Language Understanding (Interspeech2019)
Data Augmentation for Spoken Language Understanding via Pretrained Models
[Few-Shot Intent Detection via Contrastive Pre-Training and Fine-Tuning] (EMNLP2021)
STIL -- Simultaneous Slot Filling, Translation, Intent Classification, and Language Identification: Initial Results using mBART on MultiATIS++ (AACL-IJCNLP2020) [github]

Analysis

Word segmentation, parsing, NER

Pronoun/coreference resolution

Word sense disambiguation

Sentiment analysis

Relation extraction

Knowledge base

Text classification

WSC, WNLI, NLI

Commonsense

Extractive summarization

Grammatical error correction

Multi-headed Architecture Based on BERT for Grammatical Errors Correction (ACL2019 WS)
Towards Minimal Supervision BERT-based Grammar Error Correction
Learning to combine Grammatical Error Corrections (EMNLP2019 WS)
LM-Critic: Language Models for Unsupervised Grammatical Error Correction (EMNLP2021) [github]
Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction (ACL2020)
Chinese Grammatical Correction Using BERT-based Pre-trained Model (AACL-IJCNLP2020)
Spelling Error Correction with Soft-Masked BERT (ACL2020)

IR

Generation

Quality evaluator

Modification (multi-task, masking strategy, etc.)

Tokenization

Training Multilingual Pre-trained Language Model with Byte-level Subwords
Byte Pair Encoding is Suboptimal for Language Model Pretraining (EMNLP2020 Findings)
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation (TACL2022) [github]
ByT5: Towards a token-free future with pre-trained byte-to-byte models (TACL2022) [github]
Multi-view Subword Regularization (NAACL2021)
Bridging Subword Gaps in Pretrain-Finetune Paradigm for Natural Language Generation (ACL2021)
An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks (AACL-IJCNLP2020)
AMBERT: A Pre-trained Language Model with Multi-Grained Tokenization
LICHEE: Improving Language Model Pre-training with Multi-grained Tokenization (ACL2021 Findings)
Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese Pre-trained Language Models (NAACL2021)
CharBERT: Character-aware Pre-trained Language Model (COLING2020) [github]
CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters (COLING2020)
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization [github]
Fast WordPiece Tokenization (EMNLP2021)
MaxMatch-Dropout: Subword Regularization for WordPiece (COLING2022)

Prompt

Sentence embedding

Transformer variants

Probe

Inside BERT

Multi-lingual

Other than English models

Domain specific

Multi-modal

Model compression

Large language model

Reinforcement learning from human feedback

Misc.