Awesome
Awesome-Weak-Supervision
<p align="center"> <img width="250" src="https://camo.githubusercontent.com/1131548cf666e1150ebd2a52f44776d539f06324/68747470733a2f2f63646e2e7261776769742e636f6d2f73696e647265736f726875732f617765736f6d652f6d61737465722f6d656469612f6c6f676f2e737667" "Awesome!"> </p>- A curated list of programmatic/rule-based weak supervision papers and resources.
- A bib file for most of the collected papers
Blogs
An Overview of Weak Supervision
Building NLP Classifiers Cheaply With Transfer Learning and Weak Supervision
Videos
Theory & Systems for Weak Supervision | Chinese Version
Lecture Notes
Lecture Notes on Weak Supervision
Workshops
Survey
A Survey on Programmatic Weak Supervision. Jieyu Zhang
Dataset and Benchmark
WRENCH: A Comprehensive Benchmark for Weak Supervision. Jieyu Zhang NeurIPS 2021
- codebase (for both classification and sequence tagging tasks)
WALNUT: A Benchmark on Semi-weakly Supervised Learning for Natural Language Understanding. Guoqing Zheng NAACL 2022
AutoWS-Bench-101: Benchmarking Automated Weak Supervision with 100 Labels. Nicholas Roberts NeurIPS 2022
SPEAR : Semi-supervised Data Programming in Python. Ayush Maheshwari EMNLP 2022
Algorithm
Data Programming: Creating Large Training Sets, Quickly. Alex Ratner NeurIPS 2016
Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data. Paroma Varma FILM-NeurIPS 2016
Training Complex Models with Multi-Task Weak Supervision. Alex Ratner AAAI 2019
Data Programming using Continuous and Quality-Guided Labeling Functions. Oishik Chatterjee AAAI 2020
Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods. Dan Fu ICML 2020
KnowMAN: Weakly Supervised Multinomial Adversarial Network. Luisa März EMNLP 2021
End-to-End Weak Supervision. Salva Rühling Cachay NeurIPS 2021
Creating Training Sets via Weak Indirect Supervision. Jieyu Zhang ICLR 2022
Universalizing Weak Supervision. Changho Shin ICLR 2022
Learning from Multiple Noisy Partial Labelers. Peilin Yu AISTATS 2022
Firebolt: Weak Supervision Under Weaker Assumptions. Zhaobin Kuang AISTATS 2022
Learning the Structure of Generative Models without Labeled Data. Stephen H. Bach ICML 2017
Inferring Generative Model Structure with Static Analysis. Paroma Varma NeurIPS 2017
Learning Dependency Structures for Weak Supervision Models. Paroma Varma ICML 2019
Dependency Structure Misspecification in Multi-Source Weak Supervision Models. Salva Ruhling Cachay ICLR-WeaSul 2019
Pairwise Feedback for Data Programming. Benedikt Boecking NeurIPS 2019 workshop on Learning with Rich Experience: Integration of Learning Paradigms
Shoring Up the Foundations: Fusing Model Embeddings and Weak Supervision. Mayee F. Chen UAI 2022
Binary Classification with Positive Labeling Sources. Jieyu Zhang CIKM 2022
Understanding Programmatic Weak Supervision via Source-aware Influence Function. Jieyu Zhang NeurIPS 2022
Training Subset Selection for Weak Supervision. Hunter Lang NeurIPS 2022
Leveraging Instance Features for Label Aggregation in Programmatic Weak Supervision. Jieyu Zhang and Linxin Song AISTATS 2023
Learning Hyper Label Model for Programmatic Weak Supervision. Renzhi Wu ICLR 2023
System
Snorkel: Rapid Training Data Creation with Weak Supervision. Alex Ratner VLDB 2018
Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale. Stephen H. Bach SIGMOD (Industrial) 2019
Migrating a Privacy-Safe Information Extraction System to a Software 2.0 Design. Ying Sheng CIDR 2020
Overton: A Data System for Monitoring and Improving Machine-Learned Products. Christopher Ré CIDR 2020
Ruler: Data Programming by Demonstration for Document Labeling. Sara Evensen EMNLP 2020 Findings
skweak: Weak Supervision Made Easy for NLP. Pierre Lison 2021
TagRuler: Interactive Tool for Span-Level Data Programming by Demonstration. Dongjin Choi WWW 2021
Demonstration of Panda: A Weakly Supervised Entity Matching System. Renzhi Wu VLDB Demo 2021
Asterisk: Generating Large Training Datasets with Automatic Active Supervision. Mona Nashaat ACM/IMS Transactions on Data Science 2020
Inspector Gadget: A Data Programming-based Labeling System for Industrial Images. Geon Heo VLDB 2021
Weak Supervision with Labeled Data
Learning from Rules Generalizing Labeled Exemplars. Abhijeet Awasthi ICLR 2020
Self-Training with Weak Supervision. Giannis Karamanolakis NAACL 2021
Semi-Supervised Aggregation of Dependent Weak Supervision Sources with Performance Guarantees. Alessio Mazzetto AISTATS 2021
Adversarial Multiclass Learning under Weak Supervision with Performance Guarantees. Alessio Mazzetto ICML 2021
Semi-Supervised Data Programming with Subset Selection. Ayush Maheshwari ACL 2021
Active WeaSuL: Improving Weak Supervision with Active Learning. Samantha Biegel ICLR WeaSuL 2021
DP-SSL: Towards Robust Semi-supervised Learning with A Few Labeled Samples. Yi Xu NeuIPS 2021
Learning to Robustly Aggregate Labeling Functions for Semi-supervised Data Programming. Ayush Maheshwari ACL 2022 Findings
Weak Supervision Sources Generation
Snuba: Automating Weak Supervision to Label Training Data. Paroma Varma VLDB 2019
Interactive Programmatic Labeling for Weak Supervision. Benjamin Cohen-Wang KDD Workshop 2019
Interactive Weak Supervision: Learning Useful Heuristics for Data Labeling. Benedikt Boecking ICLR 2021
Adaptive Rule Discovery for Labeling Text Data. Sainyam Galhotra VLDB 2019
Weakly Supervised Named Entity Tagging with Learnable Logical Rules Jiacheng Li ACL 2021
GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition Xinyan Zhao EACL 2021
Classifying Unstructured Clinical Notes via Automatic Weak Supervision. Chufan Gao and Mononito Goswami MLHC 2022
Witan: Unsupervised Labelling Function Generation for Assisted Data Programming. Benjamin Denham VLDB 2022
Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data Programming. Cheng-Yu Hsieh VLDB 2023
Weak Supervision for Active Learning
Iterative Data Programming for Expanding Text Classification Corpora. Neil Mallinar AAAI/IAAI 20 Technical Tracks
Hybridization of Active Learning and Data Programming for Labeling Large Industrial Datasets. Mona Nashaat Big Data 2018
Application
CV
Scene Graph Prediction with Limited Labels. Vincent Chen ICCV 2019
Multi-Resolution Weak Supervision for Sequential Data. Paroma Varma NeurIPS 2019
Rekall: Specifying Video Events using Compositions of Spatiotemporal Labels. Daniel Y. Fu SOSP 2019
GOGGLES: Automatic Image Labeling with Affinity Coding. Nilaksh Das SIGMOD 2020
Cut out the annotator, keep the cutout: better segmentation with weak supervision. Sarah Hooper ICLR 2021
Task Programming: Learning Data Efficient Behavior Representations. Jennifer J. Sun CVPR 2021
NLP
Heterogeneous Supervision for Relation Extraction: A Representation Learning Approach. Liyuan Liu EMNLP 2017
Training Classifiers with Natural Language Explanations. Braden Hancock ACL 2018
Deep Text Mining of Instagram Data without Strong Supervision. Kim Hammar ICWI 2018
Bootstrapping Conversational Agents With Weak Supervision. Neil Mallinar AAAI 2019
Weakly Supervised Sequence Tagging from Noisy Rules. Esteban Safranchik AAAI 2020
NERO: A Neural Rule Grounding Framework for Label-Efficient Relation Extraction. Wenxuan Zhou WWW 2020
Named Entity Recognition without Labelled Data: A Weak Supervision Approach. Pierre Lison ACL 2020
Leveraging Multi-Source Weak Social Supervision for Early Detection of Fake News. Kai Shu CML-PKDD 2020
Learning with Weak Supervision for Email Intent Detection. Kai Shu SIGIR 2020
Understanding the Dynamics between Vaping and Cannabis Legalization Using Twitter Opinions. Shishir Adhikari AAAI-ICWSM 2021
Denoising Multi-Source Weak Supervision for Neural Text Classification. Wendi Ren EMNLP 2020 Findings
Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self-Training Approach. Yue Yu NAACL 2021
Heterogeneous Graph Neural Networks for Concept Prerequisite Relation Learning in Educational Data. Chenghao Jia NAACL 2021
Goodwill Hunting: Analyzing and Repurposing Off-the-Shelf Named Entity Linking Systems. Karan Goel NAACL 2021
Bootstrapping a Music Voice Assistant with Weak Supervision. Sergio Oramas NAACL 2021 Industry
BERTifying Hidden Markov Models for Multi-Source Weakly Supervised Named Entity Recognition Yinghao Li ACL 2021
HERALD: An Annotation Efficient Method to Train User Engagement Predictors in Dialogs Weixin Liang ACL 2021
Controllable Abstractive Dialogue Summarization with Sketch Supervision Chien-Sheng Wu ACL 2021 Findings
Named Entity Recognition through Deep Representation Learning and Weak Supervision Jerrod Parker ACL 2021 Findings
Weakly supervised discourse segmentation for multiparty oral conversations Lila Gravellier EMNLP 2021
Adaptive Ranking-based Data Selection for Weakly supervised Class-imbalanced Text Classification Linxin Song Findings of EMNLP 2022
RL
Generating Multi-Agent Trajectories using Programmatic Weak Supervision. Eric Zhan ICLR 2019
Software Engineering
Search4Code: Code Search Intent Classification Using Weak Supervision. Nikitha Rao MSR 2021
Others
Generating Training Labels for Cardiac Phase-Contrast MRI Images. Vincent Chen MED-NeurIPS 2017
Osprey: Weak Supervision of Imbalanced Extraction Problems without Code. Eran Bringer SIGMOD DEEM Workshop 2019
Weakly Supervised Classification of Rare Aortic Valve Malformations Using Unlabeled Cardiac MRI Sequences. Jason Fries Nature Communications 2019
Doubly Weak Supervision of Deep Learning Models for Head CT. Khaled Saab MICCAI 2019
A clinical text classification paradigm using weak supervision and deep representation. Yanshan Wang BMC MIDM 2019
A machine-compiled database of genome-wide association studies. Volodymyr Kuleshov Nature Communications 2019
Weak Supervision as an Efficient Approach for Automated Seizure Detection in Electroencephalography. Khaled Saab NPJ Digital Medicine 2020
Extracting Chemical Reactions From Text Using Snorkel. Emily Mallory BMC Bioinformatics 2020
Cross-Modal Data Programming Enables Rapid Medical Machine Learning. Jared A. Dunnmon Patterns 2020
SwellShark: A Generative Model for Biomedical Named Entity Recognition without Labeled Data. Jason Fries
Ontology-driven weak supervision for clinical entity classification in electronic health records. Jason Fries Nature Communications 2021
Utilizing Weak Supervision to Infer Complex Objects and Situations in Autonomous Driving Data. Zhenzhen Weng IV 2019
Multi-frame Weak Supervision to Label Wearable Sensor Data. Saelig Khattar ICML Time Series Workshop 2019
Applying Weak Supervision to Mobile Sensor Data: Experiences with Transport Mode Detection. Jonathan Furst AAAI Workshop on Artificial Intelligence of Things 2020
Exploring Inspiration Sets in a Data Programming Pipeline for Product Moderation. Justine Winkler ACL 2021 ECNLP 4
Detecting Hashtag Hijacking for Hashtag Activism. Pooneh Mousavi ACL 2021 NLP for Positive Impact
CHECKER: Detecting Clickbait Thumbnails with Weak Supervision and Co-Teaching. Tianyi Xie ECML-PKDD 2021
DeFraudNet: An End-to-End Weak Supervision Framework to Detect Fraud in Online Food Delivery. Jose Mathew ECML-PKDD 2021
Weak Supervision for Affordable Modeling of Electrocardiogram Data. Mononito Goswami. AMIA 2021 Annual Symposium
Fraud Detection under Multi-Sourced Extremely Noisy Annotations Chuang Zhang CIKM 2021
Multi-Source Domain Adaptation with Weak Supervision for Early Fake News Detection Yichuan Li BigData 2021
Weakly Supervised Classification of Vital Sign Alerts as Real or Artifact. Arnab Dey. AMIA 2022 Annual Symposium
Thesis
Acclerating Machine Learning with Training Data Management. Alex Ratner
Weak Supervision From High-Level Abstrations. Braden Jay Hancock
Other Weak Supervision Paradigm
Label-name Only Supervision
Weakly-Supervised Neural Text Classification. Yu Meng CIKM 2018
Weakly-Supervised Hierarchical Text Classification. Yu Meng AAAI 2019
Weakly-Supervised Aspect-Based Sentiment Analysis via Joint Aspect-Sentiment Topic Embedding. Jiaxin Huang EMNLP 2020
Text Classification Using Label Names Only: A Language Model Self-Training Approach. Yu Meng EMNLP 2020
Hierarchical Metadata-Aware Document Categorization under Weak Supervision. Yu Zhang WSDM 2021
Contextualized weak supervision for text classification. Dheeraj Mekala ACL 2020
Meta: Metadata-empowered weak supervision for text classification. Dheeraj Mekala EMNLP 2020
X-class: Text classification with extremely weak supervision. Zihan Wang NAACL 2021
Coarse2Fine: Fine-grained text classification on coarsely-grained annotated data. Dheeraj Mekala EMNLP 2021
Improving Weak Supervision
LOPS: Learning Order Inspired Pseudo-Label Selection for Weakly Supervised Text Classification. Dheeraj Mekala EMNLP 2022 Findings