Awesome
Open Knowledge Discovery Reading List
This is an open knowledge discovery reading list maintained by THUIAR team. As real-world scenarios are usually open settings, it is crucial to discovery these open knowledge (e.g., new user intents in dialogue system, image open set and so on) to improve the quality of machine learning systems.
Our list is still incomplete and the taxonomy may be inappropriate. We will keep adding papers to improve the list, and welcome to Pull Request ! If you have any suggestions, please contact zhang-hl20@mails.tsinghua.edu.cn.
Contents
<!-- * [Multimodality](#Multimodality) * [Dialogue System](#Dialogue_System) --> <h2 id="Natural_Language_Processing">Natural Language Processing</h2> <h3 id="Toolkit">Toolkit</h3>- Hanlei Zhang, Xiaoteng Li, Hua Xu, Panpan Zhang, Kang Zhao, Kai Gao. 2021. TEXTOIR: An Integrated and Visualized Platform for Text Open Intent Recognition. In Proceedings of ACL 2021. [paper] [toolkit] [demo]
- Hanlei Zhang, Hua Xu, Shaojie Zhao, Qianrui Zhou. 2023. Learning Discriminative Representations and Decision Boundaries for Open Intent Detection. IEEE Transactions on Audio, Speech and Language Processing. [paper] [code]
- Yunhua Zhou, Peiju Liu, Xipeng Qiu. 2022. KNN-Contrastive Learning for Out-of-Domain Intent Classification. In Proceedings of ACL 2022. [paper] [code]
- Zifeng Cheng ,Zhiwei Jiang, Yafeng Yin, Cong Wang, and Qing Gu. 2022. Learning to Classify Open Intent via Soft Labeling and Manifold Mixup. IEEE Transactions on Audio, Speech and Language Processing. [paper] [code]
- Liming Zhan, Haowen Liang, Bo Liu, Lu Fan, Xiaoming Wu, Albert Y.S. Lam. 2021. Out-of-Scope Intent Detection with Self-Supervision and Discriminative Training. In Proceedings of ACL-IJCNLP 2021. [paper] [code]
- Hanlei Zhang, Hua Xu and Ting-En Lin. 2021. Deep Open Intent Classification with Adaptive Decision Boundary. In Proceedings of AAAI 2021. [paper] [code]
- Guangfeng Yan, Lu Fan, Qimai Li, Han Liu, Xiaotong Zhang, Xiao-Ming Wu and Albert Y.S. Lam. 2020. Unknown Intent Detection Using Gaussian Mixture Model with an Application to Zero-shot Intent Classification. In Proceedings of ACL 2020. [paper] [code]
- Iñigo Casanueva, Tadas Temčinas, Daniela Gerz, Matthew Henderson, Ivan Vulić. 2020. Efficient Intent Detection with Dual Sentence Encoders. In Proceedings of ACL 2020. [paper] [dataset]
- Ting-En Lin and Hua Xu. 2019. A post-processing framework for detecting unknown intent of dialogue system via pre-trained deep neural network classifier. In Knowledge-based Systems. [paper] [code]
- Ting-En Lin and Hua Xu. 2019. Deep Unknown Intent Detection with Margin Loss. In Proceedings of ACL 2019. [paper] [code]
- Congying Xia, Chenwei Zhang, Xiaohui Yan, Yi Chang and Philip S. Yu. 2018. Zero-shot User Intent Detection via Capsule Neural Networks. In Proceedings of EMNLP 2018. [paper] [code]
- Di Jin, Shuyang Gao, Seokhwan Kim, Yang Liu, and Dilek Hakkani-Tür. 2022. Towards Textual Out-of-Domain Detection Without In-Domain Labels. IEEE Transactions on Audio, Speech and Language Processing. [paper]
- Derek Chen, Zhou Yu. GOLD: Improving Out-of-Scope Detection in Dialogues using Data Augmentation. In Proceedings of EMNLP 2021. [paper] [code]
- Udit Arora, William Huang, He He. Types of Out-of-Distribution Texts and How to Detect Them. In Proceedings of EMNLP 2021. [paper] [code]
- Wenxuan Zhou, Fangyu Liu, Muhao Chen. Contrastive Out-of-Distribution Detection for Pretrained Transformers. In Proceedings of EMNLP 2021. [paper] [code]
- Xiaoya Li, Jiwei Li, Xiaofei Sun, Chun Fan, Tianwei Zhang, Fei Wu, Yuxian Meng, Jun Zhang. 2021. kFolden: k-Fold Ensemble for Out-of-Distribution Detection. In Proceedings of EMNLP 2021. [paper][code]
- Yawen Ouyang, Jiasheng Ye, Yu Chen, Xinyu Dai, Shujian Huang, Jiajun Chen. 2021. Energy-based Unknown Intent Detection with Data Manipulation. In Proceedings of ACL-IJCNLP 2021 Findings .[paper]
- DongHyun Choi, Myeong Cheol Shin, EungGyun Kim, Dong Ryeol Shin. 2021. OutFlip: Generating Out-of-Domain Samples for Unknown Intent Detection with Natural Language Attack. In Proceedings of ACL-IJCNLP 2021 Findings .[paper]
- Zhiyuan Zeng, Keqing He, Yuanmeng Yan, Zijun Liu, Yanan Wu, Hong Xu, Huixing Jiang, Weiran Xu. 2021. Modeling Discriminative Representations for Out-of-Domain Detection with Supervised Contrastive Learning. In Proceedings of ACL-IJCNLP 2021. [paper][code]
- Yilin Shen, Yen-Chang Hsu, Avik Ray, Hongxia Jin. 2021. Enhancing the generalization for Intent Classification and Out-of-Domain Detection in SLU. In Proceedings of ACL-IJCNLP 2021. [paper]
- Yinhe Zheng, Guanyi Chen, and Minlie Huang. 2020. Out-of-domain Detection for Natural Language Understanding in Dialog Systems. IEEE Transactions on Audio, Speech and Language Processing. [paper]
- Gangal Varun, Arora Abhinav, Einolghozati Arash and Gupta Sonal. 2020. Likelihood Ratios and Generative Classifiers for Unsupervised Out-of-Domain Detection in Task Oriented Dialog. In Proceedings of AAAI 2020. [paper]
- Larson Stefan, Mahendran Anish, Peper Joseph J, Clarke Christopher, Lee Andrew, Hill Parker, Kummerfeld Jonathan K, Leach Kevin, Laurenzano Michael A., Tang Lingjia and Mars Jason. 2019. An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction. In Proceedings of EMNLP-IJCNLP 2019. [paper] [dataset]
- Joo-Kyung Kim and Young-Bum Kim. 2018. Joint Learning of Domain Classification and Out-of-Domain Detection with Dynamic Class Weighting for Satisficing False Acceptance Rates. In Proceedings of INTERSPEECH 2018. [paper]
- Seonghan Ryu, Sangjun Koo, Hwanjo Yu, and Gary Geunbae Lee. 2018. Out-of-domain Detection based on Generative Adversarial Network. In Proceedings of EMNLP 2018. [paper]
- Ian Lane, Tatsuya Kawahara, Tomoko Matsui and Satoshi Nakamura. 2006. Out-of-Domain Utterance Detection Using Classification Confidences of Multiple Topics. IEEE Transactions on Audio, Speech, and Language Processing. [paper]
- Lei Shu, Yassine Benajiba, Saab Mansour, Yi Zhang. ODIST: Open World Classification via Distributionally Shifted Instances. 2021. In Proceedings of EMNLP 2021 Findings. [paper]
- Hu Xu, Bing Liu, Lei Shu and P. Yu. Open-world Learning and Application to Product Classification. 2019. In Proceedings of WWW 2019. [paper] [code]
- Lei Shu, Hu Xu and Bing Liu. 2017. DOC: Deep Open Classification of Text Documents. In Proceedings of EMNLP 2017. [paper] [code]
- Geli Fei and Bing Liu. Breaking the Closed World Assumption in Text Classification. 2016. In Proceedings of HLT-NAACL 2016. [paper]
- Hanlei Zhang, Hua Xu, Xin Wang, Fei Long, Kai Gao. 2023. A Clustering Framework for Unsupervised and Semi-supervised New Intent Discovery. IEEE Transactions on Knowledge and Data Engineering. [paper]
- Yuwei Zhang, Haode Zhang, Li-Ming Zhan, Xiao-Ming Wu, Albert Y.S. Lam. 2022. New intent discovery with pre-training and contrastive learning. In Proceedings of ACL 2022. [paper] [code]
- Hanlei Zhang, Hua Xu, Ting-En Lin and Rui Lyu. 2021. Discovering New Intents with Deep Aligned Clustering. In Proceedings of AAAI 2021. [paper] [code]
- Nikhita Vedula, Nedim Lipka, Pranav Maneriker and Srinivasan Parthasarathy. Open Intent Extraction from Natural Language Interactions. In Proceedings of WWW 2020. [paper]
- Ting-En Lin, Hua Xu and Hanlei Zhang. 2020. Discovering New Intents via Constrained Deep Adaptive Clustering with Cluster Refinement. In Proceedings of AAAI 2020. [paper] [code]
- Hugh Perkins, Yi Yang. 2019. Dialog Intent Induction with Deep Multi-View Clustering. In Proceedings of EMNLP 2019. [paper] [code]
- Iryna Haponchyk*, Antonio Uva*, Seunghak Yu, Olga Uryupina and Alessandro Moschitti. 2018. Supervised Clustering of Questions into Intents for Dialog System Applications. In Proceedings of EMNLP 2018. [paper] [dataset]
- Padmasundari and Srinivas Bangalore. 2018. Intent Discovery Through Unsupervised Semantic Text Clustering. In Proceedings of INTERSPEECH 2018. [paper]
- Toma´s Brychc ˇ ´ın and Pavel Kral´. 2017. Unsupervised Dialogue Act Induction using Gaussian Mixtures. In Proceedings of EACL 2017. [paper]
- Dilek Hakkani-Tür, Yun-Cheng Ju, Geoff Zweig and Gokhan Tur. 2015. Clustering Novel Intents in a Conversational Interaction System with Semantic Parsing. In Proceedings of INTERSPEECH 2015. [paper]
- George Forman, Hila Nachlieli, and Renato Keshet. 2015. Clustering by intent: A semi-supervised method to discover relevant clusters incrementally. In Proceedings of ECML-PKDD 2015. [paper]
- Dilek Hakkani-Tür, Asli Celikyilmaz, Larry Heck and Gokhan Tur. 2013. A Weakly-Supervised Approach for Discovering New User Intents from Search Query Logs. In Proceedings of INTERSPEECH 2013. [paper]
- Vikash Sehwag, Mung Chiang, Prateek Mittal. 2021. SSD: A Unified Framework for Self-Supervised Outlier Detection. In Proceedings of ICLR 2021. [paper] [code]
- Yen-Chang Hsu, Yilin Shen, Hongxia Jin, Zsolt Kira. 2020. Generalized ODIN: Detecting Out-of-distribution Image without Learning from Out-of-distribution Data. In Proceedings of CVPR 2020. [paper]
- Julian Bitterwolf, Alexander Meinke, Matthias Hein. 2020. Certifiably Adversarially Robust Detection of Out-of-Distribution Data. In Proceedings of NeurIPS 2020. In Proceedings of NeurIPS 2020. [paper] [code]
- Taewon Jeong, Heeyoung Kim. 2020. OOD-MAML: Meta-Learning for Few-Shot Out-of-Distribution Detection and Classification. In Proceedings of NeurIPS 2020. [paper]
- Joan Serrà, David Álvarez, Vicenç Gómez, Olga Slizovskaia, José F. Núñez, Jordi Luque. 2020. Input Complexity and Out-of-distribution Detection with Likelihood-based Generative Models. In Proceedings of ICLR 2020. [paper]
- Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur and Balaji Lakshminarayanan. 2019. Do Deep Generative Models Know What They Don't Know? In Proceedings of ICLR 2019. [paper] [code]
- Qing Yu and Kiyoharu Aizawa. 2019. Unsupervised Out-of-Distribution Detection by Maximum Classifier Discrepancy. In Proceedings of ICCV 2019. [paper] [code]
- Jie Ren, Peter J. Liu, Emily Fertig, Jasper Snoek, Ryan Poplin, Mark A. DePristo, Joshua V. Dillon and Balaji Lakshminarayanan. 2019. Likelihood Ratios for Out-of-Distribution Detection. In Proceedings of NeurIPS 2019. [paper] [code]
- Alireza Shafaei, Mark Schmidt and James J. Little. 2019. A Less Biased Evaluation of Out-of-distribution Sample Detectors. In Proceedings of BMVC 2019. [paper] [code]
- Shiyu Liang, Yixuan Li and R. Srikant. 2018. Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks. In Proceedings of ICLR 2018. [paper] [code]
- Dan Hendrycks and Kevin Gimpel. 2017. A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks. In Proceedings of ICLR 2017. [paper] [code]
- Anh Nguyen, Jason Yosinski and Jeff Clune. 2015. Deep Neural Networks Are Easily Fooled: High Confidence Predictions for Unrecognizable Images. In Proceedings of CVPR 2015. [paper]
- Guangyao Chen, Peixi Peng, Xiangqian Wang and Yonghong Tian. 2021. Adversarial Reciprocal Points Learning for Open Set Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. [paper]
- Da-Wei Zhou, Han-Jia Ye, De-Chuan Zhan. 2021. Learning Placeholders for Open-Set Recognition. In Proceedings of CVPR 2021. [paper]
- Hong-Ming Yang, Xu-Yao Zhang, Fei Yin, Qing Yang and Cheng-Lin Liu. 2020. Convolutional Prototype Network for Open Set Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. [paper]
- Chuanxing Geng and Songcan Chen. 2021. Collective decision for open set recognition. IEEE Transactions on Knowledge and Data Engineering. [paper]
- Pramuditha Perera, Vlad I. Morariu, Rajiv Jain, Varun Manjunatha, Curtis Wigington, Vicente Ordonez and Vishal M. Patel. 2020. Generative-discriminative Feature Representations for Open-set Recognition. In Proceedings of CVPR 2020. [paper]
- Xin Sun, Zhenning Yang, Chi Zhang, Guohao Peng and Keck-Voon Ling. 2020. Conditional Gaussian Distribution Learning for Open Set Recognition. In Proceedings of CVPR 2020. [paper] [code]
- Bo Liu, Hao Kang, Haoxiang Li, Gang Hua and Nuno Vasconcelos. 2020. Few-Shot Open-Set Recognition Using Meta-Learning. In Proceedings of CVPR 2020. [paper]
- Guangyao Chen, Limeng Qiao, Yemin Shi, Peixi Peng, Jia Li, Tiejun Huang, Shiliang Pu and Yonghong Tian. 2020. Learning Open Set Network with Discriminative Reciprocal Points. In Proceedings of ECCV 2020. [paper]
- Qing Yu, Daiki Ikami, Go Irie and Kiyoharu Aizawa. 2020. Multi-Task Curriculum Framework for Open-Set Semi-Supervised Learning. In Proceedings of ECCV 2020. [paper]
- Chuanxing Geng, Sheng-Jun Huang and Songcan Chen. 2020. Recent Advances in Open Set Recognition: A Survey. In IEEE Transactions on Pattern Analysis and Machine Intelligence. [paper]
- T. E. Boult, S. Cruz, A.R. Dhamija, M. Gunther, J. Henrydoss and W.J. Scheirer. Learning and the Unknown: Surveying Steps toward Open World Recognition. In Proceedings of AAAI 2019. [paper]
- Liu Ziwei, Miao Zhongqi, Zhan Xiaohang, Wang Jiayun, Gong Boqing and Yu Stella X. 2019. Large-Scale Long-Tailed Recognition in an Open World. In Proceedings of CVPR 2019. [paper] [code]
- Ryota Yoshihashi, Wen Shao, Rei Kawakami, Shaodi You, Makoto Iida and Takeshi Naemura. 2019. Classification-Reconstruction Learning for Open-Set Recognition. In Proceedings of CVPR 2019. [paper] [code]
- Pramuditha Perera and Vishal M. Patel. 2019. Deep Transfer Learning for Multiple Class Novelty Detection. In Proceedings of CVPR 2019. [paper] [code]
- Poojan Oza and Vishal M. Patel. 2019. C2AE: Class Conditioned Auto-Encoder for Open-set Recognition. In Proceedings of CVPR 2019. [paper]
- Lei Shu, Hu Xu and Bing Liu. 2018. Unseen Class Discovery in Open-world Classification. arXiv. [paper]
- Yang Yu, Wei-Yang Qu, Nan Li, Zimin Guo. 2017. Open-Category Classification by Adversarial Sample Generation. In Proceedings of IJCAI 2017. [paper] [code]
- Manuel Gunther, Steve Cruz, Ethan M. Rudd, Terrance E. Boult. 2017. Toward Open-Set Face Recognition. In Proceedings of CVPR 2017 Workshop.[paper] [code]
- Abhijit Bendale and Terrance E. Boult. 2016. Towards Open Set Deep Networks. In Proceedings of CVPR 2016. [paper] [code]
- Abhijit Bendale and Terrance E. Boult . 2015. Towards Open World Recognition. In Proceedings of CVPR 2015. [paper] [code]
- Walter J. Scheirer, Lalit P. Jain and Terrance E. Boult. 2014. Probability Models for Open Set Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence.[paper]
- Lalit P. Jain, Walter J. Scheirer and Terrance E. Boult. Multi-class open set recognition using probability of inclusion. 2014. In Proceedings of ECCV 2014. [paper]
- Walter J. Scheirer, Anderson de Rezende, Archana Sapkota and Terrance E. Boult . 2013. Toward Open Set Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. [paper]
- Kai Han∗, Sylvestre-Alvise Rebuffi∗, Sebastien Ehrhardt∗, Andrea Vedaldi and Andrew Zisserman. 2020. Automatically Discovering and Learning New Visual Categories with Ranking Statistics. In Proceedings of ICLR 2020. [paper] [code]
- Kai Han, Andrea Vedaldi and Andrew Zisserman. 2019. Learning to Discover Novel Visual Categories via Deep Transfer Clustering. In Proceedings of ICCV 2019. [paper] [code]
- Yen-Chang Hsu, Zhaoyang Lv, Joel Schlosser, Phillip Odom and Zsolt Kira. 2019. Multi-class classification without multi-class labels. In Proceedings of ICLR 2019. [paper] [code]
- Yen-Chang Hsu and Zhaoyang Lv and Zsolt Kira. 2018. Learning to Cluster in Order to Transfer Across Domains and Tasks. In Proceedings of ICLR 2018. [paper] [code]
- Yen-Chang Hsu, Zhaoyang Lv and Zsolt Kira. 2016. Deep Image Category Discovery using a Transferred Similarity Function. arXiv. [paper]
- Xiaohang Zhan, Jiahao Xie, Ziwei Liu, Yew-Soon Ong and Chen Change Loy. 2020. Online Deep Clustering for Unsupervised Representation Learning. In Proceedings of CVPR 2020. [paper] [code]
- Yuki M. Asano, Christian Rupprecht and Andrea Vedaldi. 2020. Self-labelling via simultaneous clustering and representation learning. In Proceedings of ICLR 2020. [paper] [code]
- Tapaswi Makarand, Law Marc T and Fidler Sanja. 2019. Video Face Clustering with Unknown Number of Clusters. In Proceedings of ICCV 2019. [paper] [code]
- Caron Mathilde, Bojanowski Piotr, Joulin Armand and Douze Matthijs. 2018. Deep Clustering for Unsupervised Learning of Visual Features. In Proceedings of ECCV 2018. [paper] [code]
- Chen Shi, Qi Chen, Lei Sha, Sujian Li, Xu Sun, Houfeng Wang and Lintao Zhang. 2018. Auto-Dialabel: Labeling Dialogue Data with Unsupervised Learning. In Proceedings of EMNLP 2018. [paper]
- Jianlong Chang, Lingfeng Wang, Gaofeng Meng, Shiming Xiang and Chunhong Pan. 2017. Deep adaptive image clustering. In Proceedings of ICCV 2017. [paper] [code]
- Bo Yang, Xiao Fu, Nicholas D. Sidiropoulos and Mingyi Hong. 2017. Towards k-means-friendly spaces: Simultaneous deep learning and clustering. In Proceedings of ICML 2017. [paper] [code]
- Yen-Chang Hsu and Zsolt Kira. 2016. Neural network-based clustering using pairwise constraints. In Proceedings of ICLR 2016 Workshop. [paper] [code]
- Jianwei Yang, Devi Parikh and Dhruv Batra. 2016. Joint Unsupervised Learning of Deep Representations and Image Clusters. In Proceedings of CVPR 2016. [paper] [code]
- Junyuan Xie, Ross Girshick and Ali Farhadi. 2016. Unsupervised Deep Embedding for Clustering Analysis. In Proceedings of ICML 2016. [paper] [code]
- Zhiguo Wang, Haitao Mi and Abraham Ittycheriah. 2016. Semi-supervised Clustering for Short Text via Deep Representation Learning. In Proceedings of CoNLL 2016. [paper]
- Jiaming Xu, Peng Wang, Guanhua Tian, Bo Xu, Jun Zhao, Fangyuan Wang and Hongwei Hao. 2015. Short Text Clustering via Convolutional Neural Networks. In Proceedings of ACL Workshop 2015. [paper] [dataset]
Acknowledgements
Contributors: Hanlei Zhang, Shaojie Zhao, Ting-En Lin, Kang Zhao.
We thank TaoCesc for paper recommendation.
<!-- <h2 id="Multimodality">Multimodality</h2> <h3 id="Dialogue_System">Dialogue System</h3> * Amrita Saha, Mitesh M. Khapra and Karthik Sankaranarayanan. 2018. **Towards Building Large Scale Multimodal Domain-Aware Conversation Systems**. In *Proceedings of AAAI 2018*. [[paper](https://arxiv.org/abs/1704.00200)] [[code](https://github.com/lipiji/dialogue-hred-vhred)] -->