Awesome
Natural Language Processing Tasks and Selected References
I've been working on several natural language processing tasks for a long time. One day, I felt like drawing a map of the NLP field where I earn a living. I'm sure I'm not the only person who wants to see at a glance which tasks are in NLP.
I did my best to cover as many as possible tasks in NLP, but admittedly this is far from exhaustive purely due to my lack of knowledge. And selected references are biased towards recent deep learning accomplishments. I expect these serve as a starting point when you're about to dig into the task. I'll keep updating this repo myself, but what I really hope is you collaborate on this work. Don't hesitate to send me a pull request!
Oct. 13, 2017.<br/> by Kyubyong
Reviewed and updated by YJ Choe on Oct. 18, 2017.
Anaphora Resolution
Automated Essay Scoring
PAPER
Automatic Text Scoring Using Neural NetworksPAPER
A Neural Approach to Automated Essay ScoringCHALLENGE
Kaggle: The Hewlett Foundation: Automated Essay ScoringPROJECT
EASE (Enhanced AI Scoring Engine)
Automatic Speech Recognition
WIKI
Speech recognitionPAPER
Deep Speech 2: End-to-End Speech Recognition in English and MandarinPAPER
WaveNet: A Generative Model for Raw AudioPROJECT
A TensorFlow implementation of Baidu's DeepSpeech architecturePROJECT
Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition using DeepMind's WaveNetCHALLENGE
The 5th CHiME Speech Separation and Recognition ChallengeDATA
The 5th CHiME Speech Separation and Recognition ChallengeDATA
CSTR VCTK CorpusDATA
LibriSpeech ASR corpusDATA
Switchboard-1 Telephone Speech CorpusDATA
TED-LIUM CorpusDATA
Open Speech and Language ResourcesDATA
Common Voice
Automatic Summarisation
WIKI
Automatic summarizationBOOK
Automatic Text SummarizationPAPER
Text Summarization Using Neural NetworksPAPER
Ranking with Recursive Neural Networks and Its Application to Multi-Document SummarizationDATA
Text Analytics Conferences (TAC)DATA
Document Understanding Conferences (DUC)
Coreference Resolution
INFO
Coreference ResolutionPAPER
Deep Reinforcement Learning for Mention-Ranking Coreference ModelsPAPER
Improving Coreference Resolution by Learning Entity-Level Distributed RepresentationsCHALLENGE
CoNLL 2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotesCHALLENGE
CoNLL 2011 Shared Task: Modeling Unrestricted Coreference in OntoNotesCHALLENGE
SemEval 2018 Task 4: Character Identification on Multiparty Dialogues
Entity Linking
Grammatical Error Correction
PAPER
A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error CorrectionPAPER
Neural Network Translation Models for Grammatical Error CorrectionPAPER
Adapting Sequence Models for Sentence CorrectionCHALLENGE
CoNLL-2013 Shared Task: Grammatical Error CorrectionCHALLENGE
CoNLL-2014 Shared Task: Grammatical Error CorrectionDATA
NUS Non-commercial research/trial corpus licenseDATA
Lang-8 Learner CorporaDATA
Cornell Movie--Dialogs CorpusPROJECT
Deep Text CorrectorPRODUCT
deep grammar
Grapheme To Phoneme Conversion
PAPER
Grapheme-to-Phoneme Models for (Almost) Any LanguagePAPER
Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation LearningPAPER
Multitask Sequence-to-Sequence Models for Grapheme-to-Phoneme ConversionPROJECT
Sequence-to-Sequence G2P toolkitPROJECT
g2p_en: A Simple Python Module for English Grapheme To Phoneme ConversionDATA
Multilingual Pronunciation Data
Humor and Sarcasm Detection
PAPER
Automatic Sarcasm Detection: A SurveyPAPER
Magnets for Sarcasm: Making Sarcasm Detection Timely, Contextual and Very PersonalPAPER
Sarcasm Detection on Twitter: A Behavioral Modeling ApproachCHALLENGE
SemEval-2017 Task 6: #HashtagWars: Learning a Sense of HumorCHALLENGE
SemEval-2017 Task 7: Detection and Interpretation of English PunsDATA
Sarcastic comments from RedditDATA
Sarcasm Corpus V2DATA
Sarcasm Amazon Reviews Corpus
Language Grounding
WIKI
Symbol grounding problemPAPER
The Symbol Grounding ProblemPAPER
From phonemes to images: levels of representation in a recurrent neural model of visually-grounded language learningPAPER
Encoding of phonology in a recurrent neural model of grounded speechPAPER
Gated-Attention Architectures for Task-Oriented Language GroundingPAPER
Sound-Word2Vec: Learning Word Representations Grounded in SoundsCOURSE
Language Grounding to Vision and ControlWORKSHOP
Language Grounding for Robotics
Language Guessing
Language Identification
WIKI
Language identificationPAPER
AUTOMATIC LANGUAGE IDENTIFICATION USING DEEP NEURAL NETWORKSPAPER
Natural Language Processing with Small Feed-Forward NetworksCHALLENGE
2015 Language Recognition Evaluation
Language Modeling
WIKI
Language modelTOOLKIT
KenLM Language Model ToolkitPAPER
Distributed Representations of Words and Phrases and their CompositionalityPAPER
Generating Sequences with Recurrent Neural NetworksPAPER
Character-Aware Neural Language ModelsTHESIS
Statistical Language Models Based on Neural NetworksDATA
Penn TreebankTUTORIAL
TensorFlow Tutorial on Language Modeling with Recurrent Neural Networks
Language Recognition
Lemmatisation
WIKI
LemmatisationPAPER
Joint Lemmatization and Morphological Tagging with LEMMINGTOOLKIT
WordNet LemmatizerDATA
Treebank-3
Lip-reading
WIKI
Lip readingPAPER
LipNet: End-to-End Sentence-level LipreadingPAPER
Lip Reading Sentences in the WildPAPER
Large-Scale Visual Speech RecognitionPROJECT
Lip Reading - Cross Audio-Visual Recognition using 3D Convolutional Neural NetworksPRODUCT
LiopaDATA
The GRID audiovisual sentence corpusDATA
The BBC-Oxford 'Multi-View Lip Reading Sentences' (MV-LRS) Dataset
Machine Translation
PAPER
Neural Machine Translation by Jointly Learning to Align and TranslatePAPER
Neural Machine Translation in Linear TimePAPER
Attention Is All You NeedPAPER
Six Challenges for Neural Machine TranslationPAPER
Phrase-Based & Neural Unsupervised Machine TranslationCHALLENGE
ACL 2014 NINTH WORKSHOP ON STATISTICAL MACHINE TRANSLATIONCHALLENGE
EMNLP 2017 SECOND CONFERENCE ON MACHINE TRANSLATION (WMT17)DATA
OpenSubtitles2016DATA
WIT3: Web Inventory of Transcribed and Translated TalksDATA
The QCRI Educational Domain (QED) CorpusPAPER
Multi-task Sequence to Sequence LearningPAPER
Unsupervised Pretraining for Sequence to Sequence LearningPAPER
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot TranslationTOOLKIT
Subword Neural Machine Translation with Byte Pair Encoding (BPE)TOOLKIT
Multi-Way Neural Machine TranslationTOOLKIT
OpenNMT: Open-Source Toolkit for Neural Machine Translation
Morphological Inflection Generation
WIKI
InflectionPAPER
Morphological Inflection Generation Using Character Sequence to Sequence LearningCHALLENGE
SIGMORPHON 2016 Shared Task: Morphological ReinflectionDATA
sigmorphon2016
Named Entity Disambiguation
Named Entity Recognition
WIKI
Named-entity recognitionPAPER
Neural Architectures for Named Entity RecognitionPROJECT
OSU Twitter NLP ToolsCHALLENGE
Named Entity Recognition in TwitterCHALLENGE
CoNLL 2002 Language-Independent Named Entity RecognitionCHALLENGE
Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity RecognitionDATA
CoNLL-2002 NER corpusDATA
CoNLL-2003 NER corpusDATA
NUT Named Entity Recognition in Twitter Shared taskTOOLKIT
Stanford Named Entity Recognizer
Paraphrase Detection
PAPER
Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase DetectionPROJECT
Paralex: Paraphrase-Driven Learning for Open Question AnsweringCHALLENGE
SemEval-2015 Task 1: Paraphrase and Semantic Similarity in TwitterDATA
Microsoft Research Paraphrase CorpusDATA
Microsoft Research Video Description CorpusDATA
Pascal DatasetDATA
Flickr DatasetDATA
The SICK data setDATA
PPDB: The Paraphrase DatabaseDATA
WikiAnswers Paraphrase Corpus
Paraphrase Generation
PAPER
Neural Paraphrase Generation with Stacked Residual LSTM NetworksDATA
Neural Paraphrase Generation with Stacked Residual LSTM NetworksCODE
Neural Paraphrase Generation with Stacked Residual LSTM NetworksPAPER
A Deep Generative Framework for Paraphrase GenerationPAPER
Paraphrasing Revisited with Neural Machine Translation
Parsing
WIKI
ParsingTOOLKIT
The Stanford Parser: A statistical parserTOOLKIT
spaCy parserPAPER
Grammar as a Foreign LanguagePAPER
A fast and accurate dependency parser using neural networksPAPER
Universal Semantic ParsingCHALLENGE
CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal DependenciesCHALLENGE
CoNLL 2016 Shared Task: Multilingual Shallow Discourse ParsingCHALLENGE
CoNLL 2015 Shared Task: Shallow Discourse ParsingCHALLENGE
SemEval-2016 Task 8: The meaning representations may be abstract, but this task is concrete!
Part-of-speech Tagging
WIKI
Part-of-speech taggingPAPER
Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary LossPAPER
Unsupervised Part-Of-Speech Tagging with Anchor Hidden Markov ModelsDATA
Treebank-3TOOLKIT
nltk.tag package
Pinyin-To-Chinese Conversion
WIKI
Pinyin input methodPAPER
Neural Network Language Model for Chinese Pinyin Input Method EnginePROJECT
Neural Chinese Transliterator
Question Answering
WIKI
Question answeringPAPER
Ask Me Anything: Dynamic Memory Networks for Natural Language ProcessingPAPER
Dynamic Memory Networks for Visual and Textual Question AnsweringCHALLENGE
TREC Question Answering TaskCHALLENGE
NTCIR-8: Advanced Cross-lingual Information Access (ACLIA)CHALLENGE
CLEF Question Answering TrackCHALLENGE
SemEval-2017 Task 3: Community Question AnsweringCHALLENGE
SemEval-2018 Task 11: Machine Comprehension using Commonsense KnowledgeDATA
MS MARCO: Microsoft MAchine Reading COmprehension DatasetDATA
Maluuba NewsQADATA
SQuAD: 100,000+ Questions for Machine Comprehension of TextDATA
GraphQuestions: A Characteristic-rich Question Answering DatasetDATA
Story Cloze Test and ROCStories CorporaDATA
Microsoft Research WikiQA CorpusDATA
DeepMind Q&A DatasetDATA
QASentDATA
Textbook Question Answering
Relationship Extraction
WIKI
Relationship extractionPAPER
A deep learning approach for relationship extraction from interaction context in social manufacturing paradigmCHALLENGE
SemEval-2018 task 7 Semantic Relation Extraction and Classification in Scientific Papers
Semantic Role Labeling
WIKI
Semantic role labelingBOOK
Semantic Role LabelingPAPER
End-to-end Learning of Semantic Role Labeling Using Recurrent Neural NetworksPAPER
Neural Semantic Role Labeling with Dependency Path EmbeddingsPAPER
Deep Semantic Role Labeling: What Works and What's NextCHALLENGE
CoNLL-2005 Shared Task: Semantic Role LabelingCHALLENGE
CoNLL-2004 Shared Task: Semantic Role LabelingTOOLKIT
Illinois Semantic Role Labeler (SRL)DATA
CoNLL-2005 Shared Task: Semantic Role Labeling
Sentence Boundary Disambiguation
WIKI
Sentence boundary disambiguationPAPER
A Quantitative and Qualitative Evaluation of Sentence Boundary Detection for the Clinical DomainTOOLKIT
NLTK TokenizersDATA
The British National CorpusDATA
Switchboard-1 Telephone Speech Corpus
Sentiment Analysis
WIKI
Sentiment analysisINFO
Awesome Sentiment AnalysisCHALLENGE
Kaggle: UMICH SI650 - Sentiment ClassificationCHALLENGE
SemEval-2017 Task 4: Sentiment Analysis in TwitterCHALLENGE
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs and NewsPROJECT
SenticNetPROJECT
Stanford NLP Group Sentiment AnalysisDATA
Multi-Domain Sentiment Dataset (version 2.0)DATA
Stanford Sentiment TreebankDATA
Twitter Sentiment CorpusDATA
Twitter Sentiment Analysis Training CorpusDATA
AFINN: List of English words rated for valence
Sign Language Recognition/Translation
PAPER
Video-based Sign Language Recognition without Temporal SegmentationPAPER
SubUNets: End-to-end Hand Shape and Continuous Sign Language RecognitionDATA
RWTH-PHOENIX-WeatherDATA
ASLLRPPROJECT
SignAll
Singing Voice Synthesis
PAPER
Singing voice synthesis based on deep neural networksPAPER
A Neural Parametric Singing Synthesizer Modeling Timbre and Expression from Natural SongsPRODUCT
VOCALOID: voice synthesis technology and software developed by YamahaCHALLENGE
Special Session Interspeech 2016 Singing synthesis challenge "Fill-in the Gap"
Social Science Applications
WORKSHOP
NLP+CSS: Workshops on Natural Language Processing and Computational Social ScienceTOOLKIT
Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level ConstraintsTOOLKIT
Online Variational Bayes for Latent Dirichlet Allocation (LDA)GROUP
The University of Chicago Knowledge Lab
Source Separation
WIKI
Source separationPAPER
From Blind to Guided Audio Source SeparationPAPER
Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source SeparationCHALLENGE
Signal Separation Evaluation Campaign (SiSEC)CHALLENGE
CHiME Speech Separation and Recognition Challenge
Speaker Authentication
Speaker Diarisation
WIKI
Speaker diarisationPAPER
DNN-based speaker clustering for speaker diarisationPAPER
Unsupervised Methods for Speaker Diarization: An Integrated and Iterative ApproachPAPER
Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian FusionCHALLENGE
Rich Transcription Evaluation
Speaker Recognition
WIKI
Speaker recognitionPAPER
A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORKPAPER
DEEP NEURAL NETWORKS FOR SMALL FOOTPRINT TEXT-DEPENDENT SPEAKER VERIFICATIONPAPER
Deep Speaker: an End-to-End Neural Speaker Embedding SystemPROJECT
Voice Vector: which of the Hollywood stars is most similar to my voice?CHALLENGE
NIST Speaker Recognition Evaluation (SRE)INFO
Are there any suggestions for free databases for speaker recognition?DATA
VoxCeleb2: Deep Speaker Recognition
Speech Reading
- See Lip-reading
Speech Recognition
Speech Segmentation
WIKI
Speech_segmentationPAPER
Word Segmentation by 8-Month-Olds: When Speech Cues Count More Than StatisticsPAPER
Unsupervised Word Segmentation and Lexicon Discovery Using Acoustic Word EmbeddingsPAPER
Unsupervised Lexicon Discovery from Acoustic InputPAPER
Weakly supervised spoken term discovery using cross-lingual side informationDATA
CALLHOME Spanish Speech
Speech Synthesis
WIKI
Speech synthesisPAPER
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram PredictionsPAPER
WaveNet: A Generative Model for Raw AudioPAPER
Tacotron: Towards End-to-End Speech SynthesisPAPER
Deep Voice 3: 2000-Speaker Neural Text-to-SpeechPAPER
Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided AttentionDATA
The World English BibleDATA
LJ Speech DatasetDATA
Lessac DataCHALLENGE
Blizzard Challenge 2017PRODUCT
LyrebirdPROJECT
The Festvox projectTOOLKIT
Merlin: The Neural Network (NN) based Speech Synthesis System
Speech Enhancement
WIKI
Speech enhancementBOOK
Speech enhancement: theory and practicePAPER
An Experimental Study on Speech Enhancement BasedonDeepNeuralNetworkPAPER
A Regression Approach to Speech Enhancement BasedonDeepNeuralNetworksPAPER
Speech Enhancement Based on Deep Denoising Autoencoder
Speech-To-Text
Spoken Term Detection
Stemming
WIKI
StemmingPAPER
A BACKPROPAGATION NEURAL NETWORK TO IMPROVE ARABIC STEMMINGTOOLKIT
NLTK Stemmers
Term Extraction
WIKI
Terminology extractionPAPER
Neural Attention Models for Sequence Classification: Analysis and Application to Key Term Extraction and Dialogue Act Detection
Text Similarity
WIKI
Semantic similarityPAPER
A Survey of Text Similarity ApproachesPAPER
Learning to Rank Short Text Pairs with Convolutional Deep Neural NetworksPAPER
Improved Semantic Representations From Tree-Structured Long Short-Term Memory NetworksCHALLENGE
SemEval-2014 Task 3: Cross-Level Semantic SimilarityCHALLENGE
SemEval-2014 Task 10: Multilingual Semantic Textual SimilarityCHALLENGE
SemEval-2017 Task 1: Semantic Textual SimilarityWIKI
Semantic Textual Similarity Wiki
Text Simplification
WIKI
Text simplificationPAPER
Aligning Sentences from Standard Wikipedia to Simple WikipediaPAPER
Problems in Current Text Simplification Research: New Data Can HelpDATA
Newsela Data
Text-To-Speech
- See Speech Synthesis
Textual Entailment
WIKI
Textual entailmentPROJECT
Textual Entailment with TensorFlowPAPER
Textual Entailment with Structured Attentions and CompositionCHALLENGE
SemEval-2014 Task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailmentCHALLENGE
SemEval-2013 Task 7: The Joint Student Response Analysis and 8th Recognizing Textual Entailment Challenge
Transliteration
WIKI
TransliterationINFO
Transliteration of Non-Latin scriptsPAPER
A Deep Learning Approach to Machine TransliterationCHALLENGE
NEWS 2016 Shared Task on Transliteration of Named EntitiesPROJECT
Neural Japanese Transliteration—can you do better than SwiftKey™ Keyboard?
Voice Conversion
PAPER
PHONETIC POSTERIORGRAMS FOR MANY-TO-ONE VOICE CONVERSION WITHOUT PARALLEL DATA TRAININGPROJECT
Deep neural networks for voice conversion (voice style transfer) in TensorflowPROJECT
An implementation of voice conversion system utilizing phonetic posteriorgramsCHALLENGE
Voice Conversion Challenge 2016CHALLENGE
Voice Conversion Challenge 2018DATA
CMU_ARCTIC speech synthesis databasesDATA
TIMIT Acoustic-Phonetic Continuous Speech Corpus
Voice Recognition
Word Embeddings
WIKI
Word embeddingTOOLKIT
Gensim: word2vecTOOLKIT
fastTextTOOLKIT
GloVe: Global Vectors for Word RepresentationINFO
Where to get a pretrained modelPROJECT
Pre-trained word vectorsPROJECT
Pre-trained word vectors of 30+ languagesPROJECT
Polyglot: Distributed word representations for multilingual NLPPROJECT
BPEmb: a collection of pre-trained subword embeddings in 275 languagesCHALLENGE
SemEval 2018 Task 10 Capturing Discriminative AttributesPAPER
Bilingual Word Embeddings for Phrase-Based Machine TranslationPAPER
A Survey of Cross-Lingual Embedding Models
Word Prediction
INFO
What is Word Prediction?PAPER
The prediction of character based on recurrent neural network language modelPAPER
An Embedded Deep Learning based Word PredictionPAPER
Evaluating Word Prediction: Framing Keystroke SavingsDATA
An Embedded Deep Learning based Word PredictionPROJECT
Word Prediction using Convolutional Neural Networks—can you do better than iPhone™ Keyboard?CHALLENGE
SemEval-2018 Task 2, Multilingual Emoji Prediction
Word Segmentation
WIKI
Word segmentationPAPER
Neural Word Segmentation Learning for ChinesePROJECT
Convolutional neural network for Chinese word segmentationTOOLKIT
Stanford Word SegmenterTOOLKIT
NLTK Tokenizers