Home

Awesome

The-NLP-Pandect

<p align="center"> This pandect (πανδέκτης is Ancient Greek for encyclopedia) was created to help you find almost anything related to Natural Language Processing that is available online. </p>

Note Quick legend on available resource types:

⭐ - open source project, usually a GitHub repository with its number of stars

📙 - resource you can read, usually a blog post or a paper

🗂️ - a collection of additional resources

🔱 - non-open source tool, framework or paid service

🎥️ - a resource you can watch

🎙️ - a resource you can listen to

<p align="center"><b>Table of Contents</b></p>

📇 Main Section🗃️ Sub-sections Sample
NLP ResourcesPaper Summaries, Conference Summaries, NLP Datasets
NLP PodcastsNLP-only Podcasts, Podcasts with many NLP Episodes
NLP Newsletters-
NLP Meetups-
NLP YouTube Channels-
NLP BenchmarksGeneral NLU, Question Answering, Multilingual
Research ResourcesResource on Transformer Models, Distillation and Pruning, Automated Summarization
Industry ResourcesBest Practices for NLP Systems, MLOps for NLP
Speech RecognitionGeneral Resources, Text to Speech, Speech to Text, Datasets
Topic ModelingBlogs, Frameworks, Repositories and Projects
Keyword ExtractionText Rank, Rake, Other Approaches
Responsible NLPNLP and ML Interpretability, Ethics, Bias, and Equality in NLP, Adversarial Attacks for NLP
NLP FrameworksGeneral Purpose, Data Augmentation, Machine Translation, Adversarial Attacks, Dialog Systems & Speech, Entity and String Matching, Non-English Frameworks, Text Annotation
Learning NLPCourses, Books, Tutorials
NLP Communities-
Other NLP TopicsTokenization, Data Augmentation, Named Entity Recognition, Error Correction, AutoML/AutoNLP, Text Generation

The-NLP-Resources

Note Section keywords: paper summaries, compendium, awesome list

Compendiums and awesome lists on the topic of NLP:

NLP Conferences, Paper Summaries and Paper Compendiums:

Papers and Paper Summaries
Conference Summaries

NLP Progress and NLP Tasks:

NLP Datasets:

Word and Sentence embeddings:

Notebooks, Scripts and Repositories

Non-English resources and Compendiums

Pre-trained NLP models

NLP History

General
2020 Year in Review

The-NLP-Podcasts

🔙 Back to the Table of Contents

NLP-only podcasts

Many NLP episodes

Some NLP episodes

The-NLP-Newsletter

The-NLP-Meetups

The-NLP-Youtube

The-NLP-Benchmarks

🔙 Back to the Table of Contents

General NLU

Summarization

Question Answering

Multilingual and Non-English Benchmarks

Bio, Law, and other scientific domains

Transformer Efficiency

Speech Processing

Other

The-NLP-Research

🔙 Back to the Table of Contents

General

Embeddings

Repositories

Blogs

Cross-lingual Word and Sentence Embeddings

Byte Pair Encoding

Transformer-based Architectures

General

Transformer

BERT

Other Transformer Variants

T5
BigBird
Reformer / Linformer / Longformer / Performers
Switch Transformer

GPT-family

General
GPT-3
Learning Resources
Applications
Open-source Efforts

Other

Distillation, Pruning and Quantization

Reading Material
Tools

Automated Summarization

Knowledge Graphs and NLP

The-NLP-Industry

Note Section keywords: best practices, MLOps

🔙 Back to the Table of Contents

Best Practices for building NLP Projects

MLOps for NLP

MLOps, especially when applied to NLP, is a set of best practices around automating various parts of the workflow when building and deploying NLP pipelines.

In general, MLOps for NLP includes having the following processes in place:

Additionally, there are two more components that are not as prevalent for NLP and are mostly used for Computer Vision and other sub-fields of AI:

MLOps Compilations & Awesome Lists

Reading Material

Learning Material

MLOps Communities

Data Versioning

Experiment Tracking

Model Registry

Automated Testing and Behavioral Testing

Model Deployability and Serving

Model Debugging

Model Accuracy Prediction

Data and Model Observability

General
Model Centric
Data Centric

Feature Stores

Metadata Management

MLOps Frameworks

Transformer-based Architectures

🔙 Back to the Table of Contents

General

Multi-GPU Transformers
Training Transformers Effectively

Embeddings as a Service

NLP Recipes Industrial Applications:

NLP Applications in Bio, Finance, Legal and other industries

The-NLP-Speech

Note Section keywords: speech recognition

🔙 Back to the Table of Contents

General Speech Recognition

Text to Speech / Speech Generation

Speech to Text

Datasets

The-NLP-Topics

Note Section keywords: topic modeling

🔙 Back to the Table of Contents

Blogs

Frameworks for Topic Modeling

Repositories

Keyword-Extraction

Note Section keywords: keyword extraction

🔙 Back to the Table of Contents

Text Rank

RAKE - Rapid Automatic Keyword Extraction

Other Approaches

Further Reading

Responsible-NLP

Note Section keywords: ethics, responsible NLP

🔙 Back to the Table of Contents

NLP and ML Interpretability

NLP-centric

General

Ethics, Bias, and Equality in NLP

Adversarial Attacks for NLP

Hate Speech Analysis

The-NLP-Frameworks

Note Section keywords: frameworks

🔙 Back to the Table of Contents

General Purpose

Data Augmentation

Adversarial NLP Attacks & Behavioral Testing

Transformer-oriented

Dialogue Systems and Speech

Word/Sentence-embeddings oriented

Social Media Oriented

Phonetics

Morphology

Multi-lingual tools

Distributed NLP / Multi-GPU NLP

Machine Translation

Entity and String Matching

Discourse Analysis

PII scrubbing

Hastag Segmentation

Books Analysis / Literary Analysis / Semantic Search

Non-English oriented

Japanese

Thai

Chinese

Ukrainian

Other

Text Data Labelling & Classification

The-NLP-Learning

Note Section keywords: learn NLP

🔙 Back to the Table of Contents

General

Courses

Books

Tutorials

The-NLP-Communities

Other-NLP-Topics

🔙 Back to the Table of Contents

Tokenization

Data Augmentation and Weak Supervision

Libraries and Frameworks
Reading Material and Tutorials

Named Entity Recognition (NER)

Relation Extraction

Coreference Resolution

Sentiment Analysis

Domain Adaptation

Low Resource NLP

Spell Correction / Error Correction

Style Transfer for NLP

Automata Theory for NLP

Obscene words detection

Reddit Analysis

Skill Detection

Reinforcement Learning for NLP

AutoML / AutoNLP

OCR - Optical Character Recognition

Document AI

Text Generation

Title / Headlines Generation

NLP research reproducibility

License CC0

Attributions

Resources

Icons

Fonts


<h3 align="center">The Pandect Series also includes</h3> <p align="middle"> <a href="https://github.com/ivan-bilan/The-Microservices-Pandect"> <img src="https://raw.githubusercontent.com/ivan-bilan/The-Engineering-Manager-Pandect/main/Resources/Images/microservices_pandect_promo.png" width="390" /> </a> &nbsp; &nbsp; &nbsp; <a href="https://github.com/ivan-bilan/The-Engineering-Manager-Pandect"> <img src="https://raw.githubusercontent.com/ivan-bilan/The-Engineering-Manager-Pandect/main/Resources/Images/em_pandect_promo.png" width="370" /> </a> </p>