Home

Awesome

NLP-IN-PRACTICE

Use these NLP, Text Mining and Machine Learning code samples and tools to solve real world text data problems.

Notebooks / Source

Links in the first column take you to the subfolder/repository with the source code.

TaskRelated ArticleSource TypeDescription
Large Scale Phrase Extractionphrase2vec articlepython scriptExtract phrases for large amounts of data using PySpark. Annotate text using these phrases or use the phrases for other downstream tasks.
Word Cloud for Jupyter Notebook and Python Web Apps word_cloud articlepython script + notebookVisualize top keywords using word counts or tfidf
Gensim Word2Vec (with dataset)word2vec articlenotebookHow to work correctly with Word2Vec to get desired results
Reading files and word count with Sparkspark articlepython scriptHow to read files of different formats using PySpark with a word count example
Extracting Keywords with TF-IDF and SKLearn (with dataset)tfidf articlenotebookHow to extract interesting keywords from text using TF-IDF and Python's SKLEARN
Text Preprocessingtext preprocessing articlenotebookA few code snippets on how to perform text preprocessing. Includes stemming, noise removal, lemmatization and stop word removal.
TFIDFTransformer vs. TFIDFVectorizertfidftransformer and tfidfvectorizer usage articlenotebookHow to use TFIDFTransformer and TFIDFVectorizer correctly and the difference between the two and what to use when.
Accessing Pre-trained Word Embeddings with GensimPre-trained word embeddings articlenotebookHow to access pre-trained GloVe and Word2Vec Embeddings using Gensim and an example of how these embeddings can be leveraged for text similarity
Text Classification in Python (with news dataset)Text classification with Logistic Regression articlenotebookGet started with text classification. Learn how to build and evaluate a text classifier for news classification using Logistic Regression.
CountVectorizer Usage ExamplesHow to Correctly Use CountVectorizer? An In-Depth Look articlenotebookLearn how to maximize the use of CountVectorizer such that you are not just computing counts of words, but also preprocessing your text data appropriately as well as extracting additional features from your text dataset.
HashingVectorizer ExamplesHashingVectorizer Vs. CountVectorizer articlenotebookLearn the differences between HashingVectorizer and CountVectorizer and when to use which.
CBOW vs. SkipGramWord2Vec: A Comparison Between CBOW, SkipGram & SkipGramSI articlenotebookA quick comparison of the three embeddings architecture.

Notes

Contact

This repository is maintained by Kavita Ganesan. Connect with me on LinkedIn or Twitter.