Awesome
Awesome Feature Engineering for Machine Learning
A curated list of resources dedicated to Feature Engineering Techniques for Machine Learning
Maintainers - Andrei Khobnia
This page is licensed under Creative Commons Attribution-Noncommercial-ShareAlike 3.0 Unported License
Please feel free to create pull requests.
Contents
Numeric Data
Scaling
Ranking
Quantization and Binning
Box-Cox Transformation
- scipy.stats.boxcox
np.log (x + const)
Yeo-Johnson Transformation
Feature Interactions
- Featuretools
- sklearn.preprocessing.PolynomialFeatures
- Divisions
- Other interactions
Clustering Features
t-SNE Features
PCA Features
Textual Data
Bag of Words
- Bag-of-words model
- A Gentle Introduction to the Bag-of-Words Model
- sklearn.feature_extraction.text.CountVectorizer
- sklearn.feature_extraction.DictVectorizer
- sklearn.feature_extraction.FeatureHasher
Phrase Detection Features
TFIDF
Word Embeddings
- Word embedding
- GloVe: Global Vectors for Word Representation
- Gensim: models.word2vec – Word2vec embeddings
- fastText
- Word2Vec and FastText Word Embedding with Gensim
- Do Pretrained Embeddings Give You The Extra Edge?
Subword Embeddings
Pattern Features
- ClearTK - Feature Extraction Tutorial
- Regular Expressions
Lexicon Features
PoS Features
- Part-of-Speech_Tagging
- NLTK Categorizing and Tagging Words
- How to use PoS features in scikit learn classfiers
Image Data
Computer Vision Algorithm Features
- Feature extraction and similar image search with OpenCV for newbies
- OpenCV -- Feature Detection and Description
- SimpleCV.Features package
- Scikit-image feature module
Image Statistics Features
OCR Features
Deep Learning Features
- Keras pre-trained models feature extraction
- Using Keras’ Pre-trained Models for Feature Extraction in Image Clustering
Categorical Data
One Hot Encoding
- Why One-Hot Encode Data in Machine Learning?
- How to One Hot Encode Sequence Data in Python
- sklearn.preprocessing.OneHotEncoder
- Keras - to_categorical
Count Encoding
Label Encoding
Dummy Encoding
Mean Encoding
- Likelihood encoding of categorical features
- Python target encoding for categorical features
- Adding variance column when mean encoding
Hashing
- Feature Hashing on Wikipedia
- Feature hashing and Extraction in VowpalWabbit
- Feature hashing in scikit-learn
Time Series Data
- Automatic extraction of relevant features from time series
- Basic Feature Engineering With Time Series Data in Python