Awesome
<div align="center"> <h1>Awesome Online Machine Learning</h1> <a href="https://github.com/sindresorhus/awesome"><img src="https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg"/></a> </div>Online machine learning is a subset of machine learning where data arrives sequentially. In contrast to the more traditional batch learning, online learning methods update themselves incrementally with one data point at a time.
Courses and books
- Machine Learning for Streaming Data with Python
- IE 498: Online Learning and Decision Making
- Introduction to Online Learning
- Machine Learning the Feature — Gives some insights into the inner workings of Vowpal Wabbit, especially the slides on online linear learning.
- Machine learning for data streams with practical examples in MOA
- Online Methods in Machine Learning (MIT)
- Streaming 101: The world beyond batch
- Prediction, Learning, and Games
- Introduction to Online Convex Optimization
- Reinforcement Learning and Stochastic Optimization: A unified framework for sequential decisions — The entire book builds upon Online Learning paradigm in applied learning/optimization problems, Chapter 3 Online learning being the reference.
- Big Data course at the CILVR lab at NYU — Focus on linear models and bandits. Some courses are given by John Langford, the creator of Vowpal Wabbit.
- Machine Learning for Personalization — Course from Columbia by Tony Jebara, covers bandits.
- An Introduction to Online Learning
- Streaming Data Analytics - Course from Politecnico di Milano.
Blog posts
- Fennel AI blog posts about online recsys
- Anomaly Detection with Bytewax & Redpanda (Bytewax, 2022)
- The online machine learning predict/fit switcheroo (Max Halford, 2022)
- Real-time machine learning: challenges and solutions (Chip Huyen, 2022)
- Anomalies detection using River (Matias Aravena Gamboa, 2021)
- Introdução (não-extensiva) a Online Machine Learning (Saulo Mastelini, 2021)
- Machine learning is going real-time (Chip Huyen, 2020)
- The correct way to evaluate online machine learning models (Max Halford, 2020)
- What is online machine learning? (Max Pagels, 2018)
- What Is It and Who Needs It (Data Science Central, 2015)
Software
See more here.
Modelling
- River — A Python library for general purpose online machine learning.
- dask
- Jubatus
- Flink ML - Apache Flink machine learning library
- LIBFFM — A Library for Field-aware Factorization Machines
- LIBLINEAR — A Library for Large Linear Classification
- LIBOL — A collection of online linear models trained with first and second order gradient descent methods. Not maintained.
- MOA
- scikit-learn — Some of scikit-learn's estimators can handle incremental updates, although this is usually intended for mini-batch learning. See also the "Computing with scikit-learn" page.
- Spark Streaming — Doesn't do online learning per say, but instead mini-batches the data into fixed intervals of time.
- SofiaML
- StreamDM — A machine learning library on top of Spark Streaming.
- Tornado
- VFML
- Vowpal Wabbit
Deployment
- KappaML
- django-river-ml — a Django plugin for deploying River models
- chantilly — a prototype meant to be compatible with River (previously Creme)
Papers
Linear models
- Field-aware Factorization Machines for CTR Prediction (2016)
- Practical Lessons from Predicting Clicks on Ads at Facebook (2014)
- Ad Click Prediction: a View from the Trenches (2013)
- Normalized online learning (2013)
- Towards Optimal One Pass Large Scale Learning with Averaged Stochastic Gradient Descent (2011)
- Dual Averaging Methods for Regularized Stochastic Learning andOnline Optimization (2010)
- Adaptive Regularization of Weight Vectors (2009)
- Stochastic Gradient Descent Training forL1-regularized Log-linear Models with Cumulative Penalty (2009)
- Confidence-Weighted Linear Classification (2008)
- Exact Convex Confidence-Weighted Learning (2008)
- Online Passive-Aggressive Algorithms (2006)
- Logarithmic Regret Algorithms forOnline Convex Optimization (2007)
- A Second-Order Perceptron Algorithm (2005)
- Online Learning with Kernels (2004)
- Solving Large Scale Linear Prediction Problems Using Stochastic Gradient Descent Algorithms (2004)
Support vector machines
- Pegasos: Primal Estimated sub-GrAdient SOlver for SVM (2007)
- A New Approximate Maximal Margin Classification Algorithm (2001)
- The Relaxed Online Maximum Margin Algorithm (2000)
Neural networks
Decision trees
- AMF: Aggregated Mondrian Forests for Online Learning (2019)
- Mondrian Forests: Efficient Online Random Forests (2014)
- Mining High-Speed Data Streams (2000)
Unsupervised learning
- Online Clustering: Algorithms, Evaluation, Metrics, Applications and Benchmarking (2022)
- Online hierarchical clustering approximations (2019)
- DeepWalk: Online Learning of Social Representations (2014)
- Online Learning with Random Representations (2014)
- Online Latent Dirichlet Allocation with Infinite Vocabulary (2013)
- Web-Scale K-Means Clustering (2010)
- Online Dictionary Learning For Sparse Coding (2009)
- Density-Based Clustering over an Evolving Data Stream with Noise (2006)
- Knowledge Acquisition Via Incremental Conceptual Clustering (2004)
- Online and Batch Learning of Pseudo-Metrics (2004)
- BIRCH: an efficient data clustering method for very large databases (1996)
Time series
Drift detection
Anomaly detection
- Leveraging the Christoffel-Darboux Kernel for Online Outlier Detection (2022)
- Interpretable Anomaly Detection with Mondrian Pólya Forests on Data Streams (2020)
- Fast Anomaly Detection for Streaming Data (2011)
Metric learning
- Online Metric Learning and Fast Similarity Search (2009)
- Information-Theoretic Metric Learning (2007)
- Online and Batch Learning of Pseudo-Metrics (2004)
Graph theory
Ensemble models
- Optimal and Adaptive Algorithms for Online Boosting (2015) — An implementation is available here
- Online Bagging and Boosting (2001)
- A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting (1997)
Expert learning
Active learning
Miscellaneous
- Multi-Output Chain Models and their Application in Data Streams (2019)
- A Complete Recipe for Stochastic Gradient MCMC (2015)
- Online EM Algorithm for Latent Data Models (2007) — Source code is available here
- StreamAI: Dealing with Challenges of Continual Learning Systems for Serving AI in Production (2023)
Surveys
- Machine learning for streaming data: state of the art, challenges, and opportunities (2019)
- Online Learning: A Comprehensive Survey (2018)
- Online Machine Learning in Big Data Streams (2018)
- Incremental learning algorithms and applications (2016)
- Batch-Incremental versus Instance-Incremental Learning in Dynamic and Evolving Data
- Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey (2011)
- Online Learning and Stochastic Approximations (1998)
General-purpose algorithms
- Maintaining Sliding Window Skylines on Data Streams (2006)
- The Sliding DFT (2003) — An online variant of the Fourier Transform, a concise explanation is available here
- Sketching Algorithms for Big Data