Awesome
Statistical Rethinking: A Bayesian Course Using python and pymc3
Intro
Hello everybody!
This repo contains the python
/pymc3
version of the Statistical Rethinking course that Professor Richard McElreath taught on the Max Planck Institute for Evolutionary Anthropology in Leipzig during the Winter of 2019/2020. The original repo for the course, from which this repo is forked, can be found here.
The course contains 20 lectures structured in 10 weeks with a series of assignments for each week. This homework was done using the original rethinking
package and ulam
, a wrapper of rstan
for R
. The course is an excellent introduction to bayesian modelling in general and to the Rethinking Statistics wonderful book written by Professor McElreath. The course is really great, entertaining, eye-opening and very instructive.
I started to watch the lectures and do the homework but since I tend to prefer python
to R
I also started to re-do all the homework using pymc3
, a popular python
library for bayesian modelling that uses theano
as backend. After I finished the course I thought I should make public the jupyter
notebooks, just in case somebody finds them useful. This repo is a love-letter to the course that I have enjoyed so very much and to the work of Professor McElreath. Thank you Richard for inspiring a generation of scientists.
How to use this repo
There are ten jupyter
notebooks, one for each week of the course. At the beginning of each notebook there are links to the youtube videos of the lectures, the slides used and the original homework questions and answers in R
. I have put together all the material in the notebooks so you only have to follow one document at a time. Therefore each notebook basically contains four things:
- Original exercises proposed
- Original answers given by Professor McElreath. By this I mean only the text, not the code
python
code that provides solutions to the exercises- Brief comments made by me on differences of implementation between
R
andpython
or tips/tricks ofpymc3
that I learned along the way
Points 1. and 2. are written down in normal letters and contain minimum editing on my part to match it with my code. These sections were written by Professor McElreath and I kept them as they were in the original course. Points 3. and 4. are my humble contribution. The code is very easily identifiable and point 4. (my comments) are always written in italics to be perfectly identifiable and differentiable from Professor McElreath words. I kept them to a minimum but sometimes there are things to clarify, useful tips or common mistakes.
How I would use this repo is like this:
- Go to the notebook of the week (from 1 to 10).
- Watch the two videos for the lectures of that week (at the very top of each notebook).
- Read the original problems presented to the students and try to solve them on your own (for real! try it!).
- Follow the exercises solutions of the notebook with my code and explanations by Professor McElreath.
Technical considerations
I run the jupyter
notebooks in a fairly humble machine running python
3.6. All the libraries needed are always at the top of the notebook as usual. There are not that many. The usual suspects such as pandas
, numpy
or matplotlib
. For the actual modelling I used theano
and pymc3
and for plotting mostly altair
. I used pymc3
3.7, which is the lastest version. I did use pymc3
3.7 because of the new Data
class available only from this version. I explain in detail the advantages on having the possibility of using this new class in one of the notebooks.
Other useful resources
There are a lot of very useful resources for bayesian statistical modelling out there. Specifically centered on Professor McElreath work I would mention:
- Original repo for the course.
- Original
rethinking
package repo. - The
pymc3
repo contains a resources section where you can find the exercises for the first edition of the Rethinking Statistics book (the book, not the course) done inpymc3
. It's a bit outdated but still a very good resource. - A. Solomon Kurz re-wrote the whole book exercises using a great
R
package calledbrms
. You can find this extensive and amazing work here and here.
Notebooks
Finally, since github sometimes has issues rendering Jupyter
notebooks, you can find them via nbviewer in the following links. In the repo, you can find them in the /notebooks/pymc3
folder.
Week 1 notebook: The Golem of Prague and Garden of Forking Data
Week 2 notebook: Geocentric Models and Wiggly Orbits
Week 3 notebook: Spurious Waffles and Haunted DAG
Week 4 notebook: Ulysses' Compass and Model Comparison
Week 5 notebook: Conditional Manatees and Markov Chain Monte Carlo
Week 6 notebook: Maximum entropy & GLMs and God Spiked the Integers (binomial & Poisson GLMs)
Week 7 notebook: Monsters & Mixtures (Poisson GLMs, survival, zero-inflation) and Ordered Categories, Left & Right
Week 8 notebook: Multilevel Models and Multilevel Models 2
Week 9 notebook: Adventures in Covariance and Slopes, Instruments and Social Relations
Week 10 notebook: Gaussian Processes and Missing Values and Measurement Error