Awesome
Foreword
Important Notices
- Since release 0.24, nbhosting no longer relies on cookies to route traffic to the containers.
- Since release 0.30, nbhosting relies on podman and no longer on docker to host containers.
Jupyter notebook hosting architecture
This git repo contains a collection of utilities, that together make up the architecture behind nbhosting.inria.fr
that is designed as a notebook-serving infrastructure.
Use case : MOOCs
First use case is for hosting notebooks in the context of MOOCs. See e.g. on fun-mooc.fr:
- Python : des fondamentaux à l'utilisation du langage
- Bioinformatique : algorithmes et génomes
- Physique : préparation à l'entrée dans l'enseignement supérieur
The m@agistere service also uses this same infrastructure to add notebooks to their moddle-based LMS
In the classroom
In addition to this "*silent" mode, it is also possible to use it in standalone mode in the classroom; to that end, nbhosting also offers a few features to provide a thin navigation/structuring layer on top of notebook-oriented contents.
Open-edX teacher side
As far as fun-mooc/edx mode is concerned, on the edx side, teacher
would create a bloc typed as ipython notebook - note that the
present repo does not address the code for the edx extension that
supports this type of blocs (ref?); it is readily available at this
point (jan. 2017) at fun-mooc.fr
; see below for enabling it on a new
course.
Open-edX student side
With these settings in place, here's what a student would see;
How does it work ?
In a nutshell:
- the first time a student opens a notebook, nbhosting transparently creates them an account, together with a container;
- the first time a student opens a given notebook, this notebook is copied from the master course contents into her container; note that there are 2 different strategies at work in terms of copying, as explained below; in any case, from that point on, their work for that notebook is independant from the master course;
- containers are automatically stopped (i.e. frozen) when the student is idle for some tunable amount of time, so as to preserve computing resources; as a consequence, a student may have to wait up to 10 seconds when she shows up the first time or after idle time (i.e. each a container needs to be respawn).
2 Additional features allow a student to:
- Reset to Original: copy again the master course into their container - *beware that they will then lose their work on that notebook of course.
- Share Static Version: create a read-only snapshot of her notebook, that can then be used to share their work in the course's forum or on their favorite chat system.
Miscellaneous
Enabling New ipython notebook
Before you can, as a teacher, add your first notebook-backed content in your edx
course, you need to enable that extension; in order to do that, go to Studio,
and then in your course's Settings → Avanced, and add ipython
the Avanced
Module List setting, as illustrated below:
Workflow / how to publish
Workflow is entirely based on git : a course is defined from a git repo, typically remote (github, gitlab, ...) and public. In order to publish a new version of your notebooks, you need to push them to that reference repo, and then instruct nbhosting to pull the new stuff :
If you set a given course in autopull mode, nbhosting will perform this pull operation on its own every 5 minutes.
Container image
Each course is deployed based on a specific image; for customization,
create a file named nbhosting/Dockerfile
in your course repo.
Note that some magic recipes need to be applied in your image for proper
deployment, so you should start from either the nbhosting/minimal-notebook
or nbhosting/scipy-notebook
image; see the beginning of the code for our Python
MOOC for an example.
That image can then be rebuilt from the website. The new image will be deployed incrementally, essentially as running containers get phased out when detected as inactive; this means it can take a day or two before all the students can see the upgrade.
Notebook metadata
Each notebook is displayed with a label and version number; like e.g. on the example above . For tweaking that, use your notebook's metadata and set these two items:
Statistics
Some usage statistics are available, for visually inspecting data like:
- how many different students have showed up and when,
- which notebooks were opened and when,
- computing resources like created/active containers, disk space, CPU load...
Staff
You can declare some people as being staff; this is used by nbhosting only for discarding accesses done by these people, when putting stats together. A convenience button also allows to trash all the working files for people declared as staff, which can come in handy to be sure that staff people always see the latest pushed version.
For declaring somebody as staff, you need to somehow locate that person's hash, as exposed by edx.
Jupytext
text-formats are way easier to manage under git than the historical ipynb
format; for
that reason, nbhosting provides full and transparent support for notebooks saved in a
text-format, at least for formats known under jupytext as py:percent
, py:light
,
markdown
and md:myst
.
Dataflow - nbhosting
side
Here's the general principle of how things work
silent mode (in an iframe, behind a MOOC system)
- Open-edX forges a URL, like the one shown above, with
student
replaced with the hash of some student id - This is caught by nginx, that runs forefront; the
notebookLazyCopy/
prefix is routed to a django application, that primarily does this- create a linux user if needed
- create a copy of that notebook for the student if needed
- spawns a jupyter container for the couple (course, student)
- redirects to a (plain https, on port 443) URL that contains the port number that the container can be reached at (on localhost via http)
Note that notebookLazyCopy
used to be named ipythonExercice
, which is still supported for backward compatibility.
classroom mode
The classroom mode uses a similar approach, but uses a URL that
mentions notebookGitRepo/
instead of notebookLazyCopy/
; the
behaviour is mostly the same except for the policy used to create
notebooks in the student space; when the visited notebook is missing
there, notebookGitRepo
triggers a git clone operation, instead of
copying notebooks individually.
The advantage in this mode is that students can later on use the jupyterlab git extension to accurately manage their local repo, i.e. drop or commit local changes, pull any updates from the master repo, and so on
An experimental feature called 'pull-students' allows to deal with changes made in the master course; it allows to automatically pull these changes in the student's repo.
summary
As a summary:
TODO
See Issues on github for an up-to-date status.