Awesome
Interactive PyTorch DDP Training in FastAI Jupyter Notebooks
Ddip
("Dee dip") --- Distributed Data "interactive" Parallel is a little iPython extension of line and cell magics to bring together fastai
lesson notebooks [1] and PyTorch's Distributed Data Parallel [2]. It uses ipyparallel
[3] to manage the DDP process group.
Platform tested: single host with multiple Nvidia CUDA GPUs, Ubuntu linux + PyTorch + Python 3, fastai v1 and fastai course-v3.
Features:
"Distributed training doesn’t work in a notebook..."
-- FastAI's tutorial on How to launch a distributed training
Ddip
was conceived to address the above, with the following features:
-
Switch execution easily between PyTorch's multiprocess DDP group and local notebook namespace.
-
Takes 3 - 5 lines of iPython magics to port a Fastai
course v3
notebook to train in DDP. -
Reduce chance of GPU out of memory error by automatically emptying GPU cache memory after executing a cell in the GPU proc.
-
Extensible, to support future versions of
fastai
.
Summary of speedup observed in FastAI notebooks when trained with 3 GPUs.
Installation:
Current version: 0.1.1
pip install git+https://github.com/philtrade/Ddip.git@v0.1.1#egg=Ddip
Overview:
Control DDP and cell execution destination using %
and %%
magics:
%load_ext Ddip
, to load the extension.%makedip ...
, to start/stop/restart a DDP group, and an app, e.g.fastai_v1
.%%dip {remote, local, everywhere} ...
, where to execute the cell.%autodip {on,off}
, to automatically prepend%%dip
to subsequent cells.%dipush
, and%dipull
, to pass objects between the notebook and the DDP namespaces.
How to run DDP with in FastAI notebooks with Ddip
:
- Distributed Training in
fastai
Notebook usingDdip
- a tutorial - Example notebooks of
Ddip
iPython magics: - More Notebooks
Known Issues and Room for Improvements
References:
-
<a name="course_v3"></a> FastAI Course v3
-
<a name="pytorchddp"></a>On Distributed Training:
- Tutorial from PyTorch on Distributed Data Parallel
- Launching fastai to use DDP, FastAI
- Further readings: PyTorch Lightning -- Tips for faster training , On the performance of different training parallelism
- <a name="ipp"></a>On
ipyparallel
:
- The Official ipyparallel Documentation
- An Intro to ipyparallel, Activevision Game Science
- Using ipyparallel, Duke University, "Computational Statistics in Python"
- Interactive Distributed Deep Learning with Jupyter Notebooks, Lawrence Berkeley National Laboratory/Cray Inc.