Awesome
mxnet-speculative-synchronization
A new parallel scheme implemented on MXNet.
Prerequisites
- Python 3.5+ (for instrumenting script)
- Python 2.7+ (for starting MXNet)
- make
- gcc
Installation
Run following commands to get the source code.
git clone --recursive https://github.com/All-less/mxnet-speculative-synchronization.git
cd mxnet-speculative-synchronization
Roll back MXNet to commit 7fcaf15a
.
cd mxnet
git checkout 7fcaf15a3a597cc72a342d1bdb00273dec00e78c
git submodule update --recursive
Our implementation is based on MXNet, so we need to insert some instrumentation into MXNet sources. We will elaborate on extra-dir
option in next section.
python instrument_source.py --extra-dir <fixed_waiting|freshness_tuning>
After instrumenting, follow the instructions here to build MXNet.
Get Started
The training process is the same as original, whereas you need to set some environment variables to activate speculative synchronization. We provide two different modes of synchronization.
Fixed Waiting
In Fixed Waiting mode, you need to specify how long each worker will wait and how many fresh updates to trigger synchronization.
export MXNET_ENABLE_CANCEL=1 # enable speculative synchronization
export MXNET_WAIT_RATIO=0.10 # wait 10% of batch time
export MXNET_CANCEL_THRESHOLD=5 # synchronize when getting more than 5 fresh updates.
Freshness Tuning
In Freshness Tuning mode, you only need to turn on the switch.
export MXNET_ENABLE_CANCEL=1
Caveats
- Only CPU training is supported.