Awesome
AdaPS is a fully adaptive parameter server (PS). AdaPS is efficient for many machine learning tasks out of the box because it automatically adapts to the underlying task. It adapts based on intent signals. I.e., the application signals which parameters it intends to access in the near future. Based on these signals, AdaPS decides automatically (i.e., without specific user input) and adaptively (i.e., depending on the current situation) what to do and when to do it. This makes AdaPS efficient and easy to use. We describe details in our paper on AdaPS (arXiv).
The main
branch of this repository contains the latest version of AdaPS.
Details on the experiments in the AdaPS paper
(arXiv) can be found in
docs/experiments.md.
The source code used in the paper is in branch
review
.
AdaPS is the successor of Lapse and NuPS. Lapse is the first PS
that supports dynamic parameter allocation, i.e., the ability to relocate
parameters among nodes during run time. Our paper on Lapse provides more
information (PVLDB 13(12),
2020). Details on the
experiments for this paper can be found in
docs/experiments-vldb20.md.
The source code used in this paper is in branch
vldb20
. NuPS is a novel
multi-technique PS that combines relocation and replication management
techniques, and supports sampling directly in the PS. Our paper on NuPS provides
more detail (arXiv, to appear in SIGMOD
'22). Details on the experiments of this paper can be found in
docs/experiments-sigmod22.md.
The source code used in this paper is in branch
sigmod22
.
AdaPS provides bindings to PyTorch, see bindings/.
The implementation of AdaPS is based on NuPS, Lapse, and PS-Lite.
Usage
AdaPS provides the following primitives to access parameters:
Pull(keys)
: retrieve the values of a set of parameters (identified by keys)Push(keys, updates)
: send (additive) updates for parameters
AdaPS provides the following primitives to signal intent:
Intent(keys, start, end)
: signal that the issuing worker intends to accesskeys
between clockstart
(incl.) andend
(excl.)advanceClock()
: raise the clock of the issuing worker by 1
Additionally, AdaPS supports sampling access (as NuPS does) via the following primitives:
handle = PrepareSample(N)
: prepare a group ofN
samplesPullSample(handle)
: retrieveN
samples from a prepared group
By default, the Pull()
, Push()
, and PullSample()
primitives execute asynchronously. Wait()
can be used to execute these primitives synchronously. For example: Wait(Pull(keys))
.
A simple example:
std::vector<uint64_t> keys = {1, 3, 5};
std::vector<float> updates = {1, 1, 1};
std::vector<float> vals;
ps::KVWorker<float> kv;
kv.Wait(kv.Pull(keys, &vals)); // access without intent
kv.Wait(kv.Push(keys, updates));
kv.Intent(keys, 1, 2);
// ...
kv.advanceClock(); // clock started at 0, so is at 1 now
kv.Wait(kv.Pull(keys, &vals)); // access with intent
kv.Wait(kv.Push(keys, updates)); // access with intent
// sampling access
auto h = kv.PrepareSample(3); // prepare a group of 20 samples
kv.Wait(kv.PullSample(h, keys, vals)); // pull the 3 samples (keys.size() determines how many samples are pulled)
Build
AdaPS requires a C++11 compiler such as g++ >= 4.8
and boost for some the application examples. On Ubuntu >= 13.10, you
can install it by
sudo apt-get update && sudo apt-get install -y build-essential git libboost-all-dev
Then clone and build
git clone https://github.com/alexrenz/AdaPS
cd AdaPS && make
See bindings/README.md for how to build the bindings.
Getting started
A very simple example can be found in simple.cc. To run it, compile it:
make apps/simple
and run
python tracker/dmlc_local.py -s 1 build/apps/simple
to run with one node and default parameters or
python tracker/dmlc_local.py -s 3 build/apps/simple -v 5 -i 10 -k 14 -t 4
to run with 3 nodes and specific parameters. Run build/apps/simple --help
to see available parameters.
Starting an application
There are multiple start scripts. We commonly use the following ones:
- tracker/dmlc_local.py to run on a local machine
- tracker/dmlc_ssh.py to run on a cluster
To see more information, run
python tracker/dmlc_local.py --help
, for example.
The -s
flag specifies how many processes/nodes to use. For example, -s 4
uses 4 nodes. In each process, AdaPS starts one server thread and multiple worker threads.
Example Applications
You find example applications in the apps/ directory and launch commands to locally run toy examples below. The toy datasets are in apps/data/.
Knowledge Graph Embeddings
make apps/knowledge_graph_embeddings
python3 tracker/dmlc_local.py -s 2 build/apps/knowledge_graph_embeddings --dataset apps/data/kge/ --num_entities 280 --num_relations 112 --num_epochs 4 --embed_dim 100
Word vectors
make apps/word2vec
python3 tracker/dmlc_local.py -s 2 build/apps/word2vec --num_threads 2 --negative 2 --binary 1 --num_keys 4970 --embed_dim 10 --input_file apps/data/lm/small.txt --num_iterations 4 --window 2 --data_words 10000
Matrix Factorization
make apps/matrix_factorization
python3 tracker/dmlc_local.py -s 2 build/apps/matrix_factorization --dataset apps/data/mf/ -r 4 --num_rows 6 --num_cols 4 --epochs 10 --signal_intent_cols 1
Architecture
AdaPS starts one process per node. Within this process, worker threads access the parameter store directly. A parameter server thread handles requests by other nodes, and a synchronization manager thread triggers replica synchronization and intent communication.
How to cite
The citation for AdaPS is as follows:
@misc{adaps,
author = {Renz-Wieland, Alexander and Kieslinger, Andreas and Gericke, Robert and Gemulla, Rainer and Kaoudi, Zoi and Markl, Volker},
title = {Good Intentions: Adaptive Parameter Servers via Intent Signaling},
publisher = {arXiv},
year = {2022},
doi = {10.48550/ARXIV.2206.00470},
url = {https://arxiv.org/abs/2206.00470},
}
If you wish to refer NuPS specifically, cite:
@inproceedings{nups,
author = {Renz-Wieland, Alexander and Gemulla, Rainer and Kaoudi, Zoi and Markl, Volker},
title = {NuPS: A Parameter Server for Machine Learning with Non-Uniform Parameter Access},
year = {2022},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
booktitle = {To appear in the Proceedings of the 2022 ACM International Conference on Management of Data},
location = {Chicago, Illinois, USA},
series = {SIGMOD '22}
}
If you wish to refer Lapse specifically, cite:
@article{lapse,
author = {Renz-Wieland, Alexander and Gemulla, Rainer and Zeuch, Steffen and Markl, Volker},
title = {Dynamic Parameter Allocation in Parameter Servers},
year = {2020},
issue_date = {August 2020},
publisher = {VLDB Endowment},
volume = {13},
number = {12},
issn = {2150-8097},
url = {https://doi.org/10.14778/3407790.3407796},
doi = {10.14778/3407790.3407796},
journal = {Proc. VLDB Endow.},
month = jul,
pages = {1877–1890},
numpages = {14}
}