Home

Awesome

PyTorch Applications for AdaPM

These are the PyTorch implementations of a graph convolutional network (GCN) and click-through-rate prediction (CTR) applications as used in the AdaPM paper. Please also see the main repository for AdaPM.

Install

Dependencies are PyTorch, DGL (Deep Graph Library), and Open Graph Benchmark. To install with CUDA 11.6:

pip install dgl-cu116 dglgo -f https://data.dgl.ai/wheels/repo.html torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
pip install ogb

To install AdaPM and its PyTorch bindings, follow the steps described here.

Get the code

git clone https://github.com/alexrenz/adapm-pytorch-apps/
cd adapm-pytorch-apps

Modify PYTHONPATH so Python finds modules located in the current work directory.

export PYTHONPATH="${PYTHONPATH}:."
# ALTERNATIVE:
# export PYTHONPATH="${PYTHONPATH}:/path/to/code/directory/"

Usage examples

GCN: Graph convolutional networks

To run on the example data:

python gcn/run.py --dataset ogbn-arxiv --data_root example_data/gnn/ --no_cuda

See python gcn/run.py --help for more info.

CTR: Click-through-rate prediction

To run on the example data:

python ctr/run.py --embedding_dim 4 --dataset_dir example_data/ctr/criteo-subset/ --no_cuda

See python ctr/run.py --help for more info.

Options

Distributed training

Launching with tracker scrips

Distributed training can be launched with the tracker scripts of AdaPM. For that, use the --tracker option. For example:


python ../AdaPM/tracker/dmlc_ssh.py -s 2 -H [HOSTFILE] python gcn/run.py --dataset ogbn-arxiv --data_root example_data/gnn/ --no_cuda --tracker

The hostfile should contain the host name of one node per line.

Launching manually

Distributed training can also be launched manually by starting the appropriate processes on the corresponding nodes manually. To do so, pass the IP of the node that hosts the scheduler process, an open port, and the world size to the processes as seen below. We recommend launching with tracker scripts.

# start the scheduler process (run this once):
python gcn/run.py --nodes 0 --root_uri "[SCHEDULER_IP]" --root_port "9091" --world_size 2 --scheduler

# start one nodes process (run this on `world_size` nodes):
python gcn/run.py --nodes 1 --root_uri "[SCHEDULER_IP]" --root_port "9091" --world_size 2

CUDA options

The script automatically makes use of available CUDA devices. The this can be disabled by using --no_cuda. By default workers are assigned round robin to CUDA devices. Use --device_ids to provide an alternative assignment (one device ID for each worker thread). For example:

python gcn/run.py --nodes 2 --workers_per_node 1 --device_ids 2 3