Awesome

optw_rl

A PyTorch implementation of the Pointer Network model in:

A reinforcement learning approach to the orienteering problem with time windows <br/> Gama R, Fernandes HL - Computers & Operations Research, 2021 (arXiv, github)


@article{GAMA2021105357,
title={A Reinforcement Learning Approach to the Orienteering Problem with Time Windows},
journal = {Computers & Operations Research},
volume = {133},
pages = {105357},
year = {2021},
issn = {0305-0548},
author={Ricardo Gama and Hugo L. Fernandes},
journal = {Computers & Operations Research},
url = {https://www.sciencedirect.com/science/article/pii/S0305054821001349},
}

Learning and Inference

Quick Usage:

This repo includes two already trained models:

Cordeau's OPTW instance-region pr01, trained with uniform sampling:

$ python inference_optw_rl.py --instance pr01  --model_name article --sample_type uni_samp
route: [0, 9, 24, 47, 12, 38, 30, 2, 32, 37, 10, 45, 11, 28, 1, 16, 36, 31, 35, 34, 22, 7, 0]
total score: 308
inference time: 173 ms

using CPU:

$ python inference_optw_rl.py --instance pr01  --model_name article --sample_type uni_samp --device cpu
route: [0, 9, 24, 47, 12, 38, 30, 2, 32, 37, 10, 45, 11, 28, 1, 16, 36, 31, 35, 34, 22, 7, 0]
total score: 308
inference time: 1058 ms

Gavalas' OPTW instance-region t101, trained with correlation sampling of scores:

$ python inference_optw_rl.py --instance t101  --model_name article --sample_type corr_samp
route: [0, 68, 29, 8, 76, 3, 62, 7, 61, 16, 69, 17, 15, 33, 39, 44, 97, 46, 92, 74, 78, 0]
total score: 399
inference time: 343 ms

Setup instructions

Install Anaconda (Python 3 version).

Clone this repo:

$ git clone https://github.com/mustelideos/optw_rl

Install the environment:

$ cd optw_rl/
$ conda env create --file environment.yml

Activate the environment:
```
$ conda activate optw_env
```

Inference

On the Benchmark Instance

Using Beam Search (the default option) for Inference ("model")

$ python inference_optw_rl.py --instance t101 --model_name article --sample_type corr_samp
route: [0, 68, 29, 8, 76, 3, 62, 7, 61, 16, 69, 17, 15, 33, 39, 44, 97, 46, 92, 74, 78, 0]
total score: 399

Using Active Search followed by Beam Search ("model+as")

$ python inference_optw_rl.py --instance t101 --infe_type as_bs --model_name article --sample_type corr_samp
route: [0, 68, 29, 8, 76, 3, 62, 7, 61, 24, 46, 97, 47, 16, 69, 17, 15, 21, 95, 92, 74, 0]
total score: 400

On the Generated Instances

Make sure to generate the validation set of tourist-region-instances first
```
$ python generate_instances.py --instance t101 --sample_type corr_samp
```

Using Beam Search for Inference ("model")

$ python inference_optw_rl.py --instance t101 --model_name article --generated --sample_type corr_samp
average total score: 339.16

Using Active Search followed by Beam Search ("model+as")

$ python inference_optw_rl.py --instance t101 --infe_type as_bs --model_name article --generated --sample_type corr_samp
average total score: 340.03

Train a new Model and Infer

Make sure to generate the validation set of tourist-region-instances first. These are used to report performance during training.
```
$ python generate_instances.py --instance t101 --sample_type corr_samp
```

Choose a name, how many epochs and how often it is saved

$ python train_optw_rl.py --instance t101 --sample_type corr_samp --nepocs 1000 --nsave 1000 --model_name testing_1

Infer (using Beam Search, for instance) specifying the model name and the (saved) number of epochs

$  python inference_optw_rl.py --instance t101  --sample_type corr_samp --model_name testing_1 --saved_model_epoch 1000
total score: 376

For optional arguments and default values:

$ python train_optw_rl.py -h
(...)
optional arguments:
    -h, --help            show this help message and exit
  --instance INSTANCE   which instance to train on
  --device DEVICE       device to use (cpu/cuda)
  --use_checkpoint      use checkpoint (see
                        https://pytorch.org/docs/stable/checkpoint.html)
  --sample_type {uni_samp,corr_samp}
                        how to sample the scores of each point of interest:
                        uniformly sampled (uni_samp), score proportional to
                        each point of interest's duration of visit (corr_samp)
  --model_name MODEL_NAME
                        model name
  --debug               debug mode (verbose output and no saving)
  --nsave NSAVE         saves the model weights every <nsave> epochs
  --nprint NPRINT       to log and save the training history (total score in
                        the benchmark and generated instances of the
                        validation set) every <nprint> epochs
  --nepocs NEPOCS       number of training epochs
  --batch_size BATCH_SIZE
                        training batch size
  --max_grad_norm MAX_GRAD_NORM
                        maximum norm value for gradient value clipping
  --lr LR               initial learning rate
  --seed SEED           seed random # generators (for reproducibility)
  --beta BETA           entropy term coefficient
  --rnn_hidden RNN_HIDDEN
                        hidden size of RNN
  --n_layers N_LAYERS   number of attention layers in the encoder
  --n_heads N_HEADS     number heads in attention layers
  --ff_dim FF_DIM       hidden dimension of the encoder's feedforward sublayer
  --nfeatures NFEATURES
                        number of non-dynamic features
  --ndfeatures NDFEATURES
                        number of dynamic features

Directory Structure

.
├── data
│   ├── benchmark
│   └── generated
├── images
├── results
│   ├── pr01
│   │   └── model_w
│   │       └── model_article_uni_samp
│   └── t101
│       └── model_w
│           └── model_article_corr_samp
└── src