Awesome
tuna - Hyperparameter search for AllenNLP, powered by Ray TUNE
Installation
With pip
tuna can be installed by pip as follows:
TBD
From source
Clone the repository and run:
pip install [--editable] .
A series of tests is included in the tests folder.
You can run the tests with the command (install pytest if needed: pip install pytest
):
pytest -vv
Running tuna
$ tuna --help
Run tuna
optional arguments:
-h, --help show this help message and exit
--experiment-name EXPERIMENT_NAME
a name for the experiment
--num-cpus NUM_CPUS number of CPUs available to the experiment
--num-gpus NUM_GPUS number of GPUs available to the experiment
--cpus-per-trial CPUS_PER_TRIAL
number of CPUs dedicated to a single trial
--gpus-per-trial GPUS_PER_TRIAL
number of GPUs dedicated to a single trial
--log-dir LOG_DIR directory in which to store trial logs and results
--with-server start the Ray server
--server-port SERVER_PORT
port for Ray server to listens on
--search-strategy SEARCH_STRATEGY
hyperparameter search strategy used by Ray-Tune
--hyperparameters HYPERPARAMETERS
path to file describing the hyperparameter search
space
Example
Switch to the example
directory and run the following command:
tuna \
--experiment-name "example" \
--hyperparameters ./text_classifier/hyperparam.json \
--include-package my_library \
--parameter-file ./text_classifier/config.jsonnet \
--log-dir ~/experiment_results/text_classifier/ \
--num-cpus 4 \
--num-gpus 0 \
--cpus-per-trial 1 \
--gpus-per-trial 0
How does it work?
AllenNLP already offers a great way to configure model and training parameter in Jsonnet (for details, see Using Config Files).
Lets assume we configured a simple CNN classifier and want to do hyperparameter search.
[...]
"model": {
"type": "cnn-classifier",
"text_field_embedder": {
"token_embedders": {
"tokens": {
"type": "embedding",
"embedding_dim": 300,
},
}
},
"text_encoder": {
"type": "cnn",
"embedding_dim": 300,
"num_filters": 100,
"ngram_filter_sizes": [2, 3, 4, 5],
},
"classifier_feedforward": {
"input_dim": 400,
"num_layers": 2,
"hidden_dims": [200, 2],
"activations": ["relu", "linear"],
"dropout": [0.5, 0.0],
}
}
[...]
It's possible to override the experiment configuration at training time by providing a JSON structure to the train command via the --overrides
argument. For example, to increase the number of filters of our CNN to 200, we use the following command:
$ allennlp train <parameter file> \
--overrides {"model.text_encoder.num_filters": 200}
At this point, it looks quite straight forward. The only thing missing is a search strategy (e.g. grid search or random search) to generate the parameter configurations to be evaluated, which than can be used to override the desired parameters at training time.
Unfortunately, it's not that easy because there are some dependencies between confuration parameters. For example, if we change the number of filters used in our CNN, we must also change the input_dim
of our classifier_feedforward
as the output dimension of our CNN is now num_filters * len(ngram_filter_sizes) = 800
.
Fortunately, Jsonnet provides some nice features around variable substitution. We introduce a local variable classifier_input_dim
, which we can use to resolve the dependency between input_dim
, num_filters
, and ngram_filter_sizes
.
[...]
local classifier_input_dim = $["model"].text_encoder.num_filters * std.length($["model"].text_encoder.ngram_filter_sizes);
"model": {
"type": "cnn-classifier",
"text_field_embedder": {
"token_embedders": {
"tokens": {
"type": "embedding",
"embedding_dim": 300,
},
}
},
"text_encoder": {
"type": "cnn",
"embedding_dim": 300,
"num_filters": 100,
"ngram_filter_sizes": [2, 3, 4, 5],
},
"classifier_feedforward": {
"input_dim": classifier_input_dim,
"num_layers": 2,
"hidden_dims": [200, 2],
"activations": ["relu", "linear"],
"dropout": [0.5, 0.0],
}
}
[...]
Though this seems better, it still doesn't allow us to dynamically configure the parameters, because the local variable classifier_input_dim
is evaluated when the Jsonnet file is loaded and only then the configuration parameter override happens. Therefore classifier_input_dim
is evaluated with the default parameters for num_filters
, and ngram_filter_sizes
.
Luckily, with Jsonnet we can define the configuration to be a function, taking some arguments with default values and returning a JSON. This has two advantages: the configuration can be used as usual with the allennlp train
command but most importantly Jsonnet supports the concept of so called top-level arguments that can be used to parameterize the configuration. This is exactly what tuna does, it combines the functionality of Jsonnet with Ray TUNE to provide scalable hyperparameter search for AllenNLP.
function (num_filters=100, ngram_filter_sizes=[2, 3, 4, 5]) {
local classifier_input_dim = num_filters * std.length(ngram_filter_sizes);
[...]
"model": {
"type": "cnn-classifier",
"text_field_embedder": {
"token_embedders": {
"tokens": {
"type": "embedding",
"embedding_dim": 300,
},
}
},
"text_encoder": {
"type": "cnn",
"embedding_dim": 300,
"num_filters": num_filters,
"ngram_filter_sizes": ngram_filter_sizes,
},
"classifier_feedforward": {
"input_dim": classifier_input_dim,
"num_layers": 2,
"hidden_dims": [200, 2],
"activations": ["relu", "linear"],
"dropout": [0.5, 0.0],
}
}
[...]
}