Awesome
Mapping Language to Code in Programmatic Context
Install requirements
Pytorch 0.3 (minor changes needed for 0.4)
pip install antlr4-python3-runtime==4.6
pip install allennlp==0.3.0
pip install ipython
Download data from Google drive
mkdir concode
cd concode
Download data from: https://drive.google.com/drive/folders/1kC6fe7JgOmEHhVFaXjzOmKeatTJy1I1W into this folder.
Create production rules. This restricts the data to 100000 train and 2000 valid/test. If your resources can support it, you can use more.
python build.py -train_file concode/train_shuffled_with_path_and_id_concode.json -valid_file concode/valid_shuffled_with_path_and_id_concode.json -test_file concode/test_shuffled_with_path_and_id_concode.json -output_folder data -train_num 100000 -valid_num 2000
Prepare pytorch datasets
mkdir data/d_100k_762
python preprocess.py -train data/train.dataset -valid data/valid.dataset -save_data data/d_100k_762/concode -train_max 100000 -valid_max 2000
Train seq2seq
python train.py -dropout 0.5 -data data/d_100k_762/concode -save_model data/d_100k_762/s2s -epochs 30 -learning_rate 0.001 -seed 1123 -enc_layers 2 -dec_layers 2 -batch_size 50 -src_word_vec_size 1024 -tgt_word_vec_size 512 -rnn_size 1024 -encoder_type regular -decoder_type regular -copy_attn
Train seq2prod
python train.py -dropout 0.5 -data data/d_100k_762/concode -save_model data/d_100k_762/s2p -epochs 30 -learning_rate 0.001 -seed 1123 -enc_layers 2 -dec_layers 2 -batch_size 20 -src_word_vec_size 1024 -tgt_word_vec_size 512 -rnn_size 1024 -encoder_type regular -decoder_type prod -brnn -copy_attn
Train Concode
python train.py -dropout 0.5 -data data/d_100k_762/concode -save_model data/d_100k_762/concode/ -epochs 30 -learning_rate 0.001 -seed 1123 -enc_layers 2 -dec_layers 2 -batch_size 20 -src_word_vec_size 512 -tgt_word_vec_size 512 -rnn_size 512 -decoder_rnn_size 1024 -encoder_type concode -decoder_type concode -brnn -copy_attn -twostep -method_names -var_names
Prediction:
On Dev:
ipython predict.ipy -- -start 5 -end 30 -beam 3 -models_dir data/d_100k_762/concode/ -test_file data/valid.dataset -tgt_len 500
On Test (Use best epoch from dev):
ipython predict.ipy -- -start 15 -end 15 -beam 3 -models_dir data/d_100k_762/concode/ -test_file data/test.dataset -tgt_len 500
For other model types, use the appropriate -models_dir
.