Awesome
id2vec
This project is based on code2vec: https://github.com/tech-srl/code2vec.git
TSExtractor
Writers: Izo Sakallah, Noa Cohen
Guidance: Uri Alon, Eran Yahav
Operating id2vec on an Ubuntu machine
First, run the following commands:
cd /path/to/id2vec
dos2unix init.sh preprocess.sh train.sh scripts/splitData.sh
chmod 744 init.sh preprocess.sh train.sh
Then run init.sh in order to install necessary packages:
./init.sh
Preprocess
The data should be found in a directory called raw_data; it should contain repositories with .ts files. Remove unnecessary files and split the data before preprocessing it:
find raw_data ! -name '*.ts' -type f -exec rm -f {} +
scripts/splitData.sh raw_data 80
cd raw_data
mv train_dir train_dir_tmp
../scripts/splitData.sh train_dir_tmp 80
mv train_dir_tmp/test_dir train_dir_tmp/val_dir
mv train_dir_tmp/* .
cd ..
rmdir raw_data/train_dir_tmp
To preprocess, run the following commands inside a shell of your ubuntu machine in order for the script to run in the background:
./preprocess.sh &
disown
Explanation: the &
literal disconnects stdin from the process that runs the preprocess.sh script,
and returns it to your shell. Then, the disown
command removes the process from the shell's job control. This way,
even if your terminal session is terminated (as happens when your ssh connection to the machine is terminated)
the process will continue running in the background.
Train
Finally, to train the neural network run:
./train.sh