Awesome
βοΈAutoPrunerβοΈ
by Thanh Le-Cong, Hong Jin Kang, Truong Giang Nguyen, Stefanus Agus Haryono, David Lo, Xuan-Bach D. Le, Quyet Thang Huynh
<p align="center"> <a href="https://dl.acm.org/doi/abs/10.1145/3540250.3549175"><img src="https://img.shields.io/badge/Conference-ESEC/FSE 2023-green?style=for-the-badge"> <a href="https://arxiv.org/abs/2209.03230"><img src="https://img.shields.io/badge/arXiv-2209.03230-b31b1b.svg?style=for-the-badge"> <br> <a href="https://zenodo.org/records/6369874"><img src="https://img.shields.io/badge/Replication-10.5281%2Fzenodo.6369874-blue?style=for-the-badge"> <a href="https://hub.docker.com/r/thanhlecong/autopruner"><img src="https://img.shields.io/badge/docker-thanhlecong%2Fautopruner-%230db7ed.svg?style=for-the-badge&logo=docker&logoColor=white"></a> </p>Welcome to the source code repo of AutoPruner, a LLM-based call graph pruning tool introduced in our paper "AutoPruner: Transformer-based Call Graph Pruning"!
π Overview
If you are interested in our work, please refer to our overview for more details.
π Repository Organization
The structure of our source code's repository is as follows:
- config: contains our experimental configurations;
- script: contains script for running our experiments;
- src: contains our source code.
- finetune: contains source code for fine-tuning phase
- training: contains source code for training phase
- utils: contains source code for utility functions, e.g., logger, visualization, ...
- gnn: contains source code for gnn benchmark
- Note that, for each sub-folder in this folder, main.py, dataset.py, model.py contains the source code of training/testing, dataset processing and deep learning models, respectively;
- environment.yml: contains the configuration for AutoPruner's enviroment.
The structure of our data's repository is as follows:
- dl_dataset: contains our processed dataset for AutoPruner;
- gnn_dataset: contains our processed dataset for GNN benchmark;
- gnn_model: contains our trained models for GNN benchmarks;
- info_data: contains the lists of training and testing programs;
- model: contains our trained models for AutoPruner;
- npe_result: contains the results of manual evaluation for Null-pointer analysis;
- processed_data: contains extracted source code for methods in programs in cgPruner's dataset
- raw_data: contains the static call graphs generated by static analysis tools from cgPruner
π§ Installations
Requirements
Hardware
- More than 200GB disk space
- 2 NVIDIA GPU that CUDA 11.3; supports and have at least 8GB memory.
Software
- Ubuntu 18.04 or newer
- Docker/Conda
Environment Configuration
Conda
conda env create -n autopruner --file environment.yml
Docker
For ease of use, we also provide a installation package via a docker image. User can setup AutoPruner's docker step-by-step as follows:
- Pull AutoPruner's docker image:
docker pull thanhlecong/autopruner:v2
- Run a docker container:
docker run --name autopruner -it --shm-size 16G --gpus all thanhlecong/autopruner:v2
- Activate conda:
source /opt/conda/bin/activate
- Activate AutoPruner's conda enviroment:
conda activate autopruner
Note that, the source code of AutoPruner are stored at /workspace/ in Docker. So, please move to this folder before running experiments.
π Usage
To use our tool, please use the following command
python3 -m src.training.main --config_path [config path]
--mode [mode: test or train]
--feature [type of features: 0: structure, 1: semantic, 2:combine]
--model_path [path to saved model (for saving in train mode and loading in test mode)]
π Artifact
To replicate the result of AutoPruner, please down the data from our replication package and put in the same folder with this repository, then run following below instructions. Note that, our results may be slightly different when running on different devices. However, this diffences does not affect our findings in the paper.
RQ1
To replicate the result of AutoPruner in call graph pruning on Wala (RQ1), please use
bash script/rq1_wala.sh
To replicate the result of AutoPruner in call graph pruning on Doop (RQ1), please use
bash script/rq1_doop.sh
To replicate the result of AutoPruner in call graph pruning on Petablox (RQ1), please use
bash script/rq1_peta.sh
RQ2
Null-pointer Analysis
In this analysis, we follow the experimental settings of cgPruner including their code of Null-pointer Analysis (NPA). Please refer to cgPruner's replication package for further instructions. You also can find our manual evaluation in npe_result folder in this link
Monomorphic Call-site Detection
To replicate the result of AutoPruner in monomorphic call-site detection on Wala's call graph (RQ1), please use
bash script/rq2_wala.sh
To replicate the result of AutoPruner in monomorphic call-site detection on Doop's call graph (RQ1), please use
bash script/rq2_doop.sh
To replicate the result of AutoPruner in monomorphic call-site detection on Petablox's call graph (RQ1), please use
bash script/rq2_peta.sh
RQ3
To replicate the ablation study of AutoPruner with strutural features, please use
bash script/rq3_structure.sh
To replicate the ablation study of AutoPruner with semantic features, please use
bash script/rq3_semantic.sh
To replicate the ablation study of AutoPruner with caller function, please use
bash script/rq3_caller.sh
To replicate the ablation study of AutoPruner with callee function, please use
bash script/rq3_callee.sh
π Citation
If you use our tool, please cite our paper as follows:
@inproceedings{le2022autopruner,
title={AutoPruner: transformer-based call graph pruning},
author={Le-Cong, Thanh and Kang, Hong Jin and Nguyen, Truong Giang and Haryono, Stefanus Agus and Lo, David and Le, Xuan-Bach D and Huynh, Quyet Thang},
booktitle={Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering},
pages={520--532},
year={2022}
}