Awesome
<h1>PurityChecker</h1>Table of Contents
- PurityChecker
- RefactoringMiner
- Current precision and recall
- How to use PurityChecker
- PurityChecker Applications
- PurityChecker and RefactoringMiner API usage guidelines
- PurityChecker main API
- RefactoringMiner - With a locally cloned git repository
- RefactoringMiner - With two directories containing Java source code
- RefactoringMiner - With file contents as strings
- RefactoringMiner - With all information fetched directly from GitHub
- RefactoringMiner - With a GitHub pull request
PurityChecker
<div align="justify">PurityChecker is a tool for detecting purity of method-level refactorings in Java. The tool is build on top of RefactoringMiner, which is a refactoring detection tool. <br/> <br/>A method-level refactoring is considered pure if it does not include any modifications that alter the behavior of the refactored code (method bodies). <br/>
There could be multiple scenarios in which a code change, beyond what is considered as specific refactoring type mechanics, does not change the behavior of the refactored code. As the main novelty of this tool, PurityChecker is the first tool that is able to capture the changes caused by the application of overlapped refactorings.
For example, in the following figure, the setFrameCentered
method, which is labeled as "1", has been inlined into the placeWindow
method. As we know, the Inline Method can be summarized as simply replacing the function body with function calls. But, within this example, two statements have been removed from the inlined operation. This statement removal is attributed to the two overlapping Inline Variable refactorings which are conducted on top of the Inline Method refactoring. As we can see, this change does not change the behavior of the refactoring, although this is not part of the inline method refactoring mechanics. So, this refactoring is considered as a pure Inline Method refactoring operation, as there is no code modification involved which changes the refactoring behavior.
PurityChecker utilizes RefactoringMiner's resources by initially executing the RefactoringMiner tool on a commit. Therefore, when applied to a commit, PurityChecker offers a list of refactorings undertaken within a code change provided by RefactoringMiner, accompanied by purity details specific to method-level refactorings.
RefactoringMiner
RefactoringMiner is a library/API written in Java that can detect refactorings applied in the history of a Java project.
At present, the tool identifies a broad spectrum of 98 distinct refactoring types across various categories, encompassing method-level, class-level, variable-related, test-related, and more.
Current precision and recall
As of November 8, 2023 the precision and recall of the tool on two training and testing oracles consisting of over 2400 method-level refactorings is:
<h2>Training Oracle:</h2>Refactoring Type | Precision | Recall | Specificity | # Validated Cases |
---|---|---|---|---|
Total (W. Average) | 97.07 | 90.01 | 97 | 1912 |
Extract Method | 95.28 | 88.68 | 95.31 | 925 |
Extract and Move Method | 92.31 | 73.47 | 93.34 | 94 |
Inline Method | 98.28 | 95 | 97.73 | 104 |
Move and Inline Method | 100 | 100 | 100 | 13 |
Move Method | 100 | 92.03 | 100 | 345 |
Move and Rename Method | 100 | 80.65 | 100 | 107 |
Pull Up Method | 100 | 94.92 | 100 | 282 |
Push Down Method | 100 | 87.18 | 100 | 42 |
Refactoring Type | Precision | Recall | Specificity | # Validated Cases |
---|---|---|---|---|
Total (W. Average) | 98.57 | 87.61 | 98.34 | 452 |
Extract Method | 97.73 | 81.13 | 97.5 | 93 |
Extract and Move Method | 100 | 93.02 | 100 | 51 |
Inline Method | 96.97 | 86.49 | 98.21 | 93 |
Move and Inline Method | 100 | 100 | 100 | 6 |
Move Method | 100 | 82.81 | 100 | 77 |
Move and Rename Method | 100 | 75 | 100 | 26 |
Pull Up Method | 100 | 86.36 | 100 | 53 |
Push Down Method | 100 | 100 | 100 | 53 |
How to use PurityChecker
PurityChecker features a crucial method called isPure
. This method takes two main arguments: umlModelDiff
, which contains information about the parent and child classes along with all the changes in a commit, and refactorings
, which includes a list of refactorings performed in the given commit. These arguments are obtained by invoking RefactoringMiner on a specific commit (commit URL).
<br>
PurityChecker Applications
Code Reviewer Perspective
From a code reviewer point of view, having the knowledge about purity of refactorings gives the reviewer a deeper insight into the code modifications. For instance, the reviewer can simply skip the changes happening within pure refactorings, as she can be sure that there is no functionality change involved within those cases.
Developer Perspective
Many developers prefer manual refactoring over automated tools. On the other hand, recent research has indicated that manual refactorings are error-prone. PurityCehcker with the ability to detect whether a refactoring is pure or not can help the developer to prevent erroneous application of manual refactoring. During our manual validation, we encountered some real-world cases in which the impureness of a refactoring can be considered as a hint to inform the developer that the refactoring is being done incorrectly. <br><br>
In this example, the developer extracts many instances of duplicated code concerning different sorting algorithms into the runTest()
. The refactoring essentially parameterizes the sorting algorithm with the Testable parameter. However, because the refactoring is done manually, the developer makes a mistake. She does not replace the algorithm InsertionSort
with parameter testable
in all places in the extracted method.
The incorrect manual refactoring is fixed in a later commit with the commit message “Fixed a small mis-type in sort timing code” by replacing InsertionSort
with parameter testable
.
PurityChecker would be able to warn the developer in these cases about non-behavior-preserving manual refactoring before committing the refactoring to the repository.
Researher Perspective
PurityChecker can be used to conduct large-scale empirical studies in the Software Refactoring field. Almost every empirical study in the field of Software Refactoring can take advantage of PurityChecker. For instance, one might study the relation of pure and impure refactorings with software metrics (https://github.com/pedramnoori/RefactoringMiner/issues/3).
Regression Testing
Numerous studies propose diverse approaches for Regression Test Selection (RTS). Having a tool capable of determining whether a refactoring is pure or not can significantly benefit this field. Predicting a refactoring as pure enables the skipping of tests related to the changes influenced by these refactorings. This practice can profoundly impact regression testing by substantially reducing the number of tests that need to be executed. Since pure refactorings preserve the program’s behavior, changes related to them do not require retesting.
<br> In a study by <a href="https://ieeexplore.ieee.org/document/8453082" target="_blank">Wang et al</a>, the authors introduced refactoring-aware regression test selection. They argue that changes associated with refactorings are behavior-preserving, and tests covering these changes can be skipped. While this holds true for refactorings deemed behavior-preserving, it may not always be the case. Impure refactorings modify the program’s behavior, necessitating testing for changes within this category during regression testing. PurityChecker can greatly assist this work by focusing their study on pure refactoring operations, aiding in the selection of tests to be excluded from regression testing.API usage guidelines
PurityChecker main API
Passing a commit URL in the code snippet provided below, PurityChecker returns a Map
object (pcr
) containing the refactorings undertaken in the commit along with the purity information regarding the method-level refactorings.
public static Map<Refactoring, PurityCheckResult> isPureAPI(String url) {
String commitUrl = URLHelper.getRepo(url);
String commitSh1 = URLHelper.getCommit(url);
Map<Refactoring, PurityCheckResult> pcr = new LinkedHashMap<>();
GitHistoryRefactoringMiner miner = new GitHistoryRefactoringMinerImpl();
miner.detectModelDiff(commitUrl,
commitSh1, new RefactoringHandler() {
@Override
public void processModelDiff(String commitId, UMLModelDiff umlModelDiff) throws RefactoringMinerTimedOutException {
List<Refactoring> refactorings = umlModelDiff.getRefactorings();
PurityChecker.isPure(umlModelDiff, pcr, refactorings);
}
}, 100);
return pcr;
}
RefactoringMiner With a locally cloned git repository
RefactoringMiner can automatically detect refactorings in the entire history of git repositories, between specified commits or tags, or at specified commits.
In the code snippet below we demonstrate how to print all refactorings performed in the toy project https://github.com/danilofes/refactoring-toy-example.git.
GitService gitService = new GitServiceImpl();
GitHistoryRefactoringMiner miner = new GitHistoryRefactoringMinerImpl();
Repository repo = gitService.cloneIfNotExists(
"tmp/refactoring-toy-example",
"https://github.com/danilofes/refactoring-toy-example.git");
miner.detectAll(repo, "master", new RefactoringHandler() {
@Override
public void handle(String commitId, List<Refactoring> refactorings) {
System.out.println("Refactorings at " + commitId);
for (Refactoring ref : refactorings) {
System.out.println(ref.toString());
}
}
});
You can also analyze between commits using detectBetweenCommits
or between tags using detectBetweenTags
. RefactoringMiner will iterate through all non-merge commits from start commit/tag to end commit/tag.
// start commit: 819b202bfb09d4142dece04d4039f1708735019b
// end commit: d4bce13a443cf12da40a77c16c1e591f4f985b47
miner.detectBetweenCommits(repo,
"819b202bfb09d4142dece04d4039f1708735019b", "d4bce13a443cf12da40a77c16c1e591f4f985b47",
new RefactoringHandler() {
@Override
public void handle(String commitId, List<Refactoring> refactorings) {
System.out.println("Refactorings at " + commitId);
for (Refactoring ref : refactorings) {
System.out.println(ref.toString());
}
}
});
// start tag: 1.0
// end tag: 1.1
miner.detectBetweenTags(repo, "1.0", "1.1", new RefactoringHandler() {
@Override
public void handle(String commitId, List<Refactoring> refactorings) {
System.out.println("Refactorings at " + commitId);
for (Refactoring ref : refactorings) {
System.out.println(ref.toString());
}
}
});
It is possible to analyze a specifc commit using detectAtCommit
instead of detectAll
. The commit
is identified by its SHA key, such as in the example below:
miner.detectAtCommit(repo, "05c1e773878bbacae64112f70964f4f2f7944398", new RefactoringHandler() {
@Override
public void handle(String commitId, List<Refactoring> refactorings) {
System.out.println("Refactorings at " + commitId);
for (Refactoring ref : refactorings) {
System.out.println(ref.toString());
}
}
});
RefactoringMiner With two directories containing Java source code
It is possible to detect refactorings between the Java files in two directories containing the code before and after some changes. This feature supports the detection of renamed and moved classes, and automatically excludes from the analysis any files with identical contents:
GitHistoryRefactoringMiner miner = new GitHistoryRefactoringMinerImpl();
// You must provide absolute paths to the directories. Relative paths will cause exceptions.
File dir1 = new File("/home/user/tmp/v1");
File dir2 = new File("/home/user/tmp/v2");
miner.detectAtDirectories(dir1, dir2, new RefactoringHandler() {
@Override
public void handle(String commitId, List<Refactoring> refactorings) {
System.out.println("Refactorings at " + commitId);
for (Refactoring ref : refactorings) {
System.out.println(ref.toString());
}
}
});
GitHistoryRefactoringMiner miner = new GitHistoryRefactoringMinerImpl();
// You must provide absolute paths to the directories. Relative paths will cause exceptions.
Path dir1 = Paths.get("/home/user/tmp/v1");
Path dir1 = Paths.get("/home/user/tmp/v2");
miner.detectAtDirectories(dir1, dir2, new RefactoringHandler() {
@Override
public void handle(String commitId, List<Refactoring> refactorings) {
System.out.println("Refactorings at " + commitId);
for (Refactoring ref : refactorings) {
System.out.println(ref.toString());
}
}
});
RefactoringMiner With file contents as strings
You can provide two maps (before and after the changes) where the keys are file paths, and the values are the corresponding file contents.
Each key should correspond to a file path starting from the root of the repository. For example, src/org/refactoringminer/api/GitHistoryRefactoringMiner.java
.
After populating the maps, you can use the following code snippet:
GitHistoryRefactoringMiner miner = new GitHistoryRefactoringMinerImpl();
// Each key should correspond to a file path starting from the root of the repository
Map<String, String> fileContentsBefore;
Map<String, String> fileContentsAfter;
// populate the maps
miner.detectAtFileContents(fileContentsBefore, fileContentsAfter, new RefactoringHandler() {
@Override
public void handle(String commitId, List<Refactoring> refactorings) {
System.out.println("Refactorings at " + commitId);
for (Refactoring ref : refactorings) {
System.out.println(ref.toString());
}
}
});
RefactoringMiner With all information fetched directly from GitHub
To use this API, please provide a valid OAuth token in the github-oauth.properties
file.
You can generate an OAuth token in GitHub Settings
-> Developer settings
-> Personal access tokens
.
If you don't want to clone locally the repository, you can use the following code snippet:
GitHistoryRefactoringMiner miner = new GitHistoryRefactoringMinerImpl();
miner.detectAtCommit("https://github.com/danilofes/refactoring-toy-example.git",
"36287f7c3b09eff78395267a3ac0d7da067863fd", new RefactoringHandler() {
@Override
public void handle(String commitId, List<Refactoring> refactorings) {
System.out.println("Refactorings at " + commitId);
for (Refactoring ref : refactorings) {
System.out.println(ref.toString());
}
}
}, 10);
RefactoringMiner With a GitHub pull request
To use this API, please provide a valid OAuth token in the github-oauth.properties
file.
You can generate an OAuth token in GitHub Settings
-> Developer settings
-> Personal access tokens
.
If you want to analyze all commits of a pull request, you can use the following code snippet:
GitHistoryRefactoringMiner miner = new GitHistoryRefactoringMinerImpl();
miner.detectAtPullRequest("https://github.com/apache/drill.git", 1807, new RefactoringHandler() {
@Override
public void handle(String commitId, List<Refactoring> refactorings) {
System.out.println("Refactorings at " + commitId);
for (Refactoring ref : refactorings) {
System.out.println(ref.toString());
}
}
}, 10);