Awesome
IMPORTANT
- Before even loading this package you must set the memory option via e.g.
options(java.parameters = "-Xmx5g")
to set a larger amount of RAM than the default of 500MB which will get you intro trouble. Only then invokelibrary(bartMachine)
. If you don't do this YOU WILL GET OUT OF MEMORY ERRORS OR STUFF THAT LOOKS LIKE THISError in validObject(.Object) : invalid class “jobjRef” object: invalid object for slot "jobj" in class "jobjRef": got class "NULL", should be or extend class "externalptr"
.
bartMachine
An R-Java Bayesian Additive Regression Trees implementation (BART) Software for Supervised Statistical Learning
Copyright (C) 2023
Adam Kapelner
Department of Mathematics, Queens College, City University of New York
&
Justin Bleich
Department of Statistics, The Wharton School of the University of Pennsylvania
This is a Java implementation of the algorithm found in Chipman, George, & McCulloch BART: Bayesian Additive Regressive Trees. The Annals of Applied Statistics. 2010 4(1): 266-298 as well as many other features.
News from the Past Year
6/25/23
v1.3.4 released - better interaction investigator, convenient relabeling for classification
2/27/23
v1.3.3.1 released -- fixed bug in bartMachine constructor call that prevented local variables from being passed in.
12/29/22
v1.3.3 is released --- fastutil is no longer used and we have now downgraded back to trove losing the 2x speedup. I'm really sorry for the people who've used the package between Aug 25, 2022 - Dec 29, 2022 as there were probably bugs.
The Paper
For a vignette describing the BART model and bartMachine's features, see our JSS paper.
The Manual
See the manual for detailed information about the package's functions and parameters.
Setup Instructions
To install the bartMachine package in R, you first need to install Java and rJava and configure your computer, then you can install the package from CRAN or compile from source.
Install Java JDK (not the JRE)
Download the latest Java JDK and install it properly. (Java 7 or above is required for v1.2.x and Java 8 or above is required for v1.3 and above). bartMachine requires rJava which requires the JDK; you cannot just have a JRE!
Install rJava
Use install.packages("rJava")
within R. If you experience errors, make sure your JAVA_HOME
system variable is set to the root of your java installation (on a windows machine that would look something like C:\Program Files\Java\jdk-13.0.2
). Also try running R CMD javareconf
from the command line. On ubuntu, you should run sudo apt-get install r-cran-rjava
to install from the command prompt. If you still have errors, you are not alone! rJava is tough to install and idiosyncratic across different platforms. Google your error message! The majority of issues have been resolved on Q&A forums.
Install bartMachine via CRAN
Use install.packages("bartMachine")
within R.
Install bartMachine via compilation from source
Due to CRAN limitations, we cannot release bartMachine for Java >7. Thus it is recommended to install bartMachine from source. This is recommended as you will get the benefits of Java 8-14 as well as the latest release of bartMachine if it's not on CRAN yet (see changelog.
-
Make sure you have git properly installed.
-
Run
git clone https://github.com/kapelner/bartMachine.git
from your command line and navigate into the cloned project directory viacd bartMachine
. -
Make sure you have a Java JDK installed properly. Then make sure the bin directory is an element in the PATH variable (on a windows machine it would look something like
C:\Program Files\Java\jdk-13.0.2\bin
). We also recommend making a system variableJAVA_HOME
pointing to the directory (save \bin). -
Make sure you have apache ant installed properly. Make sure you add the bin directory for ant to your system PATH variable (on a windows machine it would be something like
C:\Program Files (x86)\apache-ant-1.10.8\bin
). We also recommend making a system variableANT_HOME
pointing to the directory (save \bin). -
Compile the JAVA source code into a JAR using
ant
. You should see a compilation record and thenBUILD SUCCESSFUL
and a total time. -
Now you can install the package into R using
R CMD INSTALL bartMachine
. On Windows systems, this may fail because it expects multiple architectures. This can be corrected by runningR CMD INSTALL --no-multiarch bartMachine
(I haven't seen this issue in years though). This may also fail if you don't have the required packages installed (runinstall.packages("bartMachineJARs")
andinstall.packages("missForest")
). Upon successful installation, the last line of the output should readDONE (bartMachine)
. In R, you can now runlibrary(bartMachine)
and start using the package normally.
Limiting CPU usage
(At least under GNU/Linux) even if you set set_bart_machine_num_cores(1)
, CPU usage per process can be much larger than 100% (reaching at times 200% or 300%). This can lead to CPU overloading, especially if you run multiple bartMachines in parallel (for example, if you use the SuperLearner package and use parallelization). This seems to be a consequence of the garbage collector. One way to avoid this problem is to issue Sys.setenv(JAVA_TOOL_OPTIONS = "-XX:ParallelGCThreads=1")
before invoking library(bartMachine)
. (If you use a cluster, for example a SNOW cluster, you will want to do this in the slaves too, for example clusterEvalQ(the_name_of_your_cluster, {Sys.setenv(JAVA_TOOL_OPTIONS = "-XX:ParallelGCThreads=1")})
).
Acknowledgements
We thank Ed George, Abba Krieger, Shene Jensen and Richard Berk for helpful discussions. We thank Matt Olson for pointing out an important memory issue. We thank JProfiler for profiling the code which allowed us to create a lean implementation.