Awesome
JPMML-R
Java library and command-line application for converting R models to PMML.
Table of Contents
Features
- Fast and memory-efficient:
- Can produce a 5 GB Random Forest PMML file in less than 1 minute on a desktop PC
- Supported model and transformation types:
ada
package:ada
- Stochastic Boosting (SB) classification
adabag
package:bagging
- Bagging classificationboosting
- Boosting classification
apollo
apollo
(formerlymaxLik
) - Discrete Choice Model (DCM) classification
caret
package:preProcess
- Transformation methods "range", "center", "scale" and "medianImpute"train
- Selected JPMML-R model types
caretEnsemble
package:caretEnsemble
- Ensemble regression and classification
CHAID
package:party
- CHi-squared Automated Interaction Detection (CHAID) classification
earth
package:earth
- Multivariate Adaptive Regression Spline (MARS) regression
elmNNRcpp
package:elm
- Extreme Learning Machine (ELM) regression
evtree
package:party
- Evolutionary Learning of Trees (EvTree) regression and classification
e1071
package:naiveBayes
- Naive Bayes (NB) classificationsvm
- Support Vector Machine (SVM) regression, classification and anomaly detection
gbm
package:gbm
- Gradient Boosting Machine (GBM) regression and classification
glmnet
package:glmnet
(elnet
,fishnet
,lognet
andmultnet
subtypes) - Generalized Linear Model with lasso or elasticnet regularization (GLMNet) regression and classificationcv.glmnet
- Cross-validated GLMNet regression and calculation
IsolationForest
package:iForest
- Isolation Forest (IF) anomaly detection
lightgbm
package:lgb.Booster
- LightGBM regression and classification.
MASS
package:negbin
- Generalized Linear Model (GLM) regression.
mlr
package:WrappedModel
- Selected JPMML-R model types.
neuralnet
package:nn
- Neural Network (NN) regression
nnet
package:multinom
- Multinomial log-linear classificationnnet.formula
- Neural Network (NNet) regression and classification
party
package:ctree
- Conditional Inference Tree (CIT) classification
partykit
package:party
- Recursive Partytioning (Party) regression and classification
pls
package:mvr
- Multivariate Regression (MVR) regression
pscl
package:hurdle
- Hurdle regression
randomForest
package:randomForest
- Random Forest (RF) regression and classification
ranger
package:ranger
- Random Forest (RF) regression and classification
rms
package:lrm
- Binary Logistic Regression (LR) classificationols
- Ordinary Least Squares (OLS) regression
rpart
package:rpart
- Recursive Partitioning (RPart) regression and classification
r2pmml
package:scorecard
- Scorecard regression
stats
package:glm
- Generalized Linear Model (GLM) regression and classification:binomial
,gaussian
,Gamma
,inverse.gamma
andpoisson
familiesMASS::negative.binomial
familystatmod::tweedie
family
kmeans
- K-Means clusteringlm
- Linear Model (LM) regression
xgboost
package:xgb.Booster
- XGBoost (XGB) regression and classification
- Data pre-processing using model formulae:
- Interaction terms
base::I(..)
function terms:- Logical operators
&
,|
and!
- Relational operators
==
,!=
,<
,<=
,>=
and>
- Arithmetic operators
+
,-
,*
,/
, and%
- Exponentiation operators
^
and**
- The
is.na
function - Arithmetic functions
abs
,ceiling
,exp
,floor
,log
,log10
,round
andsqrt
- Logical operators
base::cut()
andbase::ifelse()
function termsplyr::revalue()
andplyr::mapvalues()
function terms
- Production quality:
- Complete test coverage.
- Fully compliant with the JPMML-Evaluator library.
Prerequisites
- Java 1.8 or newer.
Installation
Enter the project root directory and build using Apache Maven:
mvn clean install
The build produces a library JAR file pmml-rexp/target/pmml-rexp-1.6-SNAPSHOT.jar
, and an executable uber-JAR file pmml-rexp-example/target/pmml-rexp-example-executable-1.6-SNAPSHOT.jar
.
Usage
A typical workflow can be summarized as follows:
- Use R to train a model.
- Serialize the model in RDS data format to a file in a local filesystem.
- Use the JPMML-R command-line converter application to turn the RDS file to a PMML file.
The R side of operations
The following R script trains a Random Forest (RF) model and saves it in RDS data format to a file rf.rds
:
library("randomForest")
rf = randomForest(Species ~ ., data = iris)
saveRDS(rf, "rf.rds")
The JPMML-R side of operations
Converting the RDS file rf.rds
to a PMML file rf.pmml
:
java -jar pmml-rexp-example/target/pmml-rexp-example-executable-1.6-SNAPSHOT.jar --rds-input rf.rds --pmml-output rf.pmml
Getting help:
java -jar pmml-rexp-example/target/pmml-rexp-example-executable-1.6-SNAPSHOT.jar --help
The conversion of large files (1 GB and beyond) can be sped up by increasing the JVM heap size using -Xms
and -Xmx
options:
java -Xms4G -Xmx8G -jar pmml-rexp-example/target/pmml-rexp-example-executable-1.6-SNAPSHOT.jar --rds-input rf.rds --pmml-output rf.pmml
Documentation
Up-to-date:
- Converting logistic regression models to PMML documents
- Deploying R language models on Apache Spark ML
Slightly outdated:
License
JPMML-R is licensed under the terms and conditions of the GNU Affero General Public License, Version 3.0.
If you would like to use JPMML-R in a proprietary software project, then it is possible to enter into a licensing agreement which makes JPMML-R available under the terms and conditions of the BSD 3-Clause License instead.
Additional information
JPMML-R is developed and maintained by Openscoring Ltd, Estonia.
Interested in using Java PMML API software in your company? Please contact info@openscoring.io