Awesome
Sparklyr2PMML
R library for converting Apache Spark ML pipelines to PMML.
Features
This package is a thin R wrapper for the JPMML-SparkML library.
Prerequisites
- Apache Spark 3.0.X, 3.1.X, 3.2.X, 3.3.X, 3.4.X or 3.5.X.
- R 3.3 or newer.
Installation
Install from GitHub using the devtools
package:
library("devtools")
install_github("jpmml/sparklyr2pmml")
Configuration and usage
Sparklyr2PMML must be paired with JPMML-SparkML based on the following compatibility matrix:
Apache Spark version | JPMML-SparkML branch | Latest JPMML-SparkML version |
---|---|---|
3.0.X | 2.0.X | 2.0.3 |
3.1.X | 2.1.X | 2.1.3 |
3.2.X | 2.2.X | 2.2.3 |
3.3.X | 2.3.X | 2.3.2 |
3.4.X | 2.4.X | 2.4.1 |
3.5.X | master | 2.5.0 |
Launch Sparklyr; use the sparklyr.connect.packages
configuration option to specify the coordinates of relevant JPMML-SparkML modules:
org.jpmml:pmml-sparkml:${version}
- Core module.org.jpmml:pmml-sparkml-lightgbm:${version}
- LightGBM via SynapseML extension module.org.jpmml:pmml-sparkml-xgboost:${version}
- XGBoost via XGBoost4J-Spark extension module.
Launching core:
library("sparklyr")
config = spark_config()
config[["sparklyr.connect.packages"]] = "org.jpmml:pmml-sparkml:${version}"
sc = spark_connect(master = "local", config = config)
Fitting a Spark ML pipeline:
library("dplyr")
library("sparklyr")
data(iris)
iris_df = copy_to(sc, iris)
iris_pipeline = ml_pipeline(sc) %>%
ft_r_formula(Species ~ .) %>%
ml_decision_tree_classifier()
iris_pipeline_model = ml_fit(iris_pipeline, iris_df)
Exporting the fitted Spark ML pipeline to a PMML file:
library("sparklyr2pmml")
pmmlBuilder = PMMLBuilder(sc, iris_df, iris_pipeline_model)
buildFile(pmmlBuilder, "DecisionTreeIris.pmml")
License
Sparklyr2PMML is licensed under the terms and conditions of the GNU Affero General Public License, Version 3.0.
If you would like to use Sparklyr2PMML in a proprietary software project, then it is possible to enter into a licensing agreement which makes Sparklyr2PMML available under the terms and conditions of the BSD 3-Clause License instead.
Additional information
Sparklyr2PMML is developed and maintained by Openscoring Ltd, Estonia.
Interested in using Java PMML API software in your company? Please contact info@openscoring.io