Home

Awesome

Sparklyr2PMML

R library for converting Apache Spark ML pipelines to PMML.

Features

This package is a thin R wrapper for the JPMML-SparkML library.

Prerequisites

Installation

Install from GitHub using the devtools package:

library("devtools")

install_github("jpmml/sparklyr2pmml")

Configuration and usage

Sparklyr2PMML must be paired with JPMML-SparkML based on the following compatibility matrix:

Apache Spark versionJPMML-SparkML branchLatest JPMML-SparkML version
3.0.X2.0.X2.0.3
3.1.X2.1.X2.1.3
3.2.X2.2.X2.2.3
3.3.X2.3.X2.3.2
3.4.X2.4.X2.4.1
3.5.Xmaster2.5.0

Launch Sparklyr; use the sparklyr.connect.packages configuration option to specify the coordinates of relevant JPMML-SparkML modules:

Launching core:

library("sparklyr")

config = spark_config()
config[["sparklyr.connect.packages"]] = "org.jpmml:pmml-sparkml:${version}"

sc = spark_connect(master = "local", config = config)

Fitting a Spark ML pipeline:

library("dplyr")
library("sparklyr")

data(iris)

iris_df = copy_to(sc, iris)

iris_pipeline = ml_pipeline(sc) %>%
	ft_r_formula(Species ~ .) %>%
	ml_decision_tree_classifier()

iris_pipeline_model = ml_fit(iris_pipeline, iris_df)

Exporting the fitted Spark ML pipeline to a PMML file:

library("sparklyr2pmml")

pmmlBuilder = PMMLBuilder(sc, iris_df, iris_pipeline_model)

buildFile(pmmlBuilder, "DecisionTreeIris.pmml")

License

Sparklyr2PMML is licensed under the terms and conditions of the GNU Affero General Public License, Version 3.0.

If you would like to use Sparklyr2PMML in a proprietary software project, then it is possible to enter into a licensing agreement which makes Sparklyr2PMML available under the terms and conditions of the BSD 3-Clause License instead.

Additional information

Sparklyr2PMML is developed and maintained by Openscoring Ltd, Estonia.

Interested in using Java PMML API software in your company? Please contact info@openscoring.io