Home

Awesome

JPMML-SkLearn Build Status

Java library and command-line application for converting Scikit-Learn pipelines to PMML.

Table of Contents

Features

Overview

Supported packages

<details> <summary>Scikit-Learn</summary>

Examples: main.py

</details> <details> <summary>BorutaPy</summary>

Examples: extensions/boruta.py

</details> <details> <summary>Category Encoders</summary>

Examples: extensions/category_encoders.py and extensions/category_encoders-xgboost.py

</details> <details> <summary>H2O.ai</summary>

Examples: main-h2o.py

</details> <details> <summary>Hyperopt-sklearn</summary>

Examples: extensions/hpsklearn.py

</details> <details> <summary>Imbalanced-Learn</summary>

Examples: extensions/imblearn.py

</details> <details> <summary>InterpretML</summary>

Examples: extensions/interpret.py

</details> <details> <summary>LightGBM</summary>

Examples: main-lightgbm.py

</details> <details> <summary>Mlxtend</summary>

Examples: N/A

</details> <details> <summary>OptBinning</summary>

Examples: extensions/optbinning.py

</details> <details> <summary>PyCaret</summary>

Examples: extensions/pycaret.py

</details> <details> <summary>Scikit-Lego</summary>

Examples: extensions/sklego.py

</details> <details> <summary>Scikit-Tree</summary>

Examples: extensions/sktree.py

</details> <details> <summary>SkLearn2PMML</summary>

Examples: main.py and extensions/sklearn2pmml.py

</details> <details> <summary>Sklearn-Pandas</summary>

Examples: main.py

</details> <details> <summary>StatsModels</summary>

Examples: main-statsmodels.py

</details> <details> <summary>TPOT</summary>

Examples: extensions/tpot.py

</details> <details> <summary>XGBoost</summary>

Examples: main-xgboost.py, extensions/category_encoders-xgboost.py and extensions/categorical.py

</details>

Prerequisites

The Python side of operations

Validating Python installation:

import joblib, sklearn, sklearn_pandas, sklearn2pmml

print(joblib.__version__)
print(sklearn.__version__)
print(sklearn_pandas.__version__)
print(sklearn2pmml.__version__)

The JPMML-SkLearn side of operations

Installation

Enter the project root directory and build using Apache Maven:

mvn clean install

The build produces a library JAR file pmml-sklearn/target/pmml-sklearn-1.8-SNAPSHOT.jar, and an executable uber-JAR file pmml-sklearn-example/target/pmml-sklearn-example-executable-1.8-SNAPSHOT.jar.

Usage

A typical workflow can be summarized as follows:

  1. Use Python to train a model.
  2. Serialize the model in pickle data format to a file in a local filesystem.
  3. Use the JPMML-SkLearn command-line converter application to turn the pickle file to a PMML file.

The Python side of operations

Loading data to a pandas.DataFrame object:

import pandas

df = pandas.read_csv("Iris.csv")

iris_X = df[df.columns.difference(["Species"])]
iris_y = df["Species"]

First, creating a sklearn_pandas.DataFrameMapper object, which performs column-oriented feature engineering and selection work:

from sklearn_pandas import DataFrameMapper
from sklearn.preprocessing import StandardScaler
from sklearn2pmml.decoration import ContinuousDomain

column_preprocessor = DataFrameMapper([
    (["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"], [ContinuousDomain(), StandardScaler()])
])

Second, creating Transformer and Selector objects, which perform table-oriented feature engineering and selection work:

from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest
from sklearn.pipeline import Pipeline
from sklearn2pmml import SelectorProxy

table_preprocessor = Pipeline([
    ("pca", PCA(n_components = 3)),
    ("selector", SelectorProxy(SelectKBest(k = 2)))
])

Please note that stateless Scikit-Learn selector objects need to be wrapped into an sklearn2pmml.SelectprProxy object.

Third, creating an Estimator object:

from sklearn.tree import DecisionTreeClassifier

classifier = DecisionTreeClassifier(min_samples_leaf = 5)

Combining the above objects into a sklearn2pmml.pipeline.PMMLPipeline object, and running the experiment:

from sklearn2pmml.pipeline import PMMLPipeline

pipeline = PMMLPipeline([
    ("columns", column_preprocessor),
    ("table", table_preprocessor),
    ("classifier", classifier)
])
pipeline.fit(iris_X, iris_y)

Recording feature importance information in a pickle data format-compatible manner:

classifier.pmml_feature_importances_ = classifier.feature_importances_

Embedding model verification data:

pipeline.verify(iris_X.sample(n = 15))

Storing the fitted PMMLPipeline object in pickle data format:

import joblib

joblib.dump(pipeline, "pipeline.pkl.z", compress = 9)

Please see the test script file main.py for more classification (binary and multi-class) and regression workflows.

The JPMML-SkLearn side of operations

Converting the pipeline pickle file pipeline.pkl.z to a PMML file pipeline.pmml:

java -jar pmml-sklearn-example/target/pmml-sklearn-example-executable-1.8-SNAPSHOT.jar --pkl-input pipeline.pkl.z --pmml-output pipeline.pmml

Getting help:

java -jar pmml-sklearn-example/target/pmml-sklearn-example-executable-1.8-SNAPSHOT.jar --help

Documentation

Integrations:

Extensions:

Miscellaneous:

Archived:

License

JPMML-SkLearn is licensed under the terms and conditions of the GNU Affero General Public License, Version 3.0.

If you would like to use JPMML-SkLearn in a proprietary software project, then it is possible to enter into a licensing agreement which makes JPMML-SkLearn available under the terms and conditions of the BSD 3-Clause License instead.

Additional information

JPMML-SkLearn is developed and maintained by Openscoring Ltd, Estonia.

Interested in using Java PMML API software in your company? Please contact info@openscoring.io