Home

Awesome

SkLearn2PMML Build Status

Python package for converting Scikit-Learn pipelines to PMML.

Features

This package is a thin Python wrapper around the JPMML-SkLearn library.

News and Updates

The current version is 0.112.1 (14 December, 2024):

pip install sklearn2pmml==0.112.1

See the NEWS.md file.

Prerequisites

Installation

Installing a release version from PyPI:

pip install sklearn2pmml

Alternatively, installing the latest snapshot version from GitHub:

pip install --upgrade git+https://github.com/jpmml/sklearn2pmml.git

Usage

Command-line application

The sklearn2pmml module is executable. The main application loads the estimator object from the Pickle file (-i or --input; supports joblib, pickle or dill variants), performs the conversion, and saves the result to a PMML file (-o or --output):

python -m sklearn2pmml --input pipeline.pkl --output pipeline.pmml

Getting help:

python -m sklearn2pmml --help

On some platforms, the Pip package installer additionally makes the main application available as a top-level command:

sklearn2pmml --input pipeline.pkl --output pipeline.pmml

Library

A typical workflow can be summarized as follows:

  1. Create a PMMLPipeline object, and populate it with pipeline steps as usual. The sklearn2pmml.pipeline.PMMLPipeline class extends the sklearn.pipeline.Pipeline class with the following functionality:
  1. Fit and validate the pipeline as usual.
  2. Optionally, compute and embed verification data into the PMMLPipeline object by invoking PMMLPipeline.verify(X) method with a small but representative subset of training data.
  3. Convert the PMMLPipeline object to a PMML file in local filesystem by invoking the sklearn2pmml.sklearn2pmml(estimator, pmml_path) utility method.

Developing a simple decision tree model for the classification of iris species:

import pandas

iris_df = pandas.read_csv("Iris.csv")

iris_X = iris_df[iris_df.columns.difference(["Species"])]
iris_y = iris_df["Species"]

from sklearn.tree import DecisionTreeClassifier
from sklearn2pmml.pipeline import PMMLPipeline

pipeline = PMMLPipeline([
	("classifier", DecisionTreeClassifier())
])
pipeline.fit(iris_X, iris_y)

from sklearn2pmml import sklearn2pmml

sklearn2pmml(pipeline, "DecisionTreeIris.pmml", with_repr = True)

Developing a more elaborate logistic regression model for the same:

import pandas

iris_df = pandas.read_csv("Iris.csv")

iris_X = iris_df[iris_df.columns.difference(["Species"])]
iris_y = iris_df["Species"]

from sklearn_pandas import DataFrameMapper
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LogisticRegression
from sklearn2pmml.decoration import ContinuousDomain
from sklearn2pmml.pipeline import PMMLPipeline

pipeline = PMMLPipeline([
	("mapper", DataFrameMapper([
		(["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"], [ContinuousDomain(), SimpleImputer()])
	])),
	("pca", PCA(n_components = 3)),
	("selector", SelectKBest(k = 2)),
	("classifier", LogisticRegression(multi_class = "ovr"))
])
pipeline.fit(iris_X, iris_y)
pipeline.verify(iris_X.sample(n = 15))

from sklearn2pmml import sklearn2pmml

sklearn2pmml(pipeline, "LogisticRegressionIris.pmml", with_repr = True)

Documentation

Integrations:

Extensions:

Miscellaneous:

Archived:

De-installation

Uninstalling:

pip uninstall sklearn2pmml

License

SkLearn2PMML is licensed under the terms and conditions of the GNU Affero General Public License, Version 3.0.

If you would like to use SkLearn2PMML in a proprietary software project, then it is possible to enter into a licensing agreement which makes SkLearn2PMML available under the terms and conditions of the BSD 3-Clause License instead.

Additional information

SkLearn2PMML is developed and maintained by Openscoring Ltd, Estonia.

Interested in using Java PMML API software in your company? Please contact info@openscoring.io