Awesome
JPMML-StatsModels
Java library and command-line application for converting StatsModels models to PMML.
Features
- Supported model types:
- Linear Regression:
- Generalized Linear Regression:
- Generalized Linear Models:
- Families:
Binomial
,Gaussian
,Poisson
- Link Functions:
identity
,Log
,Logit
- Families:
- Generalized Linear Models:
- Regression with Discrete Dependent Variable:
- Logit
- Multinomial Logit
- Poisson
- OrderedModel:
- Distributions:
logit
,probit
- Distributions:
- Production quality:
- Complete test coverage.
- Fully compliant with the JPMML-Evaluator library.
Installation
Enter the project root directory and build using Apache Maven:
mvn clean install
The build produces a library JAR file pmml-statsmodels/target/pmml-statsmodels-1.1-SNAPSHOT.jar
, and an executable uber-JAR file pmml-statsmodels-example/target/pmml-statsmodels-example-executable-1.1-SNAPSHOT.jar
.
Usage
A typical workflow can be summarized as follows:
- Use Python to fit a model.
- Save the model fitting results in
pickle
data format to a file in a local filesystem. - Use the JPMML-StatsModels command-line converter application to turn the Pickle file to a PMML file.
The Python side of operations
Loading data to a pandas.DataFrame
object:
import pandas
auto_df = pandas.read_csv("Auto.csv")
Fitting a regression model using an R-style formula:
from statsmodels.formula.api import ols
model = ols(formula = "mpg ~ C(cylinders) + displacement + horsepower + weight + acceleration + C(model_year) + C(origin)", data = auto_df)
results = model.fit()
print(results.summary())
Storing the fitted RegressionResults(Wrapper)
object in pickle
data format:
results.save("model.pkl", remove_data = True)
The JPMML-StatsModels side of operations
Converting the model fitting results Pickle file model.pkl
to a PMML file model.pmml
:
java -jar pmml-statsmodels-example/target/pmml-statsmodels-example-executable-1.1-SNAPSHOT.jar --pkl-input model.pkl --pmml-output model.pmml
Getting help:
java -jar pmml-statsmodels-example/target/pmml-statsmodels-example-executable-1.1-SNAPSHOT.jar --help
Documentation
- Training Scikit-Learn GridSearchCV StatsModels pipelines
- Training Scikit-Learn StatsModels pipelines
License
JPMML-StatsModels is licensed under the terms and conditions of the GNU Affero General Public License, Version 3.0.
If you would like to use JPMML-StatsModels in a proprietary software project, then it is possible to enter into a licensing agreement which makes JPMML-StatsModels available under the terms and conditions of the BSD 3-Clause License instead.
Additional information
JPMML-StatsModels is developed and maintained by Openscoring Ltd, Estonia.
Interested in using Java PMML API software in your company? Please contact info@openscoring.io