Home

Awesome

MLLinearModels

MLLinearModels is a small library tht provides functionality to train and use linear regression models, such as "Ordinary Least Squares", "Ridge", "Lasso", "Elastic Net". This library consists of 4 main classes:

Both "RidgeModel" and "ElasticNetModel" provide:

Installation

In order to use this library, Polymath project is required to install https://github.com/PolyMathOrg/PolyMath.

In addition, DataFrame library is highly suggested (though not necessary) to manipulate data https://github.com/PolyMathOrg/DataFrame.

Afterwards library can simply be loaded from git repository using iceberg.

Loading data for the tutorial

We will load housing data and split it into train and test sets

df := DataFrame loadHousing.
df addColumn: ((1 to: df size) collect:[:i | 100 random > 85]) named: #isTest.
 
trainX := (df  selectAllWhere: [:isTest | isTest not ]) columnsFrom: 1 to: 3.
trainY := (df  selectAllWhere: [:isTest | isTest not ]) columnAt: 4.
 
testX := (df  selectAllWhere: [:isTest | isTest  ]) columnsFrom: 1 to: 3.
testY := (df  selectAllWhere: [:isTest | isTest  ]) columnAt: 4.

In order, to interact with library though, we need to conver the dataframe data into PMMatrix class from Polymath.

trainXMatrix := PMMatrix rows: trainX asArrayOfRows .
trainYVec := trainY asPMVector .
testXMatrix := PMMatrix rows: testX asArrayOfRows .
testYVec := testY asPMVector.

Using RidgeModel

olsModel :=
    RidgeModel new alpha: 0;
    shouldCenter: true;
    shouldNormalize: true.
 
olsModel fit: trainXMatrix to: trainYVec checkInput: true.
r2coeficient = olsModel score: testXMatrix output: testYVec.
mseError = (((olsModel predict: testXMatrix) - testYVec) inject: 0 into: [ :a :b | a + b squared ]) / tY size.

Using ElasticNetModel

tol - paramater that specifies accuracy of the solution

lasso := 
   ElasticNetModel new 
   shouldCenter: true;
   shouldNormalize: true;
   l1Ratio: 1;
   alpha: 6.36;
   tol: 1e-3.
   
lasso fit: trainXMatrix to: trainYVec checkInput: true.
lasso score: testXMatrix output: testYVec.

Using RidgeModelCV

This class requires to pass and array of alpha values to choose from.

nFolds - the number of groups to perform more efficient k-cross validation.

if nFolds = nill or: nFolds = 1 - efficient leave-one-out cross validation is performed.

As a result of training this model will contain:

ridgeCV := RidgeCVModel new
    shouldCenter: true;
    shouldNormalize: true;
    alphas: {1e-3 . 5e-3 . 1e-2 . 3e-2 . 5e-2 . 7e-2 . 1e-1 . 3e-1 .  5e-1. 1 . 5 . 10 . 20}.
    
ridgeCV fit: trainXMatrix to: trainYVec checkInput: true.
ridgeCV model score: testXMatrix output: testYVec.

Using ElasticModelCV

This class requires to pass and array of l1Ration values to choose from.

If an array of alphas is not passed, they will be autogenerated (though generated grid does not work too well when l1Ratio is small).

In that case, epsilon specifies the difference between max and min alpha generated for l1Ration.

nAlphas - number of alphas in range(minAlpha, maxAlpha).

nFolds - the number of groups to perform more efficient k-cross validation.

elasticNetCV:= ElasticNetCVModel new
    shouldCenter: true;
    shouldNormalize: true;
    l1Ratios: { 0.1 . 0.2 . 0.3 . 0.4 . 0.5 . 0.6 .0.7 . 0.8. 0.9 . 0.99 .  1}  ;
    alphas: {1e-3 . 5e-3 . 1e-2 . 3e-2 . 5e-2 . 7e-2 . 1e-1 . 3e-1 .  5e-1. 1 . 5 . 10 . 20};
    nFolds: 10.
    
elasticNetCVAutoAlpha:= ElasticNetCVModel new
    shouldCenter: true;
    shouldNormalize: true;
    l1Ratios: { 0.1 . 0.2 . 0.3 . 0.4 . 0.5 . 0.6 .0.7 . 0.8. 0.9 . 0.99 .  1}  ;
    nAlphas: 100;
    epsilon: 1e-3.
    nFolds: 10.  
    
elasticNetCV fit: trainXMatrix to: trainYVec checkInput: true.
elasticNetCV model score: testXMatrix output: testYVec.