Home

Awesome

ml-ease

Table of Contents

What is ml-ease?

ml-ease is the Open-sourced Large-scale machine learning library from LinkedIn; currently it has ADMM based large scale logistic regression.

Copyright

Copyright 2014 LinkedIn Corporation. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 . Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. The license information on third-party code is included in NOTICE.

Open Source Software included in ml-ease

What is ADMM?

ADMM stands for Alternating Direction Method of Multipliers (Boyd et al. 2011). The basic idea of ADMM is as follows: ADMM considers the large scale logistic regression model fitting as a convex optimization problem with constraints. The ADMM algorithm is guaranteed to converge.? While minimizing the user-defined loss function, it enforces an extra constraint that coefficients from all partitions have to equal. To solve this optimization problem, ADMM uses an iterative process. For each iteration it partitions the big data into many small partitions, and fit an independent logistic regression for each partition. Then, it aggregates the coefficients collected from all partitions, learns the consensus coefficients, and sends it back to all partitions to retrain. After 10-20 iterations, it ends up with a converged solution that is theoretically close to what you would have obtained if you trained it on a single machine.

Quick Start

Code structure

Input Data Format

  name="age"
  term="[10,20]"
  value=1.0
Record 1:
  {
    "response" : 0,
    "features" : [ 
    {
      "name" : "7",
      "term" : "33",
      "value" : 1.0
    }, {
      "name" : "8",
      "term" : "151",
     "value" : 1.0
    }, {
      "name" : "3",
      "term" : "0",
      "value" : 1.0
    }, {
      "name" : "12",
      "term" : "132",
      "value" : 1.0
    } 
   ],
    "weight" : 1.0,
    "offset" : 0.0,
    "foo" : "whatever"
 }

Output Models

 key:string,
 model:[{name:string, term:string, value:float}]

"key" column saves the lambda value. "model" saves the model output for that lambda. "name" + "term" again represents a feature string, and "value" is the learned coefficient. Note that the first element of the model is always "(INTERCEPT)", which means the intercept. Below is a sample of the learned model for lambda = 1.0 and 2.0:

Record 1:
  {
    "key" : "1.0",
    "model" : [ {
      "name" : "(INTERCEPT)",
      "term" : "",
      "value" : -2.5
    }, {
      "name" : "7",
      "term" : "33",
      "value" : 0.98
    }, {
      "name" : "8",
      "term" : "151",
      "value" : 0.34
    }, {
      "name" : "3",
      "term" : "0",
      "value" : -0.4
    }, {
      "name" : "12",
      "term" : "132",
      "value" : -0.3
    } ],
  }
  Record 2:
  {
    "key" : "2.0",
    "model" : [ {
      "name" : "(INTERCEPT)",
      "term" : "",
      "value" : -2.5
    }, {
      "name" : "7",
      "term" : "33",
      "value" : 0.83
    }, {
      "name" : "8",
      "term" : "151",
      "value" : 0.32
    }, {
      "name" : "3",
      "term" : "0",
      "value" : -0.3
    }, {
      "name" : "12",
      "term" : "132",
      "value" : -0.1
    } ],
 }

Detailed Instructions

AdmmPrepare Job

AdmmTrain Job

AdmmTest Job

AdmmTestLoglik Job

NaiveTrain job

Supporting Team

This tool is developed by Applied Relevance Science team at LinkedIn. People who contributed to this tool include: