Home

Awesome

Deep Learning vs LightGBM for tabular data

This repo contains the code to run over 1500 experiments that compare the performance of Deep Learning algorithms for tabular data with LightGBM.

Deep Learning models for tabular data are run via the pytorch-widedeep library.

Companion post: pytorch-widedeep, deep learning for tabular data IV: Deep Learning vs LightGBM

For the experiments in this repo I have used four datasets:

  1. Adult Census (binary classification)
  2. Bank Marketing (binary classification)
  3. NYC taxi ride duration (regression)
  4. Facebook Comment Volume (regression)

And mainly four deep learning models:

  1. TabMlp: a simple MLP very similar to the tabular api implementation in the fastai library
  2. TabResnet: similar to the MLP but instead of dense layers I use Resnet blocks
  3. Tabnet
  4. TabTransformer

RESULTS

ADULT CENSUS

modelaccruntimebest_epoch_or_ntrees
lightgbm0.8781780.908639408.0
tabmlp0.872209205.35758862.0
tabtransformer0.871767288.64058132.0
tabnet0.870440422.29665926.0
tabresnet0.869777388.93254725.0

BANK MARKETING

modelf1aucruntimebest_epoch_or_ntrees
tabresnet0.4297990.65014792.51746411.0
tabtransformer0.4199710.64397231.6937614.0
tabmlp0.3855420.6280829.5720957.0
lightgbm0.3852080.6264900.46139857.0
tabnet0.3087030.59431677.87806013.0

NYC TAXI RIDE DURATION

modelrmser2runtimebest_epoch_or_ntrees
lightgbm262.7098650.80439342.721136504.0
tabmlp271.3422180.791327568.43092324.0
tabresnet292.8907920.756867471.26498324.0
tabtransformer336.5825540.6789195779.03136754.0
tabnet376.0530040.5991981844.47228915.0

FACEBOOK COMMENT VOLUME

modelrmser2runtimebest_epoch_or_ntrees
lightgbm5.5289630.8232086.525877687.0
tabmlp5.9084980.798103250.47676243.0
tabtransformer5.9255870.796933533.39081627.0
tabresnet6.2138130.77669870.4660899.0
tabnet6.4285030.761001935.02048359.0

For more results on all the experiments run see here