Home

Awesome

riskr

<!-- README.md is generated from README.Rmd --> <!-- <a href="https://github.com/jbkunst/riskr"><img style="position: absolute; top: 0; right: 0; border: 0;" src="https://camo.githubusercontent.com/652c5b9acfaddf3a9c326fa6bde407b87f7be0f4/68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f6769746875622f726962626f6e732f666f726b6d655f72696768745f6f72616e67655f6666373630302e706e67" alt="Fork me on GitHub" data-canonical-src="https://s3.amazonaws.com/github/ribbons/forkme_right_orange_ff7600.png"></a> <style> table, .table { width: 90%; margin-left: auto; margin-right: auto; font-size: 0.8em; } </style> -->

travis-status version downloads

Introduction

The riskr package facilitate credit scoring tasks such as measure the scores/models performance and make easy the scoring modelling process.

There are function to:

  1. Measure in a simple way the performance of models via wrappers/shortcuts from ROCR functions.
  2. Visualize relationships between variables.
  3. Compute usual values in the credit scoring PSI, WOE, IV, KS, AUCROC, among others.
  4. Make easier the modelling and validation process.

Assumptions

riskr assume the target variable is binary with numeric values: 0 and 1. Usually 1 means the characteristic of interest. For example 0 is a default operation and 1 a non-default one.

Installation

You can install the latest development version from github with:

devtools::install_github("jbkunst/riskr")

Functions

Performance Indicators & Plots

Usually we have a data frame with a target variable and a score (or probability) like this:

library("riskr")

data("predictions")

head(predictions)
scoretarget
0.2021
0.8061
0.5131
0.0520
0.3291
0.2460

score <- predictions$score

target <- predictions$target

The main statistics or indicators are KS, AUCROC so:

perf(target, score)
ksaucrocginidivergence
0.2540.6760.3530.408

There are functions to calculate every indicator.

aucroc(target, score)
## [1] 0.676

There are some functions to plot the score/model performance (based on ggplot package).

gg_perf(target, score)
<img src="vignettes/figures/unnamed-chunk-5-1.png" title="" alt="" style="display: block; margin: auto;" />

And:

gg_roc(target, score)
<img src="vignettes/figures/unnamed-chunk-6-1.png" title="" alt="" style="display: block; margin: auto;" />

gg_gain(target, score)
<img src="vignettes/figures/unnamed-chunk-6-2.png" title="" alt="" style="display: block; margin: auto;" />

gg_lift(target, score)
<img src="vignettes/figures/unnamed-chunk-6-3.png" title="" alt="" style="display: block; margin: auto;" />

Tables (Uni/Bivariate) & Plots

data("credit")

ft(credit$marital_status)
classcountpercent
S252490.508
C170970.344
O27760.056
V24300.049
D21420.043

bt(credit$marital_status, credit$bad)
classcountpercenttarget_counttarget_ratetarget_percentnon_target_countnon_target_percentoddswoeiv
C170970.34424830.1450.253146140.3660.170-0.3700.042
D21420.0433220.1500.03318200.0460.177-0.3300.004
O27760.0566600.2380.06721160.0530.3120.2370.003
S252490.50860590.2400.617191900.4810.3160.2490.034
V24300.0492890.1190.02921410.0540.135-0.6000.015

credit$age_bin <- bin_sup(credit$age, credit$bad, min.p = 0.20)$variable_new

bt(credit$age_bin, credit$bad)
classcountpercenttarget_counttarget_ratetarget_percentnon_target_countnon_target_percentoddswoeiv
(-Inf,22]110970.22332900.2960.33578070.1960.4210.5380.075
(22,32]134650.27130610.2270.312104040.2610.2940.1790.009
(32,44]140640.28322710.1610.231117930.2960.193-0.2450.016
(44,95]110680.22311910.1080.12198770.2480.121-0.7130.090
gg_ba(credit$age_bin, credit$bad)
<img src="vignettes/figures/unnamed-chunk-8-1.png" title="" alt="" style="display: block; margin: auto;" />

The minified version of gg_ba

gg_ba2(credit$age_bin, credit$bad) + ggtitle("Age")
<img src="vignettes/figures/unnamed-chunk-9-1.png" title="" alt="" style="display: block; margin: auto;" />

Odds Tables

The odds tables are other way to show how a score/model performs.

score <- round(predictions$score * 1000)

odds_table(target, score, nclass = 5) # default is (nclass =) 10 groups of equal size
classcountpercenttarget_counttarget_ratetarget_percentnon_target_countnon_target_percentoddswoeiv
[1,164]20090.20110100.5030.1449990.3321.01-0.8320.156
(164,331]19910.19912550.6300.1807360.2451.71-0.3090.020
(331,526]20080.20114290.7120.2045790.1922.470.0610.001
(526,738]20000.20015730.7860.2254270.1423.680.4610.038
(738,996]19920.19917230.8650.2462690.0896.411.0150.159

Ranking Predictive Variables

ranks <- pred_ranking(credit, "bad")
head(ranks)
variableksaucroc
age0.1910.626
age_bin0.1910.619
marital_status0.1500.577
months_in_the_job0.1290.567
flag_res_phone0.1120.556
area_code_res_phone0.0780.547

Confusion Matrix

The conf_matrix function return a list with the next elements:

target_pred <- ifelse(score < 500, 0, 1)

cm <- conf_matrix(target_pred, target)
cm$confusion.matrix
classpred 0pred 1
0true 02230780
1true 134763514
cm$indicators
termterm.shortvalue
AccuracyAC0.574
True Positive rate (Recall)Recall0.503
False Positive rateFP0.259
True Negative rateTN0.741
False Negative rateFN0.497
PrecisionP0.818

Related work

  1. woe package by tomasgreif
  2. smbinning package by Herman Jopia. Github repository.
  3. Guide to Credit Scoring in R
  4. Gains package
  5. plotROC package by Michael Sachs
  6. InformationValue by (selva86)[https://github.com/selva86/]

Session Info

library("riskr")
library("printr") # remove this for vignette
library("ggplot2")
library("ggthemes")
options(digits = 3, knitr.table.format = "markdown")
knitr::opts_chunk$set(collapse = TRUE, warning = FALSE,
                      fig.path = "vignettes/figures/",
                      fig.width = 6, fig.height = 6,
                      fig.align = "center", dpi = 72)

theme_set(theme_fivethirtyeight(base_size = 11) +
            theme(rect = element_rect(fill = "white"),
                  axis.title = element_text(colour = "grey30"),
                  axis.title.y = element_text(angle = 90),
                  strip.background = element_rect(fill = "#434348"),
                  strip.text = element_text(color = "#F0F0F0"),
                  plot.title = element_text(face = "plain", size = structure(1.2, class = "rel")),
                  panel.margin.x =  grid::unit(1, "cm"),
                  panel.margin.y =  grid::unit(1, "cm")))
update_geom_defaults("line", list(colour = "#434348", size = 1.05))
update_geom_defaults("point", list(colour = "#434348", size = 3))
update_geom_defaults("bar", list(fill = "#7cb5ec"))
update_geom_defaults("text", list(size = 4, colour = "gray30"))
print(sessionInfo())
## R version 3.2.0 (2015-04-16)
## Platform: i386-w64-mingw32/i386 (32-bit)
## Running under: Windows 7 (build 7601) Service Pack 1
## 
## locale:
## [1] LC_COLLATE=Spanish_Chile.1252  LC_CTYPE=Spanish_Chile.1252   
## [3] LC_MONETARY=Spanish_Chile.1252 LC_NUMERIC=C                  
## [5] LC_TIME=Spanish_Chile.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggthemes_2.2.1 ggplot2_1.0.1  printr_0.0.4   riskr_1.0     
## 
## loaded via a namespace (and not attached):
##  [1] splines_3.2.0      digest_0.6.8       htmltools_0.2.6   
##  [4] ROCR_1.0-7         R6_2.1.0           scales_0.3.0      
##  [7] assertthat_0.1     grid_3.2.0         bitops_1.0-6      
## [10] stringr_1.0.0      knitr_1.11         gdata_2.17.0      
## [13] survival_2.38-3    munsell_0.4.2      proto_0.3-10      
## [16] highr_0.5          partykit_1.0-2     tidyr_0.2.0       
## [19] DBI_0.3.1          labeling_0.3       KernSmooth_2.23-15
## [22] MASS_7.3-43        plyr_1.8.3         gplots_2.17.0     
## [25] stringi_0.5-5      magrittr_1.5       reshape2_1.4.1    
## [28] caTools_1.17.1     rmarkdown_0.7      evaluate_0.7.2    
## [31] gtable_0.1.2       colorspace_1.2-6   yaml_2.1.13       
## [34] tools_3.2.0        Formula_1.2-1      parallel_3.2.0    
## [37] dplyr_0.4.2.9002   lazyeval_0.1.10    gtools_3.5.0      
## [40] formatR_1.2        Rcpp_0.12.0