Home

Awesome

Rule Empirical Study

This repo contains the study materials we used for the empirical study of rule understanding.

Rule Generation & Visualization

For the rule generation, we apply the algorithm proposed by Wang et al.[1].

We use the home equity line of credit (HELOC) dataset [2] provided by FICO for our training stages (tutorial, concept verification, task introduction, task verification).

We generate rules based on PIMA Indian Diabetes dataset[3] for our actual test. To avoid the influence of prior knowledge in the task performance, we tell the participants we are using a fictitious data set. We change the features names into mineral names as shown in the table below:

Feature nams in diabetes dataFeature names in test
PregnanciesIron
GlucoseMagnesium
BloodPressureSodium
SkinThicknessZinc
InsulinPotassium
BMIVitamin A
DiabetesPedigreeFunctionCalcium
AgeCopper
Target names in diabetes dataTarget names in test
non-diabeticLow Risk
diabeticHigh Risk

Study Analysis

We follow the steps we stated in the pre-registration form.

Performance Overview (absolute effect size):

image

Reference:

[1] Wang, T., Rudin, C., Doshi-Velez, F., Liu, Y., Klampfl, E. and MacNeille, P., 2017. A bayesian framework for learning rule sets for interpretable classification. The Journal of Machine Learning Research, 18(1), pp.2357-2393.

[2] Explainable Machine Learning Challenge - FICO Community.

[3] Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C. and Johannes, R.S., 1988, November. Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Annual Symposium on Computer Application in Medical Care (p. 261). American Medical Informatics Association.