Awesome

Alpha-Insurance-Fraud-Detection

You have been hired by Alpha Insurance to develop predictive models to determine which automobile claims are fraudulent. You have been given data on approximately 5000 auto claims which includes a variable indicating whether the company believes the claim is fraudulent or not.

Author:

Robert Shea

Bryant University ~ Fall 2018

Hypothesis

These variables appear to be the best for detecting fraudulent claims:

Claim Amount - Uncommonly high claim amounts are more likely to be fraudulent.
Claim Cause - The more severe claim causes (fire and collision) will be less likely to be fraudulent.
Claim Report Type - Fraud claims will be reported with as little human interaction as possible.
Employment Status - Claimants who are not currently employed are more likely to report fraudulent claims.
Income - The higher the level of education, the less likely reports are to be fraudulent. (This may also be linked with income)

Process

Data Exploration

Univariate exploration
Bivariate exploration

Transformations

Impute missing values
Handle outliers
Transform variables with functions
Transform variables with binning
Encoding
Balancing Sample

Modeling

Regression
Decision Tree
Neural Network
Other
Model Selection

Sources

How to encode categorical data: https://www.datacamp.com/community/tutorials/categorical-data
Random undersampling:
Credit card fraud example: https://github.com/IBM/xgboost-smote-detect-fraud