Awesome
Alpha-Insurance-Fraud-Detection
You have been hired by Alpha Insurance to develop predictive models to determine which automobile claims are fraudulent. You have been given data on approximately 5000 auto claims which includes a variable indicating whether the company believes the claim is fraudulent or not.
Author:
- Robert Shea
Bryant University ~ Fall 2018
Hypothesis
These variables appear to be the best for detecting fraudulent claims:
- Claim Amount - Uncommonly high claim amounts are more likely to be fraudulent.
- Claim Cause - The more severe claim causes (fire and collision) will be less likely to be fraudulent.
- Claim Report Type - Fraud claims will be reported with as little human interaction as possible.
- Employment Status - Claimants who are not currently employed are more likely to report fraudulent claims.
- Income - The higher the level of education, the less likely reports are to be fraudulent. (This may also be linked with income)
Process
Data Exploration
- Univariate exploration
- Bivariate exploration
Transformations
- Impute missing values
- Handle outliers
- Transform variables with functions
- Transform variables with binning
- Encoding
- Balancing Sample
Modeling
- Regression
- Decision Tree
- Neural Network
- Other
- Model Selection
Sources
- How to encode categorical data: https://www.datacamp.com/community/tutorials/categorical-data
- Random undersampling:
- Credit card fraud example: https://github.com/IBM/xgboost-smote-detect-fraud