Awesome
Predict Absenteeism at Work
Data
This dataset can be retrieved from UCI Machine Learing Repository.
The original website is at link.
The dataset contains 740 observations and 21 features on the profiles of employees who report their absenteeism.
Features include Reasons for absence,Seasons, Transportation Expense, etc.
Model
Logistic regression, randomForest, and Gradient boosting are selected to model such phenomena.
As a result, RandomForest is the best to model absenteeism at work and the overall auc is 0.87.
Integrated with Flask on GCP
The prediction panel is hosted on GCP flask and user can input their profile and get a prediction of whether they will be absent from work.
Prediction Panel Demo
Here is a demo about how the system works:
<img src="https://img.youtube.com/vi/5y9ESJxCXfY/maxresdefault.jpg" width="50%">
To recreate this project
Step 1: Clone this repository
git clone https://github.com/shangwenyan/IDS721FinalProject.git
Step 2:Create virtual environment if none exists (optional)
virtualenv --python $(which python3) venv
source venv/bin/activate
Step 3: Install all the required packages
pip install -r requirements.txt
Step 4: Run the following command to build a tmeporary testing website
python main.py