Awesome
Churn Prediction with Kedro Framework
This is a Kedro repository that tackles a data science challenge of predicting customer churn for a fictional financial institution. The goal is to build an effective pipeline for a production-ready Machine Learning model to forecast customer churn accurately.
To approach this problem, it was first developed EDA, feature engineering and model training and evaluation using Jupyter Notebooks. The notebooks are located in "churn-prediction-kedro/churn-prediction/notebooks/"
. Feel free to visit the notebooks and check my reasoning behind the solution before running the pipeline. :)
Data Understanding:
- The first dataset, named
Abandono_clientes
contains 10,000 rows and 13 columns, including a target column "Exited" with binary data (1 if the customer has churned, 0 if not). - The second dataset, named
Abandono_teste
, consists of 1,000 rows and 12 columns, excluding theExited
column.
Key Concepts:
Customer Churn: Churn refers to the phenomenon of customers discontinuing their relationship with a company or service. In this context, it represents customers who have abandoned the financial institution.
Features: The dataset contains various features or attributes that provide information about the customers. Features include Row Number
, Customer Id
, Surname
, Credit Score
, Geography
, Gender
, Age
, Tenure
(duration of the customer's relationship with the bank), Balance
, Number of Products Held
, Has a Credit Card
, Is Active Member
and Estimated salary
.
Exited: The target variable Exited
indicates whether a customer has churned (1) or not (0).
Performance Metrics: To assess the effectiveness of the model, various evaluation metrics are used, including accuracy, precision, recall, F1-score, and AUC-ROC curve. These metrics help gauge the model's predictive capability and its ability to correctly identify customers who are likely to churn.
Getting started
Please note that this project was initially developed using Python 3.10.6 and on the Ubuntu operating system.
Clone the repository
To clone the repository and set up the development environment, follow the steps below:
-
Clone the repository using the command:
git clone https://github.com/laizaparizotto/churn-prediction-kedro.git
-
Change to the cloned repository directory:
cd churn-prediction-kedro
-
Create a virtual environment using
venv
:python -m venv .venv
-
Activate the virtual environment:
- For Windows:
.venv\Scripts\activate
- For macOS and Linux:
source .venv/bin/activate
- For Windows:
Now you have successfully cloned the repository and set up the virtual environment. You can proceed with the next steps as described in the project documentation.
Install Kedro
To install Kedro, run: For more information, please check Kedro Installation Documentation
cd churn-prediction/
pip install kedro
Install dependencies
All necessary dependencies are located in src/requirements.txt
.
To install them, run:
pip install -r src/requirements.txt
How to run the pipeline
You can run the Kedro project with:
kedro run
This will run the pipeline, which consists in data loading, preprocessing, training and evaluating RandomForestClassifier, and finally prediciting for the test set.
Final results will be stored at '/churn-prediction/data/07_model_output/resultado_teste.csv'
*
Interactive Visualization
You can acess the interactive visualization with
kedro viz