Home

Awesome

Semiconductor quality prediction and pipeline

A MLops Zoomcamp project

About the project

Manufacturing process feature selection and categorization. The project focuses on MLops best practices.

The projects contains semicoductor sensor data and classifies the end product as Pass or Fail. There are ~580 sensor features that are used. Can we make a classification model that uses the best featuers to make Pass/Fail prediction?

About the data

data source: https://www.kaggle.com/datasets/paresh2047/uci-semcom?resource=download

MLops project solution and architecture

Screenshot

Tasks and dates

Task 1) Add experiment tracking and set-up registery server (local) artifacts in s3 with MLflow

Task 2) Convert notebook into a pipeline

Task 3) Add orchestration with Prefect

Task 4) Add Monitoring

Task 4) AWS model deploy also connect kinesis and lambda function

Task 5) Best practices --> Create tests,linting and pre-commit hooks

Taks 6) Final touches

Instructions

How to make the app run:

requirments:

  1. docker installed
  2. Having your AWS credential added in docker compose file and model exists in S3.

for 2) you can also use the model.pkl. Release line 46 in prediction_service\app.py. Put "#" anything related to S3.

docker compose -f docker-compose.yml up --build

to test it run:

bash build_test_shut.sh

then you can run:

python .\send_data.py

finally:

python ./prefect_monitoring/prefect_monitoring.py

This will create an html file with the report

For the data drift you can check the Grafana container on port 3000 :)

How can you contribure:

Currently this project focuses on MLops. It is weak on the actual ML-pipeline.

  1. A good idea is to apply L1 regularization in a feature selection step.

  2. Try other classification models and grid search.

  3. Use Docker compose or any otehr method to automatically push to ECS.

  4. Use Kinesis and through a lambda function send stream data to the ECS.

Other Insructions:

1) Install enviroment

use:

$pipenv install

2) Create a S3 bucket for mlflow

in train.py and main_notebook.ipynb

mlflow.create_experiment("semicon-sensor-clf","[your S3 bucket]")

(or use a local file)

3) Run prediction locally using FastAPI and uvicorn server (dev)

run: predict.py

go to : http://localhost:8001/docs

press :"try it out"

use example: from test_one_input.txt (it should give output as "0")

4) Testing flow and other functions

run: pytest

5) Run app and monitoring service

docker compose -f docker-compose.yml up --build python .\send_data.py

6) Running Makefile

If you are on windows and want to run a Makefile go to: https://chocolatey.org/install and follow the instructions

run this in gitbash:

make build

It will run needed tests and then build image and run the container

after this you can run:

python send_data.py

python ./prefect_monitoring/prefect_monitoring.py

This will create an html file with the report

Usefull commands:

pipenv:

$pipenv install $pipenv install --dev [library] $pipenv --venv

mlflow:

mlflow server --backend-store-uri=sqlite:///mlflow.db --default-artifact-root=s3://mlflow-semicon-clf/

use:

mlflow.set_tracking_uri("sqlite:///mlflow.db") #mlflow.set_experiment("testing-mlflow") mlflow.create_experiment("semicon-sensor-clf","s3://mlflow-semicon-clf/") mlflow.set_experiment("semicon-sensor-clf")

linting and black:

pylint --recursive=y train.py, predict.py, ./prefect_monitoring/prefect_monitoring.py, ./prediction_service/app.py

black --skip-string-normalization --diff train.py, predict.py, ./prefect_monitoring/prefect_monitoring.py

black --skip-string-normalization train.py, predict.py, ./prefect_monitoring/prefect_monitoring.py, ./prediction_service/app.py

git:

pre-commit

prefect:

prefect orion start

aws ecs: docker compose --project-name semicontest -f docker-compose.yml up --build