Home

Awesome

Fault detection model training

This repository contains code that was used for training the models for sticky note and folded corner detection.

Fault detection is formulated as an image classification task, where a neural network model is trained to distinguish whether an image contains a specific fault or not. The neural network model has been built using the Pytorch library, and the model training is done by fine-tuning an existing Densenet neural network model.

The code is split into three files:

Running the code in a virtual environment

These instructions use a conda virtual environment, and as a precondition you should have Miniconda or Anaconda installed on your operating system. More information on the installation is available here.

Create and activate conda environment using the following commands:

conda create -n fault_detection_env python=3.7

conda activate fault_detection_env

Install dependencies listed in the requirements.txt file:

pip install -r requirements.txt

Run the training code

When using the default values for all of the model parameters, the training can be initiated from the command line by typing

python train.py

The different model parameters are explained in more detail below.

Model parameters

Parameters related to training and validation data

By default, the code expects the following folder structure:

├──fault_detection 
      ├──models
      ├──results 
      ├──data
      |   ├──faulty
      |   |   ├──train
      |   |   └──val
      |   └──ok
      |       ├──train
      |       └──val
      ├──train.py
      ├──utils.py
      ├──augment.py
      └──requirements.txt

Therefore the images containing faults (for instance sticky notes or folded corners) and the images without faults to be located in separate folders. In addition, train and validation data for both types of images is also expected to be located in separate folders.

Parameters:

The parameter values can be set in command line when initiating training:

python train.py --tr_data_folder ./data/faulty/train --val_data_folder ./data/faulty/val --tr_ok_folder ./data/ok/train --val_ok_folder ./data/ok/val

The accepted input image file types are .jpg, .png and .tiff. Pdf files should be transformed into one of these images formats before used as an input to the model.

Parameters related to saving the model and the training and validation results

The training performance is measured using training and validation loss, accuracy and F1 score (more information on the F1 score can be found for example here). The average of these values is saved each epoch, and the resulting values are plotted and saved in the folder defined by the user.

The trained model is saved by default after each epoch when the validation F1 score improves the previous top score. The model can be saved either in the ONNX format that is not dependent on specific frameworks like PyTorch and is optimized for inference speed, or by using PyTorch's default format for saving the model in serialized form. In the first instance, the model is saved as densenet_date.onnx and in the latter instance as densenet_date.pth. Date refers to the current date, so that a model trained on 7.6.2023 would be saved in the ONNX format as densenet_07062023.onnx.

Parameters:

The parameter values can be set in command line when initiating training:

python train.py --results_folder ./results --save_model_path ./models/ --save_model_format onnx

Parameters related to model training

A Number of parameters are used for defining the conditions for model training.

Learning rate defines how much the model weights are tuned after each iteration based on the gradient of the loss function. In the code, there are different learning rates for the classification layer and the pretrained layers of the base model. The lr parameter defines the learning rate for the base model layers, and the learning rate for the classification layer is automatically set to be 10 times larger.

Batch size defines the number of images that are processed before the model weights are updated. Number of epochs, on the other hand, defines how many times during the training the model goes through the entire training dataset. Early stopping is a method used for reducing overfitting by stopping training after a specific learning metric (loss, accuracy etc.) has not improved during a defined number of epochs.

Random seed parameter is used for setting the seed for initializing random number generation. This makes the training results reproducible when using the same seed, model and data.

The device parameters defines whether cpu or gpu is used for model training. Currently the code does not support multi-gpu training.

Parameters:

The parameter values can be set in command line when initiating training:

python train.py --lr 0.0001 --batch_size 16 --num_epochs 15 --early_stop_threshold 2 --random_seed 8765 --device cpu

Parameter for data augmentation

Data augmentations are used for increasing the diversity of the data and thus for helping to reduce overfitting. The available augmentation options are

More information and examples of the different image transform options are available here.

Parameter:

The parameter value can be set in command line when initiating training:

python train.py --augment_choice identity