Awesome

Infrared People Counting

Context

alt text

This is an implementation of the people counting image-level models from the paper : Evaluating Supervision Levels Trade-Offs for Infrared-Based People Counting. https://arxiv.org/pdf/2311.11974.pdf

Installation

pip install -r requirements.txt

Dataset

The dataset used needs to be structured into the following way : /dataset_name/split[train | test]/people_count[0 ... MAX_PEOPLE_COUNT]/image_name.img_ext[jpg | png]

Here is an example of the dataset folder structure:

/LLVIP /train /0 /im_00.png /im_01.jpg /im_02.jpg ... /1 /im_67.jpg /im_87.png ... /test ...

Training

Command & Arguments

mae [Boolean] : Whether to use Masked Autoencoder pretraining or not.
model_type [String] : Whether to use ViT or ConvNeXt.
pretrained [Boolean] : Whether to use pretrained weights from ImageNet or not.
head [String] : Whether to use a classification head or regression head. (classification | regression)
dataset_root [String] : Dataset Root path
mae_cp_path [String] : Checkpoint path to a previous Masked Autoencoder Pretraining.
sub_ratio [Float] : Ratio of the training data used for training. Mostly for experimental purposes.
small [Boolean] : Whether to use the small version or normal of the model.

python train.py --model_type 'ConvNeXt' --mae --head regression --dataset_root \Data\LLVIP

If you want to use mae pretraining, you have to train using the "--mae", after training the model will be saved in the /models/trained_models/. Then when training the classifier use, specify the path to the pretrained model to the "--mae_cp_path".

Results & Measurements

All the training results and a measures are reported in the folder /runs/[MODEL_NAME]_[DATETIME_OF_TRAINING]

Evaluation

mode [String] : Which evaluation you want to make {accuracy|localization|speed}
model_path [String] : Path to model.pt
dataset_path [String] : Path to dataset
split [String] : Which training split you want to make you evaluation on. By default it's test {test|validation|train}

Count Accuracy

The count accuracy is the amount of predictions where the number of people predicted in the image is the exact real number divided by the total number of predictions.

Where Ŷ is predictions, $Y is Ground Truths, $$ Count Accuracy = {\sum{Ŷ = Y} \over \sum{Ŷ}} $$

Location

The localization results are measured using the mean Absolute Euclidean Distance (mAED). We calculate the mAED between the predicted coordinates and the ground truth for all images in the testing set. Both the $x$ and $y$ position coordinates are normalized within the range of 0 to 1. We give a 1 penalty for every wrong point predicted(or not predicted).

The mAED is computed as:

Where M is the amount of images in the dataset, N is the amount of points within the image, p and ^p respectively the closest ground truth and predicted points, $$ mAED = \frac{1}{M} \sum_{j=1}^{M} \frac{1}{N_j} \sum_{i=1}^{N_j} ||p_{ij}-\hat{p}{ij}||{2}^{2} $$

Speed

The speed benchmark is reported in FPS for both CPU and GPU.

If there is something missing, unclear or is not working in this repository, feel free to contact me and tell me.