Home

Awesome

Infrared People Counting

Context

alt text

This is an implementation of the people counting image-level models from the paper : Evaluating Supervision Levels Trade-Offs for Infrared-Based People Counting. https://arxiv.org/pdf/2311.11974.pdf

Installation

pip install -r requirements.txt

Dataset

The dataset used needs to be structured into the following way :<br/> <b>/<i>dataset_name</i>/<i>split</i>[train | test]/<i>people_count</i>[0 ... MAX_PEOPLE_COUNT]/<i>image_name.img_ext</i>[jpg | png]</b>

Here is an example of the dataset folder structure:

/LLVIP <br/>     /train <br/>         /0 <br/>             /im_00.png <br/>             /im_01.jpg <br/>             /im_02.jpg <br/>              ... <br/>         /1 <br/>             /im_67.jpg <br/>             /im_87.png <br/>              ... <br/>     /test <br/>         ... <br/>

Training

Command & Arguments

python train.py --model_type 'ConvNeXt' --mae --head regression --dataset_root \Data\LLVIP

If you want to use mae pretraining, you have to train using the "--mae", after training the model will be saved in the /models/trained_models/. Then when training the classifier use, specify the path to the pretrained model to the "--mae_cp_path".

Results & Measurements

All the training results and a measures are reported in the folder <b>/runs/[MODEL_NAME]_[DATETIME_OF_TRAINING]</b>

Evaluation

Count Accuracy

The count accuracy is the amount of predictions where the number of people predicted in the image is the exact real number divided by the total number of predictions.

Where Ŷ is predictions, $Y is Ground Truths, $$ Count Accuracy = {\sum{Ŷ = Y} \over \sum{Ŷ}} $$

Location

The localization results are measured using the mean Absolute Euclidean Distance (mAED). We calculate the mAED between the predicted coordinates and the ground truth for all images in the testing set. Both the $x$ and $y$ position coordinates are normalized within the range of 0 to 1. We give a 1 penalty for every wrong point predicted(or not predicted).

The mAED is computed as:

Where M is the amount of images in the dataset, N is the amount of points within the image, p and ^p respectively the closest ground truth and predicted points, $$ mAED = \frac{1}{M} \sum_{j=1}^{M} \frac{1}{N_j} \sum_{i=1}^{N_j} ||p_{ij}-\hat{p}{ij}||{2}^{2} $$

Speed

The speed benchmark is reported in FPS for both CPU and GPU.

If there is something missing, unclear or is not working in this repository, feel free to contact me and tell me.