Home

Awesome

<div align="center"> <img src="https://github.com/spaceml-org/Active-Labeler/blob/main/active-simple-header.jpg" > <p align="center"> Published by <a href="http://spaceml.org/">SpaceML</a> • <a href="https://arxiv.org/abs/2012.10610">About SpaceML</a> </p>

Python Version CUDA Pip Package Docker

Google Colab Notebook Example

</div>

The Active Labeler is a CLI tool that facilitates labeling datasets with just a SINGLE line of code. The tool comes fully equipped with various Active Learning strategies and other customizable features to suit one’s requirements.

What is Active Learning?

Deep learning has a strong greedy attribute to the labeled data. While, in the real world, obtaining a comprehensive set of unlabeled datasets is relatively simple, the manual labeling of data comes at a high cost; this is especially true for those fields where labeling requires a high degree of expert insight. A way of maximizing the performance gain of a deep learning model while labeling a small number of images can significantly impact the practical implementations of AI in multiple fields. Active learning is such a method. It aims to select the most valuable samples from the unlabeled dataset and transfer them to a human annotator for labeling. This method of selective sampling of data to label reduces the cost of labeling while still maintaining performance. Some of the strategies present in this tool includes:

<br> <img src="https://github.com/spaceml-org/Active-Labeler/blob/main/ActiveLabeler_diagram.jpeg" >

Swipe Labeler:

Swipe Labeler is a GUI based tool to enable labeling of data. It supports:

Images will be picked one by one from your unlabeled images directory, and presented through the Swipe Labeler GUI. For each image, the user can choose to classify the image as a positive or negative/absent class using the “Accept” or “Reject” button. For more info on how to install and use check here.

When running on Colab, Swipe Labeler cannot be accessed at https://localhost:5000/. Run the following code to obtain an url that allows you to access the Swipe Labeler. <br>from google.colab.output import eval_js <br>print(eval_js('google.colab.kernel.proxyPort(5000)'))

Setup

Your data must be in the following format (inside a subdirectory) to be used by the pipeline:

Dataset/
└── Unlabeled
    ├── Image1.png
    ├── Image2.png
    └── Image3.png

How to use?

Mandatory Arguments

Pipeline Config: Arguments used by the main.py present in pipeline_config.yaml

ArgumentDescription
verbose1 prints all logs, 0 does not print logs
seedSeed used to save the state of a random functions used in the main pipeline for reproducibility
data_pathPath to dataset
runtime_pathPath to folder where all runtime files will be stored
swipe_label_batch_sizeLabeling batch size for Swipe Labeler
model_typeType of Self-Supervised Learning model used: SimCLR, SimSiam
model_pathPath to the .ckpt file containing the model (.ckpt file is obtained by training with Self-Supervised Learner)
image_sizeSize of input images
embedding_sizeSize of model's output representations
model_config_pathPath to model_config.yaml file
seed_nn1 to use nearest neighbour method on the reference image to curate seed dataset, 0 to use already existing seed_dataset
ref_img_pathPath to reference image used to curate seed dataset, needed if seed_nn is 1
seed_data_pathPath to your existing seed_dataset, needed if seed_nn is 0
num_treesEffects the annoy tree build time and the index size. A larger value will give more accurate results, but larger indexes. More information on Annoy trees can be found here.
sample_sizeNumbebr of images to be picked by the Active Labeler strategy
sampling_strategyType of Active Labeler strategy used: random, guassian, uncertainty
active_label_batch_sizeBatch size for Active Labeler
sampling_nn1 to do nearest neighbour search on the images picked by the Active Labeler strategy in each iteration
n_closestNumber of nearest neighbour images for each strategy image
train_dataset_batch_size Batch size for training dataset
metrics1 to obtain empirical metrics about the pipeline by using predictions on the validation dataset, 0 for no data
pos_className of positive class, needed if metrics is 1 or if simulate_label is 1
metric_csv_pathPath to csv containing metrics data
prob_csv_pathPath to csv containing prediction probilities for each iteration
simulate_labelFunction that simulates labeling for testing purposes by check if pos_class is part of the image name
<br>

Model Config: Arguments related to model training present in model_config.yaml

ArgumentDescription
encoder_typeType of Self-Supervised Learning model used: SimCLR or SimSiam
encoder_pathPath to Self-Supervised Learning model
e_embedding_sizeSize of encoder's output representations
e_lrLearning rate for encoder
train_encoderTrue for training encoder, False for freezing the encoder during training
classifier_typeArchitecture for classifier model: SSLEvaluator for multiple layers, SSLEvaluatorOneLayer for single layer
c_num_classesNumber of classes to be classified
c_hidden_dimDimension size of classfier model's hidden dim
c_linear_lrLearning rate for classfier
c_dropoutDropout rate for classifier
c_gammaGamma value for classifier
c_decay_epochsNumber of decay epochs for classifer
c_weight_decayWeight decay for classifer
c_final_lrFinal learning rate for classifier
c_momentumMomentum for classifier
c_scheduler_typeType of schedular used during training: cosine, step
seedSeed used to save the state of a random functions used in model training for reproducibility
cpusNumber of cpus available for training
devicecuda if GPU is available, cpu otherwise
epochsNumber of training epochs

Where can I find the trained model?

If needed, the finetuned model can be accessed from ./final_model.ckpt.

Citation

If you find the Active Labeler useful in your research, please consider citing the github code for this tool:

@code{
  title={Active Labeler
},
author={Muthukrishnan, Subhiksha and Khokhar, Mandeep and Krishnan, Ajay and Narayanan, Tarun and Praveen, Satyarth and Koul, Anirudh}
  url={https://github.com/spaceml-org/Active-Labeler},
  year={2021}
}
</div>