Home

Awesome

Private Detector

This is the repo for Bumble's Private Detector™ model - an image classifier that can detect lewd images.

The internal repo has been heavily refactored and released as a fully open-source project to allow for the wider community to use and finetune a Private Detector model of their own. You can download the pretrained SavedModel, Frozen Model and checkpoint here

Model

The SavedModel can be found in saved_model/ within private_detector.zip above

The model is based on Efficientnet-v2 and trained on our internal dataset of lewd images - more information can be found at the whitepaper here or here

Inference

Inference is pretty simple and an example has been given in inference.py. The model is released as a SavedModel so it can be deployed in many different ways, but here's a quick runthrough of one way to get it working for those less familiar with Python/Tensorflow.

First you need to install Python and Conda on your system and go to the Terminal/Command Prompt on your machine

Then you can use the environment.yaml file to install the necessary packages to run the inference.

conda env create -f environment.yaml
conda activate private_detector

Once that's set up, you can run the inference script. Simply replace the sample .jpg file paths below with your own

python3 inference.py \
    --model saved_model/ \
    --image_paths \
        Yes_samples/1.jpg \
        Yes_samples/2.jpg \
        Yes_samples/3.jpg \
        Yes_samples/4.jpg \
        Yes_samples/5.jpg \
        No_samples/1.jpg \
        No_samples/2.jpg \
        No_samples/3.jpg \
        No_samples/4.jpg \
        No_samples/5.jpg \
<details> <summary>Sample Output</summary> <code>
Probability: 93.71% - Yes_samples/1.jpg
Probability: 93.43% - Yes_samples/2.jpg
Probability: 94.06% - Yes_samples/3.jpg
Probability: 94.08% - Yes_samples/4.jpg
Probability: 91.01% - Yes_samples/5.jpg
Probability: 9.76% - No_samples/1.jpg
Probability: 7.14% - No_samples/2.jpg
Probability: 8.83% - No_samples/3.jpg
Probability: 4.87% - No_samples/4.jpg
Probability: 5.29% - No_samples/5.jpg
</code> </details>

Serving

See Tensorflow Serving example

Additional Training

You can finetune the model yourself on your own data, to do so is fairly simple - though you will need the checkpoint files as can be found in saved_checkpoint/ in private_detector.zip

Set up a JSON file with links to your image path lists for each class:

{
    "Yes": {
        "path": "/home/sofarrell/private_detector/Yes.txt",
        "label": 0
    },
    "No": {
         "path": "/home/sofarrell/private_detector/No.txt",
         "label": 1
    }
}

With each .txt file listing off the image paths to your images

/home/sofarrell/private_detector_images/Yes/1093840880_309463828.jpg
/home/sofarrell/private_detector_images/Yes/657954182_3459624.jpg
/home/sofarrell/private_detector_images/Yes/1503714421_3048734.jpg

You can create the training environment with conda:

conda env create -f environment.yaml
conda activate private_detector

And then retrain like so:

python3 ./train.py \
    --train_json /home/sofarrell/private_detector/train_classes.json \
    --eval_json /home/sofarrell/private_detector/eval_classes.json \
    --checkpoint_dir saved_checkpoint/ \
    --train_id retrained_private_detector

The training script has several parameters that can be tweaked:

CommandDescriptionTypeDefault
train_idID for this particular training runstr
train_jsonJSON file(s) which describes classes and contains lists of filenames of data filesList[str]
eval_jsonValidation json file which describes classes and contains lists of filenames of data filesstr
num_epochsNumber of epochs to train forint
batch_sizeNumber of images to process in a batchint64
checkpoint_dirDirectory to store checkpoints instr
model_dirDirectory to store graph instr.
data_formatData format: [channels_first, channels_last]strchannels_last
initial_learning_rateInitial learning ratefloat1e-4
min_learning_rateMinimal learning ratefloat1e-6
min_eval_metricMinimal evaluation metric to start saving modelsfloat0.01
float_dtypeFloat Dtype to use in image tensors: [16, 32]int16
steps_per_train_epochNumber of steps per train epochint800
steps_per_eval_epochNumber of steps per evaluation epochint1
reset_on_lr_updateWhether to reset to the best model after learning rate updateboolFalse
rotation_augmentationRotation augmentation angle, value <= 0 disables itfloat0
use_augmentationAdd speckle, v0, random or color distortion augmentationstr
scale_crop_augmentationResize image to the model's size times this scale and then randomly crop needed sizefloat1.4
reg_loss_weightL2 regularization weightfloat0
skip_saving_epochsDo not save good checkpoint and update best metric for this number of the first epochsint0
sequentialUse sequential run over randomly shuffled filenames vs equal sampling from each classboolFalse
eval_thresholdThreshold above which to consider a prediction positive for evaluationfloat0.5
epochs_lr_updateMaximum number of epochs without improvement used to reset/decrease learning rateint20