Awesome
DASS_Det_Inference
-
Description: Original Inference Repository of the Paper: "Domain-Adaptive Self-Supervised Pre-training for Face & Body Detection in Drawings"
-
Disclaimer: The model structure and the codes are highly adopted from the YOLOX model.
Badges
Requirements
- CUDA >= 10.2
- PyTorch >= 1.8.2
- Chainer >= 7.8.1
- ChainerCV >= 0.13.1
- OpenCV-Python >= 4.5.5
- Matplotlib >= 3.3.4
- xmltodict
Abstract
Drawings are powerful means of pictorial abstraction and communication. Understanding diverse forms of drawings including digital arts, cartoons, and comics has been a major problem of interest for the computer vision and computer graphics communities. Although there are large amounts of digitized drawings from comic books and cartoons, they contain vast stylistic variations, which necessitate expensive manual labeling for training domain-specific recognizers. In this work, we show how self-supervised learning, based on a teacher-student network with a modified student network update design, can be used to build face and body detectors. Our setup allows exploiting large amounts of unlabeled data from the target domain when labels are provided for only a small subset of it. We further demonstrate that style transfer can be incorporated into our learning pipeline to bootstrap detectors using vast amount of out-of-domain labeled images from natural images (i.e., images from the real world). Our combined architecture yields detectors with state-of-the-art (SOTA) and near-SOTA performance using minimal annotation effort.
Pre-trained Weights
You can find all the pre-trained model weights from here. Please note that:
- if the model name includes
xs
, then the depth and width parameters should be set asdepth, width = 0.33, 0.375
. If it includesxl
, thendepth, width = 1.33, 1.25
. - for the stage-2 weights (i.e., self-supervised, teacher-student), load the model with the
teacher_model
key in the weight dictionary. Otherwise usemodel
key.
Model Architecture
Overall Pipeline
Model Architecture
Self-Supervised Design
Results
The results shared below are calculated by averaging 5 separate training run outcomes for the XS sized models. For XL sized, results of a single run is given. The best-performing models among these runs are given as the pre-trained weights. Please refer to the original paper for the complete set of results and ablation studies.
Face Results
Models | iCartoonFace | Manga109 | DCM772 |
---|---|---|---|
XS Stage-1 | 42.50 | 54.74 | 69.93 |
XS Stage-2 | 49.19 | 69.25 | 82.45 |
XS Stage-3 w/ Single Datasets | 87.75 | 87.86 | 75.87 |
XS Stage-3 w/ Mix of Datasets | 83.15 | 86.45 | 78.40 |
XL Stage-3 w/ Single Datasets | 90.01 | 87.88 | 77.40 |
XL Stage-3 w/ Mix of Datasets | 87.77 | 87.08 | 85.77 |
ACFD | 90.94 | - | - |
Ogawa et al. | - | 76.20 | - |
Nguyen et al. | - | - | 74.94 |
Body Results
Models | Manga109 | DCM772 | Comic2k | Watercolor2k | Clipart1k |
---|---|---|---|---|---|
XS Stage-1 | 42.72 | 65.46 | 56.80 | 67.36 | 55.65 |
XS Stage-2 | 69.41 | 77.83 | 67.38 | 71.60 | 64.12 |
XS Stage-3 w/ Single Datasets | 87.06 | 84.89 | 71.66 | 89.17 | 77.97 |
XS Stage-3 w/ Mix of Datasets | 86.54 | 83.52 | 75.60 | 82.68 | 75.96 |
XL Stage-3 w/ Single Datasets | 87.98 | 86.14 | 73.65 | 89.81 | 83.59 |
XL Stage-3 w/ Mix of Datasets | 87.50 | 87.24 | 76.00 | 84.75 | 79.63 |
Ogawa et al. | 79.60 | - | - | - | - |
Nguyen et al. | - | 76.76 | - | - | - |
Inoue et al. | - | - | 70.10 | 77.30 | 76.20 |
Required Datasets
Please do not change the default folder structures of these datasets.
Files in This Repository
evaluator.ipynb
evaluates all the existing dataset scores if a pre-trained model path is given.visualizer.ipynb
visualizes a single image in the given path.