Awesome
Edge-guided Multi-domain RGB-to-TIR image Translation for Training Vision Tasks with Challenging Labels
Accepted Proceedings to ICRA 2023
<div align="left"> <a href="https://scholar.google.com/citations?user=u6VDnlgAAAAJ&hl=ko&oi=ao">Dong-Guw Lee</a>, <a href="https://scholar.google.co.kr/citations?user=ivOqySYAAAAJ">Myung-Hwan Jeon</a>, <a href="https://scholar.google.com/citations?user=W5MOKWIAAAAJ&hl=ko&oi=ao">Younggun Cho</a>, <a href="https://ayoungk.github.io/">Ayoung Kim</a> at <a href="https://rpm.snu.ac.kr">RPM Robotics Lab</a> </div>Overview of the edge-guided multi-domain RGB2TIR translation network
<div align="center"> </div>Proposed pipeline for training vision tasks with challenging labels
- Our target tasks are deep optical flow estimation and object detection in thermal images.
Results
Disclaimer
-The same model was used for both synthetic and real RGB to TIR image translation
-The model was trained on identical datasets (sRGB=GTA, TIR=STheReO)
Results on synthetic RGB to TIR translation
<div align="center"> </div>Results on real RGB to TIR translation
- model trained on synthetic RGB image was adapted to translate real RGB image to TIR image.
Results on thermal optical flow estimation using the proposed method
<div align="center"> </div>Video demonstration
TODO
- Upload inference code
- Upload style selection code
- Upload training code for custom data training
Environment Setup
-
Download Repo
$ git clone https://github.com/rpmsnu/sRGB-TIR.git
-
Docker support
To make things alot easier for environmental setup, I have uploaded my docker image on Dockerhub,
please use the following command to get the docker
$docker pull donkeymouse/donkeymouse:icra
*If there persists any problems, please file an issue!
How To Use: RGB to TIR translation
-
Inference
$ python3 inference_batch.py --input_folder {input dir to your RGB images} --output_folder {output dir to store your translated images} --checkpoint {weight_file address} --a2b 0 --seed {your choice} --num_style {number of tir styles to sample} --synchronized --output_only
For example, to translate RGB images stored under a folder called "input", and say you want to sample 5 styles, run the following command:
$python3 inference_batch.py --input_folder ./input --output_folder ./output --checkpoint ./translation_weights.pt --a2b 0 --seed 1234 --num_style 5 --synchronized --output_only --config configs/tir2rgb_folder.yaml
-
Network weights
Please download them from here: {link to google drive}
*If the link doesn't work, please file an issue!
Network Details
Edge-guided multi-domain RGB2TIR translation architecture
-
Network Architecture
- Content Encoder: single 7x7 conv block + four 4x4 conv block + four residual blocks + Instance Normalization
- Style Encoder: single 7x7 conv block + four 4x4 conv block + four residual blocks + GAP + FC layers
- Decoder (Generator): 4x4 conv + residual blocks in encoder-decoder architecture. 2 downsampling layers and reflection padding were used.
- Discriminator: four 4x4 convolutions. Leaky relu activations; LSGAN for loss function, reflection padding was used.
-
Model codes will be released after the review process has been cleared.
-
Training details
- Iterations: 60,000
- batch size = 1
- weight decay = 0.001
- Optimizer: Adam with B1 = 0.5, B2= 0.999
- initial learning rate = 0.0001
- step learning rate policy
- Learning rate decay rate(gamma) = 0.5
- Input image size= 640 x 400 for both synthetic RGB and thermal images
-
Config files will be released after the review process has been cleared
Citation
Please consider citing the paper as:
@ARTICLE{lee-2023-edgemultiRGB2TIR,
author={Lee, Dong-Guw and Kim, Ayoung},
conference={IEEE International Conference on Robotics and Automation},
title={Edge-guided Multi-domain RGB-to-TIR image Translation for Training Vision Tasks with Challenging Labels},
year={2023},
status={underreview}
Also, a lot of the code has been built on top of MUNIT (ECCV2018), so please go cite their paper as well.
Contact
If you have any questions, contact here please
donkeymouse@snu.ac.kr