Home

Awesome

<h1 align="center">Bilateral Reference for High-Resolution Dichotomous Image Segmentation</h1> <div align='center'> <a href='https://scholar.google.com/citations?user=TZRzWOsAAAAJ' target='_blank'><strong>Peng Zheng</strong></a><sup> 1,4,5,6</sup>,&thinsp; <a href='https://scholar.google.com/citations?user=0uPb8MMAAAAJ' target='_blank'><strong>Dehong Gao</strong></a><sup> 2</sup>,&thinsp; <a href='https://scholar.google.com/citations?user=kakwJ5QAAAAJ' target='_blank'><strong>Deng-Ping Fan</strong></a><sup> 1*</sup>,&thinsp; <a href='https://scholar.google.com/citations?user=9cMQrVsAAAAJ' target='_blank'><strong>Li Liu</strong></a><sup> 3</sup>,&thinsp; <a href='https://scholar.google.com/citations?user=qQP6WXIAAAAJ' target='_blank'><strong>Jorma Laaksonen</strong></a><sup> 4</sup>,&thinsp; <a href='https://scholar.google.com/citations?user=pw_0Z_UAAAAJ' target='_blank'><strong>Wanli Ouyang</strong></a><sup> 5</sup>,&thinsp; <a href='https://scholar.google.com/citations?user=stFCYOAAAAAJ' target='_blank'><strong>Nicu Sebe</strong></a><sup> 6</sup> </div> <div align='center'> <sup>1 </sup>Nankai University&ensp; <sup>2 </sup>Northwestern Polytechnical University&ensp; <sup>3 </sup>National University of Defense Technology&ensp; <br /> <sup>4 </sup>Aalto University&ensp; <sup>5 </sup>Shanghai AI Laboratory&ensp; <sup>6 </sup>University of Trento&ensp; </div> <div align="center" style="display: flex; justify-content: center; flex-wrap: wrap;"> <a href='https://www.sciopen.com/article/pdf/10.26599/AIR.2024.9150038.pdf'><img src='https://img.shields.io/badge/Journal-Paper-red'></a>&ensp; <a href='https://arxiv.org/pdf/2401.03407'><img src='https://img.shields.io/badge/arXiv-Paper-red'></a>&ensp; <a href='https://drive.google.com/file/d/1FWvKDWTnK9RsiywfCsIxsnQzqv-dlO5u/view'><img src='https://img.shields.io/badge/中文版-Paper-red'></a>&ensp; <a href='https://www.birefnet.top'><img src='https://img.shields.io/badge/Page-Project-red'></a>&ensp; <a href='https://drive.google.com/drive/folders/1s2Xe0cjq-2ctnJBR24563yMSCOu4CcxM'><img src='https://img.shields.io/badge/GDrive-Stuff-green'></a>&ensp; <a href='LICENSE'><img src='https://img.shields.io/badge/License-MIT-yellow'></a>&ensp; <a href='https://huggingface.co/spaces/ZhengPeng7/BiRefNet_demo'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20HF-Space-blue'></a>&ensp; <a href='https://huggingface.co/ZhengPeng7/BiRefNet'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20HF-Model-blue'></a>&ensp; </div> <div align="center" style="display: flex; justify-content: center; flex-wrap: wrap;"> <a href='https://colab.research.google.com/drive/14Dqg7oeBkFEtchaHLNpig2BcdkZEogba'><img src='https://img.shields.io/badge/Multiple_Images_Inference-F9AB00?style=for-the-badge&logo=googlecolab&color=525252'></a>&ensp; <a href='https://colab.research.google.com/drive/1MaEiBfJ4xIaZZn0DqKrhydHB8X97hNXl'><img src='https://img.shields.io/badge/Inference_&_Evaluation-F9AB00?style=for-the-badge&logo=googlecolab&color=525252'></a>&ensp; <a href='https://colab.research.google.com/drive/1B6aKZ3ekcvKMkSBn0N5mCASLUYMp0whK'><img src='https://img.shields.io/badge/Box_Guided_Segmentation-F9AB00?style=for-the-badge&logo=googlecolab&color=525252'></a>&ensp; </div>
DIS-Sample_1DIS-Sample_2
<img src="https://drive.google.com/thumbnail?id=1ItXaA26iYnE8XQ_GgNLy71MOWePoS2-g&sz=w400" /><img src="https://drive.google.com/thumbnail?id=1Z-esCujQF_uEa_YJjkibc3NUrW4aR_d4&sz=w400" />

This repo is the official implementation of "Bilateral Reference for High-Resolution Dichotomous Image Segmentation" (CAAI AIR 2024).

[!note] We need more GPU resources to push forward the performance of BiRefNet, especially on matting tasks, higher-resolution inference (2K), and more efficient model design. If you are happy to cooperate, please contact me at zhengpeng0108@gmail.com.

News :newspaper:

:rocket: Load BiRefNet in ONE LINE by HuggingFace, check more: BiRefNet

from transformers import AutoModelForImageSegmentation
birefnet = AutoModelForImageSegmentation.from_pretrained('zhengpeng7/BiRefNet', trust_remote_code=True)

:flight_arrival: Inference Partner:

We are really happy to collaborate with FAL to deploy the inference API of BiRefNet. You can access this service via the link below:

Our BiRefNet has achieved SOTA on many similar HR tasks:

DIS: PWC PWC PWC PWC PWC

<details><summary>Figure of Comparison on DIS Papers with Codes (by the time of this work):</summary> <img src="https://drive.google.com/thumbnail?id=1DLt6CFXdT1QSWDj_6jRkyZINXZ4vmyRp&sz=w1620" /> <img src="https://drive.google.com/thumbnail?id=1gn5GyKFlJbMIkre1JyEdHDSYcrFmcLD0&sz=w1620" /> <img src="https://drive.google.com/thumbnail?id=16CVYYOtafEeZhHqv0am2Daku1n_exMP6&sz=w1620" /> <img src="https://drive.google.com/thumbnail?id=10K45xwPXmaTG4Ex-29ss9payA9yBnyLn&sz=w1620" /> <img src="https://drive.google.com/thumbnail?id=16EuyqKFJOqwMmagvfnbC9hUurL9pYLLB&sz=w1620" /> </details> <br />

COD:PWC PWC PWC PWC

<details><summary>Figure of Comparison on COD Papers with Codes (by the time of this work):</summary> <img src="https://drive.google.com/thumbnail?id=1DLt6CFXdT1QSWDj_6jRkyZINXZ4vmyRp&sz=w1620" /> <img src="https://drive.google.com/thumbnail?id=1gn5GyKFlJbMIkre1JyEdHDSYcrFmcLD0&sz=w1620" /> <img src="https://drive.google.com/thumbnail?id=16CVYYOtafEeZhHqv0am2Daku1n_exMP6&sz=w1620" /> </details> <br />

HRSOD: PWC PWC PWC PWC PWC

<details><summary>Figure of Comparison on HRSOD Papers with Codes (by the time of this work):</summary> <img src="https://drive.google.com/thumbnail?id=1hNfQtlTAHT4-AVbk_47852zyRp1NOFLs&sz=w1620" /> <img src="https://drive.google.com/thumbnail?id=1bcVldUAxYkMI3OMTyaP_jNuOugDfYj-d&sz=w1620" /> <img src="https://drive.google.com/thumbnail?id=1p1zgyVz27cGEqQMtOKzm_6zoYK3Sw_Zk&sz=w1620" /> <img src="https://drive.google.com/thumbnail?id=1TubAvcoEbH_mHu3I-AxflnB71nkf35jJ&sz=w1620" /> <img src="https://drive.google.com/thumbnail?id=1A3V9HjVtcMQdnGPwuy-DBVhwKuo0q2lT&sz=w1620" /> </details> <br />

Try our online demos for inference:

<img src="https://drive.google.com/thumbnail?id=12XmDhKtO1o2fEvBu4OE4ULVB2BK0ecWi&sz=w1620" />

Model Zoo

For more general use of our BiRefNet, I extended the original academic one to more general ones for better real-life application.

Datasets and datasets are suggested to be downloaded from official pages. But you can also download the packaged ones: DIS, HRSOD, COD, Backbones.

Find performances (almost all metrics) of all models in the exp-TASK_SETTINGS folders in [stuff].

<details><summary>Models in the original paper, for <b>comparison on benchmarks</b>:</summary>
TaskTraining SetsBackboneDownload
DISDIS5K-TRswin_v1_largegoogle-drive
CODCOD10K-TR, CAMO-TRswin_v1_largegoogle-drive
HRSODDUTS-TRswin_v1_largegoogle-drive
HRSODHRSOD-TRswin_v1_largegoogle-drive
HRSODUHRSD-TRswin_v1_largegoogle-drive
HRSODDUTS-TR, HRSOD-TRswin_v1_largegoogle-drive
HRSODDUTS-TR, UHRSD-TRswin_v1_largegoogle-drive
HRSODHRSOD-TR, UHRSD-TRswin_v1_largegoogle-drive
HRSODDUTS-TR, HRSOD-TR, UHRSD-TRswin_v1_largegoogle-drive
</details> <details><summary>Models trained with customed data (general, matting), for <b>general use in practical application</b>:</summary>
TaskTraining SetsBackboneTest SetMetric (S, wF[, HCE])Download
general useDIS5K-TR,DIS-TEs, DUTS-TR_TE,HRSOD-TR_TE,UHRSD-TR_TE, HRS10K-TR_TE, TR-P3M-10k, TE-P3M-500-NP, TE-P3M-500-P, TR-humansswin_v1_largeDIS-VD0.911, 0.875, 1069google-drive
general useDIS5K-TR,DIS-TEs, DUTS-TR_TE,HRSOD-TR_TE,UHRSD-TR_TE, HRS10K-TR_TE, TR-P3M-10k, TE-P3M-500-NP, TE-P3M-500-P, TR-humansswin_v1_tinyDIS-VD0.882, 0.830, 1175google-drive
general useDIS5K-TR, DIS-TEsswin_v1_largeDIS-VD0.907, 0.865, 1059google-drive
matting segmentationP3M-10k, humansswin_v1_largeP3M-500-P0.983, 0.989google-drive
</details> <details><summary>Segmentation with box <b>guidance</b>:</summary> </details> <details><summary>Model <b>efficiency</b>:</summary>

Screenshot from the original paper. All tests are conducted on a single A100 GPU.

<img src="https://drive.google.com/thumbnail?id=1mTfSD_qt-rFO1t8DRQcyIa5cgWLf1w2-&sz=h300" /> <img src="https://drive.google.com/thumbnail?id=1F_OURIWILVe4u1rSz-aqt6ur__bAef25&sz=h300" />

</details> <details><summary><b>ONNX</b> conversion:</summary>

We converted from .pth weights files to .onnx files.
We referred a lot to the Kazuhito00/BiRefNet-ONNX-Sample, many thanks to @Kazuhito00.

</details>

Third-Party Creations

Concerning edge devices with less computing power, we provide a lightweight version with swin_v1_tiny as the backbone, which is x4+ faster and x5+ smaller. The details can be found in this issue and links there.

We found there've been some 3rd party applications based on our BiRefNet. Many thanks for their contribution to the community!
Choose the one you like to try with clicks instead of codes:

  1. Applications:
    • Thanks MoonHugo/ComfyUI-BiRefNet-Hugo: this project further upgrade the ComfyUI node for BiRefNet with our latest weights.

      <p align="center"><img src="https://github.com/MoonHugo/ComfyUI-BiRefNet-Hugo/blob/main/assets/d0a22b2a-ceb3-4205-9b4e-f6a68e4337c7.png" /></p>
    • Thanks lbq779660843/BiRefNet-Tensorrt and yuanyang1991/birefnet_tensorrt: they both provided the project to convert BiRefNet to TensorRT, which is faster and better for deployment. Their repos offer solid local establishment (Win and Linux) and colab demo, respectively. And @yuanyang1991 kindly offered the comparison among the inference efficiency of naive PyTorch, ONNX, and TensorRT on an RTX 4080S:

MethodsPytorchONNXTensorRT
       First Inference Time      0.71s5.32s0.17s
MethodsPytorchONNXTensorRT
Avg Inf Time (excluding 1st)0.15s4.43s0.11s
  1. More Visual Comparisons

    • Thanks twitter.com/ZHOZHO672070 for the comparison with more background-removal methods in images:

      <img src="https://drive.google.com/thumbnail?id=1nvVIFt_Ezs-crPSQxUDqkUBz598fTe63&sz=w1620" />
    • Thanks twitter.com/toyxyz3 for the comparison with more background-removal methods in videos:

    https://github.com/ZhengPeng7/BiRefNet/assets/25921713/40136198-01cc-4106-81f9-81c985f02e31

    https://github.com/ZhengPeng7/BiRefNet/assets/25921713/1a32860c-0893-49dd-b557-c2e35a83c160

Usage

Environment Setup

# PyTorch==2.0.1 is used for faster training with compilation.
conda create -n birefnet python=3.9 -y && conda activate birefnet
pip install -r requirements.txt

Dataset Preparation

Download combined training / test sets I have organized well from: DIS--COD--HRSOD or the single official ones in the single_ones folder, or their official pages. You can also find the same ones on my BaiduDisk: DIS--COD--HRSOD.

Weights Preparation

Download backbone weights from my google-drive folder or their official pages.

Run

# Train & Test & Evaluation
./train_test.sh RUN_NAME GPU_NUMBERS_FOR_TRAINING GPU_NUMBERS_FOR_TEST
# Example: ./train_test.sh tmp-proj 0,1,2,3,4,5,6,7 0

# See train.sh / test.sh for only training / test-evaluation.
# After the evaluation, run `gen_best_ep.py` to select the best ckpt from a specific metric (you choose it from Sm, wFm, HCE (DIS only)).

Well-trained weights:

Download the BiRefNet-{TASK}-{EPOCH}.pth from [stuff]. Info of the corresponding (predicted_maps/performance/training_log) weights can be also found in folders like exp-BiRefNet-{TASK_SETTINGS} in the same directory.

You can also download the weights from the release of this repo.

The results might be a bit different from those in the original paper, you can see them in the eval_results-BiRefNet-{TASK_SETTINGS} folder in each exp-xx, we will update them in the following days. Due to the very high cost I used (A100-80G x 8) which many people cannot afford to (including myself....), I re-trained BiRefNet on a single A100-40G only and achieve the performance on the same level (even better). It means you can directly train the model on a single GPU with 36.5G+ memory. BTW, 5.5G GPU memory is needed for inference in 1024x1024. (I personally paid a lot for renting an A100-40G to re-train BiRefNet on the three tasks... T_T. Hope it can help you.)

But if you have more and more powerful GPUs, you can set GPU IDs and increase the batch size in config.py to accelerate the training. We have made all this kind of things adaptive in scripts to seamlessly switch between single-card training and multi-card training. Enjoy it :)

Some of my messages:

This project was originally built for DIS only. But after the updates one by one, I made it larger and larger with many functions embedded together. Finally, you can use it for any binary image segmentation tasks, such as DIS/COD/SOD, medical image segmentation, anomaly segmentation, etc. You can eaily open/close below things (usually in config.py):

Quantitative Results

<p align="center"><img src="https://drive.google.com/thumbnail?id=1Ymkh8WN16XMTBOS8dmPTg5eAf-NIl2m5&sz=w1620" /></p> <p align="center"><img src="https://drive.google.com/thumbnail?id=1W0mi0ZiYbqsaGuohNXU8Gh7Zj4M3neFg&sz=w1620" /></p>

Qualitative Results

<p align="center"><img src="https://drive.google.com/thumbnail?id=1TYZF8pVZc2V0V6g3ik4iAr9iKvJ8BNrf&sz=w1620" /></p> <p align="center"><img src="https://drive.google.com/thumbnail?id=1ZGHC32CAdT9cwRloPzOCKWCrVQZvUAlJ&sz=w1620" /></p>

Citation

@article{zheng2024birefnet,
  title={Bilateral Reference for High-Resolution Dichotomous Image Segmentation},
  author={Zheng, Peng and Gao, Dehong and Fan, Deng-Ping and Liu, Li and Laaksonen, Jorma and Ouyang, Wanli and Sebe, Nicu},
  journal={CAAI Artificial Intelligence Research},
  volume = {3},
  pages = {9150038},
  year={2024}
}

Contact

Any questions, discussions, or even complaints, feel free to leave issues here or send me e-mails (zhengpeng0108@gmail.com). You can also join the Discord Group (https://discord.gg/d9NN5sgFrq) or QQ Group (https://qm.qq.com/q/y6WPy7WOIK) if you want to talk a lot publicly.