Awesome

Text2Seg v0.1

Jielu Zhang, Zhongliang Zhou, Gengchen Mai, Lan Mu, Mengxuan Hu, Sheng Li

Text2Seg design

Text2Seg is a pipeline that combined multiple Vision Foundation Models to perform semantic segmentation.

:fire: UPDATE:

2023/06/07: Update the codebase to solve some known problems with GroundingDINO.

Installation

Create an new conda environment

conda create --name text2seg python==3.8
conda activate text2seg
pip install chardet ftfy regex tqdm
mkdir Pretrained

Install Pytorch version that fit you driver(tested on pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3).
Install Segment Anything and download weights:

pip install git+https://github.com/facebookresearch/segment-anything.git
cd Pretrained
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
cd ../

Install Grounding DINO

git clone https://github.com/IDEA-Research/GroundingDINO.git
cd GroundingDINO
pip3 install -q -e .
cd ..
cd Pretrained
wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
cd ../

Download CLIP Surgery repository

git clone https://github.com/xmed-lab/CLIP_Surgery.git

Install CLIP repository

pip install git+https://github.com/openai/CLIP.git

<a name="GettingStarted"></a>Getting Started

You can test the Text2Seg on demo.ipynb notebook.

Citing Text2Seg

If you find Text2Seg useful, please use the following BibTeX entry.

@article{zhang2023text2seg,
  title={Text2Seg: Remote Sensing Image Semantic Segmentation via Text-Guided Visual Foundation Models},
  author={Zhang, Jielu and Zhou, Zhongliang and Mai, Gengchen and Mu, Lan and Hu, Mengxuan and Li, Sheng},
  journal={arXiv preprint arXiv:2304.10597},
  year={2023}
}