Home

Awesome

<p align="center"> <a href="README.md"><img src="https://img.shields.io/badge/English-white" alt='English'></a> <a href="README.zh-CN-simplified.md"><img src="https://img.shields.io/badge/%E4%B8%AD%E6%96%87-white" alt='Chinese'></a> </p> <h2 align="center">GIM: Learning Generalizable Image Matcher From Internet Videos</h2> <div align="center"> <a href="https://www.youtube.com/embed/FU_MJLD8LeY"> <img src="assets/demo/video.png" width="50%" alt="Overview Video"> </a> </div> <p></p> <div align="center"> <!-- <a href="https://iclr.cc/Conferences/2024"><img src="https://img.shields.io/badge/%F0%9F%8C%9F_ICLR'2024_Spotlight-37414c" alt='ICLR 2024 Spotlight'></a> -->

<a href="https://xuelunshen.com/gim"><img src="https://img.shields.io/badge/Project_Page-3A464E?logo=gumtree" alt='Project Page'></a> <a href="https://arxiv.org/abs/2402.11095"><img src="https://img.shields.io/badge/arXiv-2402.11095-b31b1b?logo=arxiv" alt='arxiv'></a> <a href="https://huggingface.co/spaces/xuelunshen/gim-online"><img src="https://img.shields.io/badge/%F0%9F%A4%97_Hugging_Face-Space-F0CD4B?labelColor=666EEE" alt='HuggingFace Space'></a> <a href="https://www.youtube.com/watch?v=FU_MJLD8LeY"><img src="https://img.shields.io/badge/Video-E33122?logo=Youtube" alt='Overview Video'></a> <a href="https://community.intel.com/t5/Blogs/Tech-Innovation/Artificial-Intelligence-AI/Intel-Labs-Research-Work-Receives-Spotlight-Award-at-Top-AI/post/1575985"><img src="https://img.shields.io/badge/Blog-0071C5?logo=googledocs&logoColor=white" alt='Blog'></a> <a href="https://zhuanlan.zhihu.com/p/711361901"><img src="https://img.shields.io/badge/Zhihu-1767F5?logo=zhihu&logoColor=white" alt='Blog'></a> GitHub Repo stars

<!-- <a href="https://xuelunshen.com/gim"><img src="https://img.shields.io/badge/πŸ“Š_Zero--shot_Image_Matching_Evaluation Benchmark-75BC66" alt='Zero-shot Evaluation Benchmark'></a> --> <!-- <a href="https://xuelunshen.com/gim"><img src="https://img.shields.io/badge/Source_Code-black?logo=Github" alt='Github Source Code'></a> -->

<a href="https://en.xmu.edu.cn"><img src="https://img.shields.io/badge/XMU-183F9D?logo=Google%20Scholar&logoColor=white" alt='Intel'></a> <a href="https://www.intel.com"><img src="https://img.shields.io/badge/Labs-0071C5?logo=intel" alt='Intel'></a> <a href="https://www.dji.com"><img src="https://img.shields.io/badge/DJI-131313?logo=DJI" alt='Intel'></a>

</div>

βœ… TODO List

We are actively continuing with the remaining open-source work and appreciate everyone's attention.

πŸ€— Online demo

Go to Huggingface to quickly try our model online.

βš™οΈ Environment

I set up the running environment on a new machine using the commands listed below.

<p></p> <details> <summary><b>[ Click to show commands ]</b></summary>
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install albumentations==1.0.1 --no-binary=imgaug,albumentations
pip install pytorch-lightning==1.5.10
pip install opencv-python==4.5.3.56
pip install imagesize==1.2.0
pip install kornia==0.6.10
pip install einops==0.3.0
pip install loguru==0.5.3
pip install joblib==1.0.1
pip install yacs==0.1.8
pip install h5py==3.1.0
</details> <p></p>

πŸ”¨ Usage

  1. Clone the repository
git clone https://github.com/xuelunshen/gim.git
cd gim
  1. Download gim_dkm model weight from Google Drive or OneDrive

  2. Put it on the folder weights

  3. Run the following commands

<p></p> <details> <summary><b>[ Click to show commands ]</b></summary>
python demo.py --model gim_dkm

or

python demo.py --model gim_loftr

or

python demo.py --model gim_lightglue
</details> <p></p>
  1. The code will match a1.png and a2.png in the folder assets/demo,</br>and output a1_a2_match.png and a1_a2_warp.png.
<details> <summary> <b> [ Click to show <code>a1.png</code> and <code>a2.png</code> ] </b> </summary> <p float="left"> <img src="assets/demo/a1.png" width="25%" /> <img src="assets/demo/a2.png" width="25%" /> </p> </details> <details> <summary> <b> [ Click to show <code>a1_a2_match.png</code> ] </b> </summary> <p align="left"> <img src="assets/demo/_a1_a2_match.png" width="50%"> </p> <p><code>a1_a2_match.png</code> is a visualization of the match between the two images</p> </details> <details> <summary> <b> [ Click to show <code>a1_a2_warp.png</code> ] </b> </summary> <p align="left"> <img src="assets/demo/_a1_a2_warp.png" width="50%"> </p> <p><code>a1_a2_warp.png</code> shows the effect of projecting <code>image a2</code> onto <code>image a1</code> using homography</p> </details> <p></p> There are more images in the `assets/demo` folder, you can try them out. <p></p> <details> <summary> <b> [ Click to show other images ] </b> </summary> <p float="left"> <img src="assets/demo/b1.png" width="15%" /> <img src="assets/demo/b2.png" width="15%" /> <img src="assets/demo/c1.png" width="15%" /> <img src="assets/demo/c2.png" width="15%" /> <img src="assets/demo/d1.png" width="15%" /> <img src="assets/demo/d2.png" width="15%" /> </p> </details>

πŸ•‹ 3D Reconstruction

The code for 3D reconstruction in this repository is implemented based on hloc.

First, install colmap and pycolmap according to hloc's README.

Then, download the semantic-segmentation's model parameters from Google Drive or OneDrive and put the model parameters in the folder weights.

Next, create some folders. If you want to reconstruct a room in 3D, run the following command:

mkdir -p inputs/room/images

Then, put images of the room to be reconstructed in 3D into the images folder.

Finally, run the following command to perform a 3D reconstruction:

sh reconstruction.sh room

Tips:
At present, the code for 3D reconstruction defaults to pairing all images pairwise, and then performing image matching and reconstruction,
For better reconstruction results, it is recommended to modify the code according to the actual situation and adjust the paired images.

πŸ“Š ZEB: Zero-shot Evaluation Benchmark

  1. Create a folder named zeb.
  2. Download zip archives containing the ZEB data from the URL, put it into the zeb folder and unzip zip archives.
  3. Run the following commands
<p></p> <details> <summary><b>[ Click to show commands ]</b></summary>

The number 1 below represents the number of GPUs you want to use. If you want to use 2 GPUs, change the number 1 to 2.

sh TEST_GIM_DKM.sh 1

or

sh TEST_GIM_LOFTR.sh 1

or

sh TEST_GIM_LIGHTGLUE.sh 1

or

sh TEST_ROOT_SIFT.sh 1
</details> <p></p>
  1. Run the command python check.py to check if everything outputs "Good".
  2. Run the command python analysis.py --dir dump/zeb --wid gim_dkm --version 100h --verbose to get result.
  3. Paste the ZEB result to the Excel file named zeb.xlsx.
<p></p> <details> <summary><b><font color="red">[ Click to show πŸ“Š ZEB Result ]</font></b></summary>

The data in this table comes from the ZEB: <u>Zero-shot Evaluation Benchmark for Image Matching</u> proposed in the paper. This benchmark consists of 12 public datasets that cover a variety of scenes, weather conditions, and camera models, corresponding to the 12 test sequences starting from GL3 in the table.

<div align="left">Method</div><div align="left">Mean<br />AUC@5Β°<br />(%) ↑</div>GL3BLEETIETOKITWEASEANIGMULSCEICLGTA
Handcrafted
RootSIFT31.843.533.649.948.735.221.444.114.733.47.614.835.1
Sparse Matching
SuperGlue (in)21.619.216.038.237.722.020.840.813.721.40.89.618.8
SuperGlue (out)31.229.724.252.359.328.028.448.020.933.44.516.629.3
GIM_SuperGlue<br />(50h)34.343.234.258.761.029.028.348.418.834.82.815.436.5
LightGlue31.728.923.951.656.332.129.548.922.237.43.016.230.4
βœ…GIM_LightGlue<br />(100h)38.346.638.161.762.934.931.250.622.641.86.919.043.4
Semi-dense Matching
LoFTR (in)10.75.65.111.87.517.26.49.73.522.41.314.923.4
LoFTR (out)33.129.322.551.160.136.129.748.619.437.013.120.530.3
βœ…GIM_LoFTR<br />(50h)39.150.643.962.661.635.926.847.517.641.410.225.645.0
GIM_LoFTR<br />(100h)ToDO
Dense Matching
DKM (in)46.244.437.065.773.340.232.851.023.154.733.043.655.7
DKM (out)45.845.737.066.875.841.733.551.422.956.327.337.852.9
GIM_DKM<br />(50h)49.458.347.872.774.542.134.652.025.153.732.338.860.6
βœ…GIM_DKM<br />(100h)51.263.353.073.976.743.434.652.524.556.632.242.561.6
RoMa (in)46.746.039.368.877.236.531.150.420.857.833.841.757.6
RoMa (out)48.848.340.673.679.839.934.451.424.259.933.741.359.2
GIM_RoMaToDO
</details> <p></p>

πŸ–ΌοΈ Poster

<div align="center"> <a href="https://www.youtube.com/embed/FU_MJLD8LeY"> <img src="assets/demo/poster.png" width="50%" alt="Overview Video"> </a> </div>

πŸ“Œ Citation

If the paper and code from gim help your research, we kindly ask you to give a citation to our paper ❀️. Additionally, if you appreciate our work and find this repository useful, giving it a star ⭐️ would be a wonderful way to support our work. Thank you very much.

@inproceedings{
xuelun2024gim,
title={GIM: Learning Generalizable Image Matcher From Internet Videos},
author={Xuelun Shen and Zhipeng Cai and Wei Yin and Matthias MΓΌller and Zijun Li and Kaixuan Wang and Xiaozhi Chen and Cheng Wang},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024}
}

🌟 Star History

<a href="https://star-history.com/#xuelunshen/gim&Date"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=xuelunshen/gim&type=Date&theme=dark" /> <source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=xuelunshen/gim&type=Date" /> <img alt="Star History Chart" src="https://api.star-history.com/svg?repos=xuelunshen/gim&type=Date" /> </picture> </a>

License

This repository is under the MIT License. This content/model is provided here for research purposes only. Any use beyond this is your sole responsibility and subject to your securing the necessary rights for your purpose.