Awesome
Show Me What and Where has Changed? Question Answering and Grounding for Remote Sensing Change Detection
π [Project page] β π [arXiv] β πΎ [Dataset Download]
This repository is the official implementation:
Show Me What and Where has Changed? Question Answering and Grounding for Remote Sensing Change Detection
Ke Li, Fuyu Dong, Di Wang, Shaofeng Li, Quan Wang, Xinbo Gao, Tat-Seng Chua
Abstract
Remote sensing change detection aims to perceive changes occurring on the Earthβs surface from remote sensing data in different periods, and feed these changes back to humans. However, most existing methods only focus on detecting change regions, lacking the ability to interact with users to identify changes that the users expect. In this paper, we introduce a new task named Change Detection Question Answering and Grounding (CDQAG), which extends the traditional change detection task by providing interpretable textual answers and intuitive visual evidence. To this end, we construct the first CDQAG benchmark dataset, termed QAG-360K, comprising over 360K triplets of questions, textual answers, and corresponding high-quality visual masks. It encompasses 10 essential land-cover categories and 8 comprehensive question types, which provides a large-scale and diverse dataset for remote sensing applications. Based on this, we present VisTA, a simple yet effective baseline method that unifies the tasks of question answering and grounding by delivering both visual and textual answers. Our method achieves state-of-the-art results on both the classic CDVQA and the proposed CDQAG datasets. Extensive qualitative and quantitative experimental results provide useful insights for the development of better CDQAG models, and we hope that our work can inspire further research in this important yet underexplored direction.
<div align="center"> <img src="https://github.com/like413/VisTA/blob/main/fig/1.png?raw=true" width="100%" height="100%"/> </div><br/>π₯ Benchmark dataset QAG-360K
<div align="center"> <img src="https://github.com/like413/VisTA/blob/main/fig/2.png?raw=true" width="100%" height="100%"/> </div><br/>π Simple Baseline Model VisTA
<div align="center"> <img src="https://github.com/like413/VisTA/blob/main/fig/3.png?raw=true" width="100%" height="100%"/> </div><br/> <!-- ## Update - **(2024/11/1)** The benchmark method [VisTA](https://github.com/like413/VisTA) is released. - **(2024/11/1)** The first CDQAG dataset [QAG-360K](https://drive.google.com/drive/folders/1EiOJNr8bde7apUQwqoN6cjXWKxIz7QdI?usp=sharing) is released. ## Installation: The code is tested under CUDA 11.8, Pytorch 1.11.0 and Detectron2 0.6. 1. Install [Detectron2](https://github.com/facebookresearch/detectron2) following the [manual](https://detectron2.readthedocs.io/en/latest/) 2. Run `sh make.sh` under `gres_model/modeling/pixel_decoder/ops` 3. Install other required packages: `pip -r requirements.txt` 4. Prepare the dataset following `datasets/DATASET.md` ## Inference ``` python train_net.py \ --config-file configs/referring_swin_base.yaml \ --num-gpus 8 --dist-url auto --eval-only \ MODEL.WEIGHTS [path_to_weights] \ OUTPUT_DIR [output_dir] ``` ## Training ``` python train_net.py \ --config-file configs/referring_swin_base.yaml \ --num-gpus 8 --dist-url auto \ MODEL.WEIGHTS [path_to_weights] \ OUTPUT_DIR [path_to_weights] ``` Note: You can add your own configurations subsequently to the training command for customized options. For example: ``` SOLVER.IMS_PER_BATCH 48 SOLVER.BASE_LR 0.00001 ``` For the full list of base configs, see `configs/referring_R50.yaml` and `configs/Base-COCO-InstanceSegmentation.yaml` -->π Results
QAG-360K
<div align="center"> <img src="https://github.com/like413/VisTA/blob/main/fig/table1.png?raw=true" width="100%" height="100%"/> </div><br/>CDVQA
<div align="center"> <img src="https://github.com/like413/VisTA/blob/main/fig/table2.png?raw=true" width="100%" height="100%"/> </div><br/>π Acknowledgement
The dataset is based on HiUCD, SECOND, LEVIR-CD, and CDVQA. The code is based on CRIS. We thank the authors for their open-sourced datasets and codes and encourage users to cite their works when applicable.
π Citation
If you use our data or code in your research or find it is helpful, please cite this project.
@article{li2024show,
title={Show Me What and Where has Changed? Question Answering and Grounding for Remote Sensing Change Detection},
author={Li, Ke and Dong, Fuyu and Wang, Di and Li, Shaofeng and Wang, Quan and Gao, Xinbo and Chua, Tat-Seng},
journal={arXiv preprint arXiv:2410.23828},
year={2024}
}
License
Licensed under a Creative Commons Attribution-NonCommercial 4.0 International for Non-commercial use only. Any commercial use should get formal permission first.