Awesome

A Distractor-Aware Memory (DAM) for <br> Visual Object Tracking with SAM2

Jovana Videnović, Alan Lukežič, and Matej Kristan

Faculty of Computer and Information Science, University of Ljubljana

[Preprint] [Project page ] [DiDi dataset]

https://github.com/user-attachments/assets/e90158ba-5c02-489d-9401-26f77f0592b0

https://github.com/user-attachments/assets/0203a96a-c5c9-46f8-90d6-2445d2c5ad73

</div>

Abstract

Memory-based trackers such as SAM2 demonstrate remarkable performance, however still struggle with distractors. We propose a new plug-in distractor-aware memory (DAM) and management strategy that substantially improves tracking robustness. The new model is demonstrated on SAM2.1, leading to SAM2.1++, which sets a new state-of-the-art on six benchmarks, including the most challenging VOT/S benchmarks without additional training. We also propose a new distractor-distilled (DiDi) dataset to better study the distractor problem. See the preprint for more details.

Installation

To set up the repository locally, follow these steps:

Clone the repository and navigate to the project directory:

git clone https://github.com/jovanavidenovic/DAM4SAM.git
cd DAM4SAM

Create a new conda environment and activate it:

 conda create -n dam4sam_env python=3.10.15
 conda activate dam4sam_env

Install torch and other dependencies:

pip install torch==2.1.0 torchvision==0.16.0 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

If you experience problems as mentioned here, including ImportError: cannot import name '_C' from 'sam2', run the following command in the repository root: python setup.py build_ext --inplace Note that you can still use the repository even with the warning above, but some postprocessing SAM2 steps may be skipped. For more information, consult SAM2 installation instructions.

Getting started

Model checkpoints can be downloaded by running:

cd checkpoints && \
./download_ckpts.sh

Our model configs are available in sam2/ folder.

Running and evaluation

This repository supports evaluation on the following datasets: DiDi, VOT2020, VOT2022, LaSot, LaSoText and GoT-10k. Support for running on VOTS2024 will be added soon.

A quick demo

A demo script run_bbox_example.py is provided to quickly run the tracker on a given directory containing a sequence of frames. The script first asks user to draw an initi bounding box, which is used to automatically estimate a segmentation mask on an init frame. The script is run using the following command:

CUDA_VISIBLE_DEVICES=0 python run_bbox_example.py --dir <frames-dir> --ext <frame-ext> --output_dir <output-dir>

<frames-dir> is a path to the directory containing sequence frames, <frame-ext> is a frame extension e.g., jpg, png, etc. (this is an optional argument, default: jpg), <output-dir> is a path to the output directory, where predicted segmentation masks for all frames will be saved. The --output_dir is an optional argument, if not given, the script will just visualize the results.

DiDi dataset

Run on a single sequence and visualize results:

CUDA_VISIBLE_DEVICES=0 python run_on_didi.py --dataset_path <path-to-didi> --sequence <sequence-name>

Run on the whole dataset and save results to disk:

CUDA_VISIBLE_DEVICES=0 python run_on_didi.py --dataset_path <path-to-didi> --output_dir <output-dir-path>

After obtaining the raw results on DiDi using previous command, you can compute performance measures. This is done using the VOT toolkit. We thus provide the empty vot workspace in the didi-workspace directory. The sequences from DiDi dataset should be placed into the didi-workspace/sequences directory. Alternatively, you can just create a symbolic link named sequences in the didi-workspace, pointed to the DiDi dataset on your disk. The raw results must be placed in the results subfolder, e.g. didi-workspace/results/DAM4SAM. If the results were obtained using run_on_didi.py you should move them to the workspace using the following command:

python move_didi_results.py --dataset_path <path-to-didi> --src <source-results-directory> --dst ./didi-workspace/results/DAM4SAM

The <source-results-directory> is the path to the directory used as output_dir argument in run_on_didi.py script. The move_didi_results.py script does not only move the results, but also convert them into bounding boxes since DiDi is a bounding box dataset. Finally, the performance measures are computed using the following commands:

vot analysis --workspace <path-to-didi-workspace> --format=json DAM4SAM
vot report --workspace <path-to-didi-workspace> --format=html DAM4SAM

Performance measures are available in the generated report under didi-workspace/reports. Note: if running the analysis multiple times, remember to clear the cache directory.

VOT2020 and VOT2022 Challenges

Create VOT workspace (for more info see instructions here). For VOT2020 use:

vot initialize vot2020/shortterm --workspace <workspace-dir-path>

and for VOT2022 use:

vot initialize vot2022/shortterm --workspace <workspace-dir-path>

You can use integration files from vot_integration/vot2022_st folder to run only on the selected experiment. We provided two stack files: one for the baseline and one for the real-time experiments. After workspace creation and tracker integration you can evaluate the tracker on VOT using the following commands:

vot evaluate --workspace <path-to-vot-workspace> DAM4SAM
vot analysis --workspace <path-to-vot-workspace> --format=json DAM4SAM
vot report --workspace <path-to-vot-workspace> --format=html DAM4SAM

Bounding box datasets

Running our tracker is supported on LaSot, LaSoText and GoT-10k datasets. Tracker is initialized with masks, which are obtained using SAM2 image predictor, from ground truth initialization bounding boxes. You can download them for all datasets at this link. Before running the tracker, set the corresponding paths to the datasets and the directory with ground truth masks in dam4sam_config.yaml (in the repo root directory).

Run on the whole dataset and save results to disk (arguments for the argument <dataset-name> can be: got | lasot | lasot_ext):

CUDA_VISIBLE_DEVICES=0 python run_on_box_dataset.py --dataset_name=<dataset-name> --output_dir=<output-dir-path>

Run on a single sequence and visualize results:

CUDA_VISIBLE_DEVICES=0 python run_on_box_dataset.py --dataset_name=<dataset-name> --sequence=<sequence-name>

DiDi: A distractor-distilled dataset

DiDi is a distractor-distilled tracking dataset created to address the limitation of low distractor presence in current visual object tracking benchmarks. To enhance the evaluation and analysis of tracking performance amidst distractors, we have semi-automatically distilled several existing benchmarks into the DiDi dataset. The dataset is available for download at this link.

<p align="center"> <img src="imgs/didi-examples.jpg" width="80%"> </p> <div align="center"> <i>Example frames from the DiDi dataset showing challenging distractors. Targets are denoted by green bounding boxes.</i> </div>

Experimental results on DiDi

See the project page for qualitative comparison.

Model	Quality	Accuracy	Robustness
TransT	0.465	0.669	0.678
KeepTrack	0.502	0.646	0.748
SeqTrack	0.529	0.714	0.718
AQATrack	0.535	0.693	0.753
AOT	0.541	0.622	0.852
Cutie	0.575	0.704	0.776
ODTrack	0.608	0.740 :1st_place_medal:	0.809
SAM2.1Long	0.646	0.719	0.883
SAM2.1	0.649 :3rd_place_medal:	0.720	0.887 :3rd_place_medal:
SAMURAI	0.680 :2nd_place_medal:	0.722 :3rd_place_medal:	0.930 :2nd_place_medal:
SAM2.1++ (ours)	0.694 :1st_place_medal:	0.727 :2nd_place_medal:	0.944 :1st_place_medal:

Acknowledgments

Our work is built on top of SAM 2 by Meta FAIR.