Home

Awesome

Det-SAM2-pipeline

Our tech report in https://arxiv.org/abs/2411.18977

The Det-SAM2 project is a pipeline based on the Segment Anything Model 2 segmentation model (SAM2) that uses the YOLOv8 detection model to automatically generate prompts for SAM2. It further processes SAM2's segmentation results through scenario-specific business logic, achieving fully automated object tracking in videos without human intervention. This implementation is tailored for billiard table scenarios.

For the SAM2-compatible predictor, our core contributions include:

If you are optimizing SAM2 in an engineering context, we highly encourage you to refer to the implementation of Det-SAM2. Additionally, we built a complete pipeline using Det-SAM2 that handles business scenarios (billiard table) such as shot recognition, ball collision, and boundary rebound detection. Previously, traditional non-SAM2 tracking algorithms struggled to accurately address these three conditions in fast-moving billiard table scenarios.

Note: Our open-source scripts are annotated entirely in Chinese to facilitate development, without affecting functionality. If needed, you can use tools like ChatGPT to translate the annotations into your preferred language when referencing the script's functionality.

Installation


Our project is built based on version 2.1 of SAM2. The environment dependencies are almost identical, so you can deploy it by following the installation instructions for SAM2.1: https://github.com/facebookresearch/sam2?tab=readme-ov-file#installation.

In addition, there may be a few extra packages that need to be installed separately. Please refer to the error messages and install them as required (these are common packages and shouldn't be many).

Alternatively, you can directly access the image we have published on AutoDL at https://www.codewithgpu.com/i/motern88/Det-SAM2/Det-SAM2

Most of the executable files in our project are located in the det_sam2_inference folder under the root directory of segment-anything-2.

segment-anything-2/det_sam2_inference:
├──data
│   ├──Det-SAM2-Evaluation
│   │   ├──videos
│   │   ├──postprocess.jsonl  # annotation
│   ├──preload_memory_10frames  # read frames to build preload memory bank

├──det_weights
│   ├──train_referee12_960.pt  # yolov8n, our example weight in billiards scenario

├──eval_output/eval_result
│   ├──eval_results.json
│   ├──result_visualize.py  # visualize eval_results.json(eval_det-sam2.py output)

├──output_inference_state
│   ├──inference_state.pkl  # generated preload memory bank

├──pipeline_output
├──temp_output
│   ├──det_sam2_RT_output  # det_sam2_RT.py visualize output
│   ├──prompt_results  # SAM2 prompt (by detect model) visualize output 
│   ├──video_frames

Det_SAM2_pipeline.py  # Det-SAM2 + post-process pipeline
det_sam2_RT.py  # Det-SAM2 process function
eval_det-sam2.py  # find optimal parameter combination
frames2video.py
postprocess_det_sam2.py  # post-processing example (billiards scenario)

Additionally, Det-SAM2 introduces the following modifications compared to SAM2 (our changes only add new features without removing any of the official functionalities implemented in SAM2.1):

segment-anything-2/sam2:
├──modeling
│   ├──sam2_base.py
├──utils
│   ├──misc.py
sam2_video_predictor.py

Checkpoints

We use the SAM2.1 weights:

For the detection model, you can use any YOLOv8 weights of your choice or start with our billiard detection model trained specifically for billiard scenarios:

Getting Started


The scripts below demonstrate post-processing judgment in a billiard scenario by default. If you only need to use the Det-SAM2 framework, simply run det_sam2_RT.

1.Execute Det-SAM2 segmentation mask prediction and post-processing scripts separately:

Use the detection model to automatically provide prompts for SAM2, which then performs segmentation predictions on the video:

python det_sam2_inference/det_sam2_RT.py

The det_sam2_RT.py script defines the VideoProcessor class (parameter settings are explained below), with its primary method being VideoProcessor.run().

After running det_sam2_RT.py and saving the segmentation result dictionary self.video_segments, use the post-processing script for business logic analysis on the segmented masks:

python det_sam2_inference/postprocess_det_sam2.py

The postprocess_det_sam2.py script defines the VideoPostProcessor class (parameter settings are explained below), with its primary method being VideoPostProcessor.run().

2.Run the end-to-end pipeline script (supports constant GPU and memory usage), enabling asynchronous parallel inference of segmentation masks and post-processing judgments.

Execute the full pipeline script to infer long videos in one go:

python det_sam2_inference/Det_SAM2_pipeline.py

The Det_SAM2_pipeline.py script defines the DetSAM2Pipeline class (parameter settings explained below). This class processes real-time video stream inference asynchronously and in parallel, combining the segmentation backbone (VideoProcessor) and post-processing (VideoPostProcessor) functionalities. Its primary method is DetSAM2Pipeline.inference().

3.Run the automated evaluation script to explore various parameter combinations:

python det_sam2_inference/eval_det-sam2.py

The eval_det-sam2.py script defines the EvalDetSAM2PostProcess class (parameter settings explained below). This class loops through multiple candidate parameter combinations to infer the entire evaluation dataset. For each sample inference, the segmentation backbone (VideoProcessor.run()) and post-processing (VideoPostProcessor.run()) are executed sequentially. The evaluation results are collected and written to eval_results.json. The primary method for this script is EvalDetSAM2PostProcess.eval_all_settings().

After completing the eval_results.json file, you can visualize the evaluation results for different parameter settings:

python det_sam2_inference/eval_output/eval_result/result_visualize.py

Parameter Parsing


Below is an explanation of the parameters for the key classes and their main functions in the scripts det_sam2_RT.py, postprocess_det_sam2.py, Det_SAM2_pipeline.py, and eval_det-sam2.py.