Awesome
<p align="center">Eliminating Warping Shakes for Unsupervised Online Video Stitching
🚩Recommendation
We have released the complete code of StabStitch++ (an extension of StabStitch) with better alignment, fewer distortions, and higher stability.
It contains the codes for training, inference, and multi-video stitching.
Introduction
This is the official implementation for StabStitch (ECCV2024).
Lang Nie<sup>1</sup>, Chunyu Lin<sup>1</sup>, Kang Liao<sup>2</sup>, Yun Zhang<sup>3</sup>, Shuaicheng Liu<sup>4</sup>, Rui Ai<sup>5</sup>, Yao Zhao<sup>1</sup>
<sup>1</sup> Beijing Jiaotong University {nielang, cylin, yzhao}@bjtu.edu.cn
<sup>2</sup> Nanyang Technological University
<sup>3</sup> Communication University of Zhejiang
<sup>4</sup> University of Electronic Science and Technology of China
<sup>5</sup> HAMO.AI
Feature
Nowadays, the videos captured from hand-held cameras are typically stable due to the advancements and widespread adoption of video stabilization in both hardware and software. Under such circumstances, we retarget video stitching to an emerging issue, warping shake, which describes the undesired content instability in non-overlapping regions especially when image stitching technology is directly applied to videos. To address it, we propose the first unsupervised online video stitching framework, named StabStitch, by generating stitching trajectories and smoothing them. The above figure shows the occurrence and elimination of warping shakes.
Video
Here, we provide a video (released on YouTube) to show the stitched results from StabStitch and other solutions.
📝 Changelog
- 2024.03.11: The paper of the arXiv version is online.
- 2024.07.11: We have replaced the original arXiv version with the final camera-ready version.
- 2024.07.11: The StabStitch-D dataset is available.
- 2024.07.11: The inference code and pre-trained models are available.
- 2024.07.12: We add a simple analysis of the limitations and prospects.
Dataset (StabStitch-D)
The details of the dataset can be found in our paper. (arXiv)
The dataset can be available at Google Drive or Baidu Cloud(Extraction code: 1234).
Code
Requirement
We implement StabStitch with one GPU of RTX4090Ti. Refer to environment.yml for more details.
Pre-trained model
The pre-trained models (spatial_warp.pth, temporal_warp.pth, and smooth_warp.pth) are available at Google Drive or Baidu Cloud (extraction code: 1234). Please download them and put them in the 'model' folder.
Test on the StabStitch-D dataset
Modify the test_path in Codes/test_online.py and run:
python test_online.py
Then, a folder named 'result' will be created automatically to store the stitched videos.
About the TPS warping function, we set two modes to warp frames as follows:
- 'FAST' mode: It uses F.grid_sample to implement interpolation. It's fast but may produce thin black boundaries.
- 'NORMAL' mode: It uses our implemented interpolation function. It's a bit slower but avoid the black boundaries.
You can change the mode here.
Calculate the metrics on the StabStitch-D dataset
Modify the test_path in Codes/test_metric.py and run:
python test_metric.py
Limitation and Future Prospect
Generalization
To test the model generalization, we adopt the pre-trained model (on the StabStitch-D dataset) to conduct some tests on traditional video stitching datasets. Surprisingly, it severely degrades and produces obvious distortions and artifacts, as illustrated in Figure (a) below. To further validate the generalization, we collect other video pairs from traditional video stitching datasets (over 30 video pairs) and retrain our model in the new dataset. As shown in Figure (b) below, it works well in the new dataset but fails to produce natural stitched videos on the StabStitch-D dataset.
Prospect
We found that performance degradation mainly occurs in the spatial warp model. Without corrected spatial warps, the subsequent smoothing process will amplify the distortion.
It then throws a question about how to ensure the model generalization in learning-based stitching models. A simple and intuitive idea is to establish a large-scale real-world stitching benchmark dataset with various complex scenes. It should benefit various stitching networks in the generalization. Another idea is to apply continuous learning to the field of stitching, enabling the network to work robustly across various datasets with different distributions
These are just a few simple proposals. We hope you, the intelligent minds in this field, can help to solve this problem and contribute to the advancement of this field. If you have some ideas and want to discuss them with me, please feel free to drop me an email. I’m open to any kinds of collaboration.
Meta
If you have any questions about this project, please feel free to drop me an email.
NIE Lang -- nielang@bjtu.edu.cn
@inproceedings{nie2025eliminating,
title={Eliminating Warping Shakes for Unsupervised Online Video Stitching},
author={Nie, Lang and Lin, Chunyu and Liao, Kang and Zhang, Yun and Liu, Shuaicheng and Ai, Rui and Zhao, Yao},
booktitle={European Conference on Computer Vision},
pages={390--407},
year={2025},
organization={Springer}
}
References
[1] S. Liu, P. Tan, L. Yuan, J. Sun, and B. Zeng. Meshflow: Minimum latency online video stabilization. ECCV, 2016.
[2] L. Nie, C. Lin, K. Liao, S. Liu, and Y. Zhao. Unsupervised Deep Image Stitching: Reconstructing Stitched Features to Images. TIP, 2021.
[3] L. Nie, C. Lin, K. Liao, S. Liu, and Y. Zhao. Parallax-Tolerant Unsupervised Deep Image Stitching. ICCV, 2023.