1. Video object segmentation through semantic visual words matching.
- Author
-
Hao, Chuanyan, Chen, Yadang, Wu, Weimin, Yang, Zhi-Xin, and Wu, Enhua
- Subjects
COMPUTER vision ,VISUAL fields ,ERROR correction (Information theory) ,VIDEOS ,VOCABULARY ,OCCLUSION (Chemistry) - Abstract
Video object segmentation (VOS) has been widely used in the fields of computer vision. However, existing VOS algorithms have drawbacks, such as difficulty with object deformation, occlusion, and fast motion. We therefore propose an effective VOS algorithm based on semantic visual words matching. Specifically, given the support frame and its corresponding mask, the frame is firstly input to the encoder with an embedding layer, and then a clustering algorithm is followed to generate a group of semantic visual words according to its mask. For a query frame to be segmented, a matching operation is performed against words generated from the support frame. In this manner, each pixel on query frame can be classified into different object categories by the obtained similarity. What's more, a self-attention mechanism is applied to enhance the embedding features in order to capture the global dependencies before the words matching. For further handling the object changing and global mismatch problems, an online update and correction mechanism are also employed in our method. Experiments show that our proposed method achieved competitive results on the DAVIS 2016 and DAVIS 2017 datasets. J&F-mean, the mean value between regional similarity and contour accuracy, reached 83.2% and 72.3% on DAVIS 2016 and DAVIS 2017, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF