8 results on '"Mei, Shaohui"'
Search Results
2. Graph Convolutional Dictionary Selection With L ₂ , ₚ Norm for Video Summarization.
- Author
-
Ma, Mingyang, Mei, Shaohui, Wan, Shuai, Wang, Zhiyong, Hua, Xian-Sheng, and Feng, David Dagan
- Subjects
- *
VIDEO summarization , *ORTHOGONAL matching pursuit , *IMAGE reconstruction - Abstract
Video Summarization (VS) has become one of the most effective solutions for quickly understanding a large volume of video data. Dictionary selection with self representation and sparse regularization has demonstrated its promise for VS by formulating the VS problem as a sparse selection task on video frames. However, existing dictionary selection models are generally designed only for data reconstruction, which results in the neglect of the inherent structured information among video frames. In addition, the sparsity commonly constrained by $L_{2,1}$ norm is not strong enough, which causes the redundancy of keyframes, i.e., similar keyframes are selected. Therefore, to address these two issues, in this paper we propose a general framework called graph convolutional dictionary selection with $L_{2,p}$ ($0< p\leq 1$) norm (GCDS $_{2,p}$) for both keyframe selection and skimming based summarization. Firstly, we incorporate graph embedding into dictionary selection to generate the graph embedding dictionary, which can take the structured information depicted in videos into account. Secondly, we propose to use $L_{2,p}$ ($0< p\leq 1$) norm constrained row sparsity, in which $p$ can be flexibly set for two forms of video summarization. For keyframe selection, $0< p< 1$ can be utilized to select diverse and representative keyframes; and for skimming, $p=1$ can be utilized to select key shots. In addition, an efficient iterative algorithm is devised to optimize the proposed model, and the convergence is theoretically proved. Experimental results including both keyframe selection and skimming based summarization on four benchmark datasets demonstrate the effectiveness and superiority of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
3. Similarity Based Block Sparse Subset Selection for Video Summarization.
- Author
-
Ma, Mingyang, Mei, Shaohui, Wan, Shuai, Wang, Zhiyong, Feng, David Dagan, and Bennamoun, Mohammed
- Subjects
- *
VIDEO summarization , *SUBSET selection , *VECTOR spaces , *BLOCK codes , *DEEP learning , *VIDEOS , *SPARSE matrices - Abstract
Video summarization (VS) is generally formulated as a subset selection problem where a set of representative keyframes or key segments is selected from an entire video frame set. Though many sparse subset selection based VS algorithms have been proposed in the past decade, most of them adopt linear sparse formulation in the explicit feature vector space of video frames, and don’t consider the local or global relationships among frames. In this paper, we first extend the conventional sparse subset selection for VS into kernel block sparse subset selection (KBS3) to utilize the advantage of kernel sparse coding and introduce a local inter-frame relationship through packing of frame blocks. Going a step further, we propose a similarity based block sparse subset selection (SB2S3) model by applying a specially designed transformation matrix on the KBS3 model in order to introduce a kind of global inter-frame relationship through the similarity. Finally, a greedy pursuit based algorithm is devised for the proposed NP-hard model optimization. The proposed SB2S3 has the following advantages: 1) through the similarity between each frame and any other frame, the global relationship among all frames can be considered; 2) through block sparse coding, the local relationship of adjacent frames is further considered; and 3) it has a wider application, since features can derive similarity, but not vice versa. It is believed that the effect of modeling such global and local relationships among frames in this paper, is similar to that of modeling the long-range and short-range dependencies among frames in deep learning based methods. Experimental results on three benchmark datasets have demonstrated that the proposed approach is superior to not only other sparse subset selection based VS methods but also most unsupervised deep-learning based VS methods. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
4. Keyframe Extraction From Laparoscopic Videos via Diverse and Weighted Dictionary Selection.
- Author
-
Ma, Mingyang, Mei, Shaohui, Wan, Shuai, Wang, Zhiyong, Ge, Zongyuan, Lam, Vincent, and Feng, Dagan
- Subjects
SURGICAL robots ,LAPAROSCOPIC surgery ,VIDEOS ,CONVOLUTIONAL neural networks ,MINIMALLY invasive procedures ,MEDICAL personnel - Abstract
Laparoscopic videos have been increasingly acquired for various purposes including surgical training and quality assurance, due to the wide adoption of laparoscopy in minimally invasive surgeries. However, it is very time consuming to view a large amount of laparoscopic videos, which prevents the values of laparoscopic video archives from being well exploited. In this paper, a dictionary selection based video summarization method is proposed to effectively extract keyframes for fast access of laparoscopic videos. Firstly, unlike the low-level feature used in most existing summarization methods, deep features are extracted from a convolutional neural network to effectively represent video frames. Secondly, based on such a deep representation, laparoscopic video summarization is formulated as a diverse and weighted dictionary selection model, in which image quality is taken into account to select high quality keyframes, and a diversity regularization term is added to reduce redundancy among the selected keyframes. Finally, an iterative algorithm with a rapid convergence rate is designed for model optimization, and the convergence of the proposed method is also analyzed. Experimental results on a recently released laparoscopic dataset demonstrate the clear superiority of the proposed methods. The proposed method can facilitate the access of key information in surgeries, training of junior clinicians, explanations to patients, and archive of case files. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
5. Video summarization via block sparse dictionary selection.
- Author
-
Ma, Mingyang, Mei, Shaohui, Wan, Shuai, Hou, Junhui, Wang, Zhiyong, and Feng, David Dagan
- Subjects
- *
ORTHOGONAL matching pursuit , *VIDEO processing , *VIDEOS - Abstract
The explosive growth of video data has raised new challenges for many video processing tasks such as video browsing and retrieval, hence, effective and efficient video summarization (VS) is urgently demanded to automatically summarize a video into a succinct version. Recent years have witnessed the advancements of sparse representation based approaches for VS. However, video frames are analyzed individually for keyframe selection in existing methods, which could lead to redundancy among selected keyframes and poor robustness to outlier frames. Due to that adjacent frames are visually similar, candidate keyframes often occur in temporal blocks, in addition to sparse presence. Therefore, in this paper, the block-sparsity of candidate keyframes is taken into consideration, by which the VS problem is formulated as a block sparse dictionary selection model. Moreover, a simultaneous block version of Orthogonal Matching Pursuit (SBOMP) algorithm is designed for model optimization. Two keyframe selection strategies are also explored for each block. Experimental results on two benchmark datasets, namely VSumm and TVSum datasets, demonstrate that the proposed SBOMP based VS method clearly outperforms several state-of-the-art sparse representation based methods in terms of F-score, redundancy among keyframes and robustness to outlier frames. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
6. Video Summarization with Global and Local Features.
- Author
-
Guan, Genliang, Wang, Zhiyong, Yu, Kaimin, Mei, Shaohui, He, Mingyi, and Feng, Dagan
- Abstract
Video summarization has been crucial for effective and efficient access of video content due to the ever increasing amount of video data. Most of the existing key frame based summarization approaches represent individual frames with global features, which neglects the local details of visual content. Considering that a video generally depicts a story with a number of scenes in different temporal order and shooting angles, we formulate scene summarization as identifying a set of frames which best covers the key point pool constructed from the scene. Therefore, our approach is a two-step process, identifying scenes and selecting representative content for each scene. Global features are utilized to identify scenes through clustering due to the visual similarity among video frames of the same scene, and local features to summarize each scene. We develop a key point based key frame selection method to identify representative content of a scene, which allows users to flexibly tune summarization length. Our preliminary results indicate that the proposed approach is very promising and potentially robust to clustering based scene identification. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
7. Video summarization via minimum sparse reconstruction.
- Author
-
Mei, Shaohui, Guan, Genliang, Wang, Zhiyong, Wan, Shuai, He, Mingyi, and Dagan Feng, David
- Subjects
- *
IMAGE reconstruction , *VIDEO processing , *WEB browsing , *FEATURE extraction , *PATTERN recognition systems - Abstract
The rapid growth of video data demands both effective and efficient video summarization methods so that users are empowered to quickly browse and comprehend a large amount of video content. In this paper, we formulate the video summarization task with a novel minimum sparse reconstruction (MSR) problem. That is, the original video sequence can be best reconstructed with as few selected keyframes as possible. Different from the recently proposed convex relaxation based sparse dictionary selection method, our proposed method utilizes the true sparse constraint L 0 norm, instead of the relaxed constraint L 2 , 1 norm, such that keyframes are directly selected as a sparse dictionary that can well reconstruct all the video frames. An on-line version is further developed owing to the real-time efficiency of the proposed MSR principle. In addition, a percentage of reconstruction (POR) criterion is proposed to intuitively guide users in obtaining a summary with an appropriate length. Experimental results on two benchmark datasets with various types of videos demonstrate that the proposed methods outperform the state of the art. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
8. Multi-scale deep feature fusion based sparse dictionary selection for video summarization.
- Author
-
Wu, Xiao, Ma, Mingyang, Wan, Shuai, Han, Xiuxiu, and Mei, Shaohui
- Subjects
- *
VIDEO summarization , *DEEP learning , *COMPUTER vision , *CONVOLUTIONAL neural networks , *GREEDY algorithms - Abstract
The explosive growth of video data constitutes a series of new challenges in computer vision, and the function of video summarization (VS) is becoming more and more prominent. Recent works have shown the effectiveness of sparse dictionary selection (SDS) based VS, which selects a representative frame set to sufficiently reconstruct a given video. Existing SDS based VS methods use conventional handcrafted features or single-scale deep features, which could diminish their summarization performance due to the underutilization of frame feature representation. Deep learning techniques based on convolutional neural networks (CNNs) exhibit powerful capabilities among various vision tasks, as the CNN provides excellent feature representation. Therefore, in this paper, a multi-scale deep feature fusion based sparse dictionary selection (MSDFF-SDS) is proposed for VS. Specifically, multi-scale features include the directly extracted features from the last fully connected layer and the global average pooling (GAP) processed features from intermediate layers, then VS is formulated as a problem of minimizing the reconstruction error using the multi-scale deep feature fusion. In our formulation, the contribution of each scale of features can be adjusted by a balance parameter, and the row-sparsity consistency of the simultaneous reconstruction coefficient is used to select as few keyframes as possible. The resulting MSDFF-SDS model is solved by using an efficient greedy pursuit algorithm. Experimental results on two benchmark datasets demonstrate that the proposed MSDFF-SDS improves the F-score of keyframe based summarization more than 3% compared with the existing SDS methods, and performs better than most deep-learning methods for skimming based summarization. • Usage of multi-scale features from neural networks for video summarization. • Multi-scale deep feature fusion based sparse dictionary selection. • Efficient greedy optimization for video summarization. • Explorations of feature configurations, network architectures, and pooling strategies. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.