1,202 results on '"Skeleton (category theory)"'
Search Results
2. Identification Conditions for the Solvability of NP-Complete Problems for the Class of Prefractal Graphs
- Author
-
Alexander V. Timoshenko, Rasul Kochkarov, and Azret Kochkarov
- Subjects
Discrete mathematics ,Polynomial ,discrete problems ,Computer science ,Monochromatic triangle ,Information technology ,Clique (graph theory) ,Skeleton (category theory) ,T58.5-58.64 ,Hamiltonian path ,Telecommunications network ,symbols.namesake ,np-complete problems ,Control and Systems Engineering ,Independent set ,Signal Processing ,symbols ,solvability conditions ,Production (computer science) ,pre-fractal graphs ,Software ,MathematicsofComputing_DISCRETEMATHEMATICS - Abstract
Modern network systems (unmanned aerial vehicles groups, social networks, network production chains, transport and logistics networks, communication networks, cryptocurrency networks) are distinguished by their multi-element nature and the dynamics of connections between its elements. A number of discrete problems on the construction of optimal substructures of network systems described in the form of various classes of graphs are NP-complete problems. In this case, the variability and dynamism of the structures of network systems leads to an "additional" complication of the search for solutions to discrete optimization problems. At the same time, for some subclasses of dynamical graphs, which are used to model the structures of network systems, conditions for the solvability of a number of NP-complete problems can be distinguished. This subclass of dynamic graphs includes pre-fractal graphs. The article investigates NP-complete problems on pre-fractal graphs: a Hamiltonian cycle, a skeleton with the maximum number of pendant vertices, a monochromatic triangle, a clique, an independent set. The conditions under which for some problems it is possible to obtain an answer about the existence and to construct polynomial (when fixing the number of seed vertices) algorithms for finding solutions are identified.
- Published
- 2022
- Full Text
- View/download PDF
3. Adaptive Graph Convolutional Network With Adversarial Learning for Skeleton-Based Action Prediction
- Author
-
Nanjun Li, Guangxin Li, Faliang Chang, and Chunsheng Liu
- Subjects
Action prediction ,business.industry ,Computer science ,Skeleton (category theory) ,Machine learning ,computer.software_genre ,Variety (cybernetics) ,Adversarial system ,Action (philosophy) ,Artificial Intelligence ,Action recognition ,Graph (abstract data type) ,Artificial intelligence ,Latency (engineering) ,business ,computer ,Software - Abstract
The purpose of action prediction is to recognize an action before it is completed to reduce recognition latency. Because action prediction has lower latency than action recognition, it can be applied to a variety of surveillance scenarios and responds faster. However, action prediction is more difficult because it cannot obtain the complete action execution. In this paper, we study the action prediction which is based on skeleton data and propose a new network called adaptive graph convolutional network with adversarial learning (AGCN-AL) for it. The AGCN-AL uses adversarial learning to make the features of the partial sequences as similar as possible to the features of the full sequences to learn the potential global information in the partial sequences. Besides, partial sequences with different numbers of frames contain different amounts of information. We introduce temporal-dependent loss functions to prevent the network from paying too much attention to partial sequences whose observation ratios are small, and ignoring partial sequences whose observation ratios are large. Moreover, the AGCN-AL is combined with the local AGCN into a two-stream network to enhance the prediction, proving that the local information and the potential global information in partial sequences are complementary. We evaluate the proposed approach on two datasets and show excellent performance.
- Published
- 2022
- Full Text
- View/download PDF
4. A Cross View Learning Approach for Skeleton-Based Action Recognition
- Author
-
Xinming Zhang and Hui Zheng
- Subjects
Computer science ,business.industry ,Pattern recognition ,Skeleton (category theory) ,Discriminative model ,Classifier (linguistics) ,Media Technology ,Key (cryptography) ,Action recognition ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Pose ,Block (data storage) - Abstract
With the prevalence of accessible multi-modal sensors and the maturity of pose estimation algorithms, skeleton-based action recognition has gradually become the mainstream of human action recognition (HAR). The key issue is to mine the correlations and dependencies between different joints and bones. In this paper, we propose a cross view learning approach. First, the static and dynamic representations of skeletons, from two different views (joints and bones), are calculated and aggregated respectively. Then, the integrated representations of these two views are used as parallel inputs to the cross view learning model, which mainly includes two blocks, namely a multi-scale learning block and a multi-view fusion block. The former is used to excavate the intra-view’s discriminative and comprehensive features, and the latter is utilized to capture the complementary representations of the inter-view. Finally, the fused representations are input to the classifier for action recognition. It has been experimentally proven that our proposed approach outperforms several state-of-the-art baseline methods and achieves a very competitive performance.
- Published
- 2022
- Full Text
- View/download PDF
5. Multi-Stream Interaction Networks for Human Action Recognition
- Author
-
Baosheng Yu, Linlin Zhang, Jiaqi Li, Dongyue Chen, and Haoran Wang
- Subjects
genetic structures ,Computer science ,business.industry ,Human body ,Multi stream ,Skeleton (category theory) ,Object (computer science) ,Human skeleton ,medicine.anatomical_structure ,Robustness (computer science) ,Media Technology ,medicine ,RGB color model ,Action recognition ,Computer vision ,Artificial intelligence ,Electrical and Electronic Engineering ,business - Abstract
Skeleton-based human action recognition has received extensive attention due to its efficiency and robustness to complex backgrounds. Though the human skeleton can accurately capture the dynamics of human poses, it fails to recognize human actions induced by the interaction between human and objects, making it is of great importance to further explore the interaction between the human and objects for human action recognition. In this paper, we devise the multi-stream interaction networks (MSIN), to simultaneously explore the dynamics of human skeleton, objects, and the interaction between human and objects. Specifically, apart from the traditional human skeleton stream, 1) the second stream explores the dynamics of object appearance from the objects surrounding the human body joints; and 2) the third stream captures the dynamics of object position in regard to the distance between the object and different human body joints. Experimental results on three popular skeleton-based human action recognition datasets, NTU RGB+D, NTU RGB+D 120, and SYSU, demonstrate the effectiveness of the proposed method, especially for recognizing the human actions with human-object interactions.
- Published
- 2022
- Full Text
- View/download PDF
6. Pose-free assembly retrieval based on spatial-contact skeleton
- Author
-
Jiazhen Pang, Li Yuan, Jie Zhang, and Yu Jian-feng
- Subjects
Surface (mathematics) ,Computer science ,business.industry ,Mechanical Engineering ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Aerospace Engineering ,Centroid ,Skeleton (category theory) ,Measure (mathematics) ,Set (abstract data type) ,Hausdorff distance ,Information engineering ,Computer vision ,Artificial intelligence ,Invariant (mathematics) ,business - Abstract
In complicated product industry such as aircraft manufacturing, an assembly model contains abundant engineering information for use in design, manufacture, and maintenance. Assembly retrieval can be used to find relevant models for knowledge reuse. However, an assembly with rotatable joints may have many poses, which brings difficulty to assembly retrieval, since there is no pose principle for assembly design. Therefore, focused on rotatable joints in assembly, a skeleton-based descriptor for pose-free assembly retrieval is proposed. The centroid points of part surfaces and contact faces in an assembly are extracted to construct a spatial-contact skeleton. The skeleton-based distance is proposed to measure the distance between two surface points, which is invariant to the rotatable joints. The distribution of skeleton distances between two parts is used to describe the pair. Considering a part paired with all other parts in the assembly, the set of part pairs is used to represent a part, and the modified Hausdorff distance is used to measure the dissimilarity between parts for assembly retrieval. Experiments are conducted to compare the accuracy of the proposed descriptor to holistic and structureless descriptors. The proposed method is shown to retrieve assemblies with similar parts and structures regardless of their rotatable joints.
- Published
- 2022
- Full Text
- View/download PDF
7. Graph2Net: Perceptually-Enriched Graph Learning for Skeleton-Based Action Recognition
- Author
-
Cong Wu, Xiaojun Wu, and Josef Kittler
- Subjects
Sequence ,Theoretical computer science ,Computer science ,Skeleton (category theory) ,Human skeleton ,medicine.anatomical_structure ,Media Technology ,Feature (machine learning) ,medicine ,Graph (abstract data type) ,Adjacency list ,Electrical and Electronic Engineering ,Representation (mathematics) ,Spatial analysis - Abstract
Skeleton representation has attracted a great deal of attention recently as an extremely robust feature for human action recognition. However, its non-Euclidean structural characteristics raise new challenges for conventional solutions. Recent studies have shown that there is a native superiority in modeling spatiotemporal skeleton information with a Graph Convolutional Network (GCN). Nevertheless, the skeleton graph modeling normally focuses on the physical adjacency of the elements of the human skeleton sequence, which contrasts with the requirement to provide a perceptually meaningful representation. To address this problem, in this paper, we propose a perceptually-enriched graph learning method by introducing innovative features to spatial and temporal skeleton graph modeling. For the spatial information modeling, we incorporate a Local-Global Graph Convolutional Network (LG-GCN) that builds a multifaceted spatial perceptual representation. This helps to overcome the limitations caused by over-reliance on the spatial adjacency relationships in the skeleton. For temporal modeling, we present a Region-Aware Graph Convolutional Network (RA-GCN), which directly embeds the regional relationships conveyed by a skeleton sequence into a temporal graph model. This innovation mitigates the deficiency of the original skeleton graph models. In addition, we strengthened the ability of the proposed channel modeling methods to extract multi-scale representations. These innovations result in a lightweight graph convolutional model, referred to as Graph2Net, that simultaneously extends the spatial and temporal perceptual fields, and thus enhances the capacity of the graph model to represent skeleton sequences. We conduct extensive experiments on NTU-RGB+D 60&120, Northwestern-UCLA, and Kinetics-400 datasets to show that our results surpass the performance of several mainstream methods while limiting the model complexity and computational overhead.
- Published
- 2022
- Full Text
- View/download PDF
8. Cross-Domain Self-Supervised Complete Geometric Representation Learning for Real-Scanned Point Cloud Based Pathological Gait Analysis
- Author
-
Guang-Zhong Yang, Yao Guo, Benny Lo, Xiao Gu, and British Council (UK)
- Subjects
Ground truth ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Point cloud ,Pattern recognition ,Cloud Computing ,Skeleton (category theory) ,Computer Science Applications ,Domain (software engineering) ,Activity recognition ,Health Information Management ,Discriminative model ,Depth map ,Humans ,Artificial intelligence ,Electrical and Electronic Engineering ,Gait Analysis ,business ,Gait ,Pose ,Algorithms ,ComputingMethodologies_COMPUTERGRAPHICS ,Biotechnology - Abstract
Accurate lower-limb pose estimation is a prerequisite of skeleton based pathological gait analysis. To achieve this goal in free-living environments for long-term monitoring, single depth sensor has been proposed in research. However, the depth map acquired from a single viewpoint encodes only partial geometric information of the lower limbs and exhibits large variations across different viewpoints. Existing off-the-shelf three-dimensional (3D) pose tracking algorithms and public datasets for depth based human pose estimation are mainly targeted at activity recognition applications. They are relatively insensitive to skeleton estimation accuracy, especially at the foot segments. Furthermore, acquiring ground truth skeleton data for detailed biomechanics analysis also requires considerable effort. To address these issues, we propose a novel cross-domain self-supervised complete geometric representation learning framework, with knowledge transfer from the unlabelled synthetic point clouds of full lower-limb surfaces. The proposed method can significantly reduce the number of ground truth skeletons (with only 1\%) in the training phase, meanwhile ensuring accurate and precise pose estimation and capturing discriminative features across different pathological gait patterns compared to other methods.
- Published
- 2022
- Full Text
- View/download PDF
9. Multi‐stream adaptive spatial‐temporal attention graph convolutional network for skeleton‐based action recognition
- Author
-
Yu Lubin, Qiliang Du, Jameel Ahmed Bhutto, and Lianfang Tian
- Subjects
business.industry ,Computer science ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Multi stream ,Skeleton (category theory) ,computer vision ,Computer graphics ,Space-time adaptive processing ,QA76.75-76.765 ,convolutional neural nets ,graphics processing units ,computer graphics ,Action recognition ,Graph (abstract data type) ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Computer software ,business ,space‐time adaptive processing ,Software - Abstract
Skeleton‐based action recognition algorithms have been widely applied to human action recognition. Graph convolutional networks (GCNs) generalize convolutional neural networks (CNNs) to non‐Euclidean graphs and achieve significant performance in skeleton‐based action recognition. However, existing GCN‐based models have several issues, such as the topology of the graph is defined based on the natural skeleton of the human body, which is fixed during training, and it may not be applied to different layers of the GCN model and diverse datasets. Besides, the higher‐order information of the joint data, for example, skeleton and dynamic information is not fully utilised. This work proposes a novel multi‐stream adaptive spatial‐temporal attention GCN model that overcomes the aforementioned issues. The method designs a learnable topology graph to adaptively adjust the connection relationship and strength, which is updated with training along with other network parameters. Simultaneously, the adaptive connection parameters are utilised to optimise the connection of the natural skeleton graph and the adaptive topology graph. The spatial‐temporal attention module is embedded in each graph convolution layer to ensure that the network focuses on the more critical joints and frames. A multi‐stream framework is built to integrate multiple inputs, which further improves the performance of the network. The final network achieves state‐of‐the‐art performance on both the NTU‐RGBD and Kinetics‐Skeleton action recognition datasets. The simulation results prove that the proposed method reveals better results than existing methods in all perspectives and that shows the superiority of the proposed method.
- Published
- 2022
10. Gait‐D: Skeleton‐based gait feature decomposition for gait recognition
- Author
-
Shuo Gao, Limin Liu, Jing Yun, and Yumeng Zhao
- Subjects
biometrics ,Biometrics ,Computer science ,business.industry ,feature extraction ,Feature extraction ,Computer applications to medicine. Medical informatics ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,R858-859.7 ,Pattern recognition ,Skeleton (category theory) ,pose estimation ,computer vision ,QA76.75-76.765 ,Gait (human) ,convolutional neural nets ,Feature (computer vision) ,Decomposition (computer science) ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Computer software ,business ,Pose ,video signal processing ,Software - Abstract
The general silhouette‐based gait recognition methods usually rely on binary human silhouette, which is easily affected by external factors, making it unsuitable for situations while wearing heavy clothes or carrying objects, etc. In this study, a new skeleton‐based gait recognition model is proposed. The model first extracts the spatial and temporal features of gait using the space and time relationship between body joints, and second, it eliminates redundant features by decomposing the feature map, to achieve a better recognition accuracy in the presence of external factors. Through abundant experiments on two common datasets, CASIA‐B and OUMVLP‐Pose, the proposed model has been proved to have higher recognition accuracy and remarkable robustness.
- Published
- 2022
11. Real-time skeletonization for sketch-based modeling
- Author
-
Dongliang Zhang, Jin Wang, Jing Ma, and Jituo Li
- Subjects
Computational Geometry (cs.CG) ,FOS: Computer and information sciences ,Computer science ,business.industry ,Computer Science - Human-Computer Interaction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,General Engineering ,Skeleton (category theory) ,Computer Graphics and Computer-Aided Design ,Pipeline (software) ,Graphics (cs.GR) ,Sketch ,Skeletonization ,Human-Computer Interaction (cs.HC) ,Human-Computer Interaction ,Computer Science - Graphics ,Sketch-based modeling ,Polygon ,Character animation ,Computer Science - Computational Geometry ,Domain knowledge ,Computer vision ,Artificial intelligence ,business ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
Skeleton creation is an important phase in the character animation pipeline. However, handcrafting skeleton takes extensive labor time and domain knowledge. Automatic skeletonization provides a solution. However, most of the current approaches are far from real-time and lack the flexibility to control the skeleton complexity. In this paper, we present an efficient skeletonization method, which can be seamlessly integrated into the sketch-based modeling process in real-time. The method contains three steps: local sub-skeleton extraction; sub-skeleton connection; and global skeleton refinement. Firstly, the local skeleton is extracted from the processed polygon stroke and forms a subpart along with the sub-mesh. Then, local sub-skeletons are connected according to the intersecting relationships and the modeling sequence of subparts. Lastly, a global refinement method is proposed to give users coarse-to-fine control on the connected skeleton. We demonstrate the effectiveness of our method on a variety of examples created by both novices and professionals., Shape Modeling International 2021
- Published
- 2022
- Full Text
- View/download PDF
12. Gait Recognition Based on Local Graphical Skeleton Descriptor With Pairwise Similarity Network
- Author
-
Tanfeng Sun, Xinghao Jiang, and Ke Xu
- Subjects
Structure (mathematical logic) ,Sequence ,Similarity (geometry) ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,Skeleton (category theory) ,ENCODE ,Computer Science Applications ,ComputingMethodologies_PATTERNRECOGNITION ,Gait (human) ,Robustness (computer science) ,Signal Processing ,Media Technology ,RGB color model ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
Gait recognition aims to identify a human through a walking sequence. It is a challenging task in computer vision since monocular camera loses most of the 3D information. Previous works described gait features with the contours of shape or the global geometrical characters of skeleton. So little work is researched on the local patterns of gait skeleton. In this paper, to resist the dress changes and speed changes, a Local Graphical Skeleton Descriptor (LGSD) is proposed to describe both the inner and intra local graphical patterns of a human gait skeleton. The gait features from the same or different identities are paired up and a Pairwise Similarity Network (PSN) is proposed to maximize the similarity of True matched pairs and minimize the similarity of false matched pairs. The contributions of our method are: 1) LGSD is proposed to describe human gait by computing four novel local geometrical patterns of skeleton sequences, which makes use of the intuitive cognition of gait based on the prior knowledge of mankind. 2) PSN is implemented by a two-stream CNN structure to encode the features and train the gait model, which fused two popular gait recognition strategies. 3) The robustness of our method to dress changes and speed changes is proved on the public datasets. We have also achieved some state-of-the-art results on these datasets. The proposed method is examined on three public gait datasets which have RGB or infrared frames for evaluation: the CASIA-B dataset, the NLPR gait database, and the CASIA-C dataset. The performers in these datasets are walking under different views, speeds or dresses. The results are further compared with previous approaches to confirm the effectiveness and the advantages of our method.
- Published
- 2022
- Full Text
- View/download PDF
13. Multi-Localized Sensitive Autoencoder-Attention-LSTM For Skeleton-based Action Recognition
- Author
-
Wing W. Y. Ng, Ting Wang, and Mingyang Zhang
- Subjects
Computer science ,business.industry ,Pattern recognition ,Skeleton (category theory) ,Autoencoder ,Computer Science Applications ,Robustness (computer science) ,Signal Processing ,Classifier (linguistics) ,Media Technology ,Key (cryptography) ,Action recognition ,Artificial intelligence ,Sensitivity (control systems) ,Electrical and Electronic Engineering ,Focus (optics) ,business - Abstract
One of key challenges of skeleton-based action recognition (SAR) tasks is the complex nature of human motion patterns. Variations such as performers and viewpoints may impose negative effects to the action recognition accuracy. In this work, we propose the Multi-Localized Sensitive Autoencoder-Attention-LSTM (Multi-LiSAAL) for SAR. The Localized Stochastic Sensitive Autoencoder (LiSSA) encodes both spatial and temporal information, and extracts meaningful features from different parts (four limbs and a trunk) from the skeleton. The LiSSA is trained by minimizing the localized generalization error to enhance the robustness of autoencoders via reducing its sensitivity with respect to small variations in inputs. We apply an attention mechanism to assign different weights to different skeleton parts and focus more on informative sections. Then, a backbone classifier network takes weighted features as inputs to differentiates actions. Experimental results on five public benchmarking datasets show that the Multi-LiSAAL outperforms state-of-the-art methods.
- Published
- 2022
- Full Text
- View/download PDF
14. LAGA-Net: Local-and-Global Attention Network for Skeleton Based Action Recognition
- Author
-
Yanshan Li, Wenhan Luo, and Rongjie Xia
- Subjects
Sequence ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,Skeleton (category theory) ,ENCODE ,Net (mathematics) ,Computer Science Applications ,Discriminative model ,Robustness (computer science) ,Signal Processing ,Media Technology ,Feature (machine learning) ,RGB color model ,Artificial intelligence ,Electrical and Electronic Engineering ,business - Abstract
Skeleton-based action recognition has attracted significant attention and obtained widespread applications due to the robustness of 3D skeleton data. One of the key challenges is how to extract discriminative and robust spatio-temporal features from sparse skeleton data to describe actions and improve recognition accuracy. To address this issue, this paper combines convolutions with attention mechanisms and proposes a deep network for skeleton-based action recognition, termed as local-and-global attention network (LAGA-Net). First, we encode skeleton sequences into joint feature evolution maps to compactly describe the spatial and temporal characteristics of skeleton sequences. Then, a motion guided channel attention module (MGCAM) is proposed to model the interdependencies between feature channels by calculating temporal frame-level motion and enhance motion-salient features in a channel-wise way. Further, a spatio-temporal attention module (STAM) is proposed to model spatio-temporal context-aware collaboration at sequence level and extract spatio-temporal attention features that involve long-range dependencies. Together, MGCAM and STAM are combined to form LAGA-Net, which extracts discriminative features integrating both local and global representations of skeleton sequences. Moreover, a two-stream architecture is proposed to learn complementary features from joint and bone aspects. We conduct extensive experiments to verify the effectiveness and superiority of our proposed method over state-of-the-art approaches on several benchmarks (e.g., NTU RGB+D, Northwestern-UCLA, UTD-MHAD and NTU RGB+D 120).
- Published
- 2022
- Full Text
- View/download PDF
15. Learning to Recognize Human Actions From Noisy Skeleton Data Via Noise Adaptation
- Author
-
Sijie Song, Zongming Guo, Jiaying Liu, and Lilang Lin
- Subjects
business.industry ,Computer science ,Perspective (graphical) ,Skeleton (category theory) ,Space (commercial competition) ,Machine learning ,computer.software_genre ,Computer Science Applications ,Signal Processing ,Media Technology ,Feature (machine learning) ,Action recognition ,Artificial intelligence ,Noise (video) ,Electrical and Electronic Engineering ,business ,Adaptation (computer science) ,computer - Abstract
Recent studies have made great progress on skeleton-based action recognition. However, most of them are developed with relatively clean skeletons without the presence of intensive noise. We argue that the models learned from relatively clean data are not well generalizable to handle noisy skeletons commonly appeared in the real world. In this paper, we address the challenge of recognizing human actions from noisy skeletons, which is seldom explored by previous methods. Beyond exploring the new problem, we further take a new perspective to address it, \textit{i.e.}, noise adaptation, which gets rid of explicit skeleton noise modeling and reliance on skeleton ground truths. Specifically, we develop regression-based and generation-based adaptation models according to whether pairs of noisy skeletons are available. The regression-based model aims to learn noise-suppressed intrinsic feature representations by mapping pairs of noisy skeletons into a noise-robust space. When only unpaired skeletons are accessible, the generation-based model aims to adapt the features from noisy skeletons to a low-noise space by adversarial learning. To verify our proposed model and facilitate research on noisy skeletons, we collect a new dataset Noisy Skeleton Dataset (NSD), the skeletons of which are with much noise and more similar to daily-life data than previous datasets. Extensive experiments are conducted on the NSD, VV-RGBD and N-UCLA datasets, and results consistently show the outstanding performance of our proposed model.
- Published
- 2022
- Full Text
- View/download PDF
16. Length scale control schemes for bi‐directional evolutionary structural optimization method
- Author
-
Wenke Qiu, Liang Xia, Tielin Shi, and Shaomeng Jin
- Subjects
Length scale ,Numerical Analysis ,Computer science ,Applied Mathematics ,Topology optimization ,General Engineering ,Skeleton (category theory) ,Control (linguistics) ,Topology - Published
- 2021
- Full Text
- View/download PDF
17. Dual-Stream Structured Graph Convolution Network for Skeleton-Based Action Recognition
- Author
-
HuChunlong, LiuRong, ZhangTong, YangJian, CuiZhen, and XuChunyan
- Subjects
Theoretical computer science ,Computer Networks and Communications ,Hardware and Architecture ,Computer science ,Graph (abstract data type) ,Action recognition ,DUAL (cognitive architecture) ,Skeleton (category theory) ,Convolution - Abstract
In this work, we propose a dual-stream structured graph convolution network ( DS-SGCN ) to solve the skeleton-based action recognition problem. The spatio-temporal coordinates and appearance contexts of the skeletal joints are jointly integrated into the graph convolution learning process on both the video and skeleton modalities. To effectively represent the skeletal graph of discrete joints, we create a structured graph convolution module specifically designed to encode partitioned body parts along with their dynamic interactions in the spatio-temporal sequence. In more detail, we build a set of structured intra-part graphs, each of which can be adopted to represent a distinctive body part (e.g., left arm, right leg, head). The inter-part graph is then constructed to model the dynamic interactions across different body parts; here each node corresponds to an intra-part graph built above, while an edge between two nodes is used to express these internal relationships of human movement. We implement the graph convolution learning on both intra- and inter-part graphs in order to obtain the inherent characteristics and dynamic interactions, respectively, of human action. After integrating the intra- and inter-levels of spatial context/coordinate cues, a convolution filtering process is conducted on time slices to capture these temporal dynamics of human motion. Finally, we fuse two streams of graph convolution responses in order to predict the category information of human action in an end-to-end fashion. Comprehensive experiments on five single/multi-modal benchmark datasets (including NTU RGB+D 60, NTU RGB+D 120, MSR-Daily 3D, N-UCLA, and HDM05) demonstrate that the proposed DS-SGCN framework achieves encouraging performance on the skeleton-based action recognition task.
- Published
- 2021
- Full Text
- View/download PDF
18. Spatial‐temporal slowfast graph convolutional network for skeleton‐based action recognition
- Author
-
Meng Sun, Tieyong Cao, Xiongwei Zhang, Zheng Fang, and Yunfei Zheng
- Subjects
Theoretical computer science ,Computer science ,business.industry ,Action recognition ,Graph (abstract data type) ,Graph theory ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Skeleton (category theory) ,business ,Software - Published
- 2021
- Full Text
- View/download PDF
19. Generalized zero-shot emotion recognition from body gestures
- Author
-
Jinting Wu, Yujia Zhang, Qianzhong Li, Shiying Sun, and Xiaoguang Zhao
- Subjects
Computer science ,business.industry ,Emotion classification ,Shot (filmmaking) ,Skeleton (category theory) ,computer.software_genre ,Zero (linguistics) ,Body language ,Artificial Intelligence ,Emotional expression ,Artificial intelligence ,business ,computer ,Classifier (UML) ,Natural language processing ,Gesture - Abstract
In human-human interaction, body language is one of the most important emotional expressions. However, each emotion category contains abundant emotional body gestures, and basic emotions used in most researches are difficult to describe complex and diverse emotional states. It is costly to collect sufficient samples of all emotional expressions, and new emotions or new body gestures that are not included in the training set may appear during testing. To address the above problems, we design a novel mechanism that treats each emotion category as a collection of multiple body gesture categories to make better use of gesture information for emotion recognition. A Generalized Zero-Shot Learning (GZSL) framework is introduced to recognize both seen and unseen body gesture categories with the help of semantic information, and emotion predictions are further provided based on the relationship between gestures and emotions. This framework consists of two branches. The first branch is a Hierarchical Prototype Network (HPN) which learns the prototypes of body gestures and uses them to calculate the emotion attentive prototypes. This branch aims to obtain predictions on samples of the seen gesture categories. The second branch is a Semantic Auto-Encoder (SAE) which utilizes semantic representations to predict samples of unseen gesture categories. Thresholds are further trained to determine which branch result will be used during testing, and the emotion labels are finally obtained from these results. Comprehensive experiments are conducted on an emotion recognition dataset which contains skeleton data of multiple body gestures, and the performance of our framework is superior to both the traditional emotion classifier and state-of-the-art zero-shot learning methods.
- Published
- 2021
- Full Text
- View/download PDF
20. Symmetrical Enhanced Fusion Network for Skeleton-Based Action Recognition
- Author
-
Min Jiang, Haoyang Deng, and Jun Kong
- Subjects
Fusion ,Computer science ,business.industry ,Feature extraction ,Pattern recognition ,Skeleton (category theory) ,Field (computer science) ,Data modeling ,Media Technology ,Feature (machine learning) ,Graph (abstract data type) ,Artificial intelligence ,Electrical and Electronic Engineering ,Focus (optics) ,business - Abstract
A novel method for skeleton-based action recognition by fusing multi-level spatial features and multi-level temporal features is proposed in this article. Recently, Graph Convolutional Network (GCN) for skeleton-based action recognition has attracted the eyes of many researchers and has a great performance in the field of action recognition. But most of them focus on changing architecture of single-stream network and only use simple methods like average fusion to fuse different forms of skeleton data. In this article, we shift the focus to the problem that insufficient interactions between the different forms of features for that networks are unable to fully capture efficient information from skeleton data. To tackle this problem, we propose a multi-stream network called Symmetrical Enhanced Fusion Network (SEFN). The network is composed of a spatial stream, a temporal stream and a fusion stream. The spatial stream extracts spatial features from skeleton data by GCN. The temporal stream is able to extract temporal features from skeleton data with the help of the embedded Motion Sequence Calculation Algorithm. The fusion stream provides an early fusion method and extra fusion information for the whole network. It gathers multi-level features from two feature extractions and fuses them with the Multi-perspective Attention Fusion Module (MPAFM) we propose. The MPAFM enables different forms of data to enhance each other and can strengthen feature extractions. In the final, we generalize the skeleton data from joint data to bone data and evaluate our network in three large-scale benchmarks: NTU-RGBD, NTU-RGBD 120 and Kinetics-Skeleton. Experiment results demonstrate that our method achieves competitive performance.
- Published
- 2021
- Full Text
- View/download PDF
21. Fusion of Skeleton and Inertial Data for Human Action Recognition Based on Skeleton Motion Maps and Dilated Convolution
- Author
-
Mingshu He, Tianqi Lv, Xiaojuan Wang, Lei Jin, and Ziliang Gan
- Subjects
Modality (human–computer interaction) ,Computer science ,business.industry ,Deep learning ,Skeleton (category theory) ,Convolutional neural network ,Convolution ,Data set ,Activity recognition ,Computer vision ,Artificial intelligence ,Electrical and Electronic Engineering ,F1 score ,business ,Instrumentation - Abstract
Human activity recognition (HAR) has become a hot research topic, due to its wide application prospect. Fusion-based methods can be used to complement single sensing modality methods. This paper presents the simultaneous utilization of skeleton data and inertial signals—which are captured at the same time using a Kinect depth camera and ten wearable inertial sensors —within a fusion framework, in order to achieve more robust human action recognition, compared to situations where each sensing modality is used individually. Skeleton data captured by the Kinect depth camera are transformed into a weighted front-view skeleton motion map (WF-SMM), a weighted multi-view skeleton motion map (WM-SMM), and a 3D weighted skeleton motion map (3DW-SMM), which are then fed as inputs into a convolutional neural network. Meanwhile, the inertial data are transformed into 2D inertial images, then fed into a 2D dilated convolutional neural network. Two types of fusion are considered: decision-level fusion and feature-level fusion. Experiments were conducted using the publicly available Changzhou University Multimodal Human Action Data set (CZU-MHAD), in which simultaneous skeleton sequence and inertial signals were captured for a total of 22 actions. The results obtained indicate that both the decision- and feature-level fusion approaches generate higher recognition accuracies, compared to the approaches where each sensing modality is used individually. The highest accuracy (of 98.90%) was obtained with the decision-level fusion approach using 3DW-SMM. In addition, some experiments are conducted on the continuous action streams generated based on the CZU-MHAD with different score threshold. And the highest f1 score 82.05 % is obtained with the threshold of 0.4.
- Published
- 2021
- Full Text
- View/download PDF
22. Integrating vertex and edge features with Graph Convolutional Networks for skeleton-based action recognition
- Author
-
Kai Liu, Lei Gao, Lin Qi, Naimul Mefraz Khan, and Ling Guan
- Subjects
Conditional random field ,0209 industrial biotechnology ,business.industry ,Computer science ,Cognitive Neuroscience ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,02 engineering and technology ,Function (mathematics) ,Skeleton (category theory) ,Computer Science Applications ,Convolution ,Vertex (geometry) ,020901 industrial engineering & automation ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Graph (abstract data type) ,RGB color model ,020201 artificial intelligence & image processing ,Artificial intelligence ,Enhanced Data Rates for GSM Evolution ,business ,MathematicsofComputing_DISCRETEMATHEMATICS - Abstract
Methods based on Graph Convolutional Networks (GCN) for skeleton-based action recognition have achieved great success due to their ability to exploit graph structural information from skeleton data. Recently, the bone information has attracted considerable attention as an effective modality which complements the more conventional joint information for action recognition. However, most existing GCN-based methods extract the joint and bone features with two separate GCN networks, ignoring the dependencies between them. In this paper, a novel GCN model is proposed to exploit the information across joints, bones and their relationship collaboratively on a single undirected graph instead of two separate networks. We call the proposed model Vertex-Edge Graph Convolutional Network (VE-GCN) since it conducts the graph convolution operation on the sampling area containing the designated vertexes from joints and edges from bones, respectively. In addition to conducting the Vertex-Edge graph convolution based on the physical connections of the skeleton, we further apply the Vertex-Edge graph convolution to the non-physical joint-joint and joint-bone connections to capture the distal dependencies, and then the convolution results on the non-physical connections are incorporated into the VE-GCN. Moreover, the Conditional Random Field (CRF) is adopted as the loss function to achieve the task of action recognition. Experimental results on four challenging benchmarks (NTU RGB+D, NTU RGB+D 120, N-UCLA, SYSU) show that the proposed model achieves state-of-the-art performance.
- Published
- 2021
- Full Text
- View/download PDF
23. Fusion in Dissimilarity Space Between RGB D and Skeleton for Person Re Identification
- Author
-
Mahmudul Hasan, Amran Bhuiyan, and Kamal Uddin
- Subjects
Fusion ,General Computer Science ,Dissimilarity space ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,Skeleton (category theory) ,Re identification ,Mechanics of Materials ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,ComputingMethodologies_COMPUTERGRAPHICS ,Civil and Structural Engineering - Abstract
Person re-identification (Re-id) is one of the important tools of video surveillance systems, which aims to recognize an individual across the multiple disjoint sensors of a camera network. Despite the recent advances on RGB camera-based person re-identification methods under normal lighting conditions, Re-id researchers fail to take advantages of modern RGB-D sensor-based additional information (e.g. depth and skeleton information). When traditional RGB-based cameras fail to capture the video under poor illumination conditions, RGB-D sensor-based additional information can be advantageous to tackle these constraints. This work takes depth images and skeleton joint points as additional information along with RGB appearance cues and proposes a person re-identification method. We combine 4-channel RGB-D image features with skeleton information using score-level fusion strategy in dissimilarity space to increase re-identification accuracy. Moreover, our propose method overcomes the illumination problem because we use illumination invariant depth image and skeleton information. We carried out rigorous experiments on two publicly available RGBD-ID re-identification datasets and proved the use of combined features of 4-channel RGB-D images and skeleton information boost up the rank 1 recognition accuracy.
- Published
- 2021
- Full Text
- View/download PDF
24. Practical 3D human skeleton tracking based on multi-view and multi-Kinect fusion
- Author
-
Ching-Chun Hsiao, Wen-Huang Cheng, Ching-Chun Huang, and Manh-Hung Nguyen
- Subjects
Propagation of uncertainty ,Computer Networks and Communications ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Probabilistic logic ,Tracking system ,Skeleton (category theory) ,Tracking (particle physics) ,Computer graphics ,Consistency (database systems) ,Human skeleton ,medicine.anatomical_structure ,Hardware and Architecture ,Media Technology ,medicine ,Computer vision ,Artificial intelligence ,business ,Software ,ComputingMethodologies_COMPUTERGRAPHICS ,Information Systems - Abstract
In this paper, we proposed a multi-view system for 3D human skeleton tracking based on multi-cue fusion. Multiple Kinect version 2 cameras are applied to build up a low-cost system. Though Kinect cameras can detect 3D skeleton from their depth sensors, some challenges of skeleton extraction still exist, such as left–right confusion and severe self-occlusion. Moreover, human skeleton tracking systems often have difficulty in dealing with lost tracking. These challenges make robust 3D skeleton tracking nontrivial. To address these challenges in a unified framework, we first correct the skeleton's left–right ambiguity by referring to the human joints extracted by OpenPose. Unlike Kinect, and OpenPose extracts target joints by learning-based image analysis to differentiate a person's front side and backside. With help from 2D images, we can correct the left–right skeleton confusion. On the other hand, we find that self-occlusion severely degrades Kinect joint detection owing to incorrect joint depth estimation. To alleviate the problem, we reconstruct a reference 3D skeleton by back-projecting the corresponding 2D OpenPose joints from multiple cameras. The reconstructed joints are less sensitive to occlusion and can be served as 3D anchors for skeleton fusion. Finally, we introduce inter-joint constraints into our probabilistic skeleton tracking framework to trace all joints simultaneously. Unlike conventional methods that treat each joint individually, neighboring joints are utilized to position each other. In this way, when joints are missing due to occlusion, the inter-joint constraints can ensure the skeleton consistency and preserve the length between neighboring joints. In the end, we evaluate our method with five challenging actions by building a real-time demo system. It shows that the system can track skeletons stably without error propagation and vibration. The experimental results also reveal that the average localization error is smaller than that of conventional methods.
- Published
- 2021
- Full Text
- View/download PDF
25. Robust gait recognition using hybrid descriptors based on Skeleton Gait Energy Image
- Author
-
Wankou Yang, Qiang Wu, Jian Zhang, Worapan Kusakunniran, Zhenmin Tang, and Lingxiang Yao
- Subjects
Computer science ,business.industry ,Pattern recognition ,Skeleton (category theory) ,Gait ,Image (mathematics) ,Identification (information) ,Gait (human) ,Artificial Intelligence ,Robustness (computer science) ,Signal Processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Representation (mathematics) ,Software ,Energy (signal processing) - Abstract
Gait features have been widely applied in human identification. The commonly-used representations for gait recognition can be roughly classified into two categories: model-free features and model-based features. However, due to the view variances and clothes changes, model-free features are sensitive to the appearance changes. For model-based features, there is great difficulty in extracting the underlying models from gait sequences. Based on the confidence maps and the part affinity fields produced by a two-branch multi-stage CNN network, a new model-based representation, Skeleton Gait Energy Image (SGEI), has been proposed in this paper. Another contribution is that a hybrid representation has been produced, which uses SGEI to remedy the deficiency of model-free features, Gait Energy Image (GEI) for instance. The experimental performances indicate that our proposed methods are more robust to the cloth changes, and contribute to increasing the robustness of gait recognition in the unconstrained environments with view variances and clothes changes.
- Published
- 2021
- Full Text
- View/download PDF
26. Skeletonization via Local Separators
- Author
-
Eva Rotenberg and Andreas Bærentzen
- Subjects
Computational Geometry (cs.CG) ,FOS: Computer and information sciences ,Computer science ,Computation ,Vertex separator ,Skeleton (category theory) ,Computer Graphics and Computer-Aided Design ,Graphics (cs.GR) ,Skeletonization ,Computer Science - Graphics ,Computer Science - Computational Geometry ,Graph (abstract data type) ,Algorithm ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
We propose a new algorithm for curve skeleton computation which differs from previous algorithms by being based on the notion of local separators. The main benefits of this approach are that it is able to capture relatively fine details and that it works robustly on a range of shape representations. Specifically, our method works on shape representations that can be construed as a spatially embedded graphs. Such representations include meshes, volumetric shapes, and graphs computed from point clouds. We describe a simple pipeline where geometric data is initially converted to a graph, optionally simplified, local separators are computed and selected, and finally a skeleton is constructed. We test our pipeline on polygonal meshes, volumetric shapes, and point clouds. Finally, we compare our results to other methods for skeletonization according to performance and quality., Comment: preprint, 25 pages, 17 figures
- Published
- 2021
- Full Text
- View/download PDF
27. An improved $$\ell _1$$ median model for extracting 3D human body curve-skeleton
- Author
-
Chen Lufei, Shaofan Wang, Baocai Yin, Yong Zhang, and Fei Tan
- Subjects
Computer Networks and Communications ,business.industry ,Computer science ,Frame (networking) ,Pattern recognition ,Animation ,Skeleton (category theory) ,Image (mathematics) ,Hardware and Architecture ,Media Technology ,Multimedia information systems ,Artificial intelligence ,business ,Pose ,Computer communication networks ,Software ,Interpolation - Abstract
Three-dimensional human body curve-skeleton is widely used in pose estimation, skeleton animation and other fields. This paper proposes an improved $$\ell _1$$ median model that can extract three-dimensional human body curve-skeleton. The model includes three-dimensional human body reconstruction from multi-view images, interpolation curve-skeleton extraction, $$\ell _1$$ median skeleton completion, and continuous frame curve-skeleton optimization. Through the completion and optimization processes, the curve-skeleton we extract is smoother and more complete compared with previous methods. We conduct experiments on multi-view human body image dataset collected from light field acquisition system. Both quantitative and qualitative results demonstrate the effectiveness of our model.
- Published
- 2021
- Full Text
- View/download PDF
28. Focus on temporal graph convolutional networks with unified attention for skeleton-based action recognition
- Author
-
Bing-Kun Gao, Yun-Ze Bi, Hongbo Bi, and Le Dong
- Subjects
business.industry ,Computer science ,Pattern recognition ,Skeleton (category theory) ,Action (philosophy) ,Artificial Intelligence ,Basic block ,Action recognition ,Graph (abstract data type) ,Artificial intelligence ,business ,Focus (optics) ,Spatial analysis ,Communication channel - Abstract
Graph convolutional networks (GCN) have received more and more attention in skeleton-based action recognition. Many existing GCN models pay more attention to spatial information and ignore temporal information, but the completion of actions must be accompanied by changes in temporal information. Besides, the channel, spatial, and temporal dimensions often contain redundant information. In this paper, we design a temporal graph convolutional network (FTGCN) module which can concentrate more temporal information and properly balance them for each action. In order to better integrate channel, spatial and temporal information, we propose a unified attention model of the channel, spatial and temporal (CSTA). A basic block containing these two novelties is called FTC-GCN. Extensive experiments on two large-scale datasets, compared with 17 methods on NTU-RGB+D and 8 methods on Kinetics-Skeleton, show that for skeleton-based human action recognition, our method achieves the best performance.
- Published
- 2021
- Full Text
- View/download PDF
29. Online human action recognition with spatial and temporal skeleton features using a distributed camera network
- Author
-
Ze Ji, Guohui Tian, Qinghui Zhang, Yichao Cao, and Guoliang Liu
- Subjects
Human-Computer Interaction ,Camera network ,Artificial Intelligence ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Action recognition ,Computer vision ,Artificial intelligence ,Skeleton (category theory) ,business ,Software ,Theoretical Computer Science - Abstract
Online action recognition is an important task for human-centered intelligent services. However, it remains a highly challenging problem due to the high varieties and uncertainties of spatial and temporal scales of human actions. In this paper, the following core ideas are proposed to deal with the online action recognition problem. First, we combine spatial and temporal skeleton features to represent human actions, which include not only geometrical features, but also multiscale motion features, such that both spatial and temporal information of the actions are covered. We use an efficient one-dimensional convolutional neural network to fuse spatial and temporal features and train them for action recognition. Second, we propose a group sampling method to combine the previous action frames and current action frames, which are based on the hypothesis that the neighboring frames are largely redundant, and the sampling mechanism ensures that the long-term contextual information is also considered. Third, the skeletons from multiview cameras are fused in a distributed manner, which can improve the human pose accuracy in the case of occlusions. Finally, we propose a Restful style based client-server service architecture to deploy the proposed online action recognition module on the remote server as a public service, such that camera networks for online action recognition can benefit from this architecture due to the limited onboard computational resources. We evaluated our model on the data sets of JHMDB and UT-Kinect, which achieved highly promising accuracy levels of 80.1% and 96.9%, respectively. Our online experiments show that our memory group sampling mechanism is far superior to the traditional sliding window.
- Published
- 2021
- Full Text
- View/download PDF
30. Scene image and human skeleton-based dual-stream human action recognition
- Author
-
Yibin Li, Chengjin Zhang, Yong Song, Qingyang Xu, Xianfeng Yuan, and Wanqiang Zheng
- Subjects
business.industry ,Computer science ,Frame (networking) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Optical flow ,DUAL (cognitive architecture) ,Skeleton (category theory) ,Expression (mathematics) ,Convolution ,Human skeleton ,medicine.anatomical_structure ,Artificial Intelligence ,Signal Processing ,medicine ,Graph (abstract data type) ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Software ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
The dual stream-based human action recognition model offers the advantage of high recognition accuracy, but the algorithm is less robust in case of lighting changes. The human skeleton has a strong ability to express human behavior and actions; however, the scene information is ignored. Drawing on the idea of the dual-stream model, this paper proposes a human skeleton and scene image-based dual-stream model for human action recognition. The motion features are extracted through the spatio-temporal graph convolution of the human skeleton, and a scene recognition model is proposed based on the sparse frame sampling of video and video-level consensus strategy to process the scene video and gather the visual scene information. The proposed model exploits the advantages of skeleton information in motion expression and the superiority of the image in scene presentation. The scene information and spatio-temporal graph convolution-based human skeleton limbs are fused complementarily to achieve human action recognition. Compared to the conventional optical flow-based dual-stream action recognition method, this model is verified by experimenting under unstable light conditions, and the performance of human action recognition is robust and promising.
- Published
- 2021
- Full Text
- View/download PDF
31. Towards a deep human activity recognition approach based on video to image transformation with skeleton data
- Author
-
Mourad Zaied, Nozha Jlidi, Olfa Jemai, Tahani Bouchrika, and Ahmed Snoun
- Subjects
Computer Networks and Communications ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,Construct (python library) ,Skeleton (category theory) ,Image (mathematics) ,Activity recognition ,Superposition principle ,Hardware and Architecture ,Media Technology ,Body joints ,Image transformation ,Artificial intelligence ,Transfer of learning ,business ,Software - Abstract
One of the most recent challenging tasks in computer vision is Human Activity Recognition (HAR), which aims to analyze and detect the human actions for the benefit of many fields such as video surveillance, behavior analysis and healthcare. Several works in the literature are based on the extraction and analysis of human skeletons in the aim of actions recognition. This paper introduces a new HAR approach based on the extraction of human skeletons from videos. Three features extraction techniques are proposed in this work. They used the extracted skeletons from the videos frames in order to construct a single image that summarizes the activity in that video. The first technique, called dynamic skeleton, is founded on the concept of dynamic images introduced in the literature, while the second one, called skeleton superposition, is based on the superposition of the extracted human skeletons in the same image. The third contribution is called body articulations and it uses only the body joints instead of the whole skeleton in order to recognize the ongoing activity. The obtained images from these three techniques are analyzed and classified using a classification system based on transfer learning principle by fine-tuning three well-known pre-trained CNNs (MobileNet, ResNet-50, VGG16). The designed system is validated and tested on two famous datasets for human activity recognition, which are RGBD-HuDact and KTH datasets. The obtained results are outstanding and proved that the implemented system outperforms the state-of-the-art approaches.
- Published
- 2021
- Full Text
- View/download PDF
32. Two-stream adaptive-attentional subgraph convolution networks for skeleton-based action recognition
- Author
-
Fengda Zhao, Fengwei Lou, Xianshan Li, Rong Jing, Dingding Guo, and Fengchan Meng
- Subjects
Computer Networks and Communications ,Computer science ,business.industry ,Pattern recognition ,Skeleton (category theory) ,Domain (software engineering) ,Convolution ,Human skeleton ,medicine.anatomical_structure ,Hardware and Architecture ,Media Technology ,medicine ,Graph (abstract data type) ,Artificial intelligence ,business ,Focus (optics) ,Software ,Communication channel ,Block (data storage) - Abstract
Recently, skeleton-based action recognition has modeled the human skeleton as a graph convolution network (GCN), and has achieved remarkable results. However, most of the methods convolute directly on the whole graph, neglecting that the human skeleton is made up of multiple body parts, which cannot accomplish the task well. We recognize that the physical property of bones (i.e., length and direction) can provide identifiable information which helps effectively to build the multi-level network structure. As the existing methods treat the channel domain and the spatial domain with equal importance, many computing resources are wasted on neglectable features. In our paper, we modify the Convolution Block Attention Module (CBAM) and apply it to the adaptive network. By capturing the implicit weighted information in the channel domain and spatial domain, the network can focus more attention on the key channels and nodes. A new two-stream adaptive-attentional subgraph convolution network (2s-AASGCN) is proposed to extract features in the spatio-temporal domain. We validate 2s-AASGCN on two skeleton datasets, i.e., NTU-RGB+D60 and NTU-RGB+D120. Our model achieves excellent results on these two datasets.
- Published
- 2021
- Full Text
- View/download PDF
33. Hierarchical Extraction of Skeleton Structures from Discrete Buildings
- Author
-
Xiao Wang and Dirk Burghardt
- Subjects
Cartographic generalization ,business.industry ,Computer science ,Process (computing) ,Extraction (military) ,Pattern recognition ,Artificial intelligence ,Network theory ,Skeleton (category theory) ,business ,Centrality ,Earth-Surface Processes - Abstract
Map generalization is a process of hierarchically reorganizing features whereby the global shape of the original datasets can be transferred in different scales. We propose a stroke and centrality-...
- Published
- 2021
- Full Text
- View/download PDF
34. Computing Melodic Templates in Oral Music Traditions
- Author
-
José Miguel Díaz-Báñez, Sergey Bereg, Nadine Kroher, and Inmaculada Ventura
- Subjects
Melody ,FOS: Computer and information sciences ,0209 industrial biotechnology ,Sound (cs.SD) ,Optimization problem ,Computer science ,02 engineering and technology ,Variation (game tree) ,Skeleton (category theory) ,computer.software_genre ,Computer Science - Sound ,Computer Science - Information Retrieval ,Piecewise linear function ,Set (abstract data type) ,020901 industrial engineering & automation ,Audio and Speech Processing (eess.AS) ,0202 electrical engineering, electronic engineering, information engineering ,FOS: Electrical engineering, electronic engineering, information engineering ,Continuous function ,business.industry ,Applied Mathematics ,020206 networking & telecommunications ,Term (logic) ,Computational Mathematics ,Artificial intelligence ,business ,computer ,Natural language processing ,Information Retrieval (cs.IR) ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
The term melodic template or skeleton refers to a basic melody which is subject to variation during a music performance. In many oral music tradition, these templates are implicitly passed throughout generations without ever being formalized in a score. In this work, we introduce a new geometric optimization problem, the spanning tube problem, to approximate a melodic template for a set of labeled performance transcriptions corresponding to an specific style in oral music traditions. Given a set of $n$ piecewise linear functions, we solve the problem of finding a continuous function, $f^*$, and a minimum value, $\varepsilon^*$, such that, the vertical segment of length $2\varepsilon^*$ centered at $(x,f^*(x))$ intersects at least $p$ functions ($p\leq n$). The method explored here also provide a novel tool for quantitatively assess the amount of melodic variation which occurs across performances.
- Published
- 2022
35. The Log Skeleton Visualizer in ProM 6.9: The winning contribution to the process discovery contest 2019
- Author
-
H. M. W. Verbeek, Process Science, and Process Analytics
- Subjects
Event logs ,Theoretical computer science ,Relation (database) ,Computer science ,Event (computing) ,020208 electrical & electronic engineering ,Process discovery ,Process mining ,020206 networking & telecommunications ,02 engineering and technology ,Skeleton (category theory) ,Field (computer science) ,Business process discovery ,Set (abstract data type) ,Theory of computation ,0202 electrical engineering, electronic engineering, information engineering ,Process discovery contest ,Software ,Log skeletons ,Information Systems - Abstract
Process discovery is an important area in the field of process mining. To help advance this area, a process discovery contest (PDC) has been set up, which allows us to compare different approaches. At the moment of writing, there have been three instances of the PDC: in 2016, in 2017, and in 2019. This paper introduces the winning contribution to the PDC 2019, called the Log Skeleton Visualizer. This visualizer uses a novel type of process models called log skeletons. In contrast with many workflow net-based discovery techniques, these log skeletons do not rely on the directly follows relation. As a result, log skeletons offer circumstantial information on the event log at hand rather than only sequential information. Using this visualizer, we were able to classify 898 out of 900 traces correctly for the PDC 2019 and to win this contest.
- Published
- 2022
36. Combining skeleton and accelerometer data for human fine-grained activity recognition and abnormal behaviour detection with deep temporal convolutional networks
- Author
-
Van-Toi Nguyen, Cuong Pham, Ngon Nguyen, Linh T. Nguyen, and Anh Duy Nguyen
- Subjects
Modality (human–computer interaction) ,Computer Networks and Communications ,business.industry ,Computer science ,STRIDE ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Skeleton (category theory) ,Convolution ,Activity recognition ,Acceleration ,Hardware and Architecture ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,Feature (machine learning) ,Artificial intelligence ,business ,Feature learning ,Software - Abstract
Single sensing modality is widely adopted for human activity recognition (HAR) for decades and it has made a significant stride. However, it often suffers from challenges such as noises, obstacles, or dropped signals, which might negatively impact on the recognition performance. In this paper, we propose a multi-sensing modality framework for human fine-grained activity recognition and abnormal behaviour detection by combining skeleton and acceleration data at feature level (so-called feature-level fusion). Firstly, deep temporal convolutional networks (TCN), consisting of the dilated causal convolution components, are utilized for feature learning and handling temporal properties. The feature map learnt and represented with convolutional layers in TCN is fed into two fully connected layers for the prediction. Secondly, we conduct an empirical experiment to verify our proposed method. Experimental results have shown that the proposed method could achieve 83% F1-score and surpassed several single modality models as well as early and late fusion methods on the Continuous Multimodal Multi-view Dataset of Human Fall Dataset (CMDFALL), comprised of 20 fine-grained normal and abnormal activities collected from 50 subjects. Moreover, our proposed architecture achieves 96.98% accuracy on the UTD-MHAD dataset, which has 8 subjects and 27 activities. These results indicate the effectiveness of our proposed method for the classification of human fine-grained normal and abnormal activities as well as the potential for HAR-based situated service applications.
- Published
- 2021
- Full Text
- View/download PDF
37. Cultural Heritage Use in the Twenty-first Century: The Politics of a Sami Skeleton Reburial
- Author
-
Anders Hansson
- Subjects
Cultural heritage ,Archeology ,Politics ,Computer science ,Twenty-First Century ,Ancient history ,Skeleton (category theory) - Published
- 2021
- Full Text
- View/download PDF
38. Predictively encoded graph convolutional network for noise-robust skeleton-based action recognition
- Author
-
Jongmin Yu, Yongsang Yoon, and Moongu Jeon
- Subjects
Current (mathematics) ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,02 engineering and technology ,Mutual information ,Skeleton (category theory) ,Interference (wave propagation) ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Key (cryptography) ,Action recognition ,Graph (abstract data type) ,020201 artificial intelligence & image processing ,Noise (video) ,Artificial intelligence ,business ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
In skeleton-based action recognition, graph convolutional networks (GCNs), which model human body skeletons using graphical components such as nodes and connections, have recently achieved remarkable performance. While the current state-of-the-art methods for skeleton-based action recognition usually assume that completely observed skeletons will be provided, it is problematic to realize this assumption in real-world scenarios since the captured skeletons may be incomplete or noisy. In this work, we propose a skeleton-based action recognition method that is robust to noise interference for the given skeleton features. The key insight of our approach is to train a model by maximizing the mutual information between normal and noisy skeletons using predictive coding in the latent space. We conducted comprehensive skeleton-based action recognition experiments with defective skeletons using the NTU-RGB+D and Kinetics-Skeleton datasets. The experimental results demonstrate that when the skeleton samples are noisy, our approach achieves outstanding performances compared with the existing state-of-the-art methods.
- Published
- 2021
- Full Text
- View/download PDF
39. Efficient Retrieval of Human Motion Episodes Based on Indexed Motion-Word Representations
- Author
-
Jan Horvath, Jan Sedmidubsky, Pavel Zezula, and Petra Budikova
- Subjects
Linguistics and Language ,Computer Networks and Communications ,business.industry ,Computer science ,Search engine indexing ,02 engineering and technology ,Skeleton (category theory) ,Human motion ,Motion (physics) ,Computer Science Applications ,Artificial Intelligence ,020204 information systems ,Similarity (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,Focus (optics) ,business ,Software ,Word (computer architecture) ,Information Systems - Abstract
With the increasing availability of human motion data captured in the form of 2D or 3D skeleton sequences, more complex motion recordings need to be processed. In this paper, we focus on similarity-based indexing and efficient retrieval of motion episodes — medium-sized skeleton sequences that consist of multiple semantic actions and correspond to some logical motion unit (e.g. a figure skating performance). As a first step toward efficient retrieval, we apply the motion-word technique to transform spatio-temporal skeleton sequences into compact text-like documents. Based on these documents, we introduce a two-phase retrieval scheme that first finds a set of candidate query results and then re-ranks these candidates with more expensive application-specific methods. We further index the motion-word documents using inverted files, which allows us to retrieve the candidate documents in an efficient and scalable manner. We also propose additional query-reduction techniques that accelerate both the retrieval phases by removing semantically irrelevant parts of the motion query. Experimental evaluation is used to analyze the effects of the individual proposed techniques on the retrieval efficiency and effectiveness.
- Published
- 2021
- Full Text
- View/download PDF
40. The Smart Skeleton: an open-source, interactive tool for teaching muscle actions and joint movements
- Author
-
John M. Pattillo
- Subjects
020205 medical informatics ,Physiology ,Computer science ,Software tool ,02 engineering and technology ,Skeleton (category theory) ,Education ,Software ,Human–computer interaction ,0202 electrical engineering, electronic engineering, information engineering ,Humans ,Students ,Skeleton ,ComputingMethodologies_COMPUTERGRAPHICS ,business.industry ,Muscles ,Teaching ,05 social sciences ,050301 education ,General Medicine ,Test (assessment) ,Open source ,Joint (building) ,business ,0503 education - Abstract
This paper describes the design, construction, and use of an open-source hardware and software tool intended to help Anatomy and Physiology students test their knowledge of muscle actions and joint movements. Orientation sensors are attached to a model skeleton to turn the skeleton into an interactive, physical model for teaching limb movements. A detailed description of the construction of the tool is provided, as well as the configuration and use of companion software.
- Published
- 2021
- Full Text
- View/download PDF
41. Compact joints encoding for skeleton-based dynamic hand gesture recognition
- Author
-
Dongyang Ma, Yangke Li, Yuhang Yu, Guangshun Wei, and Yuanfeng Zhou
- Subjects
Convex hull ,Computer science ,business.industry ,General Engineering ,020207 software engineering ,02 engineering and technology ,Skeleton (category theory) ,Computer Graphics and Computer-Aided Design ,Human-Computer Interaction ,Gesture recognition ,Encoding (memory) ,0202 electrical engineering, electronic engineering, information engineering ,Feature (machine learning) ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,Motion perception ,business ,Pose ,Gesture - Abstract
With the development of 3D hand pose estimation technologies, skeleton-based dynamic hand gesture recognition has attracted widespread attention. In this paper, we propose a novel framework for skeleton-based dynamic hand gesture recognition. In the spatial perception stream (SP-Stream), we design a compact joints encoding method. It can adaptively select compact joints based on the convex hull of the hand skeleton and encode them into a skeleton image for fully extracting spatial features. Besides, we present a global enhancement module (GEM) to enhance key feature maps. In the temporal perception stream (TP-Stream), we propose a motion perception module (MPM) to enhance the notable movement of hand gestures on X / Y / Z coordinate axes. Experimental results show that the proposed framework performs better than the state-of-the-art methods on two benchmark datasets.
- Published
- 2021
- Full Text
- View/download PDF
42. Sign language recognition using Kinect sensor based on color stream and skeleton points
- Author
-
Isack Bulugu
- Subjects
Discriminative model ,Computer science ,business.industry ,Feature vector ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Recognition system ,Pattern recognition ,Artificial intelligence ,Sign language ,Skeleton (category theory) ,business ,Dictionary learning ,Sign (mathematics) - Abstract
This paper presents a sign language recognition system based on color stream and skeleton points. Several approaches have been established to address sign language recognition problems. However, most of the previous approaches still have poor recognition accuracy. The proposed approach uses Kinect sensor based on color stream and skeleton points from the depth stream to improved recognition accuracy. Techniques within this approach use hand trajectories and hand shapes in combating sign recognition challenges. Therefore, for a particular sign a representative feature vector is extracted, which consists of hand trajectories and hand shapes. A sparse dictionary learning algorithm, Label Consistent K-SVD (LC-KSVD) is applied to obtain a discriminative dictionary. Based on that, the system was further developed to a new classification approach for better results. The proposed system was fairly evaluated based on 21 sign words including one-handed signs and two-handed signs. It was observed that the proposed system gets high recognition accuracy of 98.25%, and obtained an average accuracy of 95.34% for signer independent recognition. Keywords: Sign language, Color stream, Skeleton points, Kinect sensor, Discriminative dictionary.
- Published
- 2021
- Full Text
- View/download PDF
43. Adaptive most joint selection and covariance descriptions for a robust skeleton-based human action recognition
- Author
-
Dinh-Tan Pham, Thi-Lan Le, Van-Toi Nguyen, Tien-Nam Nguyen, and Hai Vu
- Subjects
Computer Networks and Communications ,Computer science ,business.industry ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Skeleton (category theory) ,Covariance ,Human skeleton ,medicine.anatomical_structure ,Hardware and Architecture ,Robustness (computer science) ,Position (vector) ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,medicine ,Feature (machine learning) ,Benchmark (computing) ,Artificial intelligence ,business ,Representation (mathematics) ,Software - Abstract
In this paper, we propose two effective manners of utilizing skeleton data for human action recognition (HAR). The proposed method on one hand takes advantage of the skeleton data thanks to their robustness to human appearance change as well as the real-time performance. On the other hand, it avoids inherent drawbacks of the skeleton data such as noises, incorrect human skeleton estimation due to self-occlusion of human pose. To this end, in terms of feature designing, we propose to extract covariance descriptors from joint velocity and combine them with those of joint position. In terms of 3-D skeleton-based activity representation, we propose two schemes to select the most informative joints. The proposed method is evaluated on two benchmark datasets. On the MSRAction-3D dataset, the proposed method outperformed different hand-designed features-based methods. On the challenging dataset CMDFall, the proposed method significantly improves accuracy when compared with techniques based on recent neuronal networks. Finally, we investigate the robustness of the proposed method via a cross-dataset evaluation.
- Published
- 2021
- Full Text
- View/download PDF
44. Reisolation and Structure Revision of Asperspiropene A
- Author
-
Hwa-Sun Lee, Chang-Su Heo, Van Anh Cao, Hee Jae Shin, and Byeoung-Kyu Choi
- Subjects
Aquatic Organisms ,Computer science ,Structure (category theory) ,Pharmaceutical Science ,Decane ,Skeleton (category theory) ,01 natural sciences ,Analytical Chemistry ,chemistry.chemical_compound ,Computational chemistry ,Drug Discovery ,Animals ,Spiro Compounds ,Pharmacology ,Biological Products ,Molecular Structure ,010405 organic chemistry ,Organic Chemistry ,Absolute configuration ,Porifera ,0104 chemical sciences ,010404 medicinal & biomolecular chemistry ,Aspergillus ,Complementary and alternative medicine ,chemistry ,Molecular Medicine ,Two-dimensional nuclear magnetic resonance spectroscopy - Abstract
Asperspiropene A was originally reported to have a unique 1,8-dioxaspiro[4.5]decane skeleton. During the course of our ongoing research for novel marine natural products, we isolated compound 1, which has identical 1D and 2D NMR data to asperspiropene A. Detailed and careful analysis of spectroscopic data led us to revise the structure of asperspiropene A and to determine its absolute configuration.
- Published
- 2021
- Full Text
- View/download PDF
45. Skeleton-Based Parametric 2-D Region Representation: Disk B-Spline Curves
- Author
-
Feng Tian, Xingce Wang, Shaolong Liu, Hock Soon Seah, Zhongke Wu, and Quan Chen
- Subjects
Computer science ,B-spline ,Boundary (topology) ,020207 software engineering ,02 engineering and technology ,Skeleton (category theory) ,Object (computer science) ,Computer Graphics and Computer-Aided Design ,Medial axis ,0202 electrical engineering, electronic engineering, information engineering ,Representation (mathematics) ,De Boor's algorithm ,Algorithm ,Software ,Parametric statistics - Abstract
The skeleton, or medial axis, is an important attribute of 2-D shapes. The disk B-spline curve (DBSC) is a skeleton-based parametric freeform 2-D region representation, which is defined in the B-spline form. The DBSC describes not only a 2-D region, which is suitable for describing heterogeneous materials in the region, but also the center curve (skeleton) of the region explicitly, which is suitable for animation, simulation, and recognition. In addition to being useful for error estimation of the B-spline curve, the DBSC can be used in designing and animating freeform 2-D regions. Despite increasing DBSC applications, its theory and fundamentals have not been thoroughly investigated. In this article, we discuss several fundamental properties and algorithms, such as the de Boor algorithm for DBSCs. We first derive the explicit evaluation and derivatives formulas at arbitrary points of a 2-D region (interior and boundary) represented by a DBSC and then provide heterogeneous object representation. We also introduce modeling and interactive heterogeneous object design methods for a DBSC, which consolidates DBSC theory and supports its further applications.
- Published
- 2021
- Full Text
- View/download PDF
46. Learning view‐invariant features using stacked autoencoder for skeleton‐based gait recognition
- Author
-
Mahedi Hasan and Hossen Asiful Mustafa
- Subjects
Computer science ,business.industry ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Pattern recognition ,Skeleton (category theory) ,Autoencoder ,QA76.75-76.765 ,Gait (human) ,Computer software ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Invariant (mathematics) ,business ,Software - Abstract
Human gait recognition in a multicamera environment is a challenging task in biometrics because of the presence of the large pose and variations in illumination among different views. In this work, to address the problem of variations in view, we present a novel stacked autoencoder for learning discriminant view‐invariant gait representations. Our autoencoder can efficiently and progressively translate skeleton joint coordinates from any arbitrary view to a common canonical view without requiring the prior estimation of the view angle or covariate type and without losing temporal information. Then, we construct a discriminative gait feature vector by fusing the encoded features with two other spatiotemporal gait features to feed into the main recurrent neural network. Experimental evaluations of the challenging CASIA A and CASIA B gait datasets demonstrate that the proposed approach outperformed other state‐of‐the‐art methods on single‐view gait recognition. In particular, the proposed method achieved 46.31% and 33.86% average correct class recognition on probe set ProbeBG and ProbeCL, respectively, of the CASIA B dataset while considering the view variation; this is 0.3% and 30.68% higher than previous best‐performing methods. Furthermore, in cross‐view recognition, our method shows better results over other state‐of‐the‐art methods when the view‐angle variation is large than 36°.
- Published
- 2021
- Full Text
- View/download PDF
47. Inferring object properties from human interaction and transferring them to new motions
- Author
-
Qian Zheng, Hanting Pan, Niloy J. Mitra, Daniel Cohen-Or, Hui Huang, and Weikai Wu
- Subjects
business.industry ,Inertial motion capture ,Computer science ,Inference ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Skeleton (category theory) ,Object (computer science) ,Computer Graphics and Computer-Aided Design ,Motion (physics) ,Computer graphics ,Artificial Intelligence ,Human interaction ,0202 electrical engineering, electronic engineering, information engineering ,Action recognition ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business - Abstract
Humans regularly interact with their surrounding objects. Such interactions often result in strongly correlated motions between humans and the interacting objects. We thus ask: “Is it possible to infer object properties from skeletal motion alone, even without seeing the interacting object itself?” In this paper, we present a fine-grained action recognition method that learns toinfersuch latent object properties from human interaction motion alone. This inference allows us todisentanglethe motion from the object property andtransferobject properties to a given motion. We collected a large number of videos and 3D skeletal motions of performing actors using an inertial motion capture device. We analyzed similar actions and learned subtle differences between them to reveal latent properties of the interacting objects. In particular, we learned to identify the interacting object, by estimating its weight, or its spillability. Our results clearly demonstrate that motions and interacting objects are highly correlated and that related object latent properties can be inferred from 3D skeleton sequences alone, leading to new synthesis possibilities for motions involving human interaction. Our dataset is available athttp://vcc.szu.edu.cn/research/2020/IT.html.
- Published
- 2021
- Full Text
- View/download PDF
48. Recognition of Hasta Mudra Using Star Skeleton—Preservation of Buddhist Heritage
- Author
-
Gopa Bhaumik and Mahesh Chandra Govil
- Subjects
Computer science ,business.industry ,Buddhism ,Gautama Buddha ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Skeleton (category theory) ,Star (graph theory) ,computer.software_genre ,Computer Graphics and Computer-Aided Design ,Zero (linguistics) ,Pattern recognition (psychology) ,Preprocessor ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,computer ,Natural language processing ,Gesture - Abstract
Nonverbal communication primarily by the way of hand gestures is as old as human evolution. Before languages were developed and texts, which came much after that, hand gestures were the only way of human interaction. And hence in historical artefacts across various civilizations and religions across the world hand gestures play a predominant role, some more than others. While selecting a subject for the application of this research we wanted to zero in on a practice or religion where non-verbal communication is a prevalent part and hence our inclination towards Buddhism. In Buddhism, hand gestures (mudras) are considered as a sacred gesture that represent the different Buddha deities and their significance. The article proposes a system that identify the Buddhist hand gesture (mudras) using computer-aided technology. The system comprises a preprocessing stage, which creates a contour plot of the image to obtain the boundary of the region of interest. The features are extracted by generating a star skeleton from the preprocessed image. The star skeleton calculated by considering the local maxima of the distance signal obtained by joining the centroid with the boundary pixels describes the mudras. Each of these mudras has a different star skeleton. The star skeletons computed from the known sample images are used to build a database for the recognition system. The recognition is achieved by choosing the template with the most similar skeleton retrieved from the database.
- Published
- 2021
- Full Text
- View/download PDF
49. An adversarial framework for open-set human action recognition using skeleton data
- Author
-
Özge Öztimur Karadag
- Subjects
Adversarial system ,General Computer Science ,business.industry ,Computer science ,Open set ,Action recognition ,Artificial intelligence ,Electrical and Electronic Engineering ,Skeleton (category theory) ,business - Published
- 2021
- Full Text
- View/download PDF
50. Pose recognition in sports scenes based on deep learning skeleton sequence model
- Author
-
Li You, Zhaoqimeng Shan, Fengjun Shen, Chen Li-quan, and Jiaxuan Chen
- Subjects
Statistics and Probability ,0209 industrial biotechnology ,Sequence model ,Computer science ,business.industry ,Deep learning ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,General Engineering ,Pattern recognition ,02 engineering and technology ,Skeleton (category theory) ,020901 industrial engineering & automation ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business - Abstract
Human skeleton extraction is a basic problem in the field of computer vision. With the rapid progress of science and technology, it has become a hot issue in the field of target detection such as pedestrian recognition, behavior monitoring, and pedestrian gesture recognition. In recent years, due to the development of deep neural networks, modeling of human joints in acquired images has made progress in skeleton extraction. However, most models have low modeling accuracy, poor real-time performance, and poor model availability. problem. Aiming at the above-mentioned human target detection problem, this paper uses the deep learning skeleton sequence model gesture recognition method in sports scenes to study, aiming to provide a gesture recognition method with strong noise resistance, good real-time performance and accurate model. This article uses motion video frame images to train the VGG16 network. Using the network to extract skeleton information can strengthen the posture feature expression, and use HOG for feature extraction, and use the Adam algorithm to optimize the network to extract more posture features, thereby improving the posture of the network Recognition accuracy. Then adjust the hyperparameters and network structure of the basic network according to the training results, and obtain the key poses in the sports scene through the final classifier.
- Published
- 2021
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.