8 results on '"Wan, Shaohua"'
Search Results
2. CE-text: A context-Aware and embedded text detector in natural scene images.
- Author
-
Wu, Yirui, Zhang, Wen, and Wan, Shaohua
- Subjects
- *
DEEP learning , *TEXT recognition , *CONVOLUTIONAL neural networks , *DETECTORS - Abstract
• A novel deep and context-aware CNN structure for accurate and fast text detection • Hierarchically channel wise attention scheme with channel wise and multilayer features • Adopts frequency-based deep compression method to build a lightweight text detector With the significant power of deep learning architectures, researchers have made much progress on effectiveness and efficiency of text detection in the past few years. However, due to the lack of consideration of unique characteristics of text components, directly applying deep learning models to perform text detection task is prone to result in low accuracy, especially producing false positive detection results. To ease this problem, we propose a lightweight and context-aware deep convolutional neural network (CNN) named as CE-Text, which appropriately encodes multi-level channel attention information to construct discriminative feature map for accurate and efficient text detection. To fit with low computation resource of embedded systems, we further transform CE-Text into a lighter version with a frequency based deep CNN compression method, which expands applicable scenarios of CE-Text into variant embedded systems. Experiments on several popular datasets show that CE-Text not only has achieved accurate text detection results in scene images, but also could run with fast performance in embedded systems. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
3. A motor imagery EEG signal classification algorithm based on recurrence plot convolution neural network.
- Author
-
Meng, XianJia, Qiu, Shi, Wan, Shaohua, Cheng, Keyang, and Cui, Lei
- Subjects
- *
CONVOLUTIONAL neural networks , *SIGNAL classification , *BRAIN-computer interfaces , *CLASSIFICATION algorithms , *ELECTROENCEPHALOGRAPHY , *BRAINWASHING - Abstract
• Limited information in time domain results in limited performance of feature classification. • The particularity of EEG signal makes it difficult to measure. • The strong correlation of EEG signals makes it difficult to build feature extraction network. With the promotion of brain-computer interface technology, it is possible to study brain control system through EEG signals in recent years. In order to solve the problem of EEG signal classification effectively, a motor imagery classification algorithm based on recurrence plot convolution neural network is proposed. Firstly, EEG signals are preprocessed to enhance the signal intensity in the exercise interval. Secondly, time-domain and frequency-domain features are extracted respectively to construct the feature mode of recurrence plot. Finally, a new neural network is established to realize the accurate recognition of left and right movements. This research can also be transferred to other research fields. [Display omitted] [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
4. CDText: Scene text detector based on context-aware deformable transformer.
- Author
-
Wu, Yirui, Kong, Qiran, Yong, Lai, Narducci, Fabio, and Wan, Shaohua
- Subjects
- *
TEXT recognition , *DETECTORS , *FEATURE extraction , *COMPARATIVE method - Abstract
• CDText detect texts of arbitrary shapes by encoding context information. • Feature extractor refines feature map with dilated context encoding blocks. • Transformer aggregates text features of detection boxes for instance segmentation. Scene text detection task aims to precisely locate text regions in natural scenes. However, the existing methods still face challenges in detecting arbitrary-shaped text, due to their limited feature representation capability. To alleviate this problem, we propose a scene text detector, i.e., CDText, based on structure of context-aware deformable transformer. Specifically, CDText firstly adopts different convolution kernel designs for feature extraction, which designs receptive fields with different size for multi-scale feature perception and fusion. Meanwhile, multi-head self-attention mechanism is used to strengthen the reasoning ability of CDText in a global sense, thus enhancing feature maps with abundant context information by extracting implicit relationship between multi-scale text features. Moreover, CDText designs a segmentation head to segment text instances of arbitrary shapes from rectangular detection boxes. Experiments show that CDText is superior to comparative methods in detection accuracy, achieving F -scores of 92.7, 81.9, and 82.9 on ICDAR2013, Total Text, and CTW-1500 datasets, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
5. GDRL: An interpretable framework for thoracic pathologic prediction.
- Author
-
Wu, Yirui, Li, Hao, Feng, Xi, Casanova, Andrea, Abate, Andrea F., and Wan, Shaohua
- Subjects
- *
DECISION making , *DEEP learning , *FEATURE extraction , *LATENT infection , *IMAGE analysis , *X-ray imaging - Abstract
• Propose a Group-Disentangled Representation Learning framework (GDRL). • Introduce an implicit group-swap structure. • Extract linking relationship between semantical concepts of pathology and visual features. • Demonstrate that GDRL can significantly improve classification accuracy. Deep learning methods have shown significant performance in medical image analysis tasks. However, they generally act like "black box" without explanations in both feature extraction and decision processes, leading to lack of clinical insights and high risk assessments. To aid deep learning in envisioning diseases with visual clues, we propose a novel Group-Disentangled Representation Learning framework (GDRL). The key contribution is that GDRL completely disentangles latent space into disease concepts with abundant and non-overlapping feature related explanations, thus enhancing interpretability in feature extraction and decision processes. Furthermore, we introduce an implicit group-swap structure by emphasizing the linking relationship between semantical concepts of disease and low-level visual features, other than explicit explanations on general objects and their attributes. We demonstrate our framework on predicting four categories of diseases from chest X-ray images. The AUROC of GDRL on ChestX-ray14 for thoracic pathologic prediction are 0.8630, 0.8980, 0.9269 and 0.8653 respectively, and we showcase the potential of our framework in enhancing interpretability of the factors contributing to different diseases. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
6. 3D dynamic facial expression recognition using low-resolution videos.
- Author
-
Shao, Jie, Gori, Ilaria, Wan, Shaohua, and Aggarwal, J.K.
- Subjects
- *
HUMAN facial recognition software , *THREE-dimensional imaging , *DEFORMATIONS (Mechanics) , *IMAGE processing , *RANDOM fields - Abstract
In this paper, we focus on the problem of 3D dynamic (4D) facial expression recognition. While traditional methods rely on building deformation models on high-resolution 3D meshes, our approach works directly on low-resolution RGB-D sequences; this feature allows us to apply our algorithm to videos retrieved by widespread and standard low-resolution RGB-D sensors, such as Kinect. After preprocessing both RGB and depth image sequences, sparse features are learned from spatio-temporal local cuboids. Conditional Random Fields classifier is then employed for training and classification. The proposed system is fully-automatic and achieves superior results on three low-resolution datasets built from the 4D facial expression recognition dataset – BU-4DFE. Extensive evaluations of our approach and comparisons with state-of-the-art methods are presented. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
7. A method for user-customized compensation of metamorphopsia through video see-through enabled head mounted display.
- Author
-
Cimmino, Lucia, Pero, Chiara, Ricciardi, Stefano, and Wan, Shaohua
- Subjects
- *
HEAD-mounted displays , *STREAMING video & television , *CAMCORDERS , *VISION disorders , *VIDEO processing , *AUGMENTED reality - Abstract
• We propose an approach to compensate the visual defects caused by metamorphopsia • Our approach enables interactive measurement of distortion in user's visual field • We compensate the warped visual field through a real-time processing of video streams • We conducted an experiment on 17 patients affected by metamorphopsia • The results show the proposed system is able to reduce visual field distortion Advances in Augmented Reality technologies and, particularly, the availability of video see-through enabled head mounted displays (HMD), are allowing to devise new strategies to help individuals with visual impairments in daily life. In this work, an approach is proposed to compensate a serious visual impairment, known as metamorphopsia, a vision disorder characterized by deformed images. The goal is to provide patients with a digitally restored visual field, through real-time processing of video see-through streams captured from the HMD. To this regard, we present two contributions, respectively, an interactive discrete modeling of patient's eye-specific vision distortion and a compensation of the latter by means of corresponding real-time counter-distortion of incoming frames. Our approach, indeed, maps each of the video streams acquired by the stereoscopic video see-through cameras aboard the headset on a 2D polygonal mesh which is then counter-warped by moving its vertices based on the previously built distortion model and then displayed, restored, on the HMD's screen. First user evaluations report promising results along with usability issues related to HMD technology. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
8. Image caption generation with high-level image features.
- Author
-
Ding, Songtao, Qu, Shiru, Xi, Yuling, Sangaiah, Arun Kumar, and Wan, Shaohua
- Subjects
- *
HUMAN facial recognition software , *IMAGE , *REPRODUCTION - Abstract
• Introduce the theory of attention in psychology to image captioning and use to filter image features. • Combine low-level information with high-level features to detect attention regions of an image. • LSTM variant model is not only affected by long-term information, but also by the rules of attention. • Quantitatively validate good performance of our method on some benchmark datasets. Recently, caption generation has raised a huge interests in images and videos. However, it is challenging for the models to select proper subjects in a complex background and generate desired captions in high-level vision tasks. Inspired by recent works, we propose a novel image captioning model based on high-level image features. We combine low-level information, such as image quality, with high-level features, such as motion classification and face recognition to detect attention regions of an image. We demonstrate that our attention model produces good performance in experiments on MSCOCO, Flickr 30K, PASCL and SBU datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.