Author: "Wan, Shaohua" / Database: Academic Search Index / Journal: pattern recognition letters / Publication Year Range: Last 10 years - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Wan, Shaohua"' showing total 8 results

Start Over Author "Wan, Shaohua" Publication Year Range Last 10 years Journal pattern recognition letters Database Academic Search Index

8 results on '"Wan, Shaohua"'

1. Editorial paper for Pattern Recognition Letters VSI on cross model understanding for visual question answering.

Author: Wan, Shaohua, Gao, Zan, Zhang, Hanwang, Xiaojun, Chang, Chen, Chen, and Tefas, Anastasios
Published: 2022
Full Text: View/download PDF

2. CE-text: A context-Aware and embedded text detector in natural scene images.

Author: Wu, Yirui, Zhang, Wen, and Wan, Shaohua
Subjects: *DEEP learning, *TEXT recognition, *CONVOLUTIONAL neural networks, *DETECTORS
Abstract: • A novel deep and context-aware CNN structure for accurate and fast text detection • Hierarchically channel wise attention scheme with channel wise and multilayer features • Adopts frequency-based deep compression method to build a lightweight text detector With the significant power of deep learning architectures, researchers have made much progress on effectiveness and efficiency of text detection in the past few years. However, due to the lack of consideration of unique characteristics of text components, directly applying deep learning models to perform text detection task is prone to result in low accuracy, especially producing false positive detection results. To ease this problem, we propose a lightweight and context-aware deep convolutional neural network (CNN) named as CE-Text, which appropriately encodes multi-level channel attention information to construct discriminative feature map for accurate and efficient text detection. To fit with low computation resource of embedded systems, we further transform CE-Text into a lighter version with a frequency based deep CNN compression method, which expands applicable scenarios of CE-Text into variant embedded systems. Experiments on several popular datasets show that CE-Text not only has achieved accurate text detection results in scene images, but also could run with fast performance in embedded systems. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

3. A motor imagery EEG signal classification algorithm based on recurrence plot convolution neural network.

Author: Meng, XianJia, Qiu, Shi, Wan, Shaohua, Cheng, Keyang, and Cui, Lei
Subjects: *CONVOLUTIONAL neural networks, *SIGNAL classification, *BRAIN-computer interfaces, *CLASSIFICATION algorithms, *ELECTROENCEPHALOGRAPHY, *BRAINWASHING
Abstract: • Limited information in time domain results in limited performance of feature classification. • The particularity of EEG signal makes it difficult to measure. • The strong correlation of EEG signals makes it difficult to build feature extraction network. With the promotion of brain-computer interface technology, it is possible to study brain control system through EEG signals in recent years. In order to solve the problem of EEG signal classification effectively, a motor imagery classification algorithm based on recurrence plot convolution neural network is proposed. Firstly, EEG signals are preprocessed to enhance the signal intensity in the exercise interval. Secondly, time-domain and frequency-domain features are extracted respectively to construct the feature mode of recurrence plot. Finally, a new neural network is established to realize the accurate recognition of left and right movements. This research can also be transferred to other research fields. [Display omitted] [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

4. CDText: Scene text detector based on context-aware deformable transformer.

Author: Wu, Yirui, Kong, Qiran, Yong, Lai, Narducci, Fabio, and Wan, Shaohua
Subjects: *TEXT recognition, *DETECTORS, *FEATURE extraction, *COMPARATIVE method
Abstract: • CDText detect texts of arbitrary shapes by encoding context information. • Feature extractor refines feature map with dilated context encoding blocks. • Transformer aggregates text features of detection boxes for instance segmentation. Scene text detection task aims to precisely locate text regions in natural scenes. However, the existing methods still face challenges in detecting arbitrary-shaped text, due to their limited feature representation capability. To alleviate this problem, we propose a scene text detector, i.e., CDText, based on structure of context-aware deformable transformer. Specifically, CDText firstly adopts different convolution kernel designs for feature extraction, which designs receptive fields with different size for multi-scale feature perception and fusion. Meanwhile, multi-head self-attention mechanism is used to strengthen the reasoning ability of CDText in a global sense, thus enhancing feature maps with abundant context information by extracting implicit relationship between multi-scale text features. Moreover, CDText designs a segmentation head to segment text instances of arbitrary shapes from rectangular detection boxes. Experiments show that CDText is superior to comparative methods in detection accuracy, achieving F -scores of 92.7, 81.9, and 82.9 on ICDAR2013, Total Text, and CTW-1500 datasets, respectively. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

5. GDRL: An interpretable framework for thoracic pathologic prediction.

Author: Wu, Yirui, Li, Hao, Feng, Xi, Casanova, Andrea, Abate, Andrea F., and Wan, Shaohua
Subjects: *DECISION making, *DEEP learning, *FEATURE extraction, *LATENT infection, *IMAGE analysis, *X-ray imaging
Abstract: • Propose a Group-Disentangled Representation Learning framework (GDRL). • Introduce an implicit group-swap structure. • Extract linking relationship between semantical concepts of pathology and visual features. • Demonstrate that GDRL can significantly improve classification accuracy. Deep learning methods have shown significant performance in medical image analysis tasks. However, they generally act like "black box" without explanations in both feature extraction and decision processes, leading to lack of clinical insights and high risk assessments. To aid deep learning in envisioning diseases with visual clues, we propose a novel Group-Disentangled Representation Learning framework (GDRL). The key contribution is that GDRL completely disentangles latent space into disease concepts with abundant and non-overlapping feature related explanations, thus enhancing interpretability in feature extraction and decision processes. Furthermore, we introduce an implicit group-swap structure by emphasizing the linking relationship between semantical concepts of disease and low-level visual features, other than explicit explanations on general objects and their attributes. We demonstrate our framework on predicting four categories of diseases from chest X-ray images. The AUROC of GDRL on ChestX-ray14 for thoracic pathologic prediction are 0.8630, 0.8980, 0.9269 and 0.8653 respectively, and we showcase the potential of our framework in enhancing interpretability of the factors contributing to different diseases. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

6. 3D dynamic facial expression recognition using low-resolution videos.

Author: Shao, Jie, Gori, Ilaria, Wan, Shaohua, and Aggarwal, J.K.
Subjects: *HUMAN facial recognition software, *THREE-dimensional imaging, *DEFORMATIONS (Mechanics), *IMAGE processing, *RANDOM fields
Abstract: In this paper, we focus on the problem of 3D dynamic (4D) facial expression recognition. While traditional methods rely on building deformation models on high-resolution 3D meshes, our approach works directly on low-resolution RGB-D sequences; this feature allows us to apply our algorithm to videos retrieved by widespread and standard low-resolution RGB-D sensors, such as Kinect. After preprocessing both RGB and depth image sequences, sparse features are learned from spatio-temporal local cuboids. Conditional Random Fields classifier is then employed for training and classification. The proposed system is fully-automatic and achieves superior results on three low-resolution datasets built from the 4D facial expression recognition dataset – BU-4DFE. Extensive evaluations of our approach and comparisons with state-of-the-art methods are presented. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

7. A method for user-customized compensation of metamorphopsia through video see-through enabled head mounted display.

Author: Cimmino, Lucia, Pero, Chiara, Ricciardi, Stefano, and Wan, Shaohua
Subjects: *HEAD-mounted displays, *STREAMING video & television, *CAMCORDERS, *VISION disorders, *VIDEO processing, *AUGMENTED reality
Abstract: • We propose an approach to compensate the visual defects caused by metamorphopsia • Our approach enables interactive measurement of distortion in user's visual field • We compensate the warped visual field through a real-time processing of video streams • We conducted an experiment on 17 patients affected by metamorphopsia • The results show the proposed system is able to reduce visual field distortion Advances in Augmented Reality technologies and, particularly, the availability of video see-through enabled head mounted displays (HMD), are allowing to devise new strategies to help individuals with visual impairments in daily life. In this work, an approach is proposed to compensate a serious visual impairment, known as metamorphopsia, a vision disorder characterized by deformed images. The goal is to provide patients with a digitally restored visual field, through real-time processing of video see-through streams captured from the HMD. To this regard, we present two contributions, respectively, an interactive discrete modeling of patient's eye-specific vision distortion and a compensation of the latter by means of corresponding real-time counter-distortion of incoming frames. Our approach, indeed, maps each of the video streams acquired by the stereoscopic video see-through cameras aboard the headset on a 2D polygonal mesh which is then counter-warped by moving its vertices based on the previously built distortion model and then displayed, restored, on the HMD's screen. First user evaluations report promising results along with usability issues related to HMD technology. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

8. Image caption generation with high-level image features.

Author: Ding, Songtao, Qu, Shiru, Xi, Yuling, Sangaiah, Arun Kumar, and Wan, Shaohua
Subjects: *HUMAN facial recognition software, *IMAGE, *REPRODUCTION
Abstract: • Introduce the theory of attention in psychology to image captioning and use to filter image features. • Combine low-level information with high-level features to detect attention regions of an image. • LSTM variant model is not only affected by long-term information, but also by the rules of attention. • Quantitatively validate good performance of our method on some benchmark datasets. Recently, caption generation has raised a huge interests in images and videos. However, it is challenging for the models to select proper subjects in a complex background and generate desired captions in high-level vision tasks. Inspired by recent works, we propose a novel image captioning model based on high-level image features. We combine low-level information, such as image quality, with high-level features, such as motion classification and face recognition to detect attention regions of an image. We demonstrate that our attention model produces good performance in experiments on MSCOCO, Flickr 30K, PASCL and SBU datasets. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

8 results on '"Wan, Shaohua"'

1. Editorial paper for Pattern Recognition Letters VSI on cross model understanding for visual question answering.

2. CE-text: A context-Aware and embedded text detector in natural scene images.

3. A motor imagery EEG signal classification algorithm based on recurrence plot convolution neural network.

4. CDText: Scene text detector based on context-aware deformable transformer.

5. GDRL: An interpretable framework for thoracic pathologic prediction.

6. 3D dynamic facial expression recognition using low-resolution videos.

7. A method for user-customized compensation of metamorphopsia through video see-through enabled head mounted display.

8. Image caption generation with high-level image features.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

8 results on '"Wan, Shaohua"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources