Author: "Shan, Hongming" / Topic: fos: computer and information sciences - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Shan, Hongming"' showing total 17 results

Start Over Author "Shan, Hongming" Topic fos: computer and information sciences

17 results on '"Shan, Hongming"'

1. Temporal Modeling Matters: A Novel Temporal Emotional Modeling Approach for Speech Emotion Recognition

Author: Ye, Jiaxin, Wen, Xin-cheng, Wei, Yujie, Xu, Yong, Liu, Kunhong, and Shan, Hongming
Subjects: FOS: Computer and information sciences, Sound (cs.SD), Computer Science - Computation and Language, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Computation and Language (cs.CL), Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Speech emotion recognition (SER) plays a vital role in improving the interactions between humans and machines by inferring human emotion and affective states from speech signals. Whereas recent works primarily focus on mining spatiotemporal information from hand-crafted features, we explore how to model the temporal patterns of speech emotions from dynamic temporal scales. Towards that goal, we introduce a novel temporal emotional modeling approach for SER, termed Temporal-aware bI-direction Multi-scale Network (TIM-Net), which learns multi-scale contextual affective representations from various time scales. Specifically, TIM-Net first employs temporal-aware blocks to learn temporal affective representation, then integrates complementary information from the past and the future to enrich contextual representations, and finally, fuses multiple time scale features for better adaptation to the emotional variation. Extensive experimental results on six benchmark SER datasets demonstrate the superior performance of TIM-Net, gaining 2.34% and 2.61% improvements of the average UAR and WAR over the second-best on each corpus. The source code is available at https://github.com/Jiaxin-Ye/TIM-Net_SER., Accepted by ICASSP 2023
Published: 2023

2. Cross-Head Supervision for Crowd Counting with Noisy Annotations

Author: Dai, Mingliang, Huang, Zhizhong, Gao, Jiaqi, Shan, Hongming, and Zhang, Junping
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: Noisy annotations such as missing annotations and location shifts often exist in crowd counting datasets due to multi-scale head sizes, high occlusion, etc. These noisy annotations severely affect the model training, especially for density map-based methods. To alleviate the negative impact of noisy annotations, we propose a novel crowd counting model with one convolution head and one transformer head, in which these two heads can supervise each other in noisy areas, called Cross-Head Supervision. The resultant model, CHS-Net, can synergize different types of inductive biases for better counting. In addition, we develop a progressive cross-head supervision learning strategy to stabilize the training process and provide more reliable supervision. Extensive experimental results on ShanghaiTech and QNRF datasets demonstrate superior performance over state-of-the-art methods. Code is available at https://github.com/RaccoonDML/CHSNet., accepted by ICASSP 2023
Published: 2023

3. Motion Matters: A Novel Motion Modeling for Cross-View Gait Feature Learning

Author: Li, Jingqi, Gao, Jiaqi, Zhang, Yuzhen, Shan, Hongming, and Zhang, Junping
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: As a unique biometric that can be perceived at a distance, gait has broad applications in person authentication, social security, and so on. Existing gait recognition methods suffer from changes in viewpoint and clothing and barely consider extracting diverse motion features, a fundamental characteristic in gaits, from gait sequences. This paper proposes a novel motion modeling method to extract the discriminative and robust representation. Specifically, we first extract the motion features from the encoded motion sequences in the shallow layer. Then we continuously enhance the motion feature in deep layers. This motion modeling approach is independent of mainstream work in building network architectures. As a result, one can apply this motion modeling method to any backbone to improve gait recognition performance. In this paper, we combine motion modeling with one commonly used backbone~(GaitGL) as GaitGL-M to illustrate motion modeling. Extensive experimental results on two commonly-used cross-view gait datasets demonstrate the superior performance of GaitGL-M over existing state-of-the-art methods.
Published: 2023

4. Fan-Net: Fourier-Based Adaptive Normalization for Cross-Domain Stroke Lesion Segmentation

Author: Yu, Weiyi, Lei, Yiming, and Shan, Hongming
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Image and Video Processing (eess.IV), Computer Science - Computer Vision and Pattern Recognition, FOS: Electrical engineering, electronic engineering, information engineering, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Since stroke is the main cause of various cerebrovascular diseases, deep learning-based stroke lesion segmentation on magnetic resonance (MR) images has attracted considerable attention. However, the existing methods often neglect the domain shift among MR images collected from different sites, which has limited performance improvement. To address this problem, we intend to change style information without affecting high-level semantics via adaptively changing the low-frequency amplitude components of the Fourier transform so as to enhance model robustness to varying domains. Thus, we propose a novel FAN-Net, a U-Net--based segmentation network incorporated with a Fourier-based adaptive normalization (FAN) and a domain classifier with a gradient reversal layer. The FAN module is tailored for learning adaptive affine parameters for the amplitude components of different domains, which can dynamically normalize the style information of source images. Then, the domain classifier provides domain-agnostic knowledge to endow FAN with strong domain generalizability. The experimental results on the ATLAS dataset, which consists of MR images from 9 sites, show the superior performance of the proposed FAN-Net compared with baseline methods., Comment: Accepted by IEEE ICASSP 2023
Published: 2023

5. CLIP-Lung: Textual Knowledge-Guided Lung Nodule Malignancy Prediction

Author: Lei, Yiming, Li, Zilong, Shen, Yan, Zhang, Junping, and Shan, Hongming
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: Lung nodule malignancy prediction has been enhanced by advanced deep-learning techniques and effective tricks. Nevertheless, current methods are mainly trained with cross-entropy loss using one-hot categorical labels, which results in difficulty in distinguishing those nodules with closer progression labels. Interestingly, we observe that clinical text information annotated by radiologists provides us with discriminative knowledge to identify challenging samples. Drawing on the capability of the contrastive language-image pre-training (CLIP) model to learn generalized visual representations from text annotations, in this paper, we propose CLIP-Lung, a textual knowledge-guided framework for lung nodule malignancy prediction. First, CLIP-Lung introduces both class and attribute annotations into the training of the lung nodule classifier without any additional overheads in inference. Second, we designed a channel-wise conditional prompt (CCP) module to establish consistent relationships between learnable context prompts and specific feature maps. Third, we align image features with both class and attribute features via contrastive learning, rectifying false positives and false negatives in latent space. The experimental results on the benchmark LIDC-IDRI dataset have demonstrated the superiority of CLIP-Lung, both in classification performance and interpretability of attention maps.
Published: 2023

6. LIT-Former: Linking In-plane and Through-plane Transformers for Simultaneous CT Image Denoising and Deblurring

Author: Chen, Zhihao, Niu, Chuang, Wang, Ge, and Shan, Hongming
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Image and Video Processing (eess.IV), FOS: Electrical engineering, electronic engineering, information engineering, Computer Science - Computer Vision and Pattern Recognition, FOS: Physical sciences, Medical Physics (physics.med-ph), Electrical Engineering and Systems Science - Image and Video Processing, Physics - Medical Physics
Abstract: This paper studies 3D low-dose computed tomography (CT) imaging. Although various deep learning methods were developed in this context, typically they perform denoising due to low-dose and deblurring for super-resolution separately. Up to date, little work was done for simultaneous in-plane denoising and through-plane deblurring, which is important to improve clinical CT images. For this task, a straightforward method is to directly train an end-to-end 3D network. However, it demands much more training data and expensive computational costs. Here, we propose to link in-plane and through-plane transformers for simultaneous in-plane denoising and through-plane deblurring, termed as LIT-Former, which can efficiently synergize in-plane and through-plane sub-tasks for 3D CT imaging and enjoy the advantages of both convolution and transformer networks. LIT-Former has two novel designs: efficient multi-head self-attention modules (eMSM) and efficient convolutional feed-forward networks (eCFN). First, eMSM integrates in-plane 2D self-attention and through-plane 1D self-attention to efficiently capture global interactions of 3D self-attention, the core unit of transformer networks. Second, eCFN integrates 2D convolution and 1D convolution to extract local information of 3D convolution in the same fashion. As a result, the proposed LIT-Former synergizes these two sub-tasks, significantly reducing the computational complexity as compared to 3D counterparts and enabling rapid convergence. Extensive experimental results on simulated and clinical datasets demonstrate superior performance over state-of-the-art models., 13 pages, 8 figures
Published: 2023

7. CORE: Learning Consistent Ordinal REpresentations for Image Ordinal Estimation

Author: Lei, Yiming, Li, Zilong, Li, Yangyang, Zhang, Junping, and Shan, Hongming
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: The goal of image ordinal estimation is to estimate the ordinal label of a given image with a convolutional neural network. Existing methods are mainly based on ordinal regression and particularly focus on modeling the ordinal mapping from the feature representation of the input to the ordinal label space. However, the manifold of the resultant feature representations does not maintain the intrinsic ordinal relations of interest, which hinders the effectiveness of the image ordinal estimation. Therefore, this paper proposes learning intrinsic Consistent Ordinal REpresentations (CORE) from ordinal relations residing in groundtruth labels while encouraging the feature representations to embody the ordinal low-dimensional manifold. First, we develop an ordinal totally ordered set (toset) distribution (OTD), which can (i) model the label embeddings to inherit ordinal information and measure distances between ordered labels of samples in a neighborhood, and (ii) model the feature embeddings to infer numerical magnitude with unknown ordinal information among the features of different samples. Second, through OTD, we convert the feature representations and labels into the same embedding space for better alignment, and then compute the Kullback Leibler (KL) divergence between the ordinal labels and feature representations to endow the latent space with consistent ordinal relations. Third, we optimize the KL divergence through ordinal prototype-constrained convex programming with dual decomposition; our theoretical analysis shows that we can obtain the optimal solutions via gradient backpropagation. Extensive experimental results demonstrate that the proposed CORE can accurately construct an ordinal latent space and significantly enhance existing deep ordinal regression methods to achieve better results., 13 pages
Published: 2023

8. BerDiff: Conditional Bernoulli Diffusion Model for Medical Image Segmentation

Author: Chen, Tao, Wang, Chenhui, and Shan, Hongming
Subjects: FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: Medical image segmentation is a challenging task with inherent ambiguity and high uncertainty, attributed to factors such as unclear tumor boundaries and multiple plausible annotations. The accuracy and diversity of segmentation masks are both crucial for providing valuable references to radiologists in clinical practice. While existing diffusion models have shown strong capacities in various visual generation tasks, it is still challenging to deal with discrete masks in segmentation. To achieve accurate and diverse medical image segmentation masks, we propose a novel conditional Bernoulli Diffusion model for medical image segmentation (BerDiff). Instead of using the Gaussian noise, we first propose to use the Bernoulli noise as the diffusion kernel to enhance the capacity of the diffusion model for binary segmentation tasks, resulting in more accurate segmentation masks. Second, by leveraging the stochastic nature of the diffusion model, our BerDiff randomly samples the initial Bernoulli noise and intermediate latent variables multiple times to produce a range of diverse segmentation masks, which can highlight salient regions of interest that can serve as valuable references for radiologists. In addition, our BerDiff can efficiently sample sub-sequences from the overall trajectory of the reverse diffusion, thereby speeding up the segmentation process. Extensive experimental results on two medical image segmentation datasets with different modalities demonstrate that our BerDiff outperforms other recently published state-of-the-art methods. Our results suggest diffusion models could serve as a strong backbone for medical image segmentation., Comment: 14 pages, 7 figures
Published: 2023
Full Text: View/download PDF

9. FreeSeed: Frequency-band-aware and Self-guided Network for Sparse-view CT Reconstruction

Author: Ma, Chenglong, Li, Zilong, Zhang, Junping, Zhang, Yi, and Shan, Hongming
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Image and Video Processing (eess.IV), Computer Science - Computer Vision and Pattern Recognition, FOS: Electrical engineering, electronic engineering, information engineering, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Sparse-view computed tomography (CT) is a promising solution for expediting the scanning process and mitigating radiation exposure to patients, the reconstructed images, however, contain severe streak artifacts, compromising subsequent screening and diagnosis. Recently, deep learning-based image post-processing methods along with their dual-domain counterparts have shown promising results. However, existing methods usually produce over-smoothed images with loss of details due to (1) the difficulty in accurately modeling the artifact patterns in the image domain, and (2) the equal treatment of each pixel in the loss function. To address these issues, we concentrate on the image post-processing and propose a simple yet effective FREquency-band-awarE and SElf-guidED network, termed FreeSeed, which can effectively remove artifact and recover missing detail from the contaminated sparse-view CT images. Specifically, we first propose a frequency-band-aware artifact modeling network (FreeNet), which learns artifact-related frequency-band attention in Fourier domain for better modeling the globally distributed streak artifact on the sparse-view CT images. We then introduce a self-guided artifact refinement network (SeedNet), which leverages the predicted artifact to assist FreeNet in continuing to refine the severely corrupted details. Extensive experiments demonstrate the superior performance of FreeSeed and its dual-domain counterpart over the state-of-the-art sparse-view CT reconstruction methods. Source code is made available at https://github.com/Masaaki-75/freeseed., Comment: MICCAI 2023
Published: 2023
Full Text: View/download PDF

10. CoreDiff: Contextual Error-Modulated Generalized Diffusion Model for Low-Dose CT Denoising and Generalization

Author: Gao, Qi, Li, Zilong, Zhang, Junping, Zhang, Yi, and Shan, Hongming
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Vision and Pattern Recognition (cs.CV), Image and Video Processing (eess.IV), Computer Science - Computer Vision and Pattern Recognition, FOS: Electrical engineering, electronic engineering, information engineering, FOS: Physical sciences, Medical Physics (physics.med-ph), Electrical Engineering and Systems Science - Image and Video Processing, Physics - Medical Physics, Machine Learning (cs.LG)
Abstract: Low-dose computed tomography (CT) images suffer from noise and artifacts due to photon starvation and electronic noise. Recently, some works have attempted to use diffusion models to address the over-smoothness and training instability encountered by previous deep-learning-based denoising models. However, diffusion models suffer from long inference times due to the large number of sampling steps involved. Very recently, cold diffusion model generalizes classical diffusion models and has greater flexibility. Inspired by the cold diffusion, this paper presents a novel COntextual eRror-modulated gEneralized Diffusion model for low-dose CT (LDCT) denoising, termed CoreDiff. First, CoreDiff utilizes LDCT images to displace the random Gaussian noise and employs a novel mean-preserving degradation operator to mimic the physical process of CT degradation, significantly reducing sampling steps thanks to the informative LDCT images as the starting point of the sampling process. Second, to alleviate the error accumulation problem caused by the imperfect restoration operator in the sampling process, we propose a novel ContextuaL Error-modulAted Restoration Network (CLEAR-Net), which can leverage contextual information to constrain the sampling process from structural distortion and modulate time step embedding features for better alignment with the input at the next time step. Third, to rapidly generalize to a new, unseen dose level with as few resources as possible, we devise a one-shot learning framework to make CoreDiff generalize faster and better using only a single LDCT image (un)paired with NDCT. Extensive experimental results on two datasets demonstrate that our CoreDiff outperforms competing methods in denoising and generalization performance, with a clinically acceptable inference time., Comment: 11 pages, 12 figures
Published: 2023
Full Text: View/download PDF

11. Adaptive Nonlinear Latent Transformation for Conditional Face Editing

Author: Huang, Zhizhong, Ma, Siteng, Zhang, Junping, and Shan, Hongming
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent works for face editing usually manipulate the latent space of StyleGAN via the linear semantic directions. However, they usually suffer from the entanglement of facial attributes, need to tune the optimal editing strength, and are limited to binary attributes with strong supervision signals. This paper proposes a novel adaptive nonlinear latent transformation for disentangled and conditional face editing, termed AdaTrans. Specifically, our AdaTrans divides the manipulation process into several finer steps; i.e., the direction and size at each step are conditioned on both the facial attributes and the latent codes. In this way, AdaTrans describes an adaptive nonlinear transformation trajectory to manipulate the faces into target attributes while keeping other attributes unchanged. Then, AdaTrans leverages a predefined density model to constrain the learned trajectory in the distribution of latent codes by maximizing the likelihood of transformed latent code. Moreover, we also propose a disentangled learning strategy under a mutual information framework to eliminate the entanglement among attributes, which can further relax the need for labeled data. Consequently, AdaTrans enables a controllable face editing with the advantages of disentanglement, flexibility with non-binary attributes, and high fidelity. Extensive experimental results on various facial attributes demonstrate the qualitative and quantitative effectiveness of the proposed AdaTrans over existing state-of-the-art methods, especially in the most challenging scenarios with a large age gap and few labeled examples. The source code is available at https://github.com/Hzzone/AdaTrans., Comment: ICCV 2023
Published: 2023
Full Text: View/download PDF

12. Twin Contrastive Learning with Noisy Labels

Author: Huang, Zhizhong, Zhang, Junping, and Shan, Hongming
Subjects: FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: Learning from noisy data is a challenging task that significantly degenerates the model performance. In this paper, we present TCL, a novel twin contrastive learning model to learn robust representations and handle noisy labels for classification. Specifically, we construct a Gaussian mixture model (GMM) over the representations by injecting the supervised model predictions into GMM to link label-free latent variables in GMM with label-noisy annotations. Then, TCL detects the examples with wrong labels as the out-of-distribution examples by another two-component GMM, taking into account the data distribution. We further propose a cross-supervision with an entropy regularization loss that bootstraps the true targets from model predictions to handle the noisy labels. As a result, TCL can learn discriminative representations aligned with estimated labels through mixup and contrastive learning. Extensive experimental results on several standard benchmarks and real-world datasets demonstrate the superior performance of TCL. In particular, TCL achieves 7.5\% improvements on CIFAR-10 with 90\% noisy label -- an extremely noisy scenario. The source code is available at \url{https://github.com/Hzzone/TCL}., Comment: CVPR 2023
Published: 2023
Full Text: View/download PDF

13. Convolutional Neural Network to Restore Low-Dose Digital Breast Tomosynthesis Projections in a Variance Stabilization Domain

Author: Vimieiro, Rodrigo de Barros, Niu, Chuang, Shan, Hongming, Borges, Lucas Rodrigues, Wang, Ge, and Vieira, Marcelo Andrade da Costa
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Vision and Pattern Recognition (cs.CV), Image and Video Processing (eess.IV), Computer Science - Computer Vision and Pattern Recognition, FOS: Electrical engineering, electronic engineering, information engineering, Electrical Engineering and Systems Science - Image and Video Processing, Machine Learning (cs.LG)
Abstract: Digital breast tomosynthesis (DBT) exams should utilize the lowest possible radiation dose while maintaining sufficiently good image quality for accurate medical diagnosis. In this work, we propose a convolution neural network (CNN) to restore low-dose (LD) DBT projections to achieve an image quality equivalent to a standard full-dose (FD) acquisition. The proposed network architecture benefits from priors in terms of layers that were inspired by traditional model-based (MB) restoration methods, considering a model-based deep learning approach, where the network is trained to operate in the variance stabilization transformation (VST) domain. To accurately control the network operation point, in terms of noise and blur of the restored image, we propose a loss function that minimizes the bias and matches residual noise between the input and the output. The training dataset was composed of clinical data acquired at the standard FD and low-dose pairs obtained by the injection of quantum noise. The network was tested using real DBT projections acquired with a physical anthropomorphic breast phantom. The proposed network achieved superior results in terms of the mean normalized squared error (MNSE), training time and noise spatial correlation compared with networks trained with traditional data-driven methods. The proposed approach can be extended for other medical imaging application that requires LD acquisitions., Comment: 12 pages, 9 figures
Published: 2022
Full Text: View/download PDF

14. Impact of loss functions on the performance of a deep neural network designed to restore low-dose digital mammography

Author: Shan, Hongming, Vimieiro, Rodrigo de Barros, Borges, Lucas Rodrigues, Vieira, Marcelo Andrade da Costa, and Wang, Ge
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Image and Video Processing (eess.IV), FOS: Electrical engineering, electronic engineering, information engineering, Computer Science - Computer Vision and Pattern Recognition, FOS: Physical sciences, Medical Physics (physics.med-ph), Electrical Engineering and Systems Science - Image and Video Processing, Physics - Medical Physics
Abstract: Digital mammography is still the most common imaging tool for breast cancer screening. Although the benefits of using digital mammography for cancer screening outweigh the risks associated with the x-ray exposure, the radiation dose must be kept as low as possible while maintaining the diagnostic utility of the generated images, thus minimizing patient risks. Many studies investigated the feasibility of dose reduction by restoring low-dose images using deep neural networks. In these cases, choosing the appropriate training database and loss function is crucial and impacts the quality of the results. In this work, a modification of the ResNet architecture, with hierarchical skip connections, is proposed to restore low-dose digital mammography. We compared the restored images to the standard full-dose images. Moreover, we evaluated the performance of several loss functions for this task. For training purposes, we extracted 256,000 image patches from a dataset of 400 images of retrospective clinical mammography exams, where different dose levels were simulated to generate low and standard-dose pairs. To validate the network in a real scenario, a physical anthropomorphic breast phantom was used to acquire real low-dose and standard full-dose images in a commercially avaliable mammography system, which were then processed through our trained model. An analytical restoration model for low-dose digital mammography, previously presented, was used as a benchmark in this work. Objective assessment was performed through the signal-to-noise ratio (SNR) and mean normalized squared error (MNSE), decomposed into residual noise and bias. Results showed that the perceptual loss function (PL4) is able to achieve virtually the same noise levels of a full-dose acquisition, while resulting in smaller signal bias compared to other loss functions., 15 pages, 12 figures
Published: 2021

15. MANAS: Multi-Scale and Multi-Level Neural Architecture Search for Low-Dose CT Denoising

Author: Lu, Zexin, Xia, Wenjun, Huang, Yongqiang, Shan, Hongming, Chen, Hu, Zhou, Jiliu, and Zhang, Yi
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, FOS: Physical sciences, Medical Physics (physics.med-ph), Physics - Medical Physics
Abstract: Lowering the radiation dose in computed tomography (CT) can greatly reduce the potential risk to public health. However, the reconstructed images from the dose-reduced CT or low-dose CT (LDCT) suffer from severe noise, compromising the subsequent diagnosis and analysis. Recently, convolutional neural networks have achieved promising results in removing noise from LDCT images; the network architectures used are either handcrafted or built on top of conventional networks such as ResNet and U-Net. Recent advance on neural network architecture search (NAS) has proved that the network architecture has a dramatic effect on the model performance, which indicates that current network architectures for LDCT may be sub-optimal. Therefore, in this paper, we make the first attempt to apply NAS to LDCT and propose a multi-scale and multi-level NAS for LDCT denoising, termed MANAS. On the one hand, the proposed MANAS fuses features extracted by different scale cells to capture multi-scale image structural details. On the other hand, the proposed MANAS can search a hybrid cell- and network-level structure for better performance. Extensively experimental results on three different dose levels demonstrate that the proposed MANAS can achieve better performance in terms of preserving image structural details than several state-of-the-art methods. In addition, we also validate the effectiveness of the multi-scale and multi-level architecture for LDCT denoising.
Published: 2021
Full Text: View/download PDF

16. Cine Cardiac MRI Motion Artifact Reduction Using a Recurrent Neural Network

Author: Lyu, Qing, Shan, Hongming, Xie, Yibin, Li, Debiao, and Wang, Ge
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Image and Video Processing (eess.IV), FOS: Electrical engineering, electronic engineering, information engineering, FOS: Physical sciences, Medical Physics (physics.med-ph), Electrical Engineering and Systems Science - Image and Video Processing, Physics - Medical Physics, Machine Learning (cs.LG)
Abstract: Cine cardiac magnetic resonance imaging (MRI) is widely used for diagnosis of cardiac diseases thanks to its ability to present cardiovascular features in excellent contrast. As compared to computed tomography (CT), MRI, however, requires a long scan time, which inevitably induces motion artifacts and causes patients' discomfort. Thus, there has been a strong clinical motivation to develop techniques to reduce both the scan time and motion artifacts. Given its successful applications in other medical imaging tasks such as MRI super-resolution and CT metal artifact reduction, deep learning is a promising approach for cardiac MRI motion artifact reduction. In this paper, we propose a recurrent neural network to simultaneously extract both spatial and temporal features from under-sampled, motion-blurred cine cardiac images for improved image quality. The experimental results demonstrate substantially improved image quality on two clinical test datasets. Also, our method enables data-driven frame interpolation at an enhanced temporal resolution. Compared with existing methods, our deep learning approach gives a superior performance in terms of structural similarity (SSIM) and peak signal-to-noise ratio (PSNR)., Comment: 10 pages, 11 figures
Published: 2020
Full Text: View/download PDF

17. Precipitation Nowcasting with Star-Bridge Networks

Author: Cao, Yuan, Li, Qiuying, Shan, Hongming, Huang, Zhizhong, Chen, Lei, Ma, Leiming, and Zhang, Junping
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: Precipitation nowcasting, which aims to precisely predict the short-term rainfall intensity of a local region, is gaining increasing attention in the artificial intelligence community. Existing deep learning-based algorithms use a single network to process various rainfall intensities together, compromising the predictive accuracy. Therefore, this paper proposes a novel recurrent neural network (RNN) based star-bridge network (StarBriNet) for precipitation nowcasting. The novelty of this work lies in the following three aspects. First, the proposed network comprises multiple sub-networks to deal with different rainfall intensities and duration separately, which can significantly improve the model performance. Second, we propose a star-shaped information bridge to enhance the information flow across RNN layers. Third, we introduce a multi-sigmoid loss function to take the precipitation nowcasting criterion into account. Experimental results demonstrate superior performance for precipitation nowcasting over existing algorithms, including the state-of-the-art one, on a natural radar echo dataset., Comment: 10 pages, 7 figures
Published: 2019
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

17 results on '"Shan, Hongming"'

1. Temporal Modeling Matters: A Novel Temporal Emotional Modeling Approach for Speech Emotion Recognition

2. Cross-Head Supervision for Crowd Counting with Noisy Annotations

3. Motion Matters: A Novel Motion Modeling for Cross-View Gait Feature Learning

4. Fan-Net: Fourier-Based Adaptive Normalization for Cross-Domain Stroke Lesion Segmentation

5. CLIP-Lung: Textual Knowledge-Guided Lung Nodule Malignancy Prediction

6. LIT-Former: Linking In-plane and Through-plane Transformers for Simultaneous CT Image Denoising and Deblurring

7. CORE: Learning Consistent Ordinal REpresentations for Image Ordinal Estimation

8. BerDiff: Conditional Bernoulli Diffusion Model for Medical Image Segmentation

9. FreeSeed: Frequency-band-aware and Self-guided Network for Sparse-view CT Reconstruction

10. CoreDiff: Contextual Error-Modulated Generalized Diffusion Model for Low-Dose CT Denoising and Generalization

11. Adaptive Nonlinear Latent Transformation for Conditional Face Editing

12. Twin Contrastive Learning with Noisy Labels

13. Convolutional Neural Network to Restore Low-Dose Digital Breast Tomosynthesis Projections in a Variance Stabilization Domain

14. Impact of loss functions on the performance of a deep neural network designed to restore low-dose digital mammography

15. MANAS: Multi-Scale and Multi-Level Neural Architecture Search for Low-Dose CT Denoising

16. Cine Cardiac MRI Motion Artifact Reduction Using a Recurrent Neural Network

17. Precipitation Nowcasting with Star-Bridge Networks

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

Publisher

17 results on '"Shan, Hongming"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources