150 results on '"Zhong-Qiu, Wang"'
Search Results
52. TF-GridNet: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation
- Author
-
Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, and Shinji Watanabe
- Subjects
FOS: Computer and information sciences ,Sound (cs.SD) ,Audio and Speech Processing (eess.AS) ,FOS: Electrical engineering, electronic engineering, information engineering ,Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
We propose TF-GridNet, a novel multi-path deep neural network (DNN) operating in the time-frequency (T-F) domain, for monaural talker-independent speaker separation in anechoic conditions. The model stacks several multi-path blocks, each consisting of an intra-frame spectral module, a sub-band temporal module, and a full-band self-attention module, to leverage local and global spectro-temporal information for separation. The model is trained to perform complex spectral mapping, where the real and imaginary (RI) components of the input mixture are stacked as input features to predict target RI components. Besides using the scale-invariant signal-to-distortion ratio (SI-SDR) loss for model training, we include a novel loss term to encourage separated sources to add up to the input mixture. Without using dynamic mixing, we obtain 23.4 dB SI-SDR improvement (SI-SDRi) on the WSJ0-2mix dataset, outperforming the previous best by a large margin., in IEEE ICASSP 2023
- Published
- 2022
53. Convolutive Prediction for Monaural Speech Dereverberation and Noisy-Reverberant Speaker Separation
- Author
-
Gordon Wichern, Jonathan Le Roux, and Zhong-Qiu Wang
- Subjects
FOS: Computer and information sciences ,Sound (cs.SD) ,Reverberation ,Acoustics and Ultrasonics ,Artificial neural network ,business.industry ,Computer science ,Speech recognition ,Deep learning ,Supervised learning ,Filter (signal processing) ,Monaural ,Signal ,Computer Science - Sound ,Computational Mathematics ,Audio and Speech Processing (eess.AS) ,FOS: Electrical engineering, electronic engineering, information engineering ,Computer Science (miscellaneous) ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Impulse response ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
A promising approach for speech dereverberation is based on supervised learning, where a deep neural network (DNN) is trained to predict the direct sound from noisy-reverberant speech. This data-driven approach is based on leveraging prior knowledge of clean speech patterns and seldom explicitly exploits the linear-filter structure in reverberation, i.e., that reverberation results from a linear convolution between a room impulse response (RIR) and a dry source signal. In this work, we propose to exploit this linear-filter structure within a deep learning based monaural speech dereverberation framework. The key idea is to first estimate the direct-path signal of the target speaker using a DNN and then identify signals that are decayed and delayed copies of the estimated direct-path signal, as these can be reliably considered as reverberation. They can be either directly removed for dereverberation, or used as extra features for another DNN to perform better dereverberation. To identify the copies, we estimate the underlying filter (or RIR) by efficiently solving a linear regression problem per frequency in the time-frequency domain. We then modify the proposed algorithm for speaker separation in reverberant and noisy-reverberant conditions. State-of-the-art speech dereverberation and speaker separation results are obtained on the REVERB, SMS-WSJ, and WHAMR! datasets., Comment: in IEEE/ACM Transactions on Audio, Speech, and Language Processing
- Published
- 2021
54. Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation
- Author
-
Zhong-Qiu Wang, DeLiang Wang, and Peidong Wang
- Subjects
FOS: Computer and information sciences ,Beamforming ,Sound (cs.SD) ,Acoustics and Ultrasonics ,Computer science ,Microphone ,Speech recognition ,Separation (aeronautics) ,02 engineering and technology ,Impulse (physics) ,Article ,Computer Science - Sound ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Minimum-variance unbiased estimator ,Audio and Speech Processing (eess.AS) ,FOS: Electrical engineering, electronic engineering, information engineering ,0202 electrical engineering, electronic engineering, information engineering ,Computer Science (miscellaneous) ,Electrical and Electronic Engineering ,Computer Science - Computation and Language ,business.industry ,Deep learning ,020206 networking & telecommunications ,Speech processing ,Computational Mathematics ,Artificial intelligence ,0305 other medical science ,business ,Computation and Language (cs.CL) ,Utterance ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
We propose multi-microphone complex spectral mapping, a simple way of applying deep learning for time-varying non-linear beamforming, for speaker separation in reverberant conditions. We aim at both speaker separation and dereverberation. Our study first investigates offline utterance-wise speaker separation and then extends to block-online continuous speech separation (CSS). Assuming a fixed array geometry between training and testing, we train deep neural networks (DNN) to predict the real and imaginary (RI) components of target speech at a reference microphone from the RI components of multiple microphones. We then integrate multi-microphone complex spectral mapping with minimum variance distortionless response (MVDR) beamforming and post-filtering to further improve separation, and combine it with frame-level speaker counting for block-online CSS. Although our system is trained on simulated room impulse responses (RIR) based on a fixed number of microphones arranged in a given geometry, it generalizes well to a real array with the same geometry. State-of-the-art separation performance is obtained on the simulated two-talker SMS-WSJ corpus and the real-recorded LibriCSS dataset., 14 pages, 6 figures. To appear in IEEE/ACM Transactions on Audio, Speech, and Language Processing. Sound demo https://zqwang7.github.io/demos/SMSWSJ_demo/taslp20_SMSWSJ_demo.html
- Published
- 2021
55. Robust Speaker Recognition Based on Single-Channel and Multi-Channel Speech Enhancement
- Author
-
DeLiang Wang, Zhong-Qiu Wang, Jorge Chang, and Hassan Taherian
- Subjects
Reverberation ,Acoustics and Ultrasonics ,Noise measurement ,Covariance matrix ,Computer science ,Speech recognition ,Speaker recognition ,Speech enhancement ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Computational Mathematics ,Robustness (computer science) ,Computer Science (miscellaneous) ,Spectrogram ,Mel-frequency cepstrum ,Electrical and Electronic Engineering ,0305 other medical science - Abstract
Deep neural network (DNN) embeddings for speaker recognition have recently attracted much attention. Compared to i-vectors, they are more robust to noise and room reverberation as DNNs leverage large-scale training. This article addresses the question of whether speech enhancement approaches are still useful when DNN embeddings are used for speaker recognition. We investigate single- and multi-channel speech enhancement for text-independent speaker verification based on x-vectors in conditions where strong diffuse noise and reverberation are both present. Single-channel (monaural) speech enhancement is based on complex spectral mapping and is applied to individual microphones. We use masking-based minimum variance distortion-less response (MVDR) beamformer and its rank-1 approximation for multi-channel speech enhancement. We propose a novel method of deriving time-frequency masks from the estimated complex spectrogram. In addition, we investigate gammatone frequency cepstral coefficients (GFCCs) as robust speaker features. Systematic evaluations and comparisons on the NIST SRE 2010 retransmitted corpus show that both monaural and multi-channel speech enhancement significantly outperform x-vector's performance, and our covariance matrix estimate is effective for the MVDR beamformer.
- Published
- 2020
56. Complex Spectral Mapping for Single- and Multi-Channel Speech Enhancement and Robust ASR
- Author
-
Peidong Wang, DeLiang Wang, and Zhong-Qiu Wang
- Subjects
Beamforming ,Acoustics and Ultrasonics ,business.industry ,Computer science ,Deep learning ,Pattern recognition ,ENCODE ,Signal ,Article ,Speech enhancement ,Computational Mathematics ,Minimum-variance unbiased estimator ,Computer Science (miscellaneous) ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Spatial analysis ,Word (computer architecture) - Abstract
This study proposes a complex spectral mapping approach for single- and multi-channel speech enhancement, where deep neural networks (DNNs) are used to predict the real and imaginary (RI) components of the direct-path signal from noisy and reverberant ones. The proposed system contains two DNNs. The first one performs single-channel complex spectral mapping. The estimated complex spectra are used to compute a minimum variance distortion-less response (MVDR) beamformer. The RI components of beamforming results, which encode spatial information, are then combined with the RI components of the mixture to train the second DNN for multi-channel complex spectral mapping. With estimated complex spectra, we also propose a novel method of time-varying beamforming. State-of-the-art performance is obtained on the speech enhancement and recognition tasks of the CHiME-4 corpus. More specifically, our system obtains 6.82%, 3.19% and 2.00% word error rates (WER) respectively on the single-, two-, and six-microphone tasks of CHiME-4, significantly surpassing the current best results of 9.15%, 3.91% and 2.24% WER.
- Published
- 2020
57. Localization Based Sequential Grouping for Continuous Speech Separation
- Author
-
Zhong-Qiu Wang and DeLiang Wang
- Subjects
FOS: Computer and information sciences ,Sound (cs.SD) ,Audio and Speech Processing (eess.AS) ,FOS: Electrical engineering, electronic engineering, information engineering ,Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
This study investigates robust speaker localization for con-tinuous speech separation and speaker diarization, where we use speaker directions to group non-contiguous segments of the same speaker. Assuming that speakers do not move and are located in different directions, the direction of arrival (DOA) information provides an informative cue for accurate sequential grouping and speaker diarization. Our system is block-online in the following sense. Given a block of frames with at most two speakers, we apply a two-speaker separa-tion model to separate (and enhance) the speakers, estimate the DOA of each separated speaker, and group the separation results across blocks based on the DOA estimates. Speaker diarization and speaker-attributed speech recognition results on the LibriCSS corpus demonstrate the effectiveness of the proposed algorithm., 5 pages, 1 figure
- Published
- 2021
58. Count And Separate: Incorporating Speaker Counting For Continuous Speaker Separation
- Author
-
DeLiang Wang and Zhong-Qiu Wang
- Subjects
Speech enhancement ,Reverberation ,Noise ,Permutation (music) ,Parallel processing (DSP implementation) ,Computer Science::Sound ,Computer science ,Speech recognition ,Frame (networking) ,Window (computing) ,Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) ,Invariant (mathematics) - Abstract
This study leverages frame-wise speaker counting to switch between speech enhancement and speaker separation for continuous speaker separation. The proposed approach counts the number of speakers at each frame. If there is no speaker overlap, a speech enhancement model is used to suppress noise and reverberation. Otherwise, a speaker separation model based on permutation invariant training is utilized to separate multiple speakers in noisy-reverberant conditions. We stitch the results from the enhancement and separation models based on their predictions in a small augmented window of frames surrounding an overlapped segment. Assuming a fixed array geometry between training and testing, we use multi-microphone complex spectral mapping for enhancement and separation, where deep neural networks are trained to predict the real and imaginary (RI) components of direct sound from stacked reverberant-noisy RI components of multiple microphones. Experimental results on the LibriCSS dataset demonstrate the effectiveness of our approach.
- Published
- 2021
59. Combining Spectral and Spatial Features for Deep Learning Based Blind Speaker Separation
- Author
-
DeLiang Wang and Zhong-Qiu Wang
- Subjects
Beamforming ,Acoustics and Ultrasonics ,Computer science ,business.industry ,Microphone ,Deep learning ,Pattern recognition ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Computational Mathematics ,Computer Science::Sound ,Computer Science (miscellaneous) ,Artificial intelligence ,Electrical and Electronic Engineering ,Invariant (mathematics) ,0305 other medical science ,business ,Cluster analysis ,Unit level - Abstract
This study tightly integrates complementary spectral and spatial features for deep learning based multi-channel speaker separation in reverberant environments. The key idea is to localize individual speakers so that an enhancement network can be trained on spatial as well as spectral features to extract the speaker from an estimated direction and with specific spectral structures. The spatial and spectral features are designed in a way such that the trained models are blind to the number of microphones and microphone geometry. To determine the direction of the speaker of interest, we identify time-frequency T-F units dominated by that speaker and only use them for direction estimation. The T-F unit level speaker dominance is determined by a two-channel chimera++ network, which combines deep clustering and permutation invariant training at the objective function level, and integrates spectral and interchannel phase patterns at the input feature level. In addition, T-F masking based beamforming is tightly integrated in the system by leveraging the magnitudes and phases produced by beamforming. Strong separation performance has been observed on reverberant talker-independent speaker separation, which separates reverberant speaker mixtures based on a random number of microphones arranged in arbitrary linear-array geometry.
- Published
- 2019
60. Robust Speaker Localization Guided by Deep Learning-Based Time-Frequency Masking
- Author
-
DeLiang Wang, Xueliang Zhang, and Zhong-Qiu Wang
- Subjects
Reverberation ,Acoustics and Ultrasonics ,Cross-correlation ,Computer science ,business.industry ,Deep learning ,Speech recognition ,Direction of arrival ,Monaural ,Time frequency masking ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Computational Mathematics ,Robustness (computer science) ,Computer Science (miscellaneous) ,Artificial intelligence ,Electrical and Electronic Engineering ,0305 other medical science ,Estimation methods ,business - Abstract
Deep learning-based time-frequency T-F masking has dramatically advanced monaural single-channel speech separation and enhancement. This study investigates its potential for direction of arrival DOA estimation in noisy and reverberant environments. We explore ways of combining T-F masking and conventional localization algorithms, such as generalized cross correlation with phase transform, as well as newly proposed algorithms based on steered-response SNR and steering vectors. The key idea is to utilize deep neural networks DNNs to identify speech dominant T-F units containing relatively clean phase for DOA estimation. Our DNN is trained using only monaural spectral information, and this makes the trained model directly applicable to arrays with various numbers of microphones arranged in diverse geometries. Although only monaural information is used for training, experimental results show strong robustness of the proposed approach in new environments with intense noise and room reverberation, outperforming traditional DOA estimation methods by large margins. Our study also suggests that the ideal ratio mask and its variants remain effective training targets for robust speaker localization.
- Published
- 2019
61. On The Compensation Between Magnitude and Phase in Speech Separation
- Author
-
Jonathan Le Roux, Zhong-Qiu Wang, and Gordon Wichern
- Subjects
FOS: Computer and information sciences ,Sound (cs.SD) ,Artificial neural network ,Computer science ,Applied Mathematics ,Speech recognition ,Monaural ,Intelligibility (communication) ,Computer Science - Sound ,Compensation (engineering) ,Speech enhancement ,Signal-to-noise ratio ,Audio and Speech Processing (eess.AS) ,Signal Processing ,FOS: Electrical engineering, electronic engineering, information engineering ,Spectrogram ,Time domain ,Electrical and Electronic Engineering ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Deep neural network (DNN) based end-to-end optimization in the complex time-frequency (T-F) domain or time domain has shown considerable potential in monaural speech separation. Many recent studies optimize loss functions defined solely in the time or complex domain, without including a loss on magnitude. Although such loss functions typically produce better scores if the evaluation metrics are objective time-domain metrics, they however produce worse scores on speech quality and intelligibility metrics and usually lead to worse speech recognition performance, compared with including a loss on magnitude. While this phenomenon has been experimentally observed by many studies, it is often not accurately explained and there lacks a thorough understanding on its fundamental cause. This paper provides a novel view from the perspective of the implicit compensation between estimated magnitude and phase. Analytical results based on monaural speech separation and robust automatic speech recognition (ASR) tasks in noisy-reverberant conditions support the validity of our view., Comment: in IEEE Signal Processing Letters
- Published
- 2021
- Full Text
- View/download PDF
62. Application of Unenhanced Computed Tomography Texture Analysis to Differentiate Pancreatic Adenosquamous Carcinoma from Pancreatic Ductal Adenocarcinoma
- Author
-
Shuai Ren, Hui-juan Tang, Rui Zhao, Shao-feng Duan, Rong Chen, and Zhong-qiu Wang
- Subjects
Adult ,Male ,Middle Aged ,Biochemistry ,Sensitivity and Specificity ,Diagnosis, Differential ,Pancreatic Neoplasms ,Carcinoma, Adenosquamous ,Predictive Value of Tests ,Genetics ,Humans ,Female ,Tomography, X-Ray Computed ,Aged ,Carcinoma, Pancreatic Ductal - Abstract
The objective of this study was to investigate the application of unenhanced computed tomography (CT) texture analysis in differentiating pancreatic adenosquamous carcinoma (PASC) from pancreatic ductal adenocarcinoma (PDAC).Preoperative CT images of 112 patients (31 with PASC, 81 with PDAC) were retrospectively reviewed. A total of 396 texture parameters were extracted from AnalysisKit software for further texture analysis. Texture features were selected for the differentiation of PASC and PDAC by the Mann-Whitney U test, univariate logistic regression analysis, and the minimum redundancy maximum relevance algorithm. Furthermore, receiver operating characteristic (ROC) curve analysis was performed to evaluate the diagnostic performance of the texture feature-based model by the random forest (RF) method. Finally, the robustness and reproducibility of the predictive model were assessed by the 10-times leave-group-out cross-validation (LGOCV) method.In the present study, 10 texture features to differentiate PASC from PDAC were eventually retained for RF model construction after feature selection. The predictive model had a good classification performance in differentiating PASC from PDAC, with the following characteristics: sensitivity, 95.7%; specificity, 92.5%; accuracy, 94.3%; positive predictive value (PPV), 94.3%; negative predictive value (NPV), 94.3%; and area under the ROC curve (AUC), 0.98. Moreover, the predictive model was proved to be robust and reproducible using the 10-times LGOCV algorithm (sensitivity, 90.0%; specificity, 71.3%; accuracy, 76.8%; PPV, 59.0%; NPV, 95.2%; and AUC, 0.80).The unenhanced CT texture analysis has great potential for differentiating PASC from PDAC.
- Published
- 2020
63. Deep Learning Based Target Cancellation for Speech Dereverberation
- Author
-
DeLiang Wang and Zhong-Qiu Wang
- Subjects
Masking (art) ,Reverberation ,Acoustics and Ultrasonics ,Computer science ,business.industry ,Speech recognition ,Deep learning ,Impulse (physics) ,Speech processing ,Signal ,Article ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Computational Mathematics ,Minimum-variance unbiased estimator ,Test set ,Computer Science (miscellaneous) ,Artificial intelligence ,Electrical and Electronic Engineering ,0305 other medical science ,business - Abstract
This article investigates deep learning based single- and multi-channel speech dereverberation. For single-channel processing, we extend magnitude-domain masking and mapping based dereverberation to complex-domain mapping, where deep neural networks (DNNs) are trained to predict the real and imaginary (RI) components of the direct-path signal from reverberant (and noisy) ones. For multi-channel processing, we first compute a minimum variance distortionless response (MVDR) beamformer to cancel the direct-path signal, and then feed the RI components of the cancelled signal, which is expected to be a filtered version of non-target signals, as additional features to perform dereverberation. Trained on a large dataset of simulated room impulse responses, our models show excellent speech dereverberation and recognition performance on the test set of the REVERB challenge, consistently better than single- and multi-channel weighted prediction error (WPE) algorithms.
- Published
- 2020
64. Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement
- Author
-
Desh Raj, Shinji Watanabe, Kevin W. Wilson, Zhong-Qiu Wang, Hakan Erdogan, Scott Wisdom, John R. Hershey, and Zhuo Chen
- Subjects
FOS: Computer and information sciences ,Beamforming ,Sound (cs.SD) ,Computer Science - Machine Learning ,Artificial neural network ,Covariance function ,business.industry ,Computer science ,Word error rate ,Machine Learning (stat.ML) ,Context (language use) ,Pattern recognition ,Computer Science - Sound ,Machine Learning (cs.LG) ,Speech enhancement ,Signal-to-noise ratio ,Audio and Speech Processing (eess.AS) ,Statistics - Machine Learning ,FOS: Electrical engineering, electronic engineering, information engineering ,Artificial intelligence ,business ,Block size ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
This work introduces sequential neural beamforming, which alternates between neural network based spectral separation and beamforming based spatial separation. Our neural networks for separation use an advanced convolutional architecture trained with a novel stabilized signal-to-noise ratio loss function. For beamforming, we explore multiple ways of computing time-varying covariance matrices, including factorizing the spatial covariance into a time-varying amplitude component and a time-invariant spatial component, as well as using block-based techniques. In addition, we introduce a multi-frame beamforming method which improves the results significantly by adding contextual frames to the beamforming formulations. We extensively evaluate and analyze the effects of window size, block size, and multi-frame context size for these methods. Our best method utilizes a sequence of three neural separation and multi-frame time-invariant spatial beamforming stages, and demonstrates an average improvement of 2.75 dB in scale-invariant signal-to-noise ratio and 14.2% absolute reduction in a comparative speech recognition metric across four challenging reverberant speech enhancement and separation tasks. We also use our three-speaker separation model to separate real recordings in the LibriCSS evaluation set into non-overlapping tracks, and achieve a better word error rate as compared to a baseline mask based beamformer., 7 pages, 7 figures, IEEE SLT 2021 (slt2020.org)
- Published
- 2019
65. Atypical choroid plexus papilloma: clinicopathological and neuroradiological features
- Author
-
Yingying Zhuang, Genji Bo, Dan Kong, Yuzhen Shi, Li-Li Guo, Rui-Rui Zhang, Wei Huang, Xiao Chen, Mao-Zhen Chen, Zhong-Qiu Wang, and Yiming Xu
- Subjects
Adult ,Gadolinium DTPA ,Male ,Pathology ,medicine.medical_specialty ,Magnetic Resonance Spectroscopy ,Contrast Media ,030218 nuclear medicine & medical imaging ,Diagnosis, Differential ,03 medical and health sciences ,0302 clinical medicine ,Edema ,medicine ,Humans ,Radiology, Nuclear Medicine and imaging ,Nuclear atypia ,Retrospective Studies ,Radiological and Ultrasound Technology ,medicine.diagnostic_test ,business.industry ,Magnetic resonance imaging ,General Medicine ,Middle Aged ,Cerebellopontine angle ,medicine.disease ,Magnetic Resonance Imaging ,Hydrocephalus ,Contrast medium ,medicine.anatomical_structure ,Ventricle ,Female ,Papilloma, Choroid Plexus ,medicine.symptom ,Differential diagnosis ,business ,030217 neurology & neurosurgery - Abstract
Background Atypical choroid plexus papilloma (APP) is a rare, newly introduced entity with intermediate characteristics. To date, few reports have revealed the magnetic resonance (MR) findings. Purpose To analyze the clinicopathological and MR features of APP. Material and Methods The clinicopathological data and preoperative MR images of six patients with pathologically proven APP were retrospectively reviewed. The MR features including tumor location, contour, signal intensity, degree of enhancement, intratumoral cysts, and necrosis; and flow voids, borders, peritumoral edema, and associated hydrocephalus were analyzed. Results The APP were located in the ventricle (n = 4) and cerebellopontine angle (CPA, n = 2). Tumor dissemination along the spinal subarachnoid space was found in one patient. The tumors appeared as milt-lobulated (n = 5) or round mass (n = 1), with slightly heterogeneous signals (n = 5) or mixed signals (n = 1) on T1-weighted and T2-weighted images. Heterogeneous and strong enhancement were found in five cases on contrast-enhanced images. Three of four intraventricular tumors had a partly blurred border with ventricle wall. Four tumors had mild to moderate extent of surrounding edema signals. A slight hydrocephalus was seen in four patients. Incomplete capsule was seen in four tumors at surgery. Histopathologically, mild nuclear atypia was seen in all tumors with a mitotic rate of 2–5 per 10 high-power fields. Conclusion APP should be included in the differential diagnosis when an intraventricular or CPA tumor appearing as a multi-lobulated solid mass with slight heterogeneity, heterogeneous strong enhancement, partly blurred borders, mild to moderate peritumoral edema, or slight hydrocephalus are present.
- Published
- 2017
66. Evidence of motor injury due to damaged corticospinal tract following acute hemorrhage in the basal ganglia region
- Author
-
Si Yuan Hou, Ling Shan Chen, Zhong Qiu Wang, Xue Hu Wei, Zheng Qiu Zhu, Xiao Kun Fang, Yong Kang Liu, and Jing Li
- Subjects
Adult ,Male ,Internal capsule ,Pyramidal Tracts ,lcsh:Medicine ,03 medical and health sciences ,0302 clinical medicine ,Corona radiata ,Basal ganglia ,Fractional anisotropy ,medicine ,Humans ,030212 general & internal medicine ,lcsh:Science ,Paresis ,Aged ,Multidisciplinary ,medicine.diagnostic_test ,business.industry ,lcsh:R ,Basal Ganglia Hemorrhage ,Magnetic resonance imaging ,Anatomy ,Recovery of Function ,Middle Aged ,Magnetic Resonance Imaging ,Diffusion Tensor Imaging ,Motor Skills ,Corticospinal tract ,lcsh:Q ,Female ,medicine.symptom ,business ,human activities ,030217 neurology & neurosurgery ,Diffusion MRI - Abstract
The integrity of the corticospinal tract (CST) is significantly affected following basal ganglia haemorrhage. We aimed to assess the local features of CST and to effectively predict motor function by diffusion characteristics of CST in patients with motor injury following acute haemorrhage in the acute basal ganglia region. We recruited 37 patients with paresis of the lateral limbs caused by acute basal ganglia haemorrhage. Based on the automated fiber quantification method to track CST, assessed the character of each CST segment between the affected and contralateral sides, and correlated these with the Fugl–Meyer (FM) and Barthel Index (BI) scores at 6 months after onset. The fractional anisotropy (FA) values of the injured side of CST showed a significantly lower FA than the contralateral side along the tract profiles (p
- Published
- 2019
67. Differentiation Between G1 and G2/G3 Phyllodes Tumors of Breast Using Mammography and Mammographic Texture Analysis
- Author
-
Xiao Chen, Wen Jing Cui, Cheng Wang, Ling Jia, Can Cui, Zhong Qiu Wang, Shuai Ren, and Shao Feng Duan
- Subjects
0301 basic medicine ,Cancer Research ,mammography ,lcsh:RC254-282 ,03 medical and health sciences ,0302 clinical medicine ,Female patient ,medicine ,Mammography ,Texture (crystalline) ,Growth speed ,Original Research ,Receiver operating characteristic ,Tumor size ,medicine.diagnostic_test ,business.industry ,Area under the curve ,artificial intelligence ,lcsh:Neoplasms. Tumors. Oncology. Including cancer and carcinogens ,030104 developmental biology ,machine learning ,Oncology ,classification ,030220 oncology & carcinogenesis ,Nuclear medicine ,business ,phyllodes tumors - Abstract
Purpose: To determine the potential of mammography (MG) and mammographic texture analysis in differentiation between Grade 1 (G1) and Grade 2/ Grade 3 (G2/G3) phyllodes tumors (PTs) of breast.Materials and methods: A total of 80 female patients with histologically proven PTs were included in this study. 45 subjects who underwent pretreatment MG from 2010 to 2017 were retrospectively analyzed, including 14 PTs G1 and 31 PTs G2/G3. Tumor size, shape, margin, density, homogeneity, presence of fat, or calcifications, a halo-sign as well as some indirect manifestations were evaluated. Texture analysis features were performed using commercial software. Receiver operating characteristic curve (ROC) was used to determine the sensitivity and specificity of prediction.Results: G2/G3 PTs showed a larger size (>4.0 cm) compared to PTs G1 (64.52 vs. 28.57%, p = 0.025). A strong lobulation or multinodular confluent was more common in G2/G3 PTs compared to PTs G1 (64.52 vs. 14.29%, p = 0.004). Significant differences were also observed in tumors' growth speed and clinical manifestations (p = 0.007, 0.022, respectively). Ten texture features showed significant differences between the two groups (p < 0.05), Correlation_AllDirection_offset7_SD and ClusterProminence_AllDirection_offset7_SD were independent risk factors. The area under the curve (AUC) of imaging-based diagnosis, texture analysis-based diagnosis and the combination of the two approaches were 0.805, 0.730, and 0.843 (90.3% sensitivity and 85.7% specificity).Conclusions: Texture analysis has great potential to improve the diagnostic efficacy of MG in differentiating PTs G1 from PTs G2/G3.
- Published
- 2019
68. Deep Learning Based Phase Reconstruction for Speaker Separation: A Trigonometric Perspective
- Author
-
DeLiang Wang, Ke Tan, and Zhong-Qiu Wang
- Subjects
FOS: Computer and information sciences ,Sound (cs.SD) ,Computer Science - Computation and Language ,Absolute phase ,Perspective (graphical) ,Short-time Fourier transform ,Computer Science - Sound ,Domain (mathematical analysis) ,symbols.namesake ,Fourier transform ,Audio and Speech Processing (eess.AS) ,FOS: Electrical engineering, electronic engineering, information engineering ,symbols ,Trigonometry ,Computation and Language (cs.CL) ,Algorithm ,Electrical Engineering and Systems Science - Audio and Speech Processing ,Sign (mathematics) ,Group delay and phase delay ,Mathematics - Abstract
This study investigates phase reconstruction for deep learning based monaural talker-independent speaker separation in the short-time Fourier transform (STFT) domain. The key observation is that, for a mixture of two sources, with their magnitudes accurately estimated and under a geometric constraint, the absolute phase difference between each source and the mixture can be uniquely determined; in addition, the source phases at each time-frequency (T-F) unit can be narrowed down to only two candidates. To pick the right candidate, we propose three algorithms based on iterative phase reconstruction, group delay estimation, and phase-difference sign prediction. State-of-the-art results are obtained on the publicly available wsj0-2mix and 3mix corpus., Comment: 5 pages, in submission to ICASSP-2019
- Published
- 2019
69. A Joint Training Framework for Robust Automatic Speech Recognition
- Author
-
Zhong-Qiu Wang and DeLiang Wang
- Subjects
Acoustics and Ultrasonics ,Artificial neural network ,Noise measurement ,business.industry ,Computer science ,Speech recognition ,Acoustic model ,Word error rate ,020206 networking & telecommunications ,Pattern recognition ,02 engineering and technology ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Computational Mathematics ,Discriminative model ,Robustness (computer science) ,Test set ,0202 electrical engineering, electronic engineering, information engineering ,Computer Science (miscellaneous) ,Artificial intelligence ,Language model ,Electrical and Electronic Engineering ,0305 other medical science ,business - Abstract
Robustness against noise and reverberation is critical for ASR systems deployed in real-world environments. In robust ASR, corrupted speech is normally enhanced using speech separation or enhancement algorithms before recognition. This paper presents a novel joint training framework for speech separation and recognition. The key idea is to concatenate a deep neural network (DNN) based speech separation frontend and a DNN-based acoustic model to build a larger neural network, and jointly adjust the weights in each module. This way, the separation fron-tend is able to provide enhanced speech desired by the acoustic model and the acoustic model can guide the separation frontend to produce more discriminative enhancement. In addition, we apply sequence training to the jointly trained DNN so that the linguistic information contained in the acoustic and language models can be back-propagated to influence the separation frontend at the training stage. To further improve the robustness, we add more noise- and reverberation-robust features for acoustic modeling. At the test stage, utterance-level unsupervised adaptation is performed to adapt the jointly trained network by learning a linear transformation of the input of the separation frontend. The resulting sequence-discriminative jointly-trained multistream system with run-time adaptation achieves 10.63% average word error rate (WER) on the test set of the reverberant and noisy CHiME-2 dataset (task-2), which represents the best performance on this dataset and a 22.75% error reduction over the best existing method.
- Published
- 2016
70. Two-stage Deep Learning for Noisy-reverberant Speech Enhancement
- Author
-
DeLiang Wang, Zhong-Qiu Wang, and Yan Zhao
- Subjects
Reverberation ,Acoustics and Ultrasonics ,Noise measurement ,Computer science ,business.industry ,Speech recognition ,Noise reduction ,Deep learning ,Speaker recognition ,Article ,Speech enhancement ,Background noise ,Computational Mathematics ,Noise ,Computer Science (miscellaneous) ,Artificial intelligence ,Electrical and Electronic Engineering ,business - Abstract
In real-world situations, speech reaching our ears is commonly corrupted by both room reverberation and background noise. These distortions are detrimental to speech intelligibility and quality, and also pose a serious problem to many speech-related applications, including automatic speech and speaker recognition. In order to deal with the combined effects of noise and reverberation, we propose a two-stage strategy to enhance corrupted speech, where denoising and dereverberation are conducted sequentially using deep neural networks. In addition, we design a new objective function that incorporates clean phase during model training to better estimate spectral magnitudes, which would in turn yield better phase estimates when combined with iterative phase reconstruction. The two-stage model is then jointly trained to optimize the proposed objective function. Systematic evaluations and comparisons show that the proposed algorithm improves objective metrics of speech intelligibility and quality substantially, and significantly outperforms previous one-stage enhancement systems.
- Published
- 2018
71. Integrating Spectral and Spatial Features for Multi-Channel Speaker Separation
- Author
-
DeLiang Wang and Zhong-Qiu Wang
- Subjects
business.industry ,Computer science ,Separation (aeronautics) ,0202 electrical engineering, electronic engineering, information engineering ,020206 networking & telecommunications ,020201 artificial intelligence & image processing ,Computer vision ,02 engineering and technology ,Artificial intelligence ,business ,Multi channel - Published
- 2018
72. Robust TDOA Estimation Based on Time-Frequency Masking and Deep Neural Networks
- Author
-
DeLiang Wang, Xueliang Zhang, and Zhong-Qiu Wang
- Subjects
0209 industrial biotechnology ,Computer science ,business.industry ,Pattern recognition ,02 engineering and technology ,Time frequency masking ,Multilateration ,030507 speech-language pathology & audiology ,03 medical and health sciences ,020901 industrial engineering & automation ,Deep neural networks ,Artificial intelligence ,0305 other medical science ,business - Published
- 2018
73. Alternative Objective Functions for Deep Clustering
- Author
-
Jonathan Le Roux, John R. Hershey, and Zhong-Qiu Wang
- Subjects
Scheme (programming language) ,Network architecture ,Linear programming ,business.industry ,Computer science ,Inference ,020206 networking & telecommunications ,02 engineering and technology ,Machine learning ,computer.software_genre ,030507 speech-language pathology & audiology ,03 medical and health sciences ,0202 electrical engineering, electronic engineering, information engineering ,Symmetric matrix ,Artificial intelligence ,0305 other medical science ,Cluster analysis ,business ,computer ,computer.programming_language - Abstract
The recently proposed deep clustering framework represents a significant step towards solving the cocktail party problem. This study proposes and compares a variety of alternative objective functions for training deep clustering networks. In addition, whereas the original deep clustering work relied on k-means clustering for test-time inference, here we investigate inference methods that are matched to the training objective. Furthermore, we explore the use of an improved chimera network architecture for speech separation, which combines deep clustering with mask-inference networks in a multiobjective training scheme. The deep clustering loss acts as a regularizer while training the end-to-end mask inference network for best separation. With further iterative phase reconstruction, our best proposed method achieves a state-of-the-art 11.5 dB signal-to-distortion ratio (SDR) result on the publicly available wsj0-2mix dataset, with a much simpler architecture than the previous best approach.
- Published
- 2018
74. Mask Weighted Stft Ratios for Relative Transfer Function Estimation and ITS Application to Robust ASR
- Author
-
DeLiang Wang and Zhong-Qiu Wang
- Subjects
Masking (art) ,Beamforming ,Frequency band ,Covariance matrix ,Computer science ,Microphone ,Wiener filter ,Short-time Fourier transform ,Direction of arrival ,020206 networking & telecommunications ,02 engineering and technology ,030507 speech-language pathology & audiology ,03 medical and health sciences ,symbols.namesake ,Fourier transform ,Generalized eigenvector ,Robustness (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,symbols ,0305 other medical science ,Algorithm ,Eigendecomposition of a matrix - Abstract
Deep learning based single-channel time-frequency (T-F) masking has shown considerable potential for beamforming and robust ASR. This paper proposes a simple but novel relative transfer function (RTF) estimation algorithm for microphone arrays, where the RTF between a reference signal and a non-reference signal at each frequency band is estimated as a weighted average of the ratios of the two STFT (short-time Fourier transform) coefficients of the speech-dominant T-F units. Similarly, the noise covariance matrix is estimated from noise-dominant T-F units. An MVDR beamformer is then constructed for robust ASR. Experiments on the two- and six-channel track of the CHiME-4 challenge show consistent improvement over a weighted delay-and-sum (WDAS) beamformer, a generalized eigenvector beamformer, a parameterized multi-channel Wiener filter, an MVDR beamformer based on conventional direction of arrival (DOA) estimation, and two MVDR beamformers both based on eigendecomposition.
- Published
- 2018
75. Multi-Channel Deep Clustering: Discriminative Spectral and Spatial Embeddings for Speaker-Independent Speech Separation
- Author
-
John R. Hershey, Jonathan Le Roux, and Zhong-Qiu Wang
- Subjects
Microphone array ,Computer science ,business.industry ,Phase (waves) ,020206 networking & telecommunications ,Pattern recognition ,02 engineering and technology ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Discriminative model ,0202 electrical engineering, electronic engineering, information engineering ,Artificial intelligence ,0305 other medical science ,business ,Cluster analysis ,Spatial analysis - Abstract
The recently-proposed deep clustering algorithm represents a fundamental advance towards solving the cocktail party problem in the single-channel case. When multiple microphones are available, spatial information can be leveraged to differentiate signals from different directions. This study combines spectral and spatial features in a deep clustering framework so that the complementary spectral and spatial information can be simultaneously exploited to improve speech separation. We find that simply encoding inter-microphone phase patterns as additional input features during deep clustering provides a significant improvement in separation performance, even with random microphone array geometry. Experiments on a spatial-ized version of the wsj0-2mix dataset show the strong potential of the proposed algorithm for speech separation in reverberant environments.
- Published
- 2018
76. End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction
- Author
-
Jonathan Le Roux, Zhong-Qiu Wang, John R. Hershey, and DeLiang Wang
- Subjects
Masking (art) ,FOS: Computer and information sciences ,Sound (cs.SD) ,Computer science ,Phase (waves) ,Inverse ,Machine Learning (stat.ML) ,02 engineering and technology ,Signal ,Computer Science - Sound ,Machine Learning (cs.LG) ,030507 speech-language pathology & audiology ,03 medical and health sciences ,symbols.namesake ,Statistics - Machine Learning ,Audio and Speech Processing (eess.AS) ,0202 electrical engineering, electronic engineering, information engineering ,FOS: Electrical engineering, electronic engineering, information engineering ,Computer Science - Computation and Language ,Series (mathematics) ,Short-time Fourier transform ,020206 networking & telecommunications ,Function (mathematics) ,Computer Science - Learning ,Fourier transform ,symbols ,0305 other medical science ,Algorithm ,Computation and Language (cs.CL) ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
This paper proposes an end-to-end approach for single-channel speaker-independent multi-speaker speech separation, where time-frequency (T-F) masking, the short-time Fourier transform (STFT), and its inverse are represented as layers within a deep network. Previous approaches, rather than computing a loss on the reconstructed signal, used a surrogate loss based on the target STFT magnitudes. This ignores reconstruction error introduced by phase inconsistency. In our approach, the loss function is directly defined on the reconstructed signals, which are optimized for best separation. In addition, we train through unfolded iterations of a phase reconstruction algorithm, represented as a series of STFT and inverse STFT layers. While mask values are typically limited to lie between zero and one for approaches using the mixture phase for reconstruction, this limitation is less relevant if the estimated magnitudes are to be used together with phase reconstruction. We thus propose several novel activation functions for the output layer of the T-F masking, to allow mask values beyond one. On the publicly-available wsj0-2mix dataset, our approach achieves state-of-the-art 12.6 dB scale-invariant signal-to-distortion ratio (SI-SDR) and 13.1 dB SDR, revealing new possibilities for deep learning based phase reconstruction and representing a fundamental progress towards solving the notoriously-hard cocktail party problem., Comment: Submitted to Interspeech 2018
- Published
- 2018
- Full Text
- View/download PDF
77. The Overview in Safety Review of Human Factors Engineering and Control Room Design in Chinese AP1000 Nuclear Power Plant
- Author
-
Zhong-Qiu Wang, Jing-Bin Liu, Qi Wu, Yun-Bo Zhang, and Yan Feng
- Subjects
Workstation ,Habitability ,law ,Computer science ,Process (engineering) ,Nuclear power plant ,Task analysis ,Systems engineering ,Plan (drawing) ,Control room ,law.invention ,Verification and validation - Abstract
The first AP1000 nuclear power plant is constructed in SanMen county in Zhejiang Province in China, which has creative and distinctive design characteristics. Human factors engineering disciplines are applied to the design of the AP1000. In addition to the elements of the program review model, the minimum inventory of controls, displays, and alarms present in the main control room and at the remote shutdown workstation. These contents mentioned above are reviewed according to standards, rules and regulations. This article introduces review process and several important issues during the reviewing process. These issues include many elements, such as operating experience review, task analysis, human system interface design, verification and validation, and so on. This paper emphasizes on the main control room design (including environment and layout). Background noise in the actual main control room may exceed too much to design value. Another important issue is about main control room habitability systems (VES) changes to satisfy post-actuation performance requirements in AP1000 design change proposal. All the wall panel displays will be closed in this design change proposal. Review process and proposal are described in the paper. Since the AP1000 unit in China is the first constructed all over the world, verification and validation are especially important and necessary. The plan of verification and validation, the results summary report of verification and validation, the report of human engineering discrepancies are focus attention. Verification and validation in human factors engineering include HSI task support verification, HFE design verification and integrated system validation. This paper introduces the review process of verification and validation, the issues found in verification and validation.
- Published
- 2017
78. A Study on Nuclear Plant Safety I&C System Verification and Validation Relevant Regulatory Standards
- Author
-
Jingbin Liu, Zhong-Qiu Wang, Yan Lu, Ning Qiao, and Yan Feng
- Subjects
Engineering ,business.industry ,Control system ,Computer software ,System verification ,Instrumentation (computer programming) ,Nuclear plant ,business ,Reliability engineering - Abstract
This article focuses on the nuclear plant safety I&C system Verification and Validation (referred to as V&V) which are through whole life cycle. This paper discusses three issues: the domestic and foreign regulatory standards involved in the Verification and Validation process, requirements in the review process and how suppliers meet their requirements in the practice. The Verification and Verification process of the safety-level instrument and control system adopts the two-step method, namely platform software and application software. V&V activities are carried out by suppliers using review and evaluation, special analysis and software testing. During the review, the focus was on what measures were taken to ensure the independence of the Verification and Validation process, software integrity-level and required Verification and Validation activities to cover the entire life cycle of the software.
- Published
- 2017
79. Research on Special Analysis of Verification and Validation in Nuclear Power Instrument and Control System
- Author
-
Jing-Bin Liu, Zhong-Qiu Wang, Yun-Bo Zhang, Ning Qiao, and Yan Feng
- Subjects
Computer science ,business.industry ,Control system ,Computer software ,Verification ,Instrumentation (computer programming) ,Nuclear power ,business ,Software verification ,Reliability engineering ,Verification and validation - Abstract
At present, there is still lack of detailed software V&V guidance standards in China, while a number of US nuclear power units and I&C platform are introduced and applied. So the software verification and validation work in our country usually cited the methods in IEEE 1012. With reference to the requirements of IEEE 1012, the V&V process of the software can be mainly divided into three forms: audit evaluation, special analysis and testing. This paper focuses on these parts and gives a detailed description and annotations of the technical methods and their life cycle stages in IEEE 1012, which cover multiple V&V phases. At the same time, the author puts forward his own understanding of the special analysis approach and procedure, such as criticality analysis, interface analysis, traceability analysis, hazard analysis, risk analysis and security analysis, and gives his own experience and related recommendations.
- Published
- 2017
80. A two-stage algorithm for noisy and reverberant speech enhancement
- Author
-
Zhong-Qiu Wang, Yan Zhao, and DeLiang Wang
- Subjects
Reverberation ,Voice activity detection ,Noise measurement ,Computer science ,Speech recognition ,Noise reduction ,Speech coding ,Acoustic model ,02 engineering and technology ,Intelligibility (communication) ,Linear predictive coding ,Speaker recognition ,Speech processing ,Background noise ,Speech enhancement ,030507 speech-language pathology & audiology ,03 medical and health sciences ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Active listening ,0305 other medical science - Abstract
In daily listening environments, speech is commonly corrupted by room reverberation and background noise. These distortions are detrimental to speech intelligibility and quality, and also severely degrade the performance of automatic speech and speaker recognition systems. In this paper, we propose a two-stage algorithm to deal with the confounding effects of noise and reverberation separately, where denoising and dereverberation are conducted sequentially using deep neural networks. In addition, we design a new objective function that incorporates clean phase information during training. As the objective function emphasizes more important time-frequency (T-F) units, better estimated magnitude is obtained during testing. By jointly training the two-stage model to optimize the proposed objective function, our algorithm improves objective metrics of speech intelligibility and quality significantly, and substantially outperforms one-stage enhancement baselines.
- Published
- 2017
81. A speech enhancement algorithm by iterating single- and multi-microphone processing and its application to robust ASR
- Author
-
Zhong-Qiu Wang, DeLiang Wang, and Xueliang Zhang
- Subjects
Masking (art) ,Beamforming ,Microphone array ,Voice activity detection ,Microphone ,Noise (signal processing) ,Computer science ,Speech recognition ,Word error rate ,020206 networking & telecommunications ,02 engineering and technology ,Speech processing ,Speech enhancement ,030507 speech-language pathology & audiology ,03 medical and health sciences ,0202 electrical engineering, electronic engineering, information engineering ,0305 other medical science - Abstract
We propose a speech enhancement algorithm based on single- and multi-microphone processing techniques. The core of the algorithm estimates a time-frequency mask which represents the target speech and use masking-based beamforming to enhance corrupted speech. Specifically, in single-microphone processing, the received signals of a microphone array are treated as individual signals and we estimate a mask for the signal of each microphone using a deep neural network (DNN). With these masks, in multi-microphone processing, we calculate a spatial covariance matrix of noise and steering vector for beamforming. In addition, we propose a masking-based post-filter to further suppress the noise in the output of beamforming. Then, the enhanced speech is sent back to DNN for mask re-estimation. When these steps are iterated for a few times, we obtain the final enhanced speech. The proposed algorithm is evaluated as a frontend for automatic speech recognition (ASR) and achieves a 5.05% average word error rate (WER) on the real environment test set of CHiME-3, outperforming the current best algorithm by 13.34%.
- Published
- 2017
82. Speech emotion recognition based on Gaussian Mixture Models and Deep Neural Networks
- Author
-
Keith W. Godin, Zhong-Qiu Wang, and Ivan Tashev
- Subjects
Artificial neural network ,Computer science ,business.industry ,Time delay neural network ,Speech recognition ,020206 networking & telecommunications ,02 engineering and technology ,Machine learning ,computer.software_genre ,Mixture model ,User experience design ,0202 electrical engineering, electronic engineering, information engineering ,Systems architecture ,Feature (machine learning) ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Extreme learning machine ,Spoken dialog systems - Abstract
Recognition of speaker emotion during interaction in spoken dialog systems can enhance the user experience, and provide system operators with information valuable to ongoing assessment of interaction system performance and utility. Interaction utterances are very short, and we assume the speaker's emotion is constant throughout a given utterance. This paper investigates combinations of a GMM-based low-level feature extractor with a neural network serving as a high level feature extractor. The advantage of this system architecture is that it combines the fast developing neural network-based solutions with the classic statistical approaches applied to emotion recognition. Experiments on a Mandarin data set compare different solutions under the same or close conditions.
- Published
- 2017
83. Discussion About Issues of Human Error in Digital Control Room of NPP
- Author
-
Zhong-Qiu Wang, Yin-Hui Guo, Jing-Bin Liu, and Yan Feng
- Subjects
Engineering ,business.industry ,Human error ,Control engineering ,Task (project management) ,law.invention ,ALARM ,Operator (computer programming) ,Mode (computer interface) ,law ,Digital human ,Nuclear power plant ,Digital control ,business ,Simulation - Abstract
This article defines and classifies human errors simply, and also analyzes representative human errors in nuclear power plant. This article also describes the difference between digital control room and traditional analog control room of nuclear plant, such as composition, information display, procedure, alarm system and operate mode. This article introduces human characteristics in digital control room including the role of the operator, the task load of operators, operator’s ability and experience, and digital human system interface. A example is given for human issues in digital control room. And the article simply studies and discusses preventive countermeasures in digital control system of nuclear plant at last. Since in preventing the human errors have common measures, but are also very different, much research work has been carried out.
- Published
- 2017
84. Qualification Test Standards Research of Electrical Equipment Important to Safety of Nuclear Power Plants
- Author
-
Jing-Bin Liu, Yun-Bo Zhang, Zhong-Qiu Wang, Yin-Hui Guo, Ning Qiao, and Yan Feng
- Subjects
Measure (data warehouse) ,Engineering ,business.industry ,International standard ,Nuclear engineering ,Nuclear power ,Reliability engineering ,law.invention ,Test (assessment) ,law ,Electrical equipment ,Nuclear power plant ,Equipment Qualification ,business ,Reliability (statistics) - Abstract
Safety digital I&C system has been gradually used to the new nuclear power plant I&C system or renovation project of the nuclear power plant. Nuclear equipment reliability directly affects the safety operation of nuclear power plant, so the safety digital I&C system should be carried out Equipment qualification (EQ) in accordance with the relevant regulation and standards. EQ test is effective measure to prevent the common failure caused by operating conditions and environmental conditions. And also can meet the single failure criterion of nuclear I&C equipment. Type testing is still method of choice for the safety electrical equipment. However, in terms of safety electrical EQ, multiple international standard system lack of detailed guide. This paper analyzes the required qualification standards. Combined with the international EQ experiences then put forward requirements of qualification test items, test levels and qualification process.
- Published
- 2017
85. The Several Issues in Safety Review of Digital Control System in Chinese Nuclear Power Plant
- Author
-
Yan Feng, Jing-Bin Liu, Zhong-Qiu Wang, Yun-Bo Zhang, and Yin-Hui Guo
- Subjects
Structure (mathematical logic) ,Engineering ,Configuration management ,business.industry ,Data_CODINGANDINFORMATIONTHEORY ,Key issues ,Civil engineering ,law.invention ,Risk analysis (engineering) ,law ,Nuclear power plant ,Digital control ,business ,Testability - Abstract
With the extensive application of digital control system (DCS) in Chinese nuclear power plant, the safety of DCS attracts people’s attention. First of all, this paper briefly introduces the overall structure of DCS. Then it summarizes the several key issues about DCS from the perspective of nuclear safety review: the single failure criterion, testability, diversity, software verification and validation (V&V), configuration management, which are common issues in the safety review. These issues are analyzed according to the relevant regulation and standards. At last, some solutions are given to these common issues, and some suggestions are made for future review.
- Published
- 2017
86. The Independence of Safety Digital I&C System in Nuclear Power Plant
- Author
-
Zhong-Qiu Wang, Yin-Hui Guo, Xiang Jia, and Yun-Bo Zhang
- Subjects
Digital instrumentation ,Computer science ,media_common.quotation_subject ,Nuclear engineering ,Control (management) ,Independence ,law.invention ,Electrical isolation ,law ,Physical separation ,Nuclear power plant ,Systems engineering ,Isolation (database systems) ,Reliability (statistics) ,media_common - Abstract
Independence is one way to improve the reliability of digital instrumentation and control (I&C) system in nuclear power plant, and relevant regulation and standards give clear requirements about independence. Based on these regulation and standards, this paper briefly introduces the three means to achieve independence: electrical isolation, physical separation, communication isolation. Then it summarizes five aspects of the independence of digital I&C system. It is useful to the design and review of the digital I&C system.
- Published
- 2017
87. Surf Crest Tracking Algorithm in Wave Image
- Author
-
Yan Xu, Zhong Qiu Wang, Hui Liu, and Hao Cui
- Subjects
Engineering ,business.industry ,Wind wave ,Process (computing) ,Point (geometry) ,Crest ,General Medicine ,Function (mathematics) ,Focus (optics) ,Tracking (particle physics) ,business ,Algorithm ,Image (mathematics) - Abstract
The non-contact ocean wave observation based on shore station could be impacted on high accuracy of surf crest 2D coordinate in wave images. Focus on previous disadvantages such as lower speed and accuracy, a novel surf crest tracking algorithm is represented with crest collapse function which could terminate the tracking process through end point adjustment. The algorithm could not only process the concerned part of the wave image automatically with higher speed, but also record 2D coordinate of every point on the concerned surf crest and the connection between every two contiguous points accurately, which are very practicable in the non-contact ocean wave observation.
- Published
- 2013
88. Research of Frequency Band by Multiple-Mode Extended for Longitudinal Vibration Acoustic Transducer
- Author
-
Su Yu Wang, Zhong Qiu Wang, Teng Li, and Hong Ru Wang
- Subjects
Coupling ,Engineering ,Field (physics) ,business.industry ,Frequency band ,Acoustics ,Resonance ,General Medicine ,Vibration ,Quality (physics) ,Transducer ,Computer Science::Sound ,Broadband ,business ,Computer Science::Formal Languages and Automata Theory - Abstract
The problem of frequency band of acoustic transducer is the key research field. By summarizing of Tonpilz and Janus-type transducer influence factors,such as transducer structure type, geometry parameters, the rule of redressing the mechanical quality factor Qm is investigated, and the trend of extending the frequency band by multi-mode coupling is got. By analyzing the multi-modal forms of vibration of double and triple resonance resonant, several types of extending frequency band of transducer are categorized under this forms. The advantages and disadvantages are investigated by general theory analysis of extending longitudinal vibration transducer, and the state of the art and the applications are displayed. So that the researchers are easier to understand the transducer, and can offer ideas for new design.
- Published
- 2013
89. Ultra-Short Baseline Positioning System and Error Analysis based on Multi-Element Stereo Array
- Author
-
Hong-ru, WANG, primary, Zhong-qiu, WANG, additional, Suiping, QI, additional, Tong, HU, additional, and Jing, ZOU, additional
- Published
- 2018
- Full Text
- View/download PDF
90. Analysis of Separation and Dislocation Characteristics of Layered Roof in the Mined-Out Areas
- Author
-
Shuren Wang and Zhong Qiu Wang
- Subjects
Materials science ,Separation (aeronautics) ,Cohesion (geology) ,Magnetic dip ,Geometry ,Geotechnical engineering ,General Medicine ,Gradual increase ,Arch ,Dislocation ,Roof ,Symmetry (physics) - Abstract
The computational model based on the shallow mined-out areas was built by using FLAC3D, and the separation and dislocation characteristics of the layered roof in the mined-out areas were analyzed under different conditions. The results showed that the maximum value of the layered roof separation increased with the increase of the mining width, and that value decreased with the increase of lateral pressure and the cohesion between layers. The separation curves of the layered roof showed the shape variation from the symmetry flat arch to the asymmetric steeple arch and then to the linear form, the maximum separation value of the layered roof exhibited a three-stage tendency of a gradual increase, a steep reduction and a slight change, and the maximum dislocation value of the layered roof presented a two-stage tendency of firstly slow growth and then rapid growth with the increase of the dip angle of coal seam.
- Published
- 2012
91. Robust speech recognition from ratio masks
- Author
-
DeLiang Wang and Zhong-Qiu Wang
- Subjects
Artificial neural network ,Computer science ,business.industry ,Speech recognition ,Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) ,Pattern recognition ,010501 environmental sciences ,01 natural sciences ,Convolutional neural network ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Computer Science::Sound ,Robustness (computer science) ,Artificial intelligence ,0305 other medical science ,business ,Utterance ,0105 earth and related environmental sciences - Abstract
Robustness against noise is crucial for automatic speech recognition systems in real-world environments. In this paper, we propose a novel approach that performs robust ASR by directly recognizing ratio masks. In the proposed approach, a deep neural network (DNN) is first trained to estimate the ideal ratio mask (IRM) from a noisy utterance and then a convolutional neural network (CNN) is employed to recognize estimated IRMs. The proposed approach has been evaluated on the TIDigits corpus, and the results demonstrate that direct recognition of ratio masks outperforms direct recognition of binary masks and traditional MMSE-HMM based method for robust ASR.
- Published
- 2016
92. Phoneme-specific speech separation
- Author
-
DeLiang Wang, Zhong-Qiu Wang, and Yan Zhao
- Subjects
Audio mining ,Voice activity detection ,business.industry ,Computer science ,Speech recognition ,Speech coding ,Acoustic model ,PSQM ,Intelligibility (communication) ,computer.software_genre ,Speech processing ,Linear predictive coding ,ComputingMethodologies_ARTIFICIALINTELLIGENCE ,01 natural sciences ,030507 speech-language pathology & audiology ,03 medical and health sciences ,ComputingMethodologies_PATTERNRECOGNITION ,0103 physical sciences ,Artificial intelligence ,Language model ,0305 other medical science ,business ,010301 acoustics ,computer ,Natural language processing - Abstract
Speech separation or enhancement algorithms seldom exploit information about phoneme identities. In this study, we propose a novel phoneme-specific speech separation method. Rather than training a single global model to enhance all the frames, we train a separate model for each phoneme to process its corresponding frames. A robust ASR system is employed to identify the phoneme identity of each frame. This way, the information from ASR systems and language models can directly influence speech separation by selecting a phoneme-specific model to use at the test stage. In addition, phoneme-specific models have fewer variations to model and do not exhibit the data imbalance problem. The improved enhancement results can in turn help recognition. Experiments on the corpus of the second CHiME speech separation and recognition challenge (task-2) demonstrate the effectiveness of this method in terms of objective measures of speech intelligibility and quality, as well as recognition performance.
- Published
- 2016
93. XPA A23G polymorphism and susceptibility to cancer: a meta-analysis
- Author
-
Xiao-Lin Cao, Zhen Zhang, Xinliang Pan, Zhong-Qiu Wang, Dapeng Lei, Jun Liu, and Tong Jin
- Subjects
Oncology ,medicine.medical_specialty ,Xeroderma pigmentosum ,Genotype ,Colorectal cancer ,Biology ,Bioinformatics ,Polymorphism, Single Nucleotide ,Breast cancer ,Gene Frequency ,Neoplasms ,Internal medicine ,Genetic model ,Odds Ratio ,Genetics ,medicine ,Humans ,Genetic Predisposition to Disease ,Allele ,Lung cancer ,Molecular Biology ,Genetic Association Studies ,Head and neck cancer ,General Medicine ,Esophageal cancer ,medicine.disease ,Xeroderma Pigmentosum Group A Protein ,Case-Control Studies ,Publication Bias - Abstract
Xeroderma pigmentosum group A (XPA) participates in modulating recognition of DNA damage during the DNA nucleotide excision repair process. The XPA A23G polymorphism has been investigated in case-control studies to evaluate the cancer risk attributed to the variant, but the results were conflicting. To clarify the effect of XPA A23G polymorphism in cancer risk, we conducted a meta-analysis that included 30 published case-control studies. Overall, no significant association of XPA A23G variant with cancer susceptibility was observed for any genetic model. However, significant association was observed for colorectal cancer (GG vs. AA: OR = 1.68, 95% CI = 1.15-2.44; dominant genetic model GG + AG vs. AA: OR = 1.54, 95% CI = 1.08-1.17), for breast cancer an increased but non-significant risk was found (GG vs. AA: OR = 1.27, 95% CI = 0.98-1.66; dominant genetic model GG + AG vs. AA: OR = 1.27, 95% CI = 0.99-1.63), and for head and neck cancer an increased risk was observed in recessive model (OR = 1.19, 95% CI = 1.02-1.38), whereas for lung cancer a significant reduced risk was observed (GG vs. AA: OR = 0.77, 95% CI = 0.66–0.90; dominant genetic model GG + AG vs. AA: OR = 0.76, 95% CI = 0.66-0.87), it’s noting that in Asian population the inverse association was more apparent. In addition, in Asian population for esophageal cancer a significant decreased risk was also found in dominant genetic model (OR = 0.55; 95% CI = 0.43-0.70) and for head and neck cancer an increased risk was observed in dominant genetic model (OR = 1.51, 95% CI = 1.03-2.23). The meta-analysis suggested that the XPA A23G G allele is a low-penetrant risk factor for cancer development.
- Published
- 2012
94. Lymphangiogenesis and biological behavior in pancreatic carcinoma and other pancreatic tumors
- Author
-
Zhong-qiu Wang, Guojun Li, Xinhua Zhang, Mingmin Tong, Zhengcan Wu, Z Liu, and Jiang Wu
- Subjects
Adult ,Male ,Cancer Research ,Pathology ,medicine.medical_specialty ,Neuroendocrine tumors ,Biochemistry ,Article ,Metastasis ,D2-40 ,Sex Factors ,Pancreatic tumor ,lymhangiogenesis ,Genetics ,Lymphatic vessel ,medicine ,metastasis ,Humans ,Neoplasm Invasiveness ,Lymphangiogenesis ,Molecular Biology ,Aged ,Lymphatic Vessels ,business.industry ,Carcinoma ,Age Factors ,Middle Aged ,Prognosis ,medicine.disease ,Primary tumor ,Pancreatic Neoplasms ,Lymphatic system ,medicine.anatomical_structure ,Oncology ,Lymphatic Metastasis ,Molecular Medicine ,pancreatic tumor ,Female ,business ,Pancreas - Abstract
Lymphatic vessels in primary tumor tissue play an important role in lymphatic metastasis. Lymphatic metastasis of malignant neoplasms is significantly related to prognosis, influencing both recurrence and survival. The aim of this study was to investigate the correlation of intra-tumoral lymphatic vessel density (iLVD) and peri-tumoral lymphatic vessel density (pLVD) with biological behavior and prognostic parameters in pancreatic carcinoma (PC) and other pancreatic tumors. Lymphangiogenesis was examined using the D2-40 monoclonal antibody in 33 cases of PC, 7 neuroendocrine tumors of the pancreas (NETP), 7 solid pseudopapillary tumors of the pancreas (SPTP) and 3 cystadenomas of the pancreas (CP). Positively-stained microvessels were counted at magnification x400 in dense lymphatic vascular foci (hotspots). The LVD of PC was compared to 3 other pancreatic tumors. The relationships among the LVD, the extent of differentiation, lymphatic invasion, lymph node metastasis and other clinicopathological parameters of PC were analyzed. There was no difference in the iLVD among PC, NETP, SPTP and CP. The pLVD of NETP was markedly higher than that of PC, SPTP and CP. The pLVD of PC was significantly higher than that of SPTP and CP, but there was no difference between SPTP and CP. The pLVD of PC was significantly associated with the extent of differentiation, lymphatic invasion and lymph node metastasis, whereas it was not associated with age, gender, tumor size, tumor location and peri-pancreatic invasion. The iLVD of PC was not correlated with these clinicopathological parameters. There was no difference in iLVD and no marked difference in pLVD among the pancreatic tumors. Detection of pLVD is of greater importance than detecting iLVD in these pancreatic tumors, as pLVD can be utilized for the prediction of lymph node metastasis, thus aiding in the evaluation of patient prognosis.
- Published
- 2012
95. Combining spectral feature mapping and multi-channel model-based source separation for noise-robust automatic speech recognition
- Author
-
Zhong-Qiu Wang, Eric Fosler-Lussier, Andrew R. Plummer, Michael I. Mandel, Yanzhang He, and Deblin Bagchi
- Subjects
Front and back ends ,Beamforming ,Reverberation ,Noise ,Computer science ,Microphone ,Speech recognition ,Source separation ,Filter bank ,Speech processing - Abstract
Automatic Speech Recognition systems suffer from severe performance degradation in the presence of myriad complicating factors such as noise, reverberation, multiple speech sources, multiple recording devices, etc. Previous challenges have sparked much innovation when it comes to designing systems capable of handling these complications. In this spirit, the CHiME-3 challenge presents system builders with the task of recognizing speech in a real-world noisy setting wherein speakers talk to an array of 6 microphones in a tablet. In order to address these issues, we explore the effectiveness of first applying a model-based source separation mask to the output of a beamformer that combines the source signals recorded by each microphone, followed by a DNN-based front end spectral mapper that predicts clean filterbank features. The source separation algorithm MESSL (Model-based EM Source Separation and Localization) has been extended from two channels to multiple channels in order to meet the demands of the challenge. We report on interactions between the two systems, cross-cut by the use of a robust beamforming algorithm called BeamformIt. Evaluations of different system settings reveal that combining MESSL and the spectral mapper together on the baseline beamformer algorithm boosts the performance substantially.
- Published
- 2015
96. Flow Stress Determination of Aluminum Alloy 7050-T7451 Using Inverse Analysis Method
- Author
-
Jun Zhou, Jie Sun, Feng Jiang, Zhong Qiu Wang, and Jianfeng Li
- Subjects
Work (thermodynamics) ,Source code ,Materials science ,business.industry ,Mechanical Engineering ,media_common.quotation_subject ,Alloy ,Constitutive equation ,chemistry.chemical_element ,Structural engineering ,Function (mathematics) ,engineering.material ,Flow stress ,Finite element method ,chemistry ,Mechanics of Materials ,Aluminium ,engineering ,General Materials Science ,business ,media_common - Abstract
In the present study, two-dimensional orthogonal slot milling experiments in conjunction with an analytical-based computer code are used to determine flow stress data as a function of the high strains, strain rates and temperatures encountered in metal cutting. By using this method, the flow stress of Al7050-T7451 is modeled. Through the comparison of cutting forces between FEM and experiment, the FEM model using predicted flow stress can give precise cutting forces. The work of this paper provides a useful method for material constitutive equation modeling without doing large number of cutting experiment or expensive SHPB tests.
- Published
- 2010
97. A Study of Exit-Burr Formation Mechanism Using the Finite Element Method in Micro-Cutting of Aluminum Alloy
- Author
-
Jianfeng Li, Dong Lu, Jun Zhou, Zhong Qiu Wang, Jie Sun, and Yiming Rong
- Subjects
Materials science ,Burr height ,Mechanical Engineering ,Alloy ,Metallurgy ,chemistry.chemical_element ,Radius ,engineering.material ,Edge (geometry) ,Finite element method ,Burr formation ,Mechanism (engineering) ,chemistry ,Mechanics of Materials ,Aluminium ,engineering ,General Materials Science ,Composite material - Abstract
A burr formation process in micro-cutting of Al7075-T7451 was analyzed. Three stages of burr formation including steady-state cutting stage, pivoting stage, and burr formation stage are investigated. And the effects of uncut chip thickness, cutting speed and tool edge radius on the burr formation are studied. The simulation results show that the generation of negative shear zone is one of the prime reasons for burr formation. Uncut chip thickness has a significant effect on burr height; however, the cutting speed effect is minor. Unlike in conventional cutting, in micro-cutting the effect of tool edge radius on the burr geometry can no longer be neglected.
- Published
- 2008
98. The Influence of Material Constitutive Constants on Numerical Simulation of Orthogonal Cutting of Titanium Alloy Ti6Al4V
- Author
-
Jianfeng Li, Zhong Qiu Wang, Jie Sun, Jian Ling Chen, and Zhi Ping Xu
- Subjects
Work (thermodynamics) ,Materials science ,Computer simulation ,Mechanics of Materials ,Bar (music) ,Mechanical Engineering ,Cutting force ,Constitutive equation ,Titanium alloy ,Material constants ,General Materials Science ,Sensitivity (control systems) ,Composite material - Abstract
Johnson-Cook (JC) constitutive model is extensively used in the simulation of metal machining. There are several different sets of JC material constants for titanium alloy Ti6Al4V fitted by split-Hopkinson bar (SHPB) tests. However, few researches have been done to study their sensitivity on the behavior of cutting. In this work, four different sets of material constants were performed in a 2D numerical model to simulate the cutting process of titanium alloy Ti6Al4V. The effects of the four sets of material constants on the predicted cutting forces, chip morphology and temperature were studied. It is shown that all the considered process outputs are very sensitive to material constitutive constants. Some quantitive comparisons with experimental results reported in the literature were also made.
- Published
- 2008
99. Prediction of Residual Stress in Hard Turning of AISI 52100 Using 2D FEM
- Author
-
Zhong Qiu Wang, I. Al-Zkeri, Jianfeng Li, and Jun Sheng Sun
- Subjects
Work (thermodynamics) ,Steady state ,Materials science ,Residual stress ,Subroutine ,Distortion ,Metallurgy ,Mechanical engineering ,General Medicine ,Tribology ,Material properties ,Finite element method - Abstract
Residual stress on the machined surface and the subsurface is one of the most important factors that can influence the service quality of a component, such as fatigue life, tribological properties, and distortion. In this work, a 2D FEM model of AISI 52100 hard turning processes is setup. The material properties are widely selected to describe the material property more precisely. By using user subroutine named Konti-Cut, the steady state of cutting process is simulated and the cutting forces and residual stresses in this time are investigated. By comparing the cutting forces, the FEM model can gives quite good appliance with experimental data. And the basic relations between residual stresses and cutting parameter, tool geometry are drawn.
- Published
- 2007
100. Joint training of speech separation, filterbank and acoustic model for robust automatic speech recognition
- Author
-
DeLiang Wang and Zhong-Qiu Wang
- Subjects
Voice activity detection ,Computer science ,business.industry ,Speech recognition ,Word error rate ,Acoustic model ,Pattern recognition ,Speech processing ,Filter bank ,Speech enhancement ,Rule-based machine translation ,Robustness (computer science) ,Artificial intelligence ,business - Abstract
Robustness is crucial for automatic speech recognition systems in real-world environments. Speech enhancement/separation algorithms are normally used to enhance noisy speech before recognition. However, such algorithms typically introduce distortions unseen by acoustic models. In this study, we propose a novel joint training approach to reduce this distortion problem. At the training stage, we first concatenate a speech separation DNN, a filterbank and an acoustic model DNN to form a deeper network, and then jointly train all of them. This way, the separation frontend and filterbank can provide enhanced speech desired by the acoustic model. In addition, the linguistic information contained in the acoustic model can have a positive effect on the frontend and filberbank. Besides the commonly used log mel-spectrogram feature, we also add more robust features for acoustic modeling. Our system obtains 14.1% average word error rate on the noisy and reverberant CHIME-2 corpus (track 2), which outperforms the previous best result by 8.4% relatively.
- Published
- 2015
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.