Author: "Fan, Zhiyun" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Fan, Zhiyun"' showing total 26 results

Start Over Author "Fan, Zhiyun"

26 results on '"Fan, Zhiyun"'

1. SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR

Author: Fan, Zhiyun, Dong, Linhao, Zhang, Jun, Lu, Lu, and Ma, Zejun
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Multi-talker automatic speech recognition plays a crucial role in scenarios involving multi-party interactions, such as meetings and conversations. Due to its inherent complexity, this task has been receiving increasing attention. Notably, the serialized output training (SOT) stands out among various approaches because of its simplistic architecture and exceptional performance. However, the frequent speaker changes in token-level SOT (t-SOT) present challenges for the autoregressive decoder in effectively utilizing context to predict output sequences. To address this issue, we introduce a masked t-SOT label, which serves as the cornerstone of an auxiliary training loss. Additionally, we utilize a speaker similarity matrix to refine the self-attention mechanism of the decoder. This strategic adjustment enhances contextual relationships within the same speaker's tokens while minimizing interactions between different speakers' tokens. We denote our method as speaker-aware SOT (SA-SOT). Experiments on the Librispeech datasets demonstrate that our SA-SOT obtains a relative cpWER reduction ranging from 12.75% to 22.03% on the multi-talker test sets. Furthermore, with more extensive training, our method achieves an impressive cpWER of 3.41%, establishing a new state-of-the-art result on the LibrispeechMix dataset.
Published: 2024

2. Language-specific Acoustic Boundary Learning for Mandarin-English Code-switching Speech Recognition

Author: Fan, Zhiyun, Dong, Linhao, Shen, Chen, Liang, Zhenlin, Zhang, Jun, Lu, Lu, and Ma, Zejun
Subjects: Computer Science - Sound
Abstract: Code-switching speech recognition (CSSR) transcribes speech that switches between multiple languages or dialects within a single sentence. The main challenge in this task is that different languages often have similar pronunciations, making it difficult for models to distinguish between them. In this paper, we propose a method for solving the CSSR task from the perspective of language-specific acoustic boundary learning. We introduce language-specific weight estimators (LSWE) to model acoustic boundary learning in different languages separately. Additionally, a non-autoregressive (NAR) decoder and a language change detection (LCD) module are employed to assist in training. Evaluated on the SEAME corpus, our method achieves a state-of-the-art mixed error rate (MER) of 16.29% and 22.81% on the test_man and test_sge sets. We also demonstrate the effectiveness of our method on a 9000-hour in-house meeting code-switching dataset, where our method achieves a relatively 7.9% MER reduction.
Published: 2023

3. Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire

Author: Fan, Zhiyun, Liang, Zhenlin, Dong, Linhao, Liu, Yi, Zhou, Shiyu, Cai, Meng, Zhang, Jun, Ma, Zejun, and Xu, Bo
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: In multi-talker scenarios such as meetings and conversations, speech processing systems are usually required to segment the audio and then transcribe each segmentation. These two stages are addressed separately by speaker change detection (SCD) and automatic speech recognition (ASR). Most previous SCD systems rely solely on speaker information and ignore the importance of speech content. In this paper, we propose a novel SCD system that considers both cues of speaker difference and speech content. These two cues are converted into token-level representations by the continuous integrate-and-fire (CIF) mechanism and then combined for detecting speaker changes on the token acoustic boundaries. We evaluate the performance of our approach on a public real-recorded meeting dataset, AISHELL-4. The experiment results show that our method outperforms a competitive frame-level baseline system by 2.45% equal coverage-purity (ECP). In addition, we demonstrate the importance of speech content and speaker difference to the SCD task, and the advantages of conducting SCD on the token acoustic boundaries compared with conducting SCD frame by frame.
Published: 2022

4. Sequence-level Speaker Change Detection with Difference-based Continuous Integrate-and-fire

Author: Fan, Zhiyun, Dong, Linhao, Cai, Meng, Ma, Zejun, and Xu, Bo
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Speaker change detection is an important task in multi-party interactions such as meetings and conversations. In this paper, we address the speaker change detection task from the perspective of sequence transduction. Specifically, we propose a novel encoder-decoder framework that directly converts the input feature sequence to the speaker identity sequence. The difference-based continuous integrate-and-fire mechanism is designed to support this framework. It detects speaker changes by integrating the speaker difference between the encoder outputs frame-by-frame and transfers encoder outputs to segment-level speaker embeddings according to the detected speaker changes. The whole framework is supervised by the speaker identity sequence, a weaker label than the precise speaker change points. The experiments on the AMI and DIHARD-I corpora show that our sequence-level method consistently outperforms a strong frame-level baseline that uses the precise speaker change labels., Comment: Signal Processing Letters 2022
Published: 2022
Full Text: View/download PDF

5. Exploring wav2vec 2.0 on speaker verification and language identification

Author: Fan, Zhiyun, Li, Meng, Zhou, Shiyu, and Xu, Bo
Subjects: Computer Science - Sound, Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Wav2vec 2.0 is a recently proposed self-supervised framework for speech representation learning. It follows a two-stage training process of pre-training and fine-tuning, and performs well in speech recognition tasks especially ultra-low resource cases. In this work, we attempt to extend self-supervised framework to speaker verification and language identification. First, we use some preliminary experiments to indicate that wav2vec 2.0 can capture the information about the speaker and language. Then we demonstrate the effectiveness of wav2vec 2.0 on the two tasks respectively. For speaker verification, we obtain a new state-of-the-art result, Equal Error Rate (EER) of 3.61% on the VoxCeleb1 dataset. For language identification, we obtain an EER of 12.02% on 1 second condition and an EER of 3.47% on full-length condition of the AP17-OLR dataset. Finally, we utilize one model to achieve the unified modeling by the multi-task learning for the two tasks., Comment: Self-supervised, speaker verification, language identification, multi-task learning, wav2vec 2.0
Published: 2020

6. Speaker-aware speech-transformer

Author: Fan, Zhiyun, Li, Jie, Zhou, Shiyu, and Xu, Bo
Subjects: Computer Science - Computation and Language, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Recently, end-to-end (E2E) models become a competitive alternative to the conventional hybrid automatic speech recognition (ASR) systems. However, they still suffer from speaker mismatch in training and testing condition. In this paper, we use Speech-Transformer (ST) as the study platform to investigate speaker aware training of E2E models. We propose a model called Speaker-Aware Speech-Transformer (SAST), which is a standard ST equipped with a speaker attention module (SAM). The SAM has a static speaker knowledge block (SKB) that is made of i-vectors. At each time step, the encoder output attends to the i-vectors in the block, and generates a weighted combined speaker embedding vector, which helps the model to normalize the speaker variations. The SAST model trained in this way becomes independent of specific training speakers and thus generalizes better to unseen testing speakers. We investigate different factors of SAM. Experimental results on the AISHELL-1 task show that SAST achieves a relative 6.5% CER reduction (CERR) over the speaker-independent (SI) baseline. Moreover, we demonstrate that SAST still works quite well even if the i-vectors in SKB all come from a different data source other than the acoustic training set.
Published: 2020

7. Unsupervised pre-training for sequence to sequence speech recognition

Author: Fan, Zhiyun, Zhou, Shiyu, and Xu, Bo
Subjects: Computer Science - Sound, Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: This paper proposes a novel approach to pre-train encoder-decoder sequence-to-sequence (seq2seq) model with unpaired speech and transcripts respectively. Our pre-training method is divided into two stages, named acoustic pre-trianing and linguistic pre-training. In the acoustic pre-training stage, we use a large amount of speech to pre-train the encoder by predicting masked speech feature chunks with its context. In the linguistic pre-training stage, we generate synthesized speech from a large number of transcripts using a single-speaker text to speech (TTS) system, and use the synthesized paired data to pre-train decoder. This two-stage pre-training method integrates rich acoustic and linguistic knowledge into seq2seq model, which will benefit downstream automatic speech recognition (ASR) tasks. The unsupervised pre-training is finished on AISHELL-2 dataset and we apply the pre-trained model to multiple paired data ratios of AISHELL-1 and HKUST. We obtain relative character error rate reduction (CERR) from 38.24% to 7.88% on AISHELL-1 and from 12.00% to 1.20% on HKUST. Besides, we apply our pretrained model to a cross-lingual case with CALLHOME dataset. For all six languages in CALLHOME dataset, our pre-training method makes model outperform baseline consistently.
Published: 2019

8. SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR

Author: Fan, Zhiyun, primary, Dong, Linhao, additional, Zhang, Jun, additional, Lu, Lu, additional, and Ma, Zejun, additional
Published: 2024
Full Text: View/download PDF

9. Language-specific Boundary Learning for Improving Mandarin-English Code-switching Speech Recognition

Author: Fan, Zhiyun, primary, Dong, Linhao, additional, Shen, Chen, additional, Liang, Zhenlin, additional, Zhang, Jun, additional, Lu, Lu, additional, and Ma, Zejun, additional
Published: 2023
Full Text: View/download PDF

10. Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire

Author: Fan, Zhiyun, primary, Liang, Zhenlin, additional, Dong, Linhao, additional, Liu, Yi, additional, Zhou, Shiyu, additional, Cai, Meng, additional, Zhang, Jun, additional, Ma, Zejun, additional, and Xu, Bo, additional
Published: 2022
Full Text: View/download PDF

11. Sequence-Level Speaker Change Detection With Difference-Based Continuous Integrate-and-Fire

Author: Fan, Zhiyun, primary, Dong, Linhao, additional, Cai, Meng, additional, Ma, Zejun, additional, and Xu, Bo, additional
Published: 2022
Full Text: View/download PDF

12. Development of patch and spray formulations for enhancing topical delivery of sinomenine hydrochloride

Author: Li, Xinru, Li, Xiaoyan, Zhou, Yanxia, Liu, Yan, Guo, Ming, Zhu, Qingfen, Xie, Yuanchao, and Fan, Zhiyun
Published: 2010
Full Text: View/download PDF

13. Exploring wav2vec 2.0 on Speaker Verification and Language Identification

Author: Fan, Zhiyun, primary, Li, Meng, additional, Zhou, Shiyu, additional, and Xu, Bo, additional
Published: 2021
Full Text: View/download PDF

14. Two-Stage Pre-Training for Sequence to Sequence Speech Recognition

Author: Fan, Zhiyun, primary, Zhou, Shiyu, additional, and Xu, Bo, additional
Published: 2021
Full Text: View/download PDF

15. Syllable-Based Acoustic Modeling With Lattice-Free MMI for Mandarin Speech Recognition

Author: Li Yan, Xiaorui Wang, Fan Zhiyun, and Jie Li
Subjects: Noise measurement, Computer science, Speech recognition, Acoustic model, 020206 networking & telecommunications, 02 engineering and technology, Mutual information, Mandarin Chinese, language.human_language, 030507 speech-language pathology & audiology, 03 medical and health sciences, 0202 electrical engineering, electronic engineering, information engineering, Benchmark (computing), language, Syllable, 0305 other medical science, Hidden Markov model, Decoding methods
Abstract: Most automatic speech recognition (ASR) systems in past decades have used context-dependent (CD) phones as the fundamental acoustic units. However, these phone-based approaches lack an easy and efficient way for modeling long-term temporal dependencies. Compared with phone units, syllables span a longer time, typically several phones, thereby having more stable acoustic realizations. In this work, we aim to train a syllable-based acoustic model for Mandarin ASR with lattice-free maximum mutual information (LF-MMI) criterion. We expect that, the combination of longer linguistic units, the RNN-based model structure and the sequence-level objective function, can result in better modeling of long-term temporal acoustic variations. We make multiple modifications to improve the performance of syllable-based AM and benchmark our models on two large-scale databases. Experimental results show that the proposed syllable-based AM performs much better than the CD phone-based baseline, especially on noisy test sets, with faster decoding speed.
Published: 2021

16. Study on the model binary mixtures for actual EPS extracted from the activated sludge in MBR on membrane fouling

Author: Gao, Zhenfu, primary, Wang, Dong, additional, Fan, Zhiyun, additional, Liu, Ying, additional, and Wang, Zhan, additional
Published: 2021
Full Text: View/download PDF

17. Speaker-Aware Speech-Transformer

Author: Fan, Zhiyun, primary, Li, Jie, additional, Zhou, Shiyu, additional, and Xu, Bo, additional
Published: 2019
Full Text: View/download PDF

18. Neuroprotective Effects and Mechanisms of Zhenlong Xingnao Capsule in In Vivo and In Vitro Models of Hypoxia

Author: Xia Wei, Jibiao Wu, Sun Changhua, Na Liu, Fan Zhiyun, Zhu Qingfen, Mingqi Qiao, Defu Hu, Yang Wang, Lihua Xu, Peng Sun, Sheng Wei, and Zhao Yan
Subjects: 0301 basic medicine, Antioxidant, Zhenlong Xingnao Capsule, medicine.medical_treatment, Ischemia, Pharmacology, Neuroprotection, Superoxide dismutase, 03 medical and health sciences, chemistry.chemical_compound, 0302 clinical medicine, In vivo, medicine, Pharmacology (medical), cerebral ischemia-reperfusion injury, Original Research, chemistry.chemical_classification, reactive oxygen species, Reactive oxygen species, biology, lcsh:RM1-950, Hypoxia (medical), medicine.disease, 030104 developmental biology, lcsh:Therapeutics. Pharmacology, chemistry, 030220 oncology & carcinogenesis, middle cerebral artery occlusion model, Amino acid neurotransmitter, biology.protein, BV-2 cell, medicine.symptom
Abstract: Zhenlong Xingnao Capsule (ZXC) is a Tibetan medicine used to treat ischemic stroke. In this study, we determined the in vitro and in vivo effects of ZXC on reactive oxygen species (ROS) in a mouse BV-2 microglial cell hypoxia-reoxygenation and rat middle cerebral artery occlusion infarction models. We aimed to clarify the role of ZXC in cerebral ischemia protection; reveal amino acid neurotransmitter changes in the frontal cortex after drug intervention; determine mRNA and protein expression changes in Bcl-2, Bax, caspase-3, P38, and nuclear factor (NF)-кB in the frontal cortex and changes in antioxidant indices in the brain; and elucidate the mechanisms underlying ZXC action. After hypoxia-reoxygenation, ROS levels were significantly increased in BV-2 cells, and their levels decreased after treatment with ZXC. ZXC had protective effects on ischemic/anoxic injury in vitro and in vivo by downregulating the expressions of caspase-3 and NF-кB mRNA during ischemia and reperfusion and that of p38 and caspase-3 during acute ischemia and reperfusion as well as the steady-state levels of excitatory amino acids/inhibitory amino acids and by improving the total antioxidant capacity and total superoxide dismutase activities during ischemia. These findings provide new molecular evidence for the mechanisms underlying ZXC action.
Published: 2019
Full Text: View/download PDF

19. Kinematics simulation analysis of six-degree-of-freedom loading and unloading robot based on ADAMS

Author: Qinglai Wang, Linan Gong, Pang Zaixiang, and Fan Zhiyun
Subjects: Computer science, Control theory, Robot, Kinematics
Published: 2019

20. Neuroprotective Effects and Mechanisms of Zhenlong Xingnao Capsule in In Vivo and In Vitro Models of Hypoxia

Author: Wei, Xia, primary, Zhu, Qingfen, additional, Liu, Na, additional, Xu, Lihua, additional, Wei, Sheng, additional, Fan, Zhiyun, additional, Sun, Changhua, additional, Zhao, Yan, additional, Qiao, Mingqi, additional, Wu, Jibiao, additional, Hu, Defu, additional, Wang, Yang, additional, and Sun, Peng, additional
Published: 2019
Full Text: View/download PDF

21. Kinematics simulation analysis of six-degree-of-freedom loading and unloading robot based on ADAMS

Author: Pang, Zaixiang, primary, Wang, Qinglai, primary, Fan, Zhiyun, primary, and Gong, Linan, primary
Published: 2019
Full Text: View/download PDF

22. Effects of reclaimed water irrigation on mineral elements content of turf

Author: Man Da, Fan Zhiyun, Hou Guohua, Chang Zhi-hui, Song Gui-long, Wang Qiong, and Han Lie-bao
Subjects: Irrigation, Agronomy, Festuca, biology, Tap water, Environmental engineering, Environmental science, Heavy metals, Sewage treatment, Field survey, biology.organism_classification, Effluent, Reclaimed water
Abstract: Effects of reclaimed water irrigation on mineral elements of turf were evaluated by conducting plot experiment and field survey from Beijing Beixiaohe Wastewater Treatment Plant and Beijing Gaobeidian Wastewater Treatment Plant. Results showed that comparing with Tap Water irrigation, turf leaf heavy metals, Ca2+, Cl− and Na+ content are apparently higher after long-termed common tertiary water and secondary effluent irrigation, but there is no negative effect on turf growth. There is no significant impact on turf leaf N, P content after both long-term and short-term irrigation by reclaimed water. In the case of the tolerance and absorption of soil salt, Festuca arundinacea.L.has more significant advantage than Poa pratensis.L.. But Poa pratensis.L is good for tolerant and absorbing of heavy-metal.
Published: 2011

23. Influence of reclaimed water irrigation on turf growth and soil

Author: Song Gui-long, Fan Zhiyun, Wang Qiong, Man Da, Han Lie-bao, Hou Guohua, and Chang Zhi-hui
Subjects: chemistry.chemical_classification, Irrigation, Nutrient, Agronomy, Wastewater, chemistry, Tap water, Soil pH, Environmental engineering, Environmental science, Organic matter, Sewage treatment, Reclaimed water
Abstract: Experiment of turf grass irrigation by reclaimed water was carried out at Beixiaohe Wastewater Treatment Plant in Beijing. Four kinds of reclaimed water: secondary treated wastewater-sec, ultra-filtration-uf, reverse osmosis-ro and micro-filtration-mf (general and deep tertiary treated wastewater) and tap water(control) were used to irrigate six kinds of common turf grasses(Kentucky Bluegrass, Tall Fescue, Perennial Ryegrass, Creeping Bentgrass, Zoysiagrass and Buffalograss). Results showed that compared with tap water irrigation: 1) Secondary treated wastewater and general tertiary treated wastewater-uf can promote growth of turf grass evidently, increasing the chlorophyll contents significantly and enhancing adversity resistance appreciably, but deep tertiary treated wastewater-ro and mf has no these effects;2) Except mf, reclaimed water irrigation has no significant effect on turf nutrient factors. After mf irrigation, Organic Matter of turf reduces sharply, and NH 4 +, NO 2 elevated significantly; 3) There is no significant impact on soil pH& EC, soil heavy metal content, soil Cl−, S2− and phenol content after reclaimed water irrigation.
Published: 2011

24. Chlorination Disinfection By-Products and Its Control in Drinking Water

Author: Fan, Zhiyun, primary, Wang, Shaopo, additional, and Hou, Guohua, additional
Published: 2010
Full Text: View/download PDF

25. Influence of reclaimed water irrigation on turf growth and soil.

Author: Hou Guohua, Man Da, Wang Qiong, Chang Zhihui, Song Guilong, Han Liebao, and Fan Zhiyun
Published: 2011
Full Text: View/download PDF

26. Effects of reclaimed water irrigation on mineral elements content of turf.

Author: Hou Guohua, Wang Qiong, Man Da, Song Guilong, Chang Zhihui, Han Liebao, and Fan Zhiyun
Published: 2011
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

26 results on '"Fan, Zhiyun"'

1. SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR

2. Language-specific Acoustic Boundary Learning for Mandarin-English Code-switching Speech Recognition

3. Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire

4. Sequence-level Speaker Change Detection with Difference-based Continuous Integrate-and-fire

5. Exploring wav2vec 2.0 on speaker verification and language identification

6. Speaker-aware speech-transformer

7. Unsupervised pre-training for sequence to sequence speech recognition

8. SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR

9. Language-specific Boundary Learning for Improving Mandarin-English Code-switching Speech Recognition

10. Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire

11. Sequence-Level Speaker Change Detection With Difference-Based Continuous Integrate-and-Fire

12. Development of patch and spray formulations for enhancing topical delivery of sinomenine hydrochloride

13. Exploring wav2vec 2.0 on Speaker Verification and Language Identification

14. Two-Stage Pre-Training for Sequence to Sequence Speech Recognition

15. Syllable-Based Acoustic Modeling With Lattice-Free MMI for Mandarin Speech Recognition

16. Study on the model binary mixtures for actual EPS extracted from the activated sludge in MBR on membrane fouling

17. Speaker-Aware Speech-Transformer

18. Neuroprotective Effects and Mechanisms of Zhenlong Xingnao Capsule in In Vivo and In Vitro Models of Hypoxia

19. Kinematics simulation analysis of six-degree-of-freedom loading and unloading robot based on ADAMS

20. Neuroprotective Effects and Mechanisms of Zhenlong Xingnao Capsule in In Vivo and In Vitro Models of Hypoxia

21. Kinematics simulation analysis of six-degree-of-freedom loading and unloading robot based on ADAMS

22. Effects of reclaimed water irrigation on mineral elements content of turf

23. Influence of reclaimed water irrigation on turf growth and soil

24. Chlorination Disinfection By-Products and Its Control in Drinking Water

25. Influence of reclaimed water irrigation on turf growth and soil.

26. Effects of reclaimed water irrigation on mineral elements content of turf.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

26 results on '"Fan, Zhiyun"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources