Author: "Liu, Yukun" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Liu, Yukun"' showing total 1,332 results

Start Over Author "Liu, Yukun"

1,332 results on '"Liu, Yukun"'

1. Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0

Author: Wang, Zhiyong, Fu, Ruibo, Wen, Zhengqi, Tao, Jianhua, Wang, Xiaopeng, Xie, Yuankun, Qi, Xin, Shi, Shuchen, Lu, Yi, Liu, Yukun, Li, Chenxing, Liu, Xuefei, and Li, Guanjun
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Speech synthesis technology has posed a serious threat to speaker verification systems. Currently, the most effective fake audio detection methods utilize pretrained models, and integrating features from various layers of pretrained model further enhances detection performance. However, most of the previously proposed fusion methods require fine-tuning the pretrained models, resulting in excessively long training times and hindering model iteration when facing new speech synthesis technology. To address this issue, this paper proposes a feature fusion method based on the Mixture of Experts, which extracts and integrates features relevant to fake audio detection from layer features, guided by a gating network based on the last layer feature, while freezing the pretrained model. Experiments conducted on the ASVspoof2019 and ASVspoof2021 datasets demonstrate that the proposed method achieves competitive performance compared to those requiring fine-tuning., Comment: submitted to ICASSP2025
Published: 2024

2. DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech

Author: Qi, Xin, Fu, Ruibo, Wen, Zhengqi, Wang, Tao, Qiang, Chunyu, Tao, Jianhua, Li, Chenxing, Lu, Yi, Shi, Shuchen, Wang, Zhiyong, Wang, Xiaopeng, Xie, Yuankun, Liu, Yukun, Liu, Xuefei, and Li, Guanjun
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: In recent years, speech diffusion models have advanced rapidly. Alongside the widely used U-Net architecture, transformer-based models such as the Diffusion Transformer (DiT) have also gained attention. However, current DiT speech models treat Mel spectrograms as general images, which overlooks the specific acoustic properties of speech. To address these limitations, we propose a method called Directional Patch Interaction for Text-to-Speech (DPI-TTS), which builds on DiT and achieves fast training without compromising accuracy. Notably, DPI-TTS employs a low-to-high frequency, frame-by-frame progressive inference approach that aligns more closely with acoustic properties, enhancing the naturalness of the generated speech. Additionally, we introduce a fine-grained style temporal modeling method that further improves speaker style similarity. Experimental results demonstrate that our method increases the training speed by nearly 2 times and significantly outperforms the baseline models., Comment: Submitted to ICASSP2025
Published: 2024

3. Exploring the Role of Audio in Multimodal Misinformation Detection

Author: Liu, Moyang, Liu, Yukun, Fu, Ruibo, Wen, Zhengqi, Tao, Jianhua, Liu, Xuefei, and Li, Guanjun
Subjects: Computer Science - Multimedia
Abstract: With the rapid development of deepfake technology, especially the deep audio fake technology, misinformation detection on the social media scene meets a great challenge. Social media data often contains multimodal information which includes audio, video, text, and images. However, existing multimodal misinformation detection methods tend to focus only on some of these modalities, failing to comprehensively address information from all modalities. To comprehensively address the various modal information that may appear on social media, this paper constructs a comprehensive multimodal misinformation detection framework. By employing corresponding neural network encoders for each modality, the framework can fuse different modality information and support the multimodal misinformation detection task. Based on the constructed framework, this paper explores the importance of the audio modality in multimodal misinformation detection tasks on social media. By adjusting the architecture of the acoustic encoder, the effectiveness of different acoustic feature encoders in the multimodal misinformation detection tasks is investigated. Furthermore, this paper discovers that audio and video information must be carefully aligned, otherwise the misalignment across different audio and video modalities can severely impair the model performance.
Published: 2024

4. Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?

Author: Xie, Yuankun, Xiong, Chenxu, Wang, Xiaopeng, Wang, Zhiyong, Lu, Yi, Qi, Xin, Fu, Ruibo, Liu, Yukun, Wen, Zhengqi, Tao, Jianhua, Li, Guanjun, and Ye, Long
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Currently, Audio Language Models (ALMs) are rapidly advancing due to the developments in large language models and audio neural codecs. These ALMs have significantly lowered the barrier to creating deepfake audio, generating highly realistic and diverse types of deepfake audio, which pose severe threats to society. Consequently, effective audio deepfake detection technologies to detect ALM-based audio have become increasingly critical. This paper investigate the effectiveness of current countermeasure (CM) against ALM-based audio. Specifically, we collect 12 types of the latest ALM-based deepfake audio and utilizing the latest CMs to evaluate. Our findings reveal that the latest codec-trained CM can effectively detect ALM-based audio, achieving 0% equal error rate under most ALM test conditions, which exceeded our expectations. This indicates promising directions for future research in ALM-based deepfake audio detection.
Published: 2024

5. EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech

Author: Qi, Xin, Fu, Ruibo, Wen, Zhengqi, Tao, Jianhua, Shi, Shuchen, Lu, Yi, Wang, Zhiyong, Wang, Xiaopeng, Xie, Yuankun, Liu, Yukun, Li, Guanjun, Liu, Xuefei, and Li, Yongwei
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: In the current era of Artificial Intelligence Generated Content (AIGC), a Low-Rank Adaptation (LoRA) method has emerged. It uses a plugin-based approach to learn new knowledge with lower parameter quantities and computational costs, and it can be plugged in and out based on the specific sub-tasks, offering high flexibility. However, the current application schemes primarily incorporate LoRA into the pre-introduced conditional parts of the speech models. This fixes the position of LoRA, limiting the flexibility and scalability of its application. Therefore, we propose the Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech (EELE) method. Starting from a general neutral speech model, we do not pre-introduce emotional information but instead use the LoRA plugin to design a flexible adaptive scheme that endows the model with emotional generation capabilities. Specifically, we initially train the model using only neutral speech data. After training is complete, we insert LoRA into different modules and fine-tune the model with emotional speech data to find the optimal insertion scheme. Through experiments, we compare and test the effects of inserting LoRA at different positions within the model and assess LoRA's ability to learn various emotions, effectively proving the validity of our method. Additionally, we explore the impact of the rank size of LoRA and the difference compared to directly fine-tuning the entire model.
Published: 2024

6. A Noval Feature via Color Quantisation for Fake Audio Detection

Author: Wang, Zhiyong, Wang, Xiaopeng, Xie, Yuankun, Fu, Ruibo, Wen, Zhengqi, Tao, Jianhua, Liu, Yukun, Li, Guanjun, Qi, Xin, Lu, Yi, Liu, Xuefei, and Li, Yongwei
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: In the field of deepfake detection, previous studies focus on using reconstruction or mask and prediction methods to train pre-trained models, which are then transferred to fake audio detection training where the encoder is used to extract features, such as wav2vec2.0 and Masked Auto Encoder. These methods have proven that using real audio for reconstruction pre-training can better help the model distinguish fake audio. However, the disadvantage lies in poor interpretability, meaning it is hard to intuitively present the differences between deepfake and real audio. This paper proposes a noval feature extraction method via color quantisation which constrains the reconstruction to use a limited number of colors for the spectral image-like input. The proposed method ensures reconstructed input differs from the original, which allows for intuitive observation of the focus areas in the spectral reconstruction. Experiments conducted on the ASVspoof2019 dataset demonstrate that the proposed method achieves better classification performance compared to using the original spectral as input and pretraining the recolor network can also benefit the fake audio detection., Comment: accepted by ISCSLP2024
Published: 2024

7. Inefficiencies of Carbon Trading Markets

Author: Borri, Nicola, Liu, Yukun, Tsyvinski, Aleh, and Wu, Xi
Subjects: Quantitative Finance - General Finance
Abstract: The European Union Emission Trading System is a prominent market-based mechanism to reduce emissions. While the theory is well understood, we are the first to study the whole cap-and-trade mechanism as a financial market. Analyzing the universe of transactions in 2005-2020 (more than one million records of granular transaction data), we show that this market features significant inefficiencies undermining its goals. First, about 40% of firms never trade in a given year. Second, many firms only trade during surrendering months, when compliance is immediate and prices are predictably high. Third, a number of operators engage in speculative trading, exploiting private information., Comment: 17 pages, 3 figures, 2 tables
Published: 2024

8. Effects from Dark Matter Halos on X-ray Pulsar Pulse Profiles

Author: Liu, Yukun, Li, Hong-Bo, Gao, Yong, Shao, Lijing, and Hu, Zexin
Subjects: Astrophysics - High Energy Astrophysical Phenomena
Abstract: Neutron stars (NSs) can capture dark matter (DM) particles because of their deep gravitational potential and high density. The accumulated DM can affect the properties of NSs. In this work we use a general relativistic two-fluid formalism to solve the structure of DM-admixed NSs (DANSs) and the surrounding spacetime. Specifically, we pay attention to the situation where those DANSs possess DM halos. Due to the gravitational effect of the DM halo, the pulse profile of an X-ray pulsar is changed. Our study finds a universal relation between the peak flux deviation of the pulse profile and $M_{\rm halo}/R_{\rm BM}$, which is the ratio of the DM halo mass, $M_{\rm halo}$, to the baryonic matter (BM) core radius, $R_{\rm BM}$. Our results show that, when $M_{\rm halo}/R_{\rm BM}=0.292$ and the DM particle mass $m_f = 0.3\,$GeV, the maximum deviation of the profile can be larger than 100$\%$, which has implication in X-ray pulsar observation., Comment: 10 pages, 11 figures; accepted by PRD
Published: 2024
Full Text: View/download PDF

9. Efficient estimation of partially linear additive Cox models and variance estimation under shape restrictions

Author: Lang, Junjun, Liu, Yukun, and Qin, Jing
Subjects: Mathematics - Statistics Theory
Abstract: Shape-restricted inferences have exhibited empirical success in various applications with survival data. However, certain works fall short in providing a rigorous theoretical justification and an easy-to-use variance estimator with theoretical guarantee. Motivated by Deng et al. (2023), this paper delves into an additive and shape-restricted partially linear Cox model for right-censored data, where each additive component satisfies a specific shape restriction, encompassing monotonic increasing/decreasing and convexity/concavity. We systematically investigate the consistencies and convergence rates of the shape-restricted maximum partial likelihood estimator (SMPLE) of all the underlying parameters. We further establish the aymptotic normality and semiparametric effiency of the SMPLE for the linear covariate shift. To estimate the asymptotic variance, we propose an innovative data-splitting variance estimation method that boasts exceptional versatility and broad applicability. Our simulation results and an analysis of the Rotterdam Breast Cancer dataset demonstrate that the SMPLE has comparable performance with the maximum likelihood estimator under the Cox model when the Cox model is correct, and outperforms the latter and Huang (1999)'s method when the Cox model is violated or the hazard is nonsmooth. Meanwhile, the proposed variance estimation method usually leads to reliable interval estimates based on the SMPLE and its competitors.
Published: 2024

10. ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation

Author: Fu, Ruibo, Qi, Xin, Wen, Zhengqi, Tao, Jianhua, Wang, Tao, Qiang, Chunyu, Wang, Zhiyong, Lu, Yi, Wang, Xiaopeng, Shi, Shuchen, Liu, Yukun, Liu, Xuefei, and Zhang, Shuai
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: Speaker adaptation, which involves cloning voices from unseen speakers in the Text-to-Speech task, has garnered significant interest due to its numerous applications in multi-media fields. Despite recent advancements, existing methods often struggle with inadequate speaker representation accuracy and overfitting, particularly in limited reference speeches scenarios. To address these challenges, we propose an Agile Speaker Representation Reinforcement Learning strategy to enhance speaker similarity in speaker adaptation tasks. ASRRL is the first work to apply reinforcement learning to improve the modeling accuracy of speaker embeddings in speaker adaptation, addressing the challenge of decoupling voice content and timbre. Our approach introduces two action strategies tailored to different reference speeches scenarios. In the single-sentence scenario, a knowledge-oriented optimal routine searching RL method is employed to expedite the exploration and retrieval of refinement information on the fringe of speaker representations. In the few-sentence scenario, we utilize a dynamic RL method to adaptively fuse reference speeches, enhancing the robustness and accuracy of speaker modeling. To achieve optimal results in the target domain, a multi-scale fusion scoring mechanism based reward model that evaluates speaker similarity, speech quality, and intelligibility across three dimensions is proposed, ensuring that improvements in speaker similarity do not compromise speech quality or intelligibility. The experimental results on the LibriTTS and VCTK datasets within mainstream TTS frameworks demonstrate the extensibility and generalization capabilities of the proposed ASRRL method. The results indicate that the ASRRL method significantly outperforms traditional fine-tuning approaches, achieving higher speaker similarity and better overall speech quality with limited reference speeches., Comment: The audio demo is available at https://7xin.github.io/ASRRL/
Published: 2024

11. Fake News Detection and Manipulation Reasoning via Large Vision-Language Models

Author: Jin, Ruihan, Fu, Ruibo, Wen, Zhengqi, Zhang, Shuai, Liu, Yukun, and Tao, Jianhua
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Fake news becomes a growing threat to information security and public opinion with the rapid sprawl of media manipulation. Therefore, fake news detection attracts widespread attention from academic community. Traditional fake news detection models demonstrate remarkable performance on authenticity binary classification but their ability to reason detailed faked traces based on the news content remains under-explored. Furthermore, due to the lack of external knowledge, the performance of existing methods on fact-related news is questionable, leaving their practical implementation unclear. In this paper, we propose a new multi-media research topic, namely manipulation reasoning. Manipulation reasoning aims to reason manipulations based on news content. To support the research, we introduce a benchmark for fake news detection and manipulation reasoning, referred to as Human-centric and Fact-related Fake News (HFFN). The benchmark highlights the centrality of human and the high factual relevance, with detailed manual annotations. HFFN encompasses four realistic domains with fake news samples generated through three manipulation approaches. Moreover, a Multi-modal news Detection and Reasoning langUage Model (M-DRUM) is presented not only to judge on the authenticity of multi-modal news, but also raise analytical reasoning about potential manipulations. On the feature extraction level, a cross-attention mechanism is employed to extract fine-grained fusion features from multi-modal inputs. On the reasoning level, a large vision-language model (LVLM) serves as the backbone to facilitate fact-related reasoning. A two-stage training framework is deployed to better activate the capacity of identification and reasoning. Comprehensive experiments demonstrate that our model outperforms state-of-the-art (SOTA) fake news detection models and powerful LVLMs like GPT-4 and LLaVA.
Published: 2024

12. ICAGC 2024: Inspirational and Convincing Audio Generation Challenge 2024

Author: Fu, Ruibo, Liu, Rui, Qiang, Chunyu, Gao, Yingming, Lu, Yi, Shi, Shuchen, Wang, Tao, Li, Ya, Wen, Zhengqi, Zhang, Chen, Bu, Hui, Liu, Yukun, Qi, Xin, and Li, Guanjun
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Artificial Intelligence
Abstract: The Inspirational and Convincing Audio Generation Challenge 2024 (ICAGC 2024) is part of the ISCSLP 2024 Competitions and Challenges track. While current text-to-speech (TTS) technology can generate high-quality audio, its ability to convey complex emotions and controlled detail content remains limited. This constraint leads to a discrepancy between the generated audio and human subjective perception in practical applications like companion robots for children and marketing bots. The core issue lies in the inconsistency between high-quality audio generation and the ultimate human subjective experience. Therefore, this challenge aims to enhance the persuasiveness and acceptability of synthesized audio, focusing on human alignment convincing and inspirational audio generation. A total of 19 teams have registered for the challenge, and the results of the competition and the competition are described in this paper., Comment: ISCSLP 2024 Challenge description and results
Published: 2024

13. MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation

Author: Fu, Ruibo, Shi, Shuchen, Guo, Hongming, Wang, Tao, Qiang, Chunyu, Wen, Zhengqi, Tao, Jianhua, Qi, Xin, Lu, Yi, Wang, Xiaopeng, Wang, Zhiyong, Liu, Yukun, Liu, Xuefei, Zhang, Shuai, and Li, Guanjun
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia, Computer Science - Sound
Abstract: Foley audio, critical for enhancing the immersive experience in multimedia content, faces significant challenges in the AI-generated content (AIGC) landscape. Despite advancements in AIGC technologies for text and image generation, the foley audio dubbing remains rudimentary due to difficulties in cross-modal scene matching and content correlation. Current text-to-audio technology, which relies on detailed and acoustically relevant textual descriptions, falls short in practical video dubbing applications. Existing datasets like AudioSet, AudioCaps, Clotho, Sound-of-Story, and WavCaps do not fully meet the requirements for real-world foley audio dubbing task. To address this, we introduce the Multi-modal Image and Narrative Text Dubbing Dataset (MINT), designed to enhance mainstream dubbing tasks such as literary story audiobooks dubbing, image/silent video dubbing. Besides, to address the limitations of existing TTA technology in understanding and planning complex prompts, a Foley Audio Content Planning, Generation, and Alignment (CPGA) framework is proposed, which includes a content planning module leveraging large language models for complex multi-modal prompts comprehension. Additionally, the training process is optimized using Proximal Policy Optimization based reinforcement learning, significantly improving the alignment and auditory realism of generated foley audio. Experimental results demonstrate that our approach significantly advances the field of foley audio dubbing, providing robust solutions for the challenges of multi-modal dubbing. Even when utilizing the relatively lightweight GPT-2 model, our framework outperforms open-source multimodal large models such as LLaVA, DeepSeek-VL, and Moondream2. The dataset is available at https://github.com/borisfrb/MINT .
Published: 2024

14. Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio

Author: Lu, Yi, Xie, Yuankun, Fu, Ruibo, Wen, Zhengqi, Tao, Jianhua, Wang, Zhiyong, Qi, Xin, Liu, Xuefei, Li, Yongwei, Liu, Yukun, Wang, Xiaopeng, and Shi, Shuchen
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: With the proliferation of Large Language Model (LLM) based deepfake audio, there is an urgent need for effective detection methods. Previous deepfake audio generation methods typically involve a multi-step generation process, with the final step using a vocoder to predict the waveform from handcrafted features. However, LLM-based audio is directly generated from discrete neural codecs in an end-to-end generation process, skipping the final step of vocoder processing. This poses a significant challenge for current audio deepfake detection (ADD) models based on vocoder artifacts. To effectively detect LLM-based deepfake audio, we focus on the core of the generation process, the conversion from neural codec to waveform. We propose Codecfake dataset, which is generated by seven representative neural codec methods. Experiment results show that codec-trained ADD models exhibit a 41.406% reduction in average equal error rate compared to vocoder-trained ADD models on the Codecfake test set., Comment: Accepted by INTERSPEECH 2024. arXiv admin note: substantial text overlap with arXiv:2405.04880
Published: 2024

15. PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation

Author: Shi, Shuchen, Fu, Ruibo, Wen, Zhengqi, Tao, Jianhua, Wang, Tao, Qiang, Chunyu, Lu, Yi, Qi, Xin, Liu, Xuefei, Liu, Yukun, Li, Yongwei, Wang, Zhiyong, and Wang, Xiaopeng
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Text-to-Audio (TTA) aims to generate audio that corresponds to the given text description, playing a crucial role in media production. The text descriptions in TTA datasets lack rich variations and diversity, resulting in a drop in TTA model performance when faced with complex text. To address this issue, we propose a method called Portable Plug-in Prompt Refiner, which utilizes rich knowledge about textual descriptions inherent in large language models to effectively enhance the robustness of TTA acoustic models without altering the acoustic training set. Furthermore, a Chain-of-Thought that mimics human verification is introduced to enhance the accuracy of audio descriptions, thereby improving the accuracy of generated content in practical applications. The experiments show that our method achieves a state-of-the-art Inception Score (IS) of 8.72, surpassing AudioGen, AudioLDM and Tango., Comment: accepted by INTERSPEECH2024
Published: 2024

16. Generalized Fake Audio Detection via Deep Stable Learning

Author: Wang, Zhiyong, Fu, Ruibo, Wen, Zhengqi, Xie, Yuankun, Liu, Yukun, Wang, Xiaopeng, Liu, Xuefei, Li, Yongwei, Tao, Jianhua, Lu, Yi, Qi, Xin, and Shi, Shuchen
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Although current fake audio detection approaches have achieved remarkable success on specific datasets, they often fail when evaluated with datasets from different distributions. Previous studies typically address distribution shift by focusing on using extra data or applying extra loss restrictions during training. However, these methods either require a substantial amount of data or complicate the training process. In this work, we propose a stable learning-based training scheme that involves a Sample Weight Learning (SWL) module, addressing distribution shift by decorrelating all selected features via learning weights from training samples. The proposed portable plug-in-like SWL is easy to apply to multiple base models and generalizes them without using extra data during training. Experiments conducted on the ASVspoof datasets clearly demonstrate the effectiveness of SWL in generalizing different models across three evaluation datasets from different distributions., Comment: accepted by INTERSPEECH2024
Published: 2024

17. Genuine-Focused Learning using Mask AutoEncoder for Generalized Fake Audio Detection

Author: Wang, Xiaopeng, Fu, Ruibo, Wen, Zhengqi, Wang, Zhiyong, Xie, Yuankun, Liu, Yukun, Tao, Jianhua, Liu, Xuefei, Li, Yongwei, Qi, Xin, Lu, Yi, and Shi, Shuchen
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: The generalization of Fake Audio Detection (FAD) is critical due to the emergence of new spoofing techniques. Traditional FAD methods often focus solely on distinguishing between genuine and known spoofed audio. We propose a Genuine-Focused Learning (GFL) framework guided, aiming for highly generalized FAD, called GFL-FAD. This method incorporates a Counterfactual Reasoning Enhanced Representation (CRER) based on audio reconstruction using the Mask AutoEncoder (MAE) architecture to accurately model genuine audio features. To reduce the influence of spoofed audio during training, we introduce a genuine audio reconstruction loss, maintaining the focus on learning genuine data features. In addition, content-related bottleneck (BN) features are extracted from the MAE to supplement the knowledge of the original audio. These BN features are adaptively fused with CRER to further improve robustness. Our method achieves state-of-the-art performance with an EER of 0.25% on ASVspoof2019 LA., Comment: Accepted by INTERSPEECH 2024
Published: 2024

18. The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio

Author: Xie, Yuankun, Lu, Yi, Fu, Ruibo, Wen, Zhengqi, Wang, Zhiyong, Tao, Jianhua, Qi, Xin, Wang, Xiaopeng, Liu, Yukun, Cheng, Haonan, Ye, Long, and Sun, Yi
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: With the proliferation of Audio Language Model (ALM) based deepfake audio, there is an urgent need for generalized detection methods. ALM-based deepfake audio currently exhibits widespread, high deception, and type versatility, posing a significant challenge to current audio deepfake detection (ADD) models trained solely on vocoded data. To effectively detect ALM-based deepfake audio, we focus on the mechanism of the ALM-based audio generation method, the conversion from neural codec to waveform. We initially construct the Codecfake dataset, an open-source large-scale dataset, including 2 languages, over 1M audio samples, and various test conditions, focus on ALM-based audio detection. As countermeasure, to achieve universal detection of deepfake audio and tackle domain ascent bias issue of original SAM, we propose the CSAM strategy to learn a domain balanced and generalized minima. In our experiments, we first demonstrate that ADD model training with the Codecfake dataset can effectively detects ALM-based audio. Furthermore, our proposed generalization countermeasure yields the lowest average Equal Error Rate (EER) of 0.616% across all test conditions compared to baseline models. The dataset and associated code are available online.
Published: 2024

19. Operando Analysis of Adsorption-Limited Hydrogen Oxidation Reaction at Palladium Surfaces

Author: Liu, Yukun, Koo, Kunmo, Mao, Zugang, Fu, Xianbiao, Hu, Xiaobing, and Dravid, Vinayak P.
Subjects: Condensed Matter - Materials Science
Abstract: Palladium (Pd) catalysts have been extensively studied for the direct synthesis of H2O through the hydrogen oxidation reaction at ambient conditions. This heterogeneous catalytic reaction not only holds considerable practical significance but also serves as a classical model for investigating fundamental mechanisms, including adsorption and reactions between adsorbates. Nonetheless, the governing mechanisms and kinetics of its intermediate reaction stages under varying gas conditions remains elusive. This is attributed to the intricate interplay between adsorption, atomic diffusion, and concurrent phase transformation of catalyst. Herein, the Pd-catalyzed, water-forming hydrogen oxidation is studied, in situ, to investigate intermediate reaction stages via fluid cell transmission electron microscopy. The dynamic behaviors of water generation, associated with reversible palladium hydride formation, are captured in real time with a nanoscale spatial resolution. Our findings suggest that the hydrogen oxidation rate catalyzed by Pd is significantly affected by the sequence in which gases are introduced. Through direct evidence of electron diffraction and density functional theory calculation, we demonstrate that the hydrogen oxidation rate is limited by adsorption processes of gas precursors. These nanoscale insights help identify the optimal reaction conditions for Pd-catalyzed hydrogen oxidation, which has substantial implications for water production technologies. The developed understanding also advocates a broader exploration of analogous mechanisms in other metal-catalyzed reactions., Comment: 30 pages, 4 figures
Published: 2024

20. One Factor to Bind the Cross-Section of Returns

Author: Borri, Nicola, Chetverikov, Denis, Liu, Yukun, and Tsyvinski, Aleh
Subjects: Quantitative Finance - General Finance, Economics - Econometrics
Abstract: We propose a new non-linear single-factor asset pricing model $r_{it}=h(f_{t}\lambda_{i})+\epsilon_{it}$. Despite its parsimony, this model represents exactly any non-linear model with an arbitrary number of factors and loadings -- a consequence of the Kolmogorov-Arnold representation theorem. It features only one pricing component $h(f_{t}\lambda_{I})$, comprising a nonparametric link function of the time-dependent factor and factor loading that we jointly estimate with sieve-based estimators. Using 171 assets across major classes, our model delivers superior cross-sectional performance with a low-dimensional approximation of the link function. Most known finance and macro factors become insignificant controlling for our single-factor.
Published: 2024

21. The Role of Pretarsal Fascia in Upper Eyelid Crease Formation and its Implications in East Asians

Author: Liu, Yukun, Wang, Yi, Cai, Changqi, and Wang, Haiping
Published: 2024
Full Text: View/download PDF

22. Two-step semiparametric empirical likelihood inference from capture–recapture data with missing covariates

Author: Liu, Yang, Liu, Yukun, Li, Pengfei, and Zhang, Riquan
Published: 2024
Full Text: View/download PDF

23. High-Performance Ammonia Detection of Polymeric BaTiO3/Ti3C2Tx MXene Composite-Based Sensor for Gas Emission and Leakage

Author: Sun, Guoqing, Wang, Chenglin, Jia, Jie, Zhang, Hao, Hu, Yaqing, Liu, Yukun, and Zhang, Dongzhi
Published: 2024
Full Text: View/download PDF

24. Clinical features and mutational spectrum of Chinese patients with primary hyperoxaluria type 2

Author: Liu, Yukun, Zhao, Zhenqiang, Ge, Yucheng, He, Longzhi, Qi, Siyu, and Wang, Wenying
Published: 2024
Full Text: View/download PDF

25. Structural connectome combining DTI features predicts postoperative language decline and its recovery in glioma patients

Author: Liu, Yukun, Cui, Meng, Gao, Xin, Yang, Hui, Chen, Hewen, Guan, Bing, and Ma, Xiaodong
Published: 2024
Full Text: View/download PDF

26. Two novel heterozygous ADCY10 variants identified in Chinese pediatric patients with absorptive hypercalciuria: case report and literature review

Author: Ge, Yucheng, Liu, Yukun, Zhan, Ruichao, Zhao, Zhenqiang, Wang, Wenying, and Tian, Ye
Published: 2024
Full Text: View/download PDF

27. Identification of Optimal Conditions for Human Placental Explant Culture and Extracellular Vesicle Release

Author: Tekkatte, Chandana, Duggan, Erika, Zhang, Yan, Zhou, Jun, Sebastian, Rachel, Liu, Yukun, Pontigon, Devin S, Meads, Morgan, Liu, Tzu Ning, Pizzo, Donald P, Nolan, John, Parast, Mana M, and Laurent, Louise C
Subjects: Placenta, explants, extracellular vesicles, syncytiotrophoblast, serum-free, hCG, PLAP, vesicle flow cytometry
Published: 2023

28. Hypothesis test on a mixture forward-incubation-time epidemic model with application to COVID-19 outbreak

Author: Wang, Chunlin, Li, Pengfei, Liu, Yukun, Zhou, Xiao-Hua, and Qin, Jing
Subjects: Statistics - Methodology, Statistics - Applications
Abstract: The distribution of the incubation period of the novel coronavirus disease that emerged in 2019 (COVID-19) has crucial clinical implications for understanding this disease and devising effective disease-control measures. Qin et al. (2020) designed a cross-sectional and forward follow-up study to collect the duration times between a specific observation time and the onset of COVID-19 symptoms for a number of individuals. They further proposed a mixture forward-incubation-time epidemic model, which is a mixture of an incubation-period distribution and a forward time distribution, to model the collected duration times and to estimate the incubation-period distribution of COVID-19. In this paper, we provide sufficient conditions for the identifiability of the unknown parameters in the mixture forward-incubation-time epidemic model when the incubation period follows a two-parameter distribution. Under the same setup, we propose a likelihood ratio test (LRT) for testing the null hypothesis that the mixture forward-incubation-time epidemic model is a homogeneous exponential distribution. The testing problem is non-regular because a nuisance parameter is present only under the alternative. We establish the limiting distribution of the LRT and identify an explicit representation for it. The limiting distribution of the LRT under a sequence of local alternatives is also obtained. Our simulation results indicate that the LRT has desirable type I errors and powers, and we analyze a COVID-19 outbreak dataset from China to illustrate the usefulness of the LRT., Comment: 34 pages, 2 figures, 2 tables
Published: 2022
Full Text: View/download PDF

29. Research on Identification of Primary and Secondary Cards with Dual SIMs Terminal

Author: Xie, Yichen, Li, Jiajun, Liu, Yukun, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Tan, Kay Chen, Series Editor, Wang, Yue, editor, Zou, Jiaqi, editor, Xu, Lexi, editor, Ling, Zhilei, editor, and Cheng, Xinzhou, editor
Published: 2024
Full Text: View/download PDF

30. Algorithm for Generating Tire Defect Images Based on RS-GAN

Author: Li, Chunhua, Fu, Ruizhi, Liu, Yukun, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Luo, Biao, editor, Cheng, Long, editor, Wu, Zheng-Guang, editor, Li, Hongyi, editor, and Li, Chaojie, editor
Published: 2024
Full Text: View/download PDF

31. Nearly optimal capture-recapture sampling and empirical likelihood weighting estimation for M-estimation with big data

Author: Fan, Yan, Liu, Yang, Liu, Yukun, and Qin, Jing
Subjects: Statistics - Methodology
Abstract: Subsampling techniques can reduce the computational costs of processing big data. Practical subsampling plans typically involve initial uniform sampling and refined sampling. With a subsample, big data inferences are generally built on the inverse probability weighting (IPW), which becomes unstable when the probability weights are close to zero and cannot incorporate auxiliary information. First, we consider capture-recapture sampling, which combines an initial uniform sampling with a second Poisson sampling. Under this sampling plan, we propose an empirical likelihood weighting (ELW) estimation approach to an M-estimation parameter. Second, based on the ELW method, we construct a nearly optimal capture-recapture sampling plan that balances estimation efficiency and computational costs. Third, we derive methods for determining the smallest sample sizes with which the proposed sampling-and-estimation method produces estimators of guaranteed precision. Our ELW method overcomes the instability of IPW by circumventing the use of inverse probabilities, and utilizes auxiliary information including the size and certain sample moments of big data. We show that the proposed ELW method produces more efficient estimators than IPW, leading to more efficient optimal sampling plans and more economical sample sizes for a prespecified estimation precision. These advantages are confirmed through simulation studies and real data analyses.
Published: 2022

32. Maximum Full Likelihood Approach to Randomly Truncated Data

Author: Cheng, Manli, Liu, Yukun, Ma, Huijuan, and Qin, Jing
Published: 2024
Full Text: View/download PDF

33. Role of P53 Mediated Molecular Regulation in Starvation-Induced Autophagy in HCT-116 and HT-29 Colorectal Carcinoma Cells

Author: Jing Wang, Liu, Yukun, Cai, Jie, Yang, Xinjiao, Xiong, Zhe, Zou, Di, Jiao, Deling, Xu, Kaixiang, Wei, Hong-Jiang, and Zhao, Hong-Ye
Published: 2023
Full Text: View/download PDF

34. Penalized empirical likelihood estimation and EM algorithms for closed-population capture-recapture models

Author: Liu, Yang, Li, Pengfei, and Liu, Yukun
Subjects: Statistics - Methodology
Abstract: Capture-recapture experiments are widely used to estimate the abundance of a finite population. Based on capture-recapture data, the empirical likelihood (EL) method has been shown to outperform the conventional conditional likelihood (CL) method. However, the current literature on EL abundance estimation ignores behavioral effects, and the EL estimates may not be stable, especially when the capture probability is low. We make three contributions in this paper. First, we extend the EL method to capture-recapture models that account for behavioral effects. Second, to overcome the instability of the EL method, we propose a penalized EL (PEL) estimation method that penalizes large abundance values. We then investigate the asymptotics of the maximum PEL estimator and the PEL ratio statistic. Third, we develop standard expectation-maximization (EM) algorithms for PEL to improve its practical performance. The EM algorithm is also applicable to EL and CL with slight modifications. Our simulation and a real-world data analysis demonstrate that the PEL method successfully overcomes the instability of the EL method and the proposed EM algorithm produces more reliable results than existing optimization algorithms.
Published: 2022

35. Research on the Characteristics of Mud Film on the Excavation Surface of Slurry Shield Tunneling Based on Transparent Soil Technology

Author: Liu, Yukun, primary and Wang, Jiawei, additional
Published: 2023
Full Text: View/download PDF

36. Tuning-parameter-free optimal propensity score matching approach for causal inference

Author: Liu, Yukun and Qin, Jing
Subjects: Mathematics - Statistics Theory
Abstract: Propensity score matching (PSM) is a pseudo-experimental method that uses statistical techniques to construct an artificial control group by matching each treated unit with one or more untreated units of similar characteristics. To date, the problem of determining the optimal number of matches per unit, which plays an important role in PSM, has not been adequately addressed. We propose a tuning-parameter-free PSM method based on the nonparametric maximum-likelihood estimation of the propensity score under the monotonicity constraint. The estimated propensity score is piecewise constant, and therefore automatically groups data. Hence, our proposal is free of tuning parameters. The proposed estimator is asymptotically semiparametric efficient for the univariate case, and achieves this level of efficiency in the multivariate case when the outcome and the propensity score depend on the covariate in the same direction. We conclude that matching methods based on the propensity score alone cannot, in general, be efficient., Comment: 28 pages, 2 figures, 3 tables
Published: 2022

37. Weighted-average quantile regression

Author: Chetverikov, Denis, Liu, Yukun, and Tsyvinski, Aleh
Subjects: Economics - Econometrics, Statistics - Machine Learning, 62J02
Abstract: In this paper, we introduce the weighted-average quantile regression framework, $\int_0^1 q_{Y|X}(u)\psi(u)du = X'\beta$, where $Y$ is a dependent variable, $X$ is a vector of covariates, $q_{Y|X}$ is the quantile function of the conditional distribution of $Y$ given $X$, $\psi$ is a weighting function, and $\beta$ is a vector of parameters. We argue that this framework is of interest in many applied settings and develop an estimator of the vector of parameters $\beta$. We show that our estimator is $\sqrt T$-consistent and asymptotically normal with mean zero and easily estimable covariance matrix, where $T$ is the size of available sample. We demonstrate the usefulness of our estimator by applying it in two empirical settings. In the first setting, we focus on financial data and study the factor structures of the expected shortfalls of the industry portfolios. In the second setting, we focus on wage data and study inequality and social welfare dependence on commonly used individual characteristics., Comment: 69 pages
Published: 2022

38. MXene-based self-adhesive, ultrasensitive, highly tough flexible hydrogel pressure sensors for motion monitoring and robotic tactile sensing

Author: Zhang, Pengfei, Wang, Weiwei, Ma, Yanhua, Zhang, Hao, Zhou, Dandi, Ji, Xinyi, Liu, Wenzhe, Liu, Yukun, and Zhang, Dongzhi
Published: 2024
Full Text: View/download PDF

39. Flexible self-supporting photonic crystals: Fabrications and responsive structural colors

Author: Meng, Zhipeng, Liu, Yukun, Huang, Haofei, and Wu, Suli
Published: 2024
Full Text: View/download PDF

40. Synergetic manipulation of components and multiple activator sites towards full-spectrum lighting in Eu2+-doped whitlockite phosphors for high color-rendering WLED

Author: Chen, Huan, Zhang, Ziqi, Mi, Ruiyu, Molokeev, Maxim S., Liu, Yukun, Wang, Baochen, Mei, Lefu, Fang, Minghao, Wu, Xiaowen, Min, Xin, and Liu, Yan-gai
Published: 2024
Full Text: View/download PDF

41. A prime red phosphor Ca3Gd(AlO)3(BO3)4:Eu3+ with high color purity for low color temperature pc-WLEDs

Author: Yu, Zheng, Liu, Yangai, Yang, ChenGuang, Liu, YuKun, Sun, Tonglu, Mi, Ruiyu, and Mei, Lefu
Published: 2024
Full Text: View/download PDF

42. Asymmetric Zn-N4 atomic sites embedded hollow fibers as stable Zn anode for high-performance Zn-ion hybrid capacitor

Author: Liu, Yukun, Li, Bing, Wang, Jin, Li, Caiyun, Yang, Hongrui, Song, Yang, Zhang, Sen, and Deng, Chao
Published: 2024
Full Text: View/download PDF

43. Eco-friendly triboelectric nanogenerator for self-powering stacked In2O3 nanosheets/PPy nanoparticles-based NO2 gas sensor

Author: Zhang, Hao, Zhang, Dongzhi, Yang, Yan, Zhou, Lina, Liu, Yukun, Liu, Wenzhe, Sun, Yuehang, Guo, Yihong, and Ji, Yuncheng
Published: 2024
Full Text: View/download PDF

44. Ultrafast response humidity sensor based on titanium dioxide quantum dots/silica and its multifunctional applications

Author: Liu, Wenzhe, Zhang, Dongzhi, Zhang, Hao, Sun, Yuehang, Wang, Zijian, Ji, Xinyi, Liu, Yukun, Wang, Jianghao, and Jiao, Gongao
Published: 2024
Full Text: View/download PDF

45. Rotational contact triboelectric nanogenerator driven by water flows inspired by waterwheels and their applications for lead ion removal

Author: Liu, Yukun, Zhang, Dongzhi, Ji, Xinyi, Xu, Zhenyuan, Zhang, Hao, Mao, Ruiyuan, Liu, Wenzhe, Wang, Jianghao, and Sun, Yuehang
Published: 2024
Full Text: View/download PDF

46. Identification of mutations in 15 nephrolithiasis-related genes leading to a molecular diagnosis in 85 Chinese pediatric patients

Author: Liu, Yukun, Ge, Yucheng, Zhan, Ruichao, Zhao, Zhenqiang, Li, Jun, and Wang, Wenying
Subjects: Gene mutations -- Identification and classification, Kidney stones -- Genetic aspects -- Diagnosis -- Demographic aspects, Health
Abstract: Background The aim of this study was to describe the genotypic and phenotypic characteristics of Chinese pediatric patients with hereditary nephrolithiasis. Methods Whole-exome sequencing (WES) was performed on 218 Chinese pediatric patients with kidney stones, and genetic and clinical data were collected and analyzed retrospectively. Results The median age at onset in our cohort was 2.5 years (age range, 0.3-13 years). We detected 79 causative mutations in 15 genes, leading to a molecular diagnosis in 38.99% (85/218) of all cases. Monogenic mutations were present in 80 cases, and digenic mutations were present in 5 cases; 34.18% (27/79) of mutations were not included in the databases. Six common mutant genes, i.e., HOGA1, AGXT, GRHPR, SLC3A1, SLC7A9, and SLC4A1, were found in 84.71% of the patients overall. Furthermore, three mutations (A278A, c.834_834 + 1GG > TT, and C257G) in HOGA1, two mutations (K12QfX156 and S275RfX28) in AGXT, and one mutation (C289DfX22) in GRHPR represented hotspot mutations. The patients with HOGA1 mutations had the earliest onset age (0.8 years), followed by those with SLC7A9 (1.8 years), SLC4A1 (2.7 years), AGXT (4.3 years), SLC3A1 (4.8 years), and GRHPR (8 years) mutations (p = 0.002). Nephrocalcinosis was most commonly observed in patients with AGXT gene mutations. Conclusions Fifteen causative genes were detected in 85 Chinese pediatric patients with kidney stone diseases. The most common mutant genes, novel mutations, hotspot mutations, and genotype-phenotype correlations were also found. This study contributes to the understanding of genetic profiles and clinical courses in pediatric patients with hereditary nephrolithiasis. Graphical abstract, Author(s): Yukun Liu [sup.1] , Yucheng Ge [sup.1] , Ruichao Zhan [sup.1] , Zhenqiang Zhao [sup.1] , Jun Li [sup.1] , Wenying Wang [sup.1] Author Affiliations: (1) grid.411610.3, 0000 0004 [...]
Published: 2023
Full Text: View/download PDF

47. Trimethylamine gas sensor based on bimetallic Ag/Cu@CuFe2O4: Experiment and DFT calculation

Author: Sun, Yuehang, Zhang, Dongzhi, Tang, Mingcong, Liu, Wenzhe, Liu, Yukun, Wang, Jianghao, Xi, Guangshuai, Xiong, Haotian, and Zhang, Lifa
Published: 2025
Full Text: View/download PDF

48. High color purity aluminate phosphor SryCa1-x-yAl12O19: xEu2+ for lighting and backlight display applications

Author: Tian, Zhaofeng, Liu, Yangai, Xie, Ci'an, Yang, Juyu, Liu, Yukun, Yang, Chenguang, Mi, Ruiyu, and Mei, Lefu
Published: 2025
Full Text: View/download PDF

49. Co-CoSe heterogeneous fibers with strong interfacial built-in electric field as bifunctional electrocatalyst for high-performance Zn-air battery

Author: Song, Yang, Li, Caiyun, Wang, Jin, Yang, Hongrui, He, Hanwen, Liu, Yukun, Zhang, Sen, and Deng, Chao
Published: 2025
Full Text: View/download PDF

50. Ethanol-responsive structural colors with multi-level information encryption based on the patterned three-layer inverse opal photonic crystal

Author: Liu, Yukun, Meng, Zhipeng, Miao, Senlin, Huang, Haofei, Ren, Jie, Han, Yaqun, and Wu, Suli
Published: 2025
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

1,332 results on '"Liu, Yukun"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources