Author: "Huang, Zhen" / Search Limiters: Available in Library Collection - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Huang, Zhen"' showing total 4,762 results

Start Over Author "Huang, Zhen" Search Limiters Available in Library Collection

4,762 results on '"Huang, Zhen"'

1. O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

Author: Huang, Zhen, Zou, Haoyang, Li, Xuefeng, Liu, Yixiu, Zheng, Yuxiang, Chern, Ethan, Xia, Shijie, Qin, Yiwei, Yuan, Weizhe, and Liu, Pengfei
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: This paper presents a critical examination of current approaches to replicating OpenAI's O1 model capabilities, with particular focus on the widespread but often undisclosed use of knowledge distillation techniques. While our previous work explored the fundamental technical path to O1 replication, this study reveals how simple distillation from O1's API, combined with supervised fine-tuning, can achieve superior performance on complex mathematical reasoning tasks. Through extensive experiments, we show that a base model fine-tuned on simply tens of thousands of samples O1-distilled long-thought chains outperforms O1-preview on the American Invitational Mathematics Examination (AIME) with minimal technical complexity. Moreover, our investigation extends beyond mathematical reasoning to explore the generalization capabilities of O1-distilled models across diverse tasks: hallucination, safety and open-domain QA. Notably, despite training only on mathematical problem-solving data, our models demonstrated strong generalization to open-ended QA tasks and became significantly less susceptible to sycophancy after fine-tuning. We deliberately make this finding public to promote transparency in AI research and to challenge the current trend of obscured technical claims in the field. Our work includes: (1) A detailed technical exposition of the distillation process and its effectiveness, (2) A comprehensive benchmark framework for evaluating and categorizing O1 replication attempts based on their technical transparency and reproducibility, (3) A critical discussion of the limitations and potential risks of over-relying on distillation approaches, our analysis culminates in a crucial bitter lesson: while the pursuit of more capable AI systems is important, the development of researchers grounded in first-principles thinking is paramount., Comment: 16 pages
Published: 2024

2. Renormalization of States and Quasiparticles in Many-body Downfolding

Author: Canestraight, Annabelle, Huang, Zhen, and Vlcek, Vojtech
Subjects: Physics - Computational Physics, Condensed Matter - Materials Science
Abstract: We explore the principles of many-body Hamiltonian complexity reduction via downfolding on an effective low-dimensional representation. We present a unique measure of fidelity between the effective (reduced-rank) description and the full many-body treatment for arbitrary (i.e., ground and excited) states. When the entire problem is mapped on a system of interacting quasiparticles [npj Computational Materials 9 (1), 126, 2023], the effective Hamiltonians can faithfully reproduce the physics only when a clear energy scale separation exists between the subsystems and its environment. We also demonstrate that it is necessary to include quasiparticle renormalization at distinct energy scales, capturing the distinct interaction between subsystems and their surrounding environments. Numerical results from simple, exactly solvable models highlight the limitations and strengths of this approach, particularly for ground and low-lying excited states. This work lays the groundwork for applying dynamical downfolding techniques to problems concerned with (quantum) interfaces.
Published: 2024

3. Unified analysis of non-Markovian open quantum systems in Gaussian environment using superoperator formalism

Author: Huang, Zhen, Lin, Lin, Park, Gunhee, and Zhu, Yuanran
Subjects: Quantum Physics, Mathematical Physics
Abstract: We present perturbative error bounds for the non-Markovian dynamics of observables in open quantum systems interacting with Gaussian environments, governed by general Liouville dynamics. This extends the work of [Mascherpa et al., Phys. Rev. Lett. 118, 100401, 2017], which demonstrated qualitatively tighter bounds over the standard Gr\"onwall-type analysis, where the joint system-environment evolution is unitary. Our results apply to systems with both bosonic and fermionic environments. Our approach utilizes a superoperator formalism, which avoids the need for formal coherent state path integral calculations, or the dilation of Lindblad dynamics into an equivalent unitary framework with infinitely many degrees of freedom. This enables a unified treatment of a wide range of open quantum systems. These findings provide a solid theoretical basis for various recently developed pseudomode methods in simulating open quantum system dynamics., Comment: 46 pages
Published: 2024

4. Real-time propagation of adaptive sampling selected configuration interaction wave function

Author: Shee, Avijit, Huang, Zhen, Head-Gordon, Martin, and Whaley, K. Birgitta
Subjects: Physics - Chemical Physics, Physics - Computational Physics, Quantum Physics
Abstract: We have developed a new time propagation method, time-dependent adaptive sampling configuration interaction (TD-ASCI), to describe the dynamics of a strongly correlated system. We employ the short iterative Lanczos (SIL) method as the time-integrator, which provides a unitary, norm-conserving, and stable long-time propagation scheme. We used the TD-ASCI method to evaluate the time-domain correlation functions of molecular systems. The accuracy of the correlation function was assessed by Fourier transforming (FT) into the frequency domain to compute the dipole-allowed absorption spectra. The FT has been carried out with a short-time signal of the correlation function to reduce the computation time, using an efficient alternative FT scheme based on the ESPRIT signal processing algorithm. We have applied the {TD-ASCI} method to prototypical strongly correlated molecular systems and compared the absorption spectra to spectra evaluated using the equation of motion coupled cluster (EOMCC) method with a truncation at single-doubles-triples (SDT) level.
Published: 2024

5. RuleRAG: Rule-guided retrieval-augmented generation with language models for question answering

Author: Chen, Zhongwu, Xu, Chengjin, Wang, Dingmin, Huang, Zhen, Dou, Yong, and Guo, Jian
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Retrieval-augmented generation (RAG) framework has shown promising potential in knowledge-intensive question answering (QA) by retrieving external corpus and generating based on augmented context. However, existing approaches only consider the query itself, neither specifying the retrieval preferences for the retrievers nor informing the generators of how to refer to the retrieved documents for the answers, which poses a significant challenge to the QA performance. To address these issues, we propose Rule-Guided Retrieval-Augmented Generation with LMs, which explicitly introduces symbolic rules as demonstrations for in-context learning (RuleRAG-ICL) to guide retrievers to retrieve logically related documents in the directions of rules and uniformly guide generators to generate answers attributed by the guidance of the same set of rules. Moreover, the combination of queries and rules can be further used as supervised fine-tuning data to update retrievers and generators (RuleRAG-FT) to achieve better rule-based instruction following capability, leading to retrieve more supportive results and generate more acceptable answers. To emphasize the attribution of rules, we construct five rule-aware QA benchmarks, including three temporal and two static scenarios, and equip RuleRAG with several kinds of retrievers and generators. Experiments demonstrate that training-free RuleRAG-ICL effectively improves the retrieval quality of +89.2% in Recall@10 scores and generation accuracy of +103.1% in exact match scores over standard RAG on average across the five benchmarks, and further fine-tuned RuleRAG-FT consistently yields more significant performance enhancement. Extensive analyses indicate that RuleRAG scales well with increasing numbers of retrieved documents and exhibits generalization ability for untrained rules.
Published: 2024

6. O1 Replication Journey: A Strategic Progress Report -- Part 1

Author: Qin, Yiwei, Li, Xuefeng, Zou, Haoyang, Liu, Yixiu, Xia, Shijie, Huang, Zhen, Ye, Yixin, Yuan, Weizhe, Liu, Hector, Li, Yuanzhi, and Liu, Pengfei
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: This paper introduces a pioneering approach to artificial intelligence research, embodied in our O1 Replication Journey. In response to the announcement of OpenAI's groundbreaking O1 model, we embark on a transparent, real-time exploration to replicate its capabilities while reimagining the process of conducting and communicating AI research. Our methodology addresses critical challenges in modern AI research, including the insularity of prolonged team-based projects, delayed information sharing, and the lack of recognition for diverse contributions. By providing comprehensive, real-time documentation of our replication efforts, including both successes and failures, we aim to foster open science, accelerate collective advancement, and lay the groundwork for AI-driven scientific discovery. Our research progress report diverges significantly from traditional research papers, offering continuous updates, full process transparency, and active community engagement throughout the research journey. Technologically, we proposed the journey learning paradigm, which encourages models to learn not just shortcuts, but the complete exploration process, including trial and error, reflection, and backtracking. With only 327 training samples and without any additional tricks, journey learning outperformed conventional supervised learning by over 8\% on the MATH dataset, demonstrating its extremely powerful potential. We believe this to be the most crucial component of O1 technology that we have successfully decoded. We share valuable resources including technical hypotheses and insights, cognitive exploration maps, custom-developed tools, etc at https://github.com/GAIR-NLP/O1-Journey.
Published: 2024

7. Zero-resource Hallucination Detection for Text Generation via Graph-based Contextual Knowledge Triples Modeling

Author: Fang, Xinyue, Huang, Zhen, Tian, Zhiliang, Fang, Minghui, Pan, Ziyi, Fang, Quntian, Wen, Zhihua, Pan, Hengyue, and Li, Dongsheng
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: LLMs obtain remarkable performance but suffer from hallucinations. Most research on detecting hallucination focuses on the questions with short and concrete correct answers that are easy to check the faithfulness. Hallucination detections for text generation with open-ended answers are more challenging. Some researchers use external knowledge to detect hallucinations in generated texts, but external resources for specific scenarios are hard to access. Recent studies on detecting hallucinations in long text without external resources conduct consistency comparison among multiple sampled outputs. To handle long texts, researchers split long texts into multiple facts and individually compare the consistency of each pairs of facts. However, these methods (1) hardly achieve alignment among multiple facts; (2) overlook dependencies between multiple contextual facts. In this paper, we propose a graph-based context-aware (GCA) hallucination detection for text generations, which aligns knowledge facts and considers the dependencies between contextual knowledge triples in consistency comparison. Particularly, to align multiple facts, we conduct a triple-oriented response segmentation to extract multiple knowledge triples. To model dependencies among contextual knowledge triple (facts), we construct contextual triple into a graph and enhance triples' interactions via message passing and aggregating via RGCN. To avoid the omission of knowledge triples in long text, we conduct a LLM-based reverse verification via reconstructing the knowledge triples. Experiments show that our model enhances hallucination detection and excels all baselines.
Published: 2024

8. Contextualization of ASR with LLM using phonetic retrieval-based augmentation

Author: Lei, Zhihong, Na, Xingyu, Xu, Mingbin, Pusateri, Ernest, Van Gysel, Christophe, Zhang, Yuanyuan, Han, Shiyi, and Huang, Zhen
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Computation and Language, Computer Science - Machine Learning, Computer Science - Sound
Abstract: Large language models (LLMs) have shown superb capability of modeling multimodal signals including audio and text, allowing the model to generate spoken or textual response given a speech input. However, it remains a challenge for the model to recognize personal named entities, such as contacts in a phone book, when the input modality is speech. In this work, we start with a speech recognition task and propose a retrieval-based solution to contextualize the LLM: we first let the LLM detect named entities in speech without any context, then use this named entity as a query to retrieve phonetically similar named entities from a personal database and feed them to the LLM, and finally run context-aware LLM decoding. In a voice assistant task, our solution achieved up to 30.2% relative word error rate reduction and 73.6% relative named entity error rate reduction compared to a baseline system without contextualization. Notably, our solution by design avoids prompting the LLM with the full named entity database, making it highly efficient and applicable to large named entity databases.
Published: 2024

9. From Yes-Men to Truth-Tellers: Addressing Sycophancy in Large Language Models with Pinpoint Tuning

Author: Chen, Wei, Huang, Zhen, Xie, Liang, Lin, Binbin, Li, Houqiang, Lu, Le, Tian, Xinmei, Cai, Deng, Zhang, Yonggang, Wang, Wenxiao, Shen, Xu, and Ye, Jieping
Subjects: Computer Science - Computation and Language
Abstract: Large Language Models (LLMs) tend to prioritize adherence to user prompts over providing veracious responses, leading to the sycophancy issue. When challenged by users, LLMs tend to admit mistakes and provide inaccurate responses even if they initially provided the correct answer. Recent works propose to employ supervised fine-tuning (SFT) to mitigate the sycophancy issue, while it typically leads to the degeneration of LLMs' general capability. To address the challenge, we propose a novel supervised pinpoint tuning (SPT), where the region-of-interest modules are tuned for a given objective. Specifically, SPT first reveals and verifies a small percentage (<5%) of the basic modules, which significantly affect a particular behavior of LLMs. i.e., sycophancy. Subsequently, SPT merely fine-tunes these identified modules while freezing the rest. To verify the effectiveness of the proposed SPT, we conduct comprehensive experiments, demonstrating that SPT significantly mitigates the sycophancy issue of LLMs (even better than SFT). Moreover, SPT introduces limited or even no side effects on the general capability of LLMs. Our results shed light on how to precisely, effectively, and efficiently explain and improve the targeted ability of LLMs., Comment: Accepted by ICML 2024
Published: 2024

10. Quasi-Lindblad pseudomode theory for open quantum systems

Author: Park, Gunhee, Huang, Zhen, Zhu, Yuanran, Yang, Chao, Chan, Garnet Kin-Lic, and Lin, Lin
Subjects: Quantum Physics, Condensed Matter - Strongly Correlated Electrons, Physics - Chemical Physics, Physics - Computational Physics
Abstract: We introduce a new framework to study the dynamics of open quantum systems with linearly coupled Gaussian baths. Our approach replaces the continuous bath with an auxiliary discrete set of pseudomodes with dissipative dynamics, but we further relax the complete positivity requirement in the Lindblad master equation and formulate a quasi-Lindblad pseudomode theory. We show that this quasi-Lindblad pseudomode formulation directly leads to a representation of the bath correlation function in terms of a complex weighted sum of complex exponentials, an expansion that is known to be rapidly convergent in practice and thus leads to a compact set of pseudomodes. The pseudomode representation is not unique and can differ by a gauge choice. When the global dynamics can be simulated exactly, the system dynamics is unique and independent of the specific pseudomode representation. However, the gauge choice may affect the stability of the global dynamics, and we provide an analysis of why and when the global dynamics can retain stability despite losing positivity. We showcase the performance of this formulation across various spectral densities in both bosonic and fermionic problems, finding significant improvements over conventional pseudomode formulations., Comment: 13 pages, 6 figures (main text); 8 pages, 1 figure (Supplementary Material)
Published: 2024
Full Text: View/download PDF

11. Focused Discriminative Training For Streaming CTC-Trained Automatic Speech Recognition Models

Author: Haider, Adnan, Na, Xingyu, McDermott, Erik, Ng, Tim, Huang, Zhen, and Zhuang, Xiaodan
Subjects: Computer Science - Machine Learning
Abstract: This paper introduces a novel training framework called Focused Discriminative Training (FDT) to further improve streaming word-piece end-to-end (E2E) automatic speech recognition (ASR) models trained using either CTC or an interpolation of CTC and attention-based encoder-decoder (AED) loss. The proposed approach presents a novel framework to identify and improve a model's recognition on challenging segments of an audio. Notably, this training framework is independent of hidden Markov models (HMMs) and lattices, eliminating the need for substantial decision-making regarding HMM topology, lexicon, and graph generation, as typically required in standard discriminative training approaches. Compared to additional fine-tuning with MMI or MWER loss on the encoder, FDT is shown to be more effective in achieving greater reductions in Word Error Rate (WER) on streaming models trained on LibriSpeech. Additionally, this method is shown to be effective in further improving a converged word-piece streaming E2E model trained on 600k hours of assistant and dictation dataset., Comment: UK Speech 2024, Submitted to SLT 2024
Published: 2024

12. One-dimensional spin-flipping topological edge state laser

Author: Wu, Jhih-Sheng, Huang, Zhen-Ting, Han, Meng-Ting, Chen, Yen-Hsun, and Lu, Tien-Chang
Subjects: Physics - Optics
Abstract: Topological edge states manifest spin-momentum-locking propagation as a primary consequence of topological crystals. However, experimental studies on spin manipulation and the resulting propagation of these states are lacking. Here, we demonstrate experimentally spin manipulation of topological edge states by the boundary conditions of the one-dimensional path. Armchair boundaries at the endpoints of the path induce spin-flipping back-scattering, resulting in a novel one-dimensional resonance -- traveling resonance. Remarkably, we demonstrate lasing of this one-dimensional traveling resonance. Our findings hold significant potential for practical applications in spin manipulation of light., Comment: 9 pages, 6 figures
Published: 2024

13. Joint Optimization of Resource Allocation and Data Selection for Fast and Cost-Efficient Federated Edge Learning

Author: Jia, Yunjian, Huang, Zhen, Yan, Jiping, Zhang, Yulu, Luo, Kun, and Wen, Wanli
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Deploying federated learning at the wireless edge introduces federated edge learning (FEEL). Given FEEL's limited communication resources and potential mislabeled data on devices, improper resource allocation or data selection can hurt convergence speed and increase training costs. Thus, to realize an efficient FEEL system, this paper emphasizes jointly optimizing resource allocation and data selection. Specifically, in this work, through rigorously modeling the training process and deriving an upper bound on FEEL's one-round convergence rate, we establish a problem of joint resource allocation and data selection, which, unfortunately, cannot be solved directly. Toward this end, we equivalently transform the original problem into a solvable form via a variable substitution and then break it into two subproblems, that is, the resource allocation problem and the data selection problem. The two subproblems are mixed-integer non-convex and integer non-convex problems, respectively, and achieving their optimal solutions is a challenging task. Based on the matching theory and applying the convex-concave procedure and gradient projection methods, we devise a low-complexity suboptimal algorithm for the two subproblems, respectively. Finally, the superiority of our proposed scheme of joint resource allocation and data selection is validated by numerical results.
Published: 2024

14. OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far?

Author: Huang, Zhen, Wang, Zengzhi, Xia, Shijie, and Liu, Pengfei
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: In this report, we pose the following question: Who is the most intelligent AI model to date, as measured by the OlympicArena (an Olympic-level, multi-discipline, multi-modal benchmark for superintelligent AI)? We specifically focus on the most recently released models: Claude-3.5-Sonnet, Gemini-1.5-Pro, and GPT-4o. For the first time, we propose using an Olympic medal Table approach to rank AI models based on their comprehensive performance across various disciplines. Empirical results reveal: (1) Claude-3.5-Sonnet shows highly competitive overall performance over GPT-4o, even surpassing GPT-4o on a few subjects (i.e., Physics, Chemistry, and Biology). (2) Gemini-1.5-Pro and GPT-4V are ranked consecutively just behind GPT-4o and Claude-3.5-Sonnet, but with a clear performance gap between them. (3) The performance of AI models from the open-source community significantly lags behind these proprietary models. (4) The performance of these models on this benchmark has been less than satisfactory, indicating that we still have a long way to go before achieving superintelligence. We remain committed to continuously tracking and evaluating the performance of the latest powerful models on this benchmark (available at https://github.com/GAIR-NLP/OlympicArena)., Comment: 10 pages
Published: 2024

15. OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI

Author: Huang, Zhen, Wang, Zengzhi, Xia, Shijie, Li, Xuefeng, Zou, Haoyang, Xu, Ruijie, Fan, Run-Ze, Ye, Lyumanshan, Chern, Ethan, Ye, Yixin, Zhang, Yikai, Yang, Yuqing, Wu, Ting, Wang, Binjie, Sun, Shichao, Xiao, Yang, Li, Yiyuan, Zhou, Fan, Chern, Steffi, Qin, Yiwei, Ma, Yan, Su, Jiadi, Liu, Yixiu, Zheng, Yuxiang, Zhang, Shaoting, Lin, Dahua, Qiao, Yu, and Liu, Pengfei
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: The evolution of Artificial Intelligence (AI) has been significantly accelerated by advancements in Large Language Models (LLMs) and Large Multimodal Models (LMMs), gradually showcasing potential cognitive reasoning abilities in problem-solving and scientific discovery (i.e., AI4Science) once exclusive to human intellect. To comprehensively evaluate current models' performance in cognitive reasoning abilities, we introduce OlympicArena, which includes 11,163 bilingual problems across both text-only and interleaved text-image modalities. These challenges encompass a wide range of disciplines spanning seven fields and 62 international Olympic competitions, rigorously examined for data leakage. We argue that the challenges in Olympic competition problems are ideal for evaluating AI's cognitive reasoning due to their complexity and interdisciplinary nature, which are essential for tackling complex scientific challenges and facilitating discoveries. Beyond evaluating performance across various disciplines using answer-only criteria, we conduct detailed experiments and analyses from multiple perspectives. We delve into the models' cognitive reasoning abilities, their performance across different modalities, and their outcomes in process-level evaluations, which are vital for tasks requiring complex reasoning with lengthy solutions. Our extensive evaluations reveal that even advanced models like GPT-4o only achieve a 39.97% overall accuracy, illustrating current AI limitations in complex reasoning and multimodal integration. Through the OlympicArena, we aim to advance AI towards superintelligence, equipping it to address more complex challenges in science and beyond. We also provide a comprehensive set of resources to support AI research, including a benchmark dataset, an open-source annotation platform, a detailed evaluation tool, and a leaderboard with automatic submission features., Comment: 44 pages
Published: 2024

16. Adaptive Cooperative Streaming of Holographic Video Over Wireless Networks: A Proximal Policy Optimization Solution

Author: Wen, Wanli, Yan, Jiping, Zhang, Yulu, Huang, Zhen, Liang, Liang, and Jia, Yunjian
Subjects: Electrical Engineering and Systems Science - Systems and Control
Abstract: Adapting holographic video streaming to fluctuating wireless channels is essential to maintain consistent and satisfactory Quality of Experience (QoE) for users, which, however, is a challenging task due to the dynamic and uncertain characteristics of wireless networks. To address this issue, we propose a holographic video cooperative streaming framework designed for a generic wireless network in which multiple access points can cooperatively transmit video with different bitrates to multiple users. Additionally, we model a novel QoE metric tailored specifically for holographic video streaming, which can effectively encapsulate the nuances of holographic video quality, quality fluctuations, and rebuffering occurrences simultaneously. Furthermore, we formulate a formidable QoE maximization problem, which is a non-convex mixed integer nonlinear programming problem. Using proximal policy optimization (PPO), a new class of reinforcement learning algorithms, we devise a joint beamforming and bitrate control scheme, which can be wisely adapted to fluctuations in the wireless channel. The numerical results demonstrate the superiority of the proposed scheme over representative baselines., Comment: This paper has been accepted for publication in IEEE Wireless Communications Letters
Published: 2024

17. Enhancing CTC-based speech recognition with diverse modeling units

Author: Han, Shiyi, Lei, Zhihong, Xu, Mingbin, Na, Xingyu, and Huang, Zhen
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Artificial Intelligence, Computer Science - Sound
Abstract: In recent years, the evolution of end-to-end (E2E) automatic speech recognition (ASR) models has been remarkable, largely due to advances in deep learning architectures like transformer. On top of E2E systems, researchers have achieved substantial accuracy improvement by rescoring E2E model's N-best hypotheses with a phoneme-based model. This raises an interesting question about where the improvements come from other than the system combination effect. We examine the underlying mechanisms driving these gains and propose an efficient joint training approach, where E2E models are trained jointly with diverse modeling units. This methodology does not only align the strengths of both phoneme and grapheme-based models but also reveals that using these diverse modeling units in a synergistic way can significantly enhance model accuracy. Our findings offer new insights into the optimal integration of heterogeneous modeling units in the development of more robust and accurate ASR systems.
Published: 2024

18. Path-wise Vulnerability Mitigation

Author: Huang, Zhen and Dokic, Hristina
Subjects: Computer Science - Cryptography and Security
Abstract: Software vulnerabilities are prevalent but fixing software vulnerabilities is not trivial. Studies have shown that a considerable prepatch window exists because it often takes weeks or months for software vendors to fix a vulnerability. Existing approaches aim to reduce the pre-patch window by generating and applying mitigation patches that prevent adversaries from exploiting vulnerabilities rather than fix vulnerabilities. Because mitigation patches typically terminate the execution of vulnerability-triggering program paths at the level of functions, they can have significant side-effects. This paper describes an approach called PAVER that generates and inserts mitigation patches at the level of program paths, i.e. path-wise vulnerability mitigation patches, in order to reduce their side-effects. PAVER generates a program path graph that includes the paths leading to vulnerabilities and the control dependencies on these paths, then identifies candidate patch locations based on the program path graph. For each candidate patch location, PAVER generates and inserts a mitigation patch, and tests the patched program to assess the side-effect of the patch. It ranks the patches by the extent of their side-effects. We evaluates the prototype of PAVER on real world vulnerabilities and the evaluation shows that our path-wise vulnerability mitigation patches can achieve minimum side-effects.
Published: 2024

19. Perception of Knowledge Boundary for Large Language Models through Semi-open-ended Question Answering

Author: Wen, Zhihua, Tian, Zhiliang, Jian, Zexin, Huang, Zhen, Ke, Pei, Gao, Yifu, Huang, Minlie, and Li, Dongsheng
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Large Language Models (LLMs) are widely used for knowledge-seeking yet suffer from hallucinations. The knowledge boundary (KB) of an LLM limits its factual understanding, beyond which it may begin to hallucinate. Investigating the perception of LLMs' KB is crucial for detecting hallucinations and LLMs' reliable generation. Current studies perceive LLMs' KB on questions with a concrete answer (close-ended questions) while paying limited attention to semi-open-ended questions (SoeQ) that correspond to many potential answers. Some researchers achieve it by judging whether the question is answerable or not. However, this paradigm is unsuitable for SoeQ, which are usually partially answerable, containing both answerable and ambiguous (unanswerable) answers. Ambiguous answers are essential for knowledge-seeking, but they may go beyond the KB of LLMs. In this paper, we perceive the LLMs' KB with SoeQ by discovering more ambiguous answers. First, we apply an LLM-based approach to construct SoeQ and obtain answers from a target LLM. Unfortunately, the output probabilities of mainstream black-box LLMs are inaccessible to sample for low-probability ambiguous answers. Therefore, we apply an open-sourced auxiliary model to explore ambiguous answers for the target LLM. We calculate the nearest semantic representation for existing answers to estimate their probabilities, with which we reduce the generation probability of high-probability answers to achieve a more effective generation. Finally, we compare the results from the RAG-based evaluation and LLM self-evaluation to categorize four types of ambiguous answers that are beyond the KB of the target LLM. Following our method, we construct a dataset to perceive the KB for GPT-4. We find that GPT-4 performs poorly on SoeQ and is often unaware of its KB. Besides, our auxiliary model, LLaMA-2-13B, is effective in discovering more ambiguous answers.
Published: 2024

20. Vulnerability Detection in C/C++ Code with Deep Learning

Author: Huang, Zhen and Aumpansub, Amy
Subjects: Computer Science - Cryptography and Security, Computer Science - Machine Learning
Abstract: Deep learning has been shown to be a promising tool in detecting software vulnerabilities. In this work, we train neural networks with program slices extracted from the source code of C/C++ programs to detect software vulnerabilities. The program slices capture the syntax and semantic characteristics of vulnerability-related program constructs, including API function call, array usage, pointer usage, and arithmetic expression. To achieve a strong prediction model for both vulnerable code and non-vulnerable code, we compare different types of training data, different optimizers, and different types of neural networks. Our result shows that combining different types of characteristics of source code and using a balanced number of vulnerable program slices and non-vulnerable program slices produce a balanced accuracy in predicting both vulnerable code and non-vulnerable code. Among different neural networks, BGRU with the ADAM optimizer performs the best in detecting software vulnerabilities with an accuracy of 92.49%.
Published: 2024

21. DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Author: DeepSeek-AI, Liu, Aixin, Feng, Bei, Wang, Bin, Wang, Bingxuan, Liu, Bo, Zhao, Chenggang, Dengr, Chengqi, Ruan, Chong, Dai, Damai, Guo, Daya, Yang, Dejian, Chen, Deli, Ji, Dongjie, Li, Erhang, Lin, Fangyun, Luo, Fuli, Hao, Guangbo, Chen, Guanting, Li, Guowei, Zhang, H., Xu, Hanwei, Yang, Hao, Zhang, Haowei, Ding, Honghui, Xin, Huajian, Gao, Huazuo, Li, Hui, Qu, Hui, Cai, J. L., Liang, Jian, Guo, Jianzhong, Ni, Jiaqi, Li, Jiashi, Chen, Jin, Yuan, Jingyang, Qiu, Junjie, Song, Junxiao, Dong, Kai, Gao, Kaige, Guan, Kang, Wang, Lean, Zhang, Lecong, Xu, Lei, Xia, Leyi, Zhao, Liang, Zhang, Liyue, Li, Meng, Wang, Miaojun, Zhang, Mingchuan, Zhang, Minghua, Tang, Minghui, Li, Mingming, Tian, Ning, Huang, Panpan, Wang, Peiyi, Zhang, Peng, Zhu, Qihao, Chen, Qinyu, Du, Qiushi, Chen, R. J., Jin, R. L., Ge, Ruiqi, Pan, Ruizhe, Xu, Runxin, Chen, Ruyi, Li, S. S., Lu, Shanghao, Zhou, Shangyan, Chen, Shanhuang, Wu, Shaoqing, Ye, Shengfeng, Ma, Shirong, Wang, Shiyu, Zhou, Shuang, Yu, Shuiping, Zhou, Shunfeng, Zheng, Size, Wang, T., Pei, Tian, Yuan, Tian, Sun, Tianyu, Xiao, W. L., Zeng, Wangding, An, Wei, Liu, Wen, Liang, Wenfeng, Gao, Wenjun, Zhang, Wentao, Li, X. Q., Jin, Xiangyue, Wang, Xianzu, Bi, Xiao, Liu, Xiaodong, Wang, Xiaohan, Shen, Xiaojin, Chen, Xiaokang, Chen, Xiaosha, Nie, Xiaotao, Sun, Xiaowen, Wang, Xiaoxiang, Liu, Xin, Xie, Xin, Yu, Xingkai, Song, Xinnan, Zhou, Xinyi, Yang, Xinyu, Lu, Xuan, Su, Xuecheng, Wu, Y., Li, Y. K., Wei, Y. X., Zhu, Y. X., Xu, Yanhong, Huang, Yanping, Li, Yao, Zhao, Yao, Sun, Yaofeng, Li, Yaohui, Wang, Yaohui, Zheng, Yi, Zhang, Yichao, Xiong, Yiliang, Zhao, Yilong, He, Ying, Tang, Ying, Piao, Yishi, Dong, Yixin, Tan, Yixuan, Liu, Yiyuan, Wang, Yongji, Guo, Yongqiang, Zhu, Yuchen, Wang, Yuduan, Zou, Yuheng, Zha, Yukun, Ma, Yunxian, Yan, Yuting, You, Yuxiang, Liu, Yuxuan, Ren, Z. Z., Ren, Zehui, Sha, Zhangli, Fu, Zhe, Huang, Zhen, Zhang, Zhen, Xie, Zhenda, Hao, Zhewen, Shao, Zhihong, Wen, Zhiniu, Xu, Zhipeng, Zhang, Zhongyu, Li, Zhuoshu, Wang, Zihan, Gu, Zihui, Li, Zilin, and Xie, Ziwei
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation. Compared with DeepSeek 67B, DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality and multi-source corpus consisting of 8.1T tokens, and further perform Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unlock its potential. Evaluation results show that, even with only 21B activated parameters, DeepSeek-V2 and its chat versions still achieve top-tier performance among open-source models.
Published: 2024

22. LTOS: Layout-controllable Text-Object Synthesis via Adaptive Cross-attention Fusions

Author: Zhao, Xiaoran, Wu, Tianhao, Lai, Yu, Tian, Zhiliang, Huang, Zhen, Liu, Yahui, He, Zejiang, and Li, Dongsheng
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Controllable text-to-image generation synthesizes visual text and objects in images with certain conditions, which are frequently applied to emoji and poster generation. Visual text rendering and layout-to-image generation tasks have been popular in controllable text-to-image generation. However, each of these tasks typically focuses on single modality generation or rendering, leaving yet-to-be-bridged gaps between the approaches correspondingly designed for each of the tasks. In this paper, we combine text rendering and layout-to-image generation tasks into a single task: layout-controllable text-object synthesis (LTOS) task, aiming at synthesizing images with object and visual text based on predefined object layout and text contents. As compliant datasets are not readily available for our LTOS task, we construct a layout-aware text-object synthesis dataset, containing elaborate well-aligned labels of visual text and object information. Based on the dataset, we propose a layout-controllable text-object adaptive fusion (TOF) framework, which generates images with clear, legible visual text and plausible objects. We construct a visual-text rendering module to synthesize text and employ an object-layout control module to generate objects while integrating the two modules to harmoniously generate and integrate text content and objects in images. To better the image-text integration, we propose a self-adaptive cross-attention fusion module that helps the image generation to attend more to important text information. Within such a fusion module, we use a self-adaptive learnable factor to learn to flexibly control the influence of cross-attention outputs on image generation. Experimental results show that our method outperforms the state-of-the-art in LTOS, text rendering, and layout-to-image tasks, enabling harmonious visual text rendering and object generation.
Published: 2024

23. Learn to Disguise: Avoid Refusal Responses in LLM's Defense via a Multi-agent Attacker-Disguiser Game

Author: Xu, Qianqiao, Tian, Zhiliang, Wu, Hongyan, Huang, Zhen, Song, Yiping, Liu, Feng, and Li, Dongsheng
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: With the enhanced performance of large models on natural language processing tasks, potential moral and ethical issues of large models arise. There exist malicious attackers who induce large models to jailbreak and generate information containing illegal, privacy-invasive information through techniques such as prompt engineering. As a result, large models counter malicious attackers' attacks using techniques such as safety alignment. However, the strong defense mechanism of the large model through rejection replies is easily identified by attackers and used to strengthen attackers' capabilities. In this paper, we propose a multi-agent attacker-disguiser game approach to achieve a weak defense mechanism that allows the large model to both safely reply to the attacker and hide the defense intent. First, we construct a multi-agent framework to simulate attack and defense scenarios, playing different roles to be responsible for attack, disguise, safety evaluation, and disguise evaluation tasks. After that, we design attack and disguise game algorithms to optimize the game strategies of the attacker and the disguiser and use the curriculum learning process to strengthen the capabilities of the agents. The experiments verify that the method in this paper is more effective in strengthening the model's ability to disguise the defense intent compared with other methods. Moreover, our approach can adapt any black-box large model to assist the model in defense and does not suffer from model version iterations., Comment: 13 pages, 2 figures
Published: 2024

24. Isotoosendanin inhibits triple-negative breast cancer metastasis by reducing mitochondrial fission and lamellipodia formation regulated by the Smad2/3-GOT2-MYH9 signaling axis

Author: Zhang, Jing-nan, Zhang, Ze, Huang, Zhen-lin, Guo, Qian, Wu, Ze-qi, Ke, Chuang, Lu, Bin, Wang, Zheng-tao, and Ji, Li-li
Published: 2024
Full Text: View/download PDF

25. KC-GenRe: A Knowledge-constrained Generative Re-ranking Method Based on Large Language Models for Knowledge Graph Completion

Author: Wang, Yilin, Hu, Minghao, Huang, Zhen, Li, Dongsheng, Yang, Dong, and Lu, Xicheng
Subjects: Computer Science - Artificial Intelligence
Abstract: The goal of knowledge graph completion (KGC) is to predict missing facts among entities. Previous methods for KGC re-ranking are mostly built on non-generative language models to obtain the probability of each candidate. Recently, generative large language models (LLMs) have shown outstanding performance on several tasks such as information extraction and dialog systems. Leveraging them for KGC re-ranking is beneficial for leveraging the extensive pre-trained knowledge and powerful generative capabilities. However, it may encounter new problems when accomplishing the task, namely mismatch, misordering and omission. To this end, we introduce KC-GenRe, a knowledge-constrained generative re-ranking method based on LLMs for KGC. To overcome the mismatch issue, we formulate the KGC re-ranking task as a candidate identifier sorting generation problem implemented by generative LLMs. To tackle the misordering issue, we develop a knowledge-guided interactive training method that enhances the identification and ranking of candidates. To address the omission issue, we design a knowledge-augmented constrained inference method that enables contextual prompting and controlled generation, so as to obtain valid rankings. Experimental results show that KG-GenRe achieves state-of-the-art performance on four datasets, with gains of up to 6.7% and 7.7% in the MRR and Hits@1 metric compared to previous methods, and 9.0% and 11.1% compared to that without re-ranking. Extensive analysis demonstrates the effectiveness of components in KG-GenRe., Comment: This paper has been accepted for publication in the proceedings of LREC-COLING 2024
Published: 2024

26. Classical-Quantum correspondence in Lindblad evolution

Author: Galkowski, Jeffrey, Huang, Zhen, and Zworski, Maciej
Subjects: Mathematical Physics, Mathematics - Analysis of PDEs, Quantum Physics
Abstract: We show that for the Lindblad evolution defined using (at most) quadratically growing classical Hamiltonians and (at most) linearly growing classical jump functions (quantized into jump operators assumed to satisfy certain ellipticity conditions and modeling interaction with a larger system), the evolution of a quantum observable remains close to the classical Fokker--Planck evolution in the Hilbert--Schmidt norm for times vastly exceeding the Ehrenfest time (the limit of such agreement with no jump operators). The time scale is the same as in the recent papers by Hern\'andez--Ranard--Riedel but the statement and methods are different. The appendix presents numerical experiments illustrating the classical/quantum correspondence in Lindblad evolution and comparing it to the mathematical results., Comment: Main article by Jeffrey Galkowski and Maciej Zworski with an appendix by Zhen Huang and Maciej Zworski -- new appendix with numerical experiments and a new section providing estimates for the Hilbert--Schmidt norm under Lindblad evolution
Published: 2024

27. Automating psychological hypothesis generation with AI: when large language models meet causal graph

Author: Tong, Song, Mao, Kai, Huang, Zhen, Zhao, Yukun, and Peng, Kaiping
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computers and Society
Abstract: Leveraging the synergy between causal knowledge graphs and a large language model (LLM), our study introduces a groundbreaking approach for computational hypothesis generation in psychology. We analyzed 43,312 psychology articles using a LLM to extract causal relation pairs. This analysis produced a specialized causal graph for psychology. Applying link prediction algorithms, we generated 130 potential psychological hypotheses focusing on `well-being', then compared them against research ideas conceived by doctoral scholars and those produced solely by the LLM. Interestingly, our combined approach of a LLM and causal graphs mirrored the expert-level insights in terms of novelty, clearly surpassing the LLM-only hypotheses (t(59) = 3.34, p=0.007 and t(59) = 4.32, p<0.001, respectively). This alignment was further corroborated using deep semantic analysis. Our results show that combining LLM with machine learning techniques such as causal knowledge graphs can revolutionize automated discovery in psychology, extracting novel insights from the extensive literature. This work stands at the crossroads of psychology and artificial intelligence, championing a new enriched paradigm for data-driven hypothesis generation in psychological research.
Published: 2024
Full Text: View/download PDF

28. POMP: Probability-driven Meta-graph Prompter for LLMs in Low-resource Unsupervised Neural Machine Translation

Author: Pan, Shilong, Tian, Zhiliang, Ding, Liang, Huang, Zhen, Wen, Zhihua, and Li, Dongsheng
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Low-resource languages (LRLs) face challenges in supervised neural machine translation due to limited parallel data, prompting research into unsupervised methods. Unsupervised neural machine translation (UNMT) methods, including back-translation, transfer learning, and pivot-based translation, offer practical solutions for LRL translation, but they are hindered by issues like synthetic data noise, language bias, and error propagation, which can potentially be mitigated by Large Language Models (LLMs). LLMs have advanced NMT with in-context learning (ICL) and supervised fine-tuning methods, but insufficient training data results in poor performance in LRLs. We argue that LLMs can mitigate the linguistic noise with auxiliary languages to improve translations in LRLs. In this paper, we propose Probability-driven Meta-graph Prompter (POMP), a novel approach employing a dynamic, sampling-based graph of multiple auxiliary languages to enhance LLMs' translation capabilities for LRLs. POMP involves constructing a directed acyclic meta-graph for each source language, from which we dynamically sample multiple paths to prompt LLMs to mitigate the linguistic noise and improve translations during training. We use the BLEURT metric to evaluate the translations and back-propagate rewards, estimated by scores, to update the probabilities of auxiliary languages in the paths. Our experiments show significant improvements in the translation quality of three LRLs, demonstrating the effectiveness of our approach.
Published: 2024

29. Topological transmission in Suzuki phase sonic crystals

Author: Huang, Zhen, Cervera, Francisco, Wu, Jiu Hui, Ibarias, Martin, Liu, Chongrui, Garcia-Chocano, Victor M., Ma, Fuyin, and Sanchez-Dehesa, Jose
Subjects: Physics - Applied Physics, Condensed Matter - Mesoscale and Nanoscale Physics
Abstract: This work reports topological extraordinary properties of sound transmission through topological states in sonic crystals denominated Suzuki phase, consisting of a rectangular lattice of vacancies created in a triangular lattice. These low-symmetry crystals exhibit unique properties due to the embedded lattice of vacancies. A generalized folding method explains the band structure and the quasi-type-II Dirac point in the Suzuki phase, which is related to the underlying triangular lattice. In analogy to the acoustic valley Hall effect, the Suzuki phase contains three types of topological edge states on the four possible interfaces separating two Suzuki phase crystals with distinct topological phases. The edge states have defined symmetries with inherent directionality, which affect the topological sound transmission and are different from chirality, valley vorticity or helicity. Particularly, the existence of topological deaf bands is here reported. The propagation of topological eigenmodes on the same interface is also different, which is quantified using the acoustic Shannon entropy, making the topological transport dependent on the frequency of the edge states. Based on the abundant topological edge states of Suzuki phase crystals, a multifunctional device with acoustic diodes, multi-channel transmission, and selective acoustic transmission can be designed. Numerical simulations and measurements demonstrate the topological transmission. Our work extends the research platform of acoustic topological states to lattices with low symmetry, which opens new avenues for enriching topological states with broad engineering applications., Comment: 29 pages, 9 figures, and 17 pages of supplementary materials
Published: 2024

30. Realizing topological edge states in graphene-like elastic metamaterials

Author: Huang, Zhen, Gao, Penglin, Ramirez, Federico B., Garcia-Tiscar, Jorge, Broatch, Alberto, Wu, Jiu Hui, Ma, Fuyin, and Sanchez-Dehesa, Jose
Subjects: Physics - Applied Physics, Physics - Classical Physics
Abstract: The study of topological states in electronic structures, which allows robust transport properties against impurities and defects, has been recently extended to the realm of elasticity. This work shows that nontrivial topological flexural edge states located on the free boundary of the elastic graphene-like metamaterial can be realized without breaking the time reversal, mirror, or inversion symmetry of the system. Numerical calculations and experimental studies demonstrate the robust transport of flexural waves along the boundaries of the designed structure. The topological edge states on the free boundary are not limited by the size of the finite structure, which can reduce the scale of the topological state system. In addition, unlike the edge states localized on the free boundary in graphene where the group velocity is zero, the edge states on the elastic metamaterial plate have propagation states with non-zero group velocity. There is a frequency range for the edge states, and we introduce the concept of Shannon entropy for elastic waves and use it to assess the frequency range of the edge states in graphene-like elastic metamaterials. This work represents a relevant advance in the study of elastic wave topological states, providing a theoretical basis for engineering applications such as vibration reduction and vibration isolation of mechanical structures., Comment: 22 pages, 8 figures, and 12 pages of supplementary information
Published: 2024

31. Study on the High-Temperature Mechanical Behavior and Parameters of Joints: a Case of a Prefabricated Frame Tunnel

Author: Huang, Zhen, Ye, Zhangqian, Hu, Zhaojian, Zhang, Jiawei, and Cao, Chen
Published: 2024
Full Text: View/download PDF

32. Conformer-Based Speech Recognition On Extreme Edge-Computing Devices

Author: Xu, Mingbin, Jin, Alex, Wang, Sicheng, Su, Mu, Ng, Tim, Mason, Henry, Han, Shiyi, Lei, Zhihong, Deng, Yaqiao, Huang, Zhen, and Krishnamoorthy, Mahesh
Subjects: Computer Science - Machine Learning, Computer Science - Performance
Abstract: With increasingly more powerful compute capabilities and resources in today's devices, traditionally compute-intensive automatic speech recognition (ASR) has been moving from the cloud to devices to better protect user privacy. However, it is still challenging to implement on-device ASR on resource-constrained devices, such as smartphones, smart wearables, and other smart home automation devices. In this paper, we propose a series of model architecture adaptions, neural network graph transformations, and numerical optimizations to fit an advanced Conformer based end-to-end streaming ASR system on resource-constrained devices without accuracy degradation. We achieve over 5.26 times faster than realtime (0.19 RTF) speech recognition on smart wearables while minimizing energy consumption and achieving state-of-the-art accuracy. The proposed methods are widely applicable to other transformer-based server-free AI applications. In addition, we provide a complete theory on optimal pre-normalizers that numerically stabilize layer normalization in any Lp-norm using any floating point precision.
Published: 2023

33. Enhanced Cross Z-Complementary Set and Its Application in Generalized Spatial Modulation

Author: Huang, Zhen-Ming, Pai, Cheng-Yu, Liu, Zilong, and Chen, Chao-Yu
Subjects: Computer Science - Information Theory
Abstract: Generalized spatial modulation (GSM) is a novel multiple-antenna technique offering flexibility among spectral efficiency, energy efficiency, and the cost of RF chains. In this paper, a novel class of sequence sets, called enhanced cross Zcomplementary set (E-CZCS), is proposed for efficient training sequence design in broadband GSM systems. Specifically, an E-CZCS consists of multiple CZCSs possessing front-end and tail-end zero-correlation zones (ZCZs), whereby any two distinct CZCSs have a tail-end ZCZ when a novel type of cross-channel aperiodic correlation sums is considered. The theoretical upper bound on the ZCZ width is first derived, upon which optimal E-CZCSs with flexible parameters are constructed. For optimal channel estimation over frequency-selective channels, we introduce and evaluate a novel GSM training framework employing the proposed E-CZCSs.
Published: 2023

34. FIRST: A Million-Entry Dataset for Text-Driven Fashion Synthesis and Design

Author: Huang, Zhen, Li, Yihao, Pei, Dong, Zhou, Jiapeng, Ning, Xuliang, Han, Jianlin, Han, Xiaoguang, and Chen, Xuejun
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Text-driven fashion synthesis and design is an extremely valuable part of artificial intelligence generative content(AIGC), which has the potential to propel a tremendous revolution in the traditional fashion industry. To advance the research on text-driven fashion synthesis and design, we introduce a new dataset comprising a million high-resolution fashion images with rich structured textual(FIRST) descriptions. In the FIRST, there is a wide range of attire categories and each image-paired textual description is organized at multiple hierarchical levels. Experiments on prevalent generative models trained over FISRT show the necessity of FIRST. We invite the community to further develop more intelligent fashion synthesis and design systems that make fashion design more creative and imaginative based on our dataset. The dataset will be released soon., Comment: 11 pages, 8 figures
Published: 2023

35. Personalization of CTC-based End-to-End Speech Recognition Using Pronunciation-Driven Subword Tokenization

Author: Lei, Zhihong, Pusateri, Ernest, Han, Shiyi, Liu, Leo, Xu, Mingbin, Ng, Tim, Travadi, Ruchir, Zhang, Youyuan, Hannemann, Mirko, Siu, Man-Hung, and Huang, Zhen
Subjects: Computer Science - Machine Learning
Abstract: Recent advances in deep learning and automatic speech recognition have improved the accuracy of end-to-end speech recognition systems, but recognition of personal content such as contact names remains a challenge. In this work, we describe our personalization solution for an end-to-end speech recognition system based on connectionist temporal classification. Building on previous work, we present a novel method for generating additional subword tokenizations for personal entities from their pronunciations. We show that using this technique in combination with two established techniques, contextual biasing and wordpiece prior normalization, we are able to achieve personal named entity accuracy on par with a competitive hybrid system.
Published: 2023

36. Acoustic Model Fusion for End-to-end Speech Recognition

Author: Lei, Zhihong, Xu, Mingbin, Han, Shiyi, Liu, Leo, Huang, Zhen, Ng, Tim, Zhang, Yuanyuan, Pusateri, Ernest, Hannemann, Mirko, Deng, Yaqiao, and Siu, Man-Hung
Subjects: Computer Science - Sound, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Recent advances in deep learning and automatic speech recognition (ASR) have enabled the end-to-end (E2E) ASR system and boosted the accuracy to a new level. The E2E systems implicitly model all conventional ASR components, such as the acoustic model (AM) and the language model (LM), in a single network trained on audio-text pairs. Despite this simpler system architecture, fusing a separate LM, trained exclusively on text corpora, into the E2E system has proven to be beneficial. However, the application of LM fusion presents certain drawbacks, such as its inability to address the domain mismatch issue inherent to the internal AM. Drawing inspiration from the concept of LM fusion, we propose the integration of an external AM into the E2E system to better address the domain mismatch. By implementing this novel approach, we have achieved a significant reduction in the word error rate, with an impressive drop of up to 14.3% across varied test sets. We also discovered that this AM fusion approach is particularly beneficial in enhancing named entity recognition.
Published: 2023

37. GROVE: A Retrieval-augmented Complex Story Generation Framework with A Forest of Evidence

Author: Wen, Zhihua, Tian, Zhiliang, Wu, Wei, Yang, Yuxin, Shi, Yanqi, Huang, Zhen, and Li, Dongsheng
Subjects: Computer Science - Computation and Language
Abstract: Conditional story generation is significant in human-machine interaction, particularly in producing stories with complex plots. While Large language models (LLMs) perform well on multiple NLP tasks, including story generation, it is challenging to generate stories with both complex and creative plots. Existing methods often rely on detailed prompts to guide LLMs to meet target conditions, which inadvertently restrict the creative potential of the generated stories. We argue that leveraging information from exemplary human-written stories facilitates generating more diverse plotlines. Delving deeper into story details helps build complex and credible plots. In this paper, we propose a retrieval-au\textbf{G}mented sto\textbf{R}y generation framework with a f\textbf{O}rest of e\textbf{V}id\textbf{E}nce (GROVE) to enhance stories' complexity. We build a retrieval repository for target conditions to produce few-shot examples to prompt LLMs. Additionally, we design an ``asking-why'' prompting scheme that extracts a forest of evidence, providing compensation for the ambiguities that may occur in the generated story. This iterative process uncovers underlying story backgrounds. Finally, we select the most fitting chains of evidence from the evidence forest and integrate them into the generated story, thereby enhancing the narrative's complexity and credibility. Experimental results and numerous examples verify the effectiveness of our method., Comment: Findings of EMNLP 2023
Published: 2023

38. Competition of electronic correlation and reconstruction in La1-xSrxTiO3/SrTiO3 heterostructures

Author: Wang, Xueyan, Sun, Lin, Ye, Chen, Huang, Zhen, Han, Kun, Huang, Ke, Yang, Allen Jian, Zeng, Shengwei, Loh, Xian Jun, Zhu, Qiang, Venkatesan, T., Ariando, Ariando, and Wang, X. Renshaw
Subjects: Condensed Matter - Strongly Correlated Electrons
Abstract: Electronic correlation and reconstruction are two important factors that play a critical role in shaping the magnetic and electronic properties of correlated low-dimensional systems. Here, we report a competition between the electronic correlation and structural reconstruction in La1-xSrxTiO3/SrTiO3 heterostructures by modulating material polarity and interfacial strain, respectively. The heterostructures exhibit a critical thickness (tc) at which a metal-to-insulator transition (MIT) abruptly occurs at certain thickness, accompanied by the coexistence of two- and three-dimensional (2D and 3D) carriers. Intriguingly, the tc exhibits a V-shaped dependence on the doping concentration of Sr, with the smallest tc value at x = 0.5. We attribute this V-shaped dependence to the competition between the electronic reconstruction (modulated by the polarity) and the electronic correlation (modulated by strain), which are borne out by the experimental results, including strain-dependent electronic properties and the evolution of 2D and 3D carriers. Our findings underscore the significance of the interplay between electronic reconstruction and correlation in the realization and utilization of emergent electronic functionalities in low-dimensional correlated systems.
Published: 2023

39. Functional annotation of the Hippo pathway somatic mutations in human cancers

Author: Han, Han, Huang, Zhen, Xu, Congsheng, Seo, Gayoung, An, Jeongmin, Yang, Bing, Liu, Yuhan, Lan, Tian, Yan, Jiachen, Ren, Shanshan, Xu, Yue, Xiao, Di, Yan, Jonathan K., Ahn, Claire, Fishman, Dmitry A., Meng, Zhipeng, Guan, Kun-Liang, Qi, Ruxi, Luo, Ray, and Wang, Wenqi
Published: 2024
Full Text: View/download PDF

40. A 18F-FDG PET/CT-based deep learning-radiomics-clinical model for prediction of cervical lymph node metastasis in esophageal squamous cell carcinoma

Author: Yuan, Ping, Huang, Zhen-Hao, Yang, Yun-Hai, Bao, Fei-Chao, Sun, Ke, Chao, Fang-Fang, Liu, Ting-Ting, Zhang, Jing-Jing, Xu, Jin-Ming, Li, Xiang-Nan, Li, Feng, Ma, Tao, Li, Hao, Li, Zi-Hao, Zhang, Shan-Feng, Hu, Jian, and Qi, Yu
Published: 2024
Full Text: View/download PDF

41. Challenges and strategies toward oncolytic virotherapy for leptomeningeal metastasis

Author: Zhao, Jia-Li, Lin, Bi-Lin, Luo, Chen, Yi, Yan-ling, Huang, Peng, Chen, Yu, Zhao, Sha, Huang, Zhen-Jie, Ma, Xin-Yi, and Huang, Long
Published: 2024
Full Text: View/download PDF

42. Chronic thyrotoxic myopathy development is associated with thyroid hormone sensitivity index, predicted by lower-limb fatigue and the squat-up test

Author: Fu, Shi-en, Liang, Xing-huan, Huang, Li-li, Xian, Jing, Wu, Xi-zhen, Pan, Jie, Chen, Xue-lan, Kuang, Ya-qi, Wu, Chun-jiao, Li, Qiao-li, Liu, Xiao-fan, Huang, Zi-en, Wei, Ting-ting, Qin, Ying-fen, Huang, Zhen-xing, Yang, Hai-yan, Lan, Shan-shan, Lu, De-cheng, and Luo, Zuo-jie
Published: 2024
Full Text: View/download PDF

43. Survival and functional outcomes after hemiarthroplasty in children with proximal tibial osteosarcoma

Author: Li, Yuan, Xu, Hairong, Yang, Yongkun, Shan, Huachao, Huang, Zhen, Ma, Ke, Liu, Weifeng, and Niu, Xiaohui
Published: 2024
Full Text: View/download PDF

44. Experimental study on bioaerosols behavior and purification measures in a subway compartment

Author: Xu, Renze, Wu, Fan, Shen, Lian, Fan, Zhiqiang, Yu, Jianci, and Huang, Zhen
Published: 2024
Full Text: View/download PDF

45. Single-cell exome sequencing reveals polyclonal seeding and TRPS1 mutations in colon cancer metastasis

Author: Cai, Jianqiang, Zhang, Weilong, Lu, Yalan, Liu, Wenjie, Zhou, Haitao, Liu, Mei, Bi, Xinyu, Liu, Jianmei, Chen, Jinghua, Yin, Yanjiang, Deng, Yiqiao, Luo, Zhiwen, Yang, Yi, Chen, Qichen, Chen, Xiao, Xu, Zheng, Zhang, Yueyang, Wu, Chaoling, Long, Qizhao, Huang, Chunyuan, Yan, Changjian, Liu, Yan, Guo, Lei, Li, Weihua, Yuan, Pei, Jiao, Yucheng, Song, Wei, Wang, Xiaobing, Huang, Zhen, Ying, Jianming, and Zhao, Hong
Published: 2024
Full Text: View/download PDF

46. Utilizing full-spectrum sunlight for ammonia decomposition to hydrogen over GaN nanowires-supported Ru nanoparticles on silicon

Author: Li, Jinglin, Sheng, Bowen, Chen, Yiqing, Yang, Jiajia, Wang, Ping, Li, Yixin, Yu, Tianqi, Pan, Hu, Qiu, Liang, Li, Ying, Song, Jun, Zhu, Lei, Wang, Xinqiang, Huang, Zhen, and Zhou, Baowen
Published: 2024
Full Text: View/download PDF

47. PIEZO1 targeting in macrophages boosts phagocytic activity and foam cell apoptosis in atherosclerosis

Author: Pourteymour, Shirin, Fan, Jingxue, Majhi, Rakesh Kumar, Guo, Shuyuan, Sun, Xin, Huang, Zhen, Liu, Ying, Winter, Hanna, Bäcklund, Alexandra, Skenteris, Nikolaos-Taxiarchis, Chernogubova, Ekaterina, Werngren, Olivera, Li, Zhaolong, Skogsberg, Josefin, Li, Yuhuang, Matic, Ljubica, Hedin, Ulf, Maegdefessel, Lars, Ehrenborg, Ewa, Tian, Ye, and Jin, Hong
Published: 2024
Full Text: View/download PDF

48. Development and validation of an artificial intelligence model for predicting de novo distant bone metastasis in breast cancer: a dual-center study

Author: Zhang, Wen-hai, Tan, Yang, Huang, Zhen, Tan, Qi-xing, Zhang, Yue-mei, and Wei, Chang-yuan
Published: 2024
Full Text: View/download PDF

49. AI models predicting breast cancer distant metastasis using LightGBM with clinical blood markers and ultrasound maximum diameter

Author: Tan, Yang, Zhang, Wen-hai, Huang, Zhen, Tan, Qi-xing, Zhang, Yue-mei, Wei, Chang-yuan, and Feng, Zhen-Bo
Published: 2024
Full Text: View/download PDF

50. The effects of temperament type on infusion extravasation in newborns

Author: Huang, Fang, Huang, Li-xuan, Huang, Zhen-peng, Wei, Jiao-jiao, and Lu, Chang-jiang
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

4,762 results on '"Huang, Zhen"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources