Author: "Wang, Shuohuan" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Wang, Shuohuan"' showing total 50 results

Start Over Author "Wang, Shuohuan"

50 results on '"Wang, Shuohuan"'

1. MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions

Author: Chai, Yekun, Sun, Haoran, Fang, Huang, Wang, Shuohuan, Sun, Yu, and Wu, Hua
Subjects: Computer Science - Computation and Language
Abstract: Reinforcement learning from human feedback (RLHF) has demonstrated effectiveness in aligning large language models (LLMs) with human preferences. However, token-level RLHF suffers from the credit assignment problem over long sequences, where delayed rewards make it challenging for the model to discern which actions contributed to successful outcomes. This hinders learning efficiency and slows convergence. In this paper, we propose MA-RLHF, a simple yet effective RLHF framework that incorporates macro actions -- sequences of tokens or higher-level language constructs -- into the learning process. By operating at this higher level of abstraction, our approach reduces the temporal distance between actions and rewards, facilitating faster and more accurate credit assignment. This results in more stable policy gradient estimates and enhances learning efficiency within each episode, all without increasing computational complexity during training or inference. We validate our approach through extensive experiments across various model sizes and tasks, including text summarization, dialogue generation, question answering, and program synthesis. Our method achieves substantial performance improvements over standard RLHF, with performance gains of up to 30% in text summarization and code generation, 18% in dialogue, and 8% in question answering tasks. Notably, our approach reaches parity with vanilla RLHF 1.7x to 2x faster in terms of training time and continues to outperform it with further training. We will make our code and data publicly available at https://github.com/ernie-research/MA-RLHF .
Published: 2024

2. Upcycling Instruction Tuning from Dense to Mixture-of-Experts via Parameter Merging

Author: Hui, Tingfeng, Zhang, Zhenyu, Wang, Shuohuan, Sun, Yu, Wu, Hua, and Su, Sen
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Mixture-of-Experts (MoE) shines brightly in large language models (LLMs) and demonstrates outstanding performance in plentiful natural language processing tasks. However, existing methods transforming LLMs from dense to MoE face significant data requirements and typically rely on large-scale post-training. In this paper, we propose Upcycling Instruction Tuning (UpIT), a data-efficient approach for tuning a dense pre-trained model into a MoE instruction model. Specifically, we first point out that intermediate checkpoints during instruction tuning of the dense model are naturally suitable for specialized experts, and then propose an expert expansion stage to flexibly achieve models with flexible numbers of experts, where genetic algorithm and parameter merging are introduced to ensure sufficient diversity of new extended experts. To ensure that each specialized expert in the MoE model works as expected, we select a small amount of seed data that each expert excels to pre-optimize the router. Extensive experiments with various data scales and upcycling settings demonstrate the outstanding performance and data efficiency of UpIT, as well as stable improvement in expert or data scaling. Further analysis reveals the importance of ensuring expert diversity in upcycling., Comment: work in progress
Published: 2024

3. NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time

Author: Chen, Yilong, Wang, Guoxia, Shang, Junyuan, Cui, Shiyao, Zhang, Zhenyu, Liu, Tingwen, Wang, Shuohuan, Sun, Yu, Yu, Dianhai, and Wu, Hua
Subjects: Computer Science - Computation and Language
Abstract: Large Language Models (LLMs) have ignited an innovative surge of AI applications, marking a new era of exciting possibilities equipped with extended context windows. However, hosting these models is cost-prohibitive mainly due to the extensive memory consumption of KV Cache involving long-context modeling. Despite several works proposing to evict unnecessary tokens from the KV Cache, most of them rely on the biased local statistics of accumulated attention scores and report performance using unconvincing metric like perplexity on inadequate short-text evaluation. In this paper, we propose NACL, a general framework for long-context KV cache eviction that achieves more optimal and efficient eviction in a single operation during the encoding phase. Due to NACL's efficiency, we combine more accurate attention score statistics in PROXY TOKENS EVICTION with the diversified random eviction strategy of RANDOM EVICTION, aiming to alleviate the issue of attention bias and enhance the robustness in maintaining pivotal tokens for long-context modeling tasks. Notably, our method significantly improves the performance on short- and long-text tasks by 80% and 76% respectively, reducing KV Cache by up to 50% with over 95% performance maintenance. The code is available at https://github.com/PaddlePaddle/Research/tree/master/NLP/ACL2024-NACL., Comment: Accepted by ACL 2024 (main conference, long paper)
Published: 2024

4. DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion

Author: Chen, Yilong, Zhang, Linhao, Shang, Junyuan, Zhang, Zhenyu, Liu, Tingwen, Wang, Shuohuan, and Sun, Yu
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Large language models (LLMs) with billions of parameters demonstrate impressive performance. However, the widely used Multi-Head Attention (MHA) in LLMs incurs substantial computational and memory costs during inference. While some efforts have optimized attention mechanisms by pruning heads or sharing parameters among heads, these methods often lead to performance degradation or necessitate substantial continued pre-training costs to restore performance. Based on the analysis of attention redundancy, we design a Decoupled-Head Attention (DHA) mechanism. DHA adaptively configures group sharing for key heads and value heads across various layers, achieving a better balance between performance and efficiency. Inspired by the observation of clustering similar heads, we propose to progressively transform the MHA checkpoint into the DHA model through linear fusion of similar head parameters step by step, retaining the parametric knowledge of the MHA checkpoint. We construct DHA models by transforming various scales of MHA checkpoints given target head budgets. Our experiments show that DHA remarkably requires a mere 0.25\% of the original model's pre-training budgets to achieve 97.6\% of performance while saving 75\% of KV cache. Compared to Group-Query Attention (GQA), DHA achieves a 5$\times$ training acceleration, a maximum of 13.93\% performance improvement under 0.01\% pre-training budget, and 4\% relative improvement under 0.05\% pre-training budget., Comment: 10 pages, 9 figures, 3 tables
Published: 2024

5. HFT: Half Fine-Tuning for Large Language Models

Author: Hui, Tingfeng, Zhang, Zhenyu, Wang, Shuohuan, Xu, Weiran, Sun, Yu, and Wu, Hua
Subjects: Computer Science - Computation and Language
Abstract: Large language models (LLMs) with one or more fine-tuning phases have become a necessary step to unlock various capabilities, enabling LLMs to follow natural language instructions or align with human preferences. However, it carries the risk of catastrophic forgetting during sequential training, the parametric knowledge or the ability learned in previous stages may be overwhelmed by incoming training data. In this paper, we find that by regularly resetting partial parameters, LLMs can restore some of the original knowledge. Inspired by this, we introduce Half Fine-Tuning (HFT) for LLMs, as a substitute for full fine-tuning (FFT), to mitigate the forgetting issues, where half of the parameters are selected to learn new tasks while the other half are frozen to remain previous knowledge. We provide a feasibility analysis from the perspective of optimization and interpret the parameter selection operation as a regularization term. Without changing the model architecture, HFT could be seamlessly integrated into existing fine-tuning frameworks. Extensive experiments and analysis on supervised fine-tuning, direct preference optimization, and continual learning consistently demonstrate the effectiveness, robustness, and efficiency of HFT. Compared with FFT, HFT not only significantly alleviates the forgetting problem, but also achieves the best performance in a series of downstream benchmarks, with an approximately 30% reduction in training time., Comment: Work in progress
Published: 2024

6. Autoregressive Pre-Training on Pixels and Texts

Author: Chai, Yekun, Liu, Qingyi, Xiao, Jingwu, Wang, Shuohuan, Sun, Yu, and Wu, Hua
Subjects: Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition
Abstract: The integration of visual and textual information represents a promising direction in the advancement of language models. In this paper, we explore the dual modality of language--both visual and textual--within an autoregressive framework, pre-trained on both document images and texts. Our method employs a multimodal training strategy, utilizing visual data through next patch prediction with a regression head and/or textual data through next token prediction with a classification head. We focus on understanding the interaction between these two modalities and their combined impact on model performance. Our extensive evaluation across a wide range of benchmarks shows that incorporating both visual and textual data significantly improves the performance of pixel-based language models. Remarkably, we find that a unidirectional pixel-based model trained solely on visual data can achieve comparable results to state-of-the-art bidirectional models on several language understanding tasks. This work uncovers the untapped potential of integrating visual and textual modalities for more effective language modeling. We release our code, data, and model checkpoints at \url{https://github.com/ernie-research/pixelgpt}., Comment: EMNLP 2024
Published: 2024

7. On Training Data Influence of GPT Models

Author: Chai, Yekun, Liu, Qingyi, Wang, Shuohuan, Sun, Yu, Peng, Qiwei, and Wu, Hua
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Amidst the rapid advancements in generative language models, the investigation of how training data shapes the performance of GPT models is still emerging. This paper presents GPTfluence, a novel approach that leverages a featurized simulation to assess the impact of training examples on the training dynamics of GPT models. Our approach not only traces the influence of individual training instances on performance trajectories, such as loss and other key metrics, on targeted test points but also enables a comprehensive comparison with existing methods across various training scenarios in GPT models, ranging from 14 million to 2.8 billion parameters, across a range of downstream tasks. Contrary to earlier methods that struggle with generalization to new data, GPTfluence introduces a parameterized simulation of training dynamics, demonstrating robust generalization capabilities to unseen training data. This adaptability is evident across both fine-tuning and instruction-tuning scenarios, spanning tasks in natural language understanding and generation. We make our code and data publicly available at https://github.com/ernie-research/gptfluence., Comment: EMNLP 2024
Published: 2024

8. Tool-Augmented Reward Modeling

Author: Li, Lei, Chai, Yekun, Wang, Shuohuan, Sun, Yu, Tian, Hao, Zhang, Ningyu, and Wu, Hua
Subjects: Computer Science - Computation and Language
Abstract: Reward modeling (a.k.a., preference modeling) is instrumental for aligning large language models with human preferences, particularly within the context of reinforcement learning from human feedback (RLHF). While conventional reward models (RMs) have exhibited remarkable scalability, they oft struggle with fundamental functionality such as arithmetic computation, code execution, and factual lookup. In this paper, we propose a tool-augmented preference modeling approach, named Themis, to address these limitations by empowering RMs with access to external environments, including calculators and search engines. This approach not only fosters synergy between tool utilization and reward grading but also enhances interpretive capacity and scoring reliability. Our study delves into the integration of external tools into RMs, enabling them to interact with diverse external sources and construct task-specific tool engagement and reasoning traces in an autoregressive manner. We validate our approach across a wide range of domains, incorporating seven distinct external tools. Our experimental results demonstrate a noteworthy overall improvement of 17.7% across eight tasks in preference ranking. Furthermore, our approach outperforms Gopher 280B by 7.3% on TruthfulQA task in zero-shot evaluation. In human evaluations, RLHF trained with Themis attains an average win rate of 32% when compared to baselines across four distinct tasks. Additionally, we provide a comprehensive collection of tool-related RM datasets, incorporating data from seven distinct tool APIs, totaling 15,000 instances. We have made the code, data, and model checkpoints publicly available to facilitate and inspire further research advancements\footnote{\url{https://github.com/ernie-research/Tool-Augmented-Reward-Model}}., Comment: ICLR 2024 Spotlight
Published: 2023

9. ERNIE-Music: Text-to-Waveform Music Generation with Diffusion Models

Author: Zhu, Pengfei, Pang, Chao, Chai, Yekun, Li, Lei, Wang, Shuohuan, Sun, Yu, Tian, Hao, and Wu, Hua
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Multimedia, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: In recent years, the burgeoning interest in diffusion models has led to significant advances in image and speech generation. Nevertheless, the direct synthesis of music waveforms from unrestricted textual prompts remains a relatively underexplored domain. In response to this lacuna, this paper introduces a pioneering contribution in the form of a text-to-waveform music generation model, underpinned by the utilization of diffusion models. Our methodology hinges on the innovative incorporation of free-form textual prompts as conditional factors to guide the waveform generation process within the diffusion model framework. Addressing the challenge of limited text-music parallel data, we undertake the creation of a dataset by harnessing web resources, a task facilitated by weak supervision techniques. Furthermore, a rigorous empirical inquiry is undertaken to contrast the efficacy of two distinct prompt formats for text conditioning, namely, music tags and unconstrained textual descriptions. The outcomes of this comparative analysis affirm the superior performance of our proposed model in terms of enhancing text-music relevance. Finally, our work culminates in a demonstrative exhibition of the excellent capabilities of our model in text-to-music generation. We further demonstrate that our generated music in the waveform domain outperforms previous works by a large margin in terms of diversity, quality, and text-music relevance., Comment: Accepted by AACL demo 2023
Published: 2023

10. ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages

Author: Chai, Yekun, Wang, Shuohuan, Pang, Chao, Sun, Yu, Tian, Hao, and Wu, Hua
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning, Computer Science - Programming Languages, Computer Science - Software Engineering
Abstract: Software engineers working with the same programming language (PL) may speak different natural languages (NLs) and vice versa, erecting huge barriers to communication and working efficiency. Recent studies have demonstrated the effectiveness of generative pre-training in computer programs, yet they are always English-centric. In this work, we step towards bridging the gap between multilingual NLs and multilingual PLs for large language models (LLMs). We release ERNIE-Code, a unified pre-trained language model for 116 NLs and 6 PLs. We employ two methods for universal cross-lingual pre-training: span-corruption language modeling that learns patterns from monolingual NL or PL; and pivot-based translation language modeling that relies on parallel data of many NLs and PLs. Extensive results show that ERNIE-Code outperforms previous multilingual LLMs for PL or NL across a wide range of end tasks of code intelligence, including multilingual code-to-text, text-to-code, code-to-code, and text-to-text generation. We further show its advantage of zero-shot prompting on multilingual code summarization and text-to-text translation. We release our code and pre-trained checkpoints., Comment: Accepted at ACL 2023 (Findings)
Published: 2022

11. X-PuDu at SemEval-2022 Task 6: Multilingual Learning for English and Arabic Sarcasm Detection

Author: Han, Yaqian, Chai, Yekun, Wang, Shuohuan, Sun, Yu, Huang, Hongyi, Chen, Guanghao, Xu, Yitong, and Yang, Yang
Subjects: Computer Science - Computation and Language
Abstract: Detecting sarcasm and verbal irony from people's subjective statements is crucial to understanding their intended meanings and real sentiments and positions in social scenarios. This paper describes the X-PuDu system that participated in SemEval-2022 Task 6, iSarcasmEval - Intended Sarcasm Detection in English and Arabic, which aims at detecting intended sarcasm in various settings of natural language understanding. Our solution finetunes pre-trained language models, such as ERNIE-M and DeBERTa, under the multilingual settings to recognize the irony from Arabic and English texts. Our system ranked second out of 43, and ninth out of 32 in Task A: one-sentence detection in English and Arabic; fifth out of 22 in Task B: binary multi-label classification in English; first out of 16, and fifth out of 13 in Task C: sentence-pair detection in English and Arabic., Comment: SemEval-2022 Task 6
Published: 2022

12. X-PuDu at SemEval-2022 Task 7: A Replaced Token Detection Task Pre-trained Model with Pattern-aware Ensembling for Identifying Plausible Clarifications

Author: Shang, Junyuan, Wang, Shuohuan, Sun, Yu, Yu, Yanjun, Zhou, Yue, Xiang, Li, and Yang, Guixiu
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: This paper describes our winning system on SemEval 2022 Task 7: Identifying Plausible Clarifications of Implicit and Underspecified Phrases in Instructional Texts. A replaced token detection pre-trained model is utilized with minorly different task-specific heads for SubTask-A: Multi-class Classification and SubTask-B: Ranking. Incorporating a pattern-aware ensemble method, our system achieves a 68.90% accuracy score and 0.8070 spearman's rank correlation score surpassing the 2nd place with a large margin by 2.7 and 2.2 percent points for SubTask-A and SubTask-B, respectively. Our approach is simple and easy to implement, and we conducted ablation studies and qualitative and quantitative analyses for the working strategies used in our system., Comment: Accepted at the 16th International Workshop on Semantic Evaluation (SemEval-2022), NAACL
Published: 2022
Full Text: View/download PDF

13. ERNIE-UniX2: A Unified Cross-lingual Cross-modal Framework for Understanding and Generation

Author: Shan, Bin, Han, Yaqian, Yin, Weichong, Wang, Shuohuan, Sun, Yu, Tian, Hao, Wu, Hua, and Wang, Haifeng
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent cross-lingual cross-modal works attempt to extend Vision-Language Pre-training (VLP) models to non-English inputs and achieve impressive performance. However, these models focus only on understanding tasks utilizing encoder-only architecture. In this paper, we propose ERNIE-UniX2, a unified cross-lingual cross-modal pre-training framework for both generation and understanding tasks. ERNIE-UniX2 integrates multiple pre-training paradigms (e.g., contrastive learning and language modeling) based on encoder-decoder architecture and attempts to learn a better joint representation across languages and modalities. Furthermore, ERNIE-UniX2 can be seamlessly fine-tuned for varieties of generation and understanding downstream tasks. Pre-trained on both multilingual text-only and image-text datasets, ERNIE-UniX2 achieves SOTA results on various cross-lingual cross-modal generation and understanding tasks such as multimodal machine translation and multilingual visual question answering., Comment: 13 pages, 2 figures
Published: 2022

14. ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech

Author: Fan, Xiaoran, Pang, Chao, Yuan, Tian, Bai, He, Zheng, Renjie, Zhu, Pengfei, Wang, Shuohuan, Chen, Junkun, Chen, Zeyu, Huang, Liang, Sun, Yu, and Wu, Hua
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Computation and Language, Computer Science - Sound
Abstract: Speech representation learning has improved both speech understanding and speech synthesis tasks for single language. However, its ability in cross-lingual scenarios has not been explored. In this paper, we extend the pretraining method for cross-lingual multi-speaker speech synthesis tasks, including cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing. We propose a speech-text joint pretraining framework, where we randomly mask the spectrogram and the phonemes given a speech example and its transcription. By learning to reconstruct the masked parts of the input in different languages, our model shows great improvements over speaker-embedding-based multi-speaker TTS methods. Moreover, our framework is end-to-end for both the training and the inference without any finetuning effort. In cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing tasks, our experiments show that our model outperforms speaker-embedding-based multi-speaker TTS methods.
Published: 2022

15. Clip-Tuning: Towards Derivative-free Prompt Learning with a Mixture of Rewards

Author: Chai, Yekun, Wang, Shuohuan, Sun, Yu, Tian, Hao, Wu, Hua, and Wang, Haifeng
Subjects: Computer Science - Computation and Language
Abstract: Derivative-free prompt learning has emerged as a lightweight alternative to prompt tuning, which only requires model inference to optimize the prompts. However, existing work did not take full advantage of the over-parameterized characteristics of large pre-trained language models (PLMs). In this paper, we propose Clip-Tuning, a simple yet effective method that adopts diverse frozen "thinned" networks of PLMs to obtain a mixture of rewards and thus advance the derivative-free prompt learning. The thinned networks consist of all the hidden units that survive a stationary dropout strategy, whose inference predictions reflect an ensemble of partial views over prompted training samples. Our method outperforms previous gradient-free prompt learning methods and achieves parity with gradient-based counterparts on seven language understanding benchmarks under few-shot settings., Comment: EMNLP 2022 (Findings)
Published: 2022

16. Nebula-I: A General Framework for Collaboratively Training Deep Learning Models on Low-Bandwidth Cloud Clusters

Author: Xiang, Yang, Wu, Zhihua, Gong, Weibao, Ding, Siyu, Mo, Xianjie, Liu, Yuang, Wang, Shuohuan, Liu, Peng, Hou, Yongshuai, Li, Long, Wang, Bin, Shi, Shaohuai, Han, Yaqian, Yu, Yue, Li, Ge, Sun, Yu, Ma, Yanjun, and Yu, Dianhai
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: The ever-growing model size and scale of compute have attracted increasing interests in training deep learning models over multiple nodes. However, when it comes to training on cloud clusters, especially across remote clusters, huge challenges are faced. In this work, we introduce a general framework, Nebula-I, for collaboratively training deep learning models over remote heterogeneous clusters, the connections between which are low-bandwidth wide area networks (WANs). We took natural language processing (NLP) as an example to show how Nebula-I works in different training phases that include: a) pre-training a multilingual language model using two remote clusters; and b) fine-tuning a machine translation model using knowledge distilled from pre-trained models, which run through the most popular paradigm of recent deep learning. To balance the accuracy and communication efficiency, in Nebula-I, parameter-efficient training strategies, hybrid parallel computing methods and adaptive communication acceleration techniques are jointly applied. Meanwhile, security strategies are employed to guarantee the safety, reliability and privacy in intra-cluster computation and inter-cluster communication. Nebula-I is implemented with the PaddlePaddle deep learning framework, which can support collaborative training over heterogeneous hardware, e.g. GPU and NPU. Experiments demonstrate that the proposed framework could substantially maximize the training efficiency while preserving satisfactory NLP performance. By using Nebula-I, users can run large-scale training tasks over cloud clusters with minimum developments, and the utility of existed large pre-trained models could be further promoted. We also introduced new state-of-the-art results on cross-lingual natural language inference tasks, which are generated based upon a novel learning framework and Nebula-I., Comment: 20 pages, 10 figures, technical report
Published: 2022

17. ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation

Author: Wang, Shuohuan, Sun, Yu, Xiang, Yang, Wu, Zhihua, Ding, Siyu, Gong, Weibao, Feng, Shikun, Shang, Junyuan, Zhao, Yanbin, Pang, Chao, Liu, Jiaxiang, Chen, Xuyi, Lu, Yuxiang, Liu, Weixin, Wang, Xi, Bai, Yangfan, Chen, Qiuliang, Zhao, Li, Li, Shiyong, Sun, Peng, Yu, Dianhai, Ma, Yanjun, Tian, Hao, Wu, Hua, Wu, Tian, Zeng, Wei, Li, Ge, Gao, Wen, and Wang, Haifeng
Subjects: Computer Science - Computation and Language
Abstract: Pre-trained language models have achieved state-of-the-art results in various Natural Language Processing (NLP) tasks. GPT-3 has shown that scaling up pre-trained language models can further exploit their enormous potential. A unified framework named ERNIE 3.0 was recently proposed for pre-training large-scale knowledge enhanced models and trained a model with 10 billion parameters. ERNIE 3.0 outperformed the state-of-the-art models on various NLP tasks. In order to explore the performance of scaling up ERNIE 3.0, we train a hundred-billion-parameter model called ERNIE 3.0 Titan with up to 260 billion parameters on the PaddlePaddle platform. Furthermore, we design a self-supervised adversarial loss and a controllable language modeling loss to make ERNIE 3.0 Titan generate credible and controllable texts. To reduce the computation overhead and carbon emission, we propose an online distillation framework for ERNIE 3.0 Titan, where the teacher model will teach students and train itself simultaneously. ERNIE 3.0 Titan is the largest Chinese dense pre-trained model so far. Empirical results show that the ERNIE 3.0 Titan outperforms the state-of-the-art models on 68 NLP datasets., Comment: arXiv admin note: text overlap with arXiv:2107.02137
Published: 2021

18. ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation

Author: Sun, Yu, Wang, Shuohuan, Feng, Shikun, Ding, Siyu, Pang, Chao, Shang, Junyuan, Liu, Jiaxiang, Chen, Xuyi, Zhao, Yanbin, Lu, Yuxiang, Liu, Weixin, Wu, Zhihua, Gong, Weibao, Liang, Jianzhong, Shang, Zhizhou, Sun, Peng, Liu, Wei, Ouyang, Xuan, Yu, Dianhai, Tian, Hao, Wu, Hua, and Wang, Haifeng
Subjects: Computer Science - Computation and Language
Abstract: Pre-trained models have achieved state-of-the-art results in various Natural Language Processing (NLP) tasks. Recent works such as T5 and GPT-3 have shown that scaling up pre-trained language models can improve their generalization abilities. Particularly, the GPT-3 model with 175 billion parameters shows its strong task-agnostic zero-shot/few-shot learning capabilities. Despite their success, these large-scale models are trained on plain texts without introducing knowledge such as linguistic knowledge and world knowledge. In addition, most large-scale models are trained in an auto-regressive way. As a result, this kind of traditional fine-tuning approach demonstrates relatively weak performance when solving downstream language understanding tasks. In order to solve the above problems, we propose a unified framework named ERNIE 3.0 for pre-training large-scale knowledge enhanced models. It fuses auto-regressive network and auto-encoding network, so that the trained model can be easily tailored for both natural language understanding and generation tasks with zero-shot learning, few-shot learning or fine-tuning. We trained the model with 10 billion parameters on a 4TB corpus consisting of plain texts and a large-scale knowledge graph. Empirical results show that the model outperforms the state-of-the-art models on 54 Chinese NLP tasks, and its English version achieves the first place on the SuperGLUE benchmark (July 3, 2021), surpassing the human performance by +0.8% (90.6% vs. 89.8%).
Published: 2021

19. ERNIE-Doc: A Retrospective Long-Document Modeling Transformer

Author: Ding, Siyu, Shang, Junyuan, Wang, Shuohuan, Sun, Yu, Tian, Hao, Wu, Hua, and Wang, Haifeng
Subjects: Computer Science - Computation and Language
Abstract: Transformers are not suited for processing long documents, due to their quadratically increasing memory and time consumption. Simply truncating a long document or applying the sparse attention mechanism will incur the context fragmentation problem or lead to an inferior modeling capability against comparable model sizes. In this paper, we propose ERNIE-Doc, a document-level language pretraining model based on Recurrence Transformers. Two well-designed techniques, namely the retrospective feed mechanism and the enhanced recurrence mechanism, enable ERNIE-Doc, which has a much longer effective context length, to capture the contextual information of a complete document. We pretrain ERNIE-Doc to explicitly learn the relationships among segments with an additional document-aware segment-reordering objective. Various experiments were conducted on both English and Chinese document-level tasks. ERNIE-Doc improved the state-of-the-art language modeling result of perplexity to 16.8 on WikiText-103. Moreover, it outperformed competitive pretraining models by a large margin on most language understanding tasks, such as text classification and question answering., Comment: Accepted by ACL 2021 (main conference, long paper)
Published: 2020

20. ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora

Author: Ouyang, Xuan, Wang, Shuohuan, Pang, Chao, Sun, Yu, Tian, Hao, Wu, Hua, and Wang, Haifeng
Subjects: Computer Science - Computation and Language
Abstract: Recent studies have demonstrated that pre-trained cross-lingual models achieve impressive performance in downstream cross-lingual tasks. This improvement benefits from learning a large amount of monolingual and parallel corpora. Although it is generally acknowledged that parallel corpora are critical for improving the model performance, existing methods are often constrained by the size of parallel corpora, especially for low-resource languages. In this paper, we propose ERNIE-M, a new training method that encourages the model to align the representation of multiple languages with monolingual corpora, to overcome the constraint that the parallel corpus size places on the model performance. Our key insight is to integrate back-translation into the pre-training process. We generate pseudo-parallel sentence pairs on a monolingual corpus to enable the learning of semantic alignments between different languages, thereby enhancing the semantic modeling of cross-lingual models. Experimental results show that ERNIE-M outperforms existing cross-lingual models and delivers new state-of-the-art results in various cross-lingual downstream tasks., Comment: Accepted by EMNLP 2021 (main conference, long paper)
Published: 2020

21. Galileo at SemEval-2020 Task 12: Multi-lingual Learning for Offensive Language Identification using Pre-trained Language Models

Author: Wang, Shuohuan, Liu, Jiaxiang, Ouyang, Xuan, and Sun, Yu
Subjects: Computer Science - Computation and Language
Abstract: This paper describes Galileo's performance in SemEval-2020 Task 12 on detecting and categorizing offensive language in social media. For Offensive Language Identification, we proposed a multi-lingual method using Pre-trained Language Models, ERNIE and XLM-R. For offensive language categorization, we proposed a knowledge distillation method trained on soft labels generated by several supervised models. Our team participated in all three sub-tasks. In Sub-task A - Offensive Language Identification, we ranked first in terms of average F1 scores in all languages. We are also the only team which ranked among the top three across all languages. We also took the first place in Sub-task B - Automatic Categorization of Offense Types and Sub-task C - Offence Target Identification., Comment: 8 pages, 2 figures, 6 tables. Accepted at Proceedings of 14th International Workshop on Semantic Evaluation (SemEval-2020)
Published: 2020

22. ERNIE at SemEval-2020 Task 10: Learning Word Emphasis Selection by Pre-trained Language Model

Author: Huang, Zhengjie, Feng, Shikun, Su, Weiyue, Chen, Xuyi, Wang, Shuohuan, Liu, Jiaxiang, Ouyang, Xuan, and Sun, Yu
Subjects: Computer Science - Computation and Language
Abstract: This paper describes the system designed by ERNIE Team which achieved the first place in SemEval-2020 Task 10: Emphasis Selection For Written Text in Visual Media. Given a sentence, we are asked to find out the most important words as the suggestion for automated design. We leverage the unsupervised pre-training model and finetune these models on our task. After our investigation, we found that the following models achieved an excellent performance in this task: ERNIE 2.0, XLM-ROBERTA, ROBERTA and ALBERT. We combine a pointwise regression loss and a pairwise ranking loss which is more close to the final M atchm metric to finetune our models. And we also find that additional feature engineering and data augmentation can help improve the performance. Our best model achieves the highest score of 0.823 and ranks first for all kinds of metrics
Published: 2020

23. kk2018 at SemEval-2020 Task 9: Adversarial Training for Code-Mixing Sentiment Classification

Author: Liu, Jiaxiang, Chen, Xuyi, Feng, Shikun, Wang, Shuohuan, Ouyang, Xuan, Sun, Yu, Huang, Zhengjie, and Su, Weiyue
Subjects: Computer Science - Computation and Language
Abstract: Code switching is a linguistic phenomenon that may occur within a multilingual setting where speakers share more than one language. With the increasing communication between groups with different languages, this phenomenon is more and more popular. However, there are little research and data in this area, especially in code-mixing sentiment classification. In this work, the domain transfer learning from state-of-the-art uni-language model ERNIE is tested on the code-mixing dataset, and surprisingly, a strong baseline is achieved. Furthermore, the adversarial training with a multi-lingual model is used to achieve 1st place of SemEval-2020 Task 9 Hindi-English sentiment classification competition.
Published: 2020

24. ERNIE 2.0: A Continual Pre-training Framework for Language Understanding

Author: Sun, Yu, Wang, Shuohuan, Li, Yukun, Feng, Shikun, Tian, Hao, Wu, Hua, and Wang, Haifeng
Subjects: Computer Science - Computation and Language
Abstract: Recently, pre-trained models have achieved state-of-the-art results in various language understanding tasks, which indicates that pre-training on large-scale corpora may play a crucial role in natural language processing. Current pre-training procedures usually focus on training the model with several simple tasks to grasp the co-occurrence of words or sentences. However, besides co-occurring, there exists other valuable lexical, syntactic and semantic information in training corpora, such as named entity, semantic closeness and discourse relations. In order to extract to the fullest extent, the lexical, syntactic and semantic information from training corpora, we propose a continual pre-training framework named ERNIE 2.0 which builds and learns incrementally pre-training tasks through constant multi-task learning. Experimental results demonstrate that ERNIE 2.0 outperforms BERT and XLNet on 16 tasks including English tasks on GLUE benchmarks and several common tasks in Chinese. The source codes and pre-trained models have been released at https://github.com/PaddlePaddle/ERNIE., Comment: 11 pages, 3 figures and 7 tables; Accepted by AAAI 2020
Published: 2019

25. ERNIE: Enhanced Representation through Knowledge Integration

Author: Sun, Yu, Wang, Shuohuan, Li, Yukun, Feng, Shikun, Chen, Xuyi, Zhang, Han, Tian, Xin, Zhu, Danxiang, Tian, Hao, and Wu, Hua
Subjects: Computer Science - Computation and Language
Abstract: We present a novel language representation model enhanced by knowledge called ERNIE (Enhanced Representation through kNowledge IntEgration). Inspired by the masking strategy of BERT, ERNIE is designed to learn language representation enhanced by knowledge masking strategies, which includes entity-level masking and phrase-level masking. Entity-level strategy masks entities which are usually composed of multiple words.Phrase-level strategy masks the whole phrase which is composed of several words standing together as a conceptual unit.Experimental results show that ERNIE outperforms other baseline methods, achieving new state-of-the-art results on five Chinese natural language processing tasks including natural language inference, semantic similarity, named entity recognition, sentiment analysis and question answering. We also demonstrate that ERNIE has more powerful knowledge inference capacity on a cloze test., Comment: 8 pages
Published: 2019

26. Dual Modalities of Text: Visual and Textual Generative Pre-training

Author: Chai, Yekun, Liu, Qingyi, Xiao, Jingwu, Wang, Shuohuan, Sun, Yu, Wu, Hua, Chai, Yekun, Liu, Qingyi, Xiao, Jingwu, Wang, Shuohuan, Sun, Yu, and Wu, Hua
Abstract: Harnessing visual texts represents a burgeoning frontier in the evolution of language modeling. In this paper, we introduce a novel pre-training framework for a suite of pixel-based autoregressive language models, pre-training on a corpus of over 400 million documents rendered as RGB images. Our approach is characterized by a dual-modality training regimen, engaging both visual data through next patch prediction with a regression head and textual data via next token prediction with a classification head. This study is particularly focused on investigating the synergistic interplay between visual and textual modalities of language. Our comprehensive evaluation across a diverse array of benchmarks reveals that the confluence of visual and textual data substantially augments the efficacy of pixel-based language models. Notably, our findings show that a unidirectional pixel-based model, devoid of textual data during training, can match the performance levels of advanced bidirectional pixel-based models on various language understanding benchmarks. This work highlights the considerable untapped potential of integrating visual and textual information for language modeling purposes. We will release our code, data, and checkpoints to inspire further research advancement.
Published: 2024

27. On Training Data Influence of GPT Models

Author: Liu, Qingyi, Chai, Yekun, Wang, Shuohuan, Sun, Yu, Peng, Qiwei, Wang, Keze, Wu, Hua, Liu, Qingyi, Chai, Yekun, Wang, Shuohuan, Sun, Yu, Peng, Qiwei, Wang, Keze, and Wu, Hua
Abstract: Amidst the rapid advancements in generative language models, the investigation of how training data shapes the performance of GPT models is still emerging. This paper presents GPTfluence, a novel approach that leverages a featurized simulation to assess the impact of training examples on the training dynamics of GPT models. Our approach not only traces the influence of individual training instances on performance trajectories, such as loss and other key metrics, on targeted test points but also enables a comprehensive comparison with existing methods across various training scenarios in GPT models, ranging from 14 million to 2.8 billion parameters, across a range of downstream tasks. Contrary to earlier methods that struggle with generalization to new data, GPTfluence introduces a parameterized simulation of training dynamics, demonstrating robust generalization capabilities to unseen training data. This adaptability is evident across both fine-tuning and instruction-tuning scenarios, spanning tasks in natural language understanding and generation. We will make our code and data publicly available.
Published: 2024

28. ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages

Author: Chai, Yekun, primary, Wang, Shuohuan, additional, Pang, Chao, additional, Sun, Yu, additional, Tian, Hao, additional, and Wu, Hua, additional
Published: 2023
Full Text: View/download PDF

29. Retrieval-Augmented Domain Adaptation of Language Models

Author: Xu, Benfeng, primary, Zhao, Chunxu, additional, Jiang, Wenbin, additional, Zhu, PengFei, additional, Dai, Songtai, additional, Pang, Chao, additional, Sun, Zhuo, additional, Wang, Shuohuan, additional, and Sun, Yu, additional
Published: 2023
Full Text: View/download PDF

30. X-PuDu at SemEval-2022 Task 7: A Replaced Token Detection Task Pre-trained Model with Pattern-aware Ensembling for Identifying Plausible Clarifications

Author: Shang, Junyuan, primary, Wang, Shuohuan, additional, Sun, Yu, additional, Yu, Yanjun, additional, Zhou, Yue, additional, Xiang, Li, additional, and Yang, Guixiu, additional
Published: 2022
Full Text: View/download PDF

31. Clip-Tuning: Towards Derivative-free Prompt Learning with a Mixture of Rewards

Author: Chai, Yekun, primary, Wang, Shuohuan, additional, Sun, Yu, additional, Tian, Hao, additional, Wu, Hua, additional, and Wang, Haifeng, additional
Published: 2022
Full Text: View/download PDF

32. X-PuDu at SemEval-2022 Task 6: Multilingual Learning for English and Arabic Sarcasm Detection

Author: Han, Ya, primary, Chai, Yekun, additional, Wang, Shuohuan, additional, Sun, Yu, additional, Huang, Hongyi, additional, Chen, Guanghao, additional, Xu, Yitong, additional, and Yang, Yang, additional
Published: 2022
Full Text: View/download PDF

33. Correcting Chinese Spelling Errors with Phonetic Pre-training

Author: Haifeng Wang, Zhang Chuanqiang, Wang Shuohuan, Zhongjun He, Yu Sun, Hua Wu, Pang Chao, and Zhang Ruiqing
Subjects: business.industry, Computer science, Training (meteorology), Artificial intelligence, business, computer.software_genre, computer, Spelling, Natural language processing
Published: 2021

34. ERNIE-Doc: A Retrospective Long-Document Modeling Transformer

Author: Junyuan Shang, Hua Wu, Yu Sun, Ding Siyu, Hao Tian, Haifeng Wang, and Wang Shuohuan
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Perplexity, Computer science, Mechanism (biology), business.industry, Context (language use), computer.software_genre, Market fragmentation, Margin (machine learning), Question answering, Language model, Artificial intelligence, business, Computation and Language (cs.CL), computer, Natural language processing, Transformer (machine learning model)
Abstract: Transformers are not suited for processing long documents, due to their quadratically increasing memory and time consumption. Simply truncating a long document or applying the sparse attention mechanism will incur the context fragmentation problem or lead to an inferior modeling capability against comparable model sizes. In this paper, we propose ERNIE-Doc, a document-level language pretraining model based on Recurrence Transformers. Two well-designed techniques, namely the retrospective feed mechanism and the enhanced recurrence mechanism, enable ERNIE-Doc, which has a much longer effective context length, to capture the contextual information of a complete document. We pretrain ERNIE-Doc to explicitly learn the relationships among segments with an additional document-aware segment-reordering objective. Various experiments were conducted on both English and Chinese document-level tasks. ERNIE-Doc improved the state-of-the-art language modeling result of perplexity to 16.8 on WikiText-103. Moreover, it outperformed competitive pretraining models by a large margin on most language understanding tasks, such as text classification and question answering., Accepted by ACL 2021 (main conference, long paper)
Published: 2021

35. abcbpc at SemEval-2021 Task 7: ERNIE-based Multi-task Model for Detecting and Rating Humor and Offense

Author: Xuan Ouyang, Pang Chao, Xiaoran Fan, Shikun Feng, Weiyue Su, Wang Shuohuan, Jiaxiang Liu, Xuyi Chen, and Yu Sun
Subjects: Artificial neural network, Computer science, Process (engineering), business.industry, computer.software_genre, Ensemble learning, SemEval, Task (project management), Language model, Artificial intelligence, Semantic information, business, computer, Natural language processing
Abstract: This paper describes our system participated in Task 7 of SemEval-2021: Detecting and Rating Humor and Offense. The task is designed to detect and score humor and offense which are influenced by subjective factors. In order to obtain semantic information from a large amount of unlabeled data, we applied unsupervised pre-trained language models. By conducting research and experiments, we found that the ERNIE 2.0 and DeBERTa pre-trained models achieved impressive performance in various subtasks. Therefore, we applied the above pre-trained models to fine-tune the downstream neural network. In the process of fine-tuning the model, we adopted multi-task training strategy and ensemble learning method. Based on the above strategy and method, we achieved RMSE of 0.4959 for subtask 1b, and finally won the first place.
Published: 2021

36. ERNIE at SemEval-2020 Task 10: Learning Word Emphasis Selection by Pre-trained Language Model

Author: Xuan Ouyang, Jiaxiang Liu, Xuyi Chen, Zhengjie Huang, Weiyue Su, Shikun Feng, Wang Shuohuan, and Yu Sun
Subjects: FOS: Computer and information sciences, Feature engineering, Computer Science - Computation and Language, Computer science, business.industry, computer.software_genre, SemEval, Leverage (statistics), Language model, Artificial intelligence, business, Computation and Language (cs.CL), computer, Sentence, Natural language processing
Abstract: This paper describes the system designed by ERNIE Team which achieved the first place in SemEval-2020 Task 10: Emphasis Selection For Written Text in Visual Media. Given a sentence, we are asked to find out the most important words as the suggestion for automated design. We leverage the unsupervised pre-training model and finetune these models on our task. After our investigation, we found that the following models achieved an excellent performance in this task: ERNIE 2.0, XLM-ROBERTA, ROBERTA and ALBERT. We combine a pointwise regression loss and a pairwise ranking loss which is more close to the final Match m metric to finetune our models. And we also find that additional feature engineering and data augmentation can help improve the performance. Our best model achieves the highest score of 0.823 and ranks first for all kinds of metrics.
Published: 2020

37. Galileo at SemEval-2020 Task 12: Multi-lingual Learning for Offensive Language Identification Using Pre-trained Language Models

Author: Xuan Ouyang, Yu Sun, Jiaxiang Liu, and Wang Shuohuan
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Language identification, Computer science, business.industry, Offensive, computer.software_genre, SemEval, Task (project management), Identification (information), Categorization, Social media, Language model, Artificial intelligence, business, Computation and Language (cs.CL), computer, Natural language processing
Abstract: This paper describes Galileo's performance in SemEval-2020 Task 12 on detecting and categorizing offensive language in social media. For Offensive Language Identification, we proposed a multi-lingual method using Pre-trained Language Models, ERNIE and XLM-R. For offensive language categorization, we proposed a knowledge distillation method trained on soft labels generated by several supervised models. Our team participated in all three sub-tasks. In Sub-task A - Offensive Language Identification, we ranked first in terms of average F1 scores in all languages. We are also the only team which ranked among the top three across all languages. We also took the first place in Sub-task B - Automatic Categorization of Offense Types and Sub-task C - Offence Target Identification., Comment: 8 pages, 2 figures, 6 tables. Accepted at Proceedings of 14th International Workshop on Semantic Evaluation (SemEval-2020)
Published: 2020

38. Kk2018 at SemEval-2020 Task 9: Adversarial Training for Code-Mixing Sentiment Classification

Author: Zhengjie Huang, Xuan Ouyang, Yu Sun, Weiyue Su, Xuyi Chen, Jiaxiang Liu, Shikun Feng, and Wang Shuohuan
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computer science, business.industry, computer.software_genre, SemEval, Task (project management), Domain (software engineering), Code-mixing, Adversarial system, Artificial intelligence, Baseline (configuration management), Transfer of learning, business, Computation and Language (cs.CL), computer, Natural language processing
Abstract: Code switching is a linguistic phenomenon which may occur within a multilingual setting where speakers share more than one language. With the increasing communication between groups with different languages, this phenomenon is more and more popular. However, there are little research and data in this area, especially in code-mixing sentiment classification. In this work, the domain transfer learning from state-of-the-art uni-language model ERNIE is tested on the code-mixing dataset, and surprisingly, a strong baseline is achieved. And further more, the adversarial training with a multi-lingual model is used to achieved 1st place of SemEval-2020 Task9 Hindi-English sentiment classification competition.
Published: 2020

39. ERNIE-Doc: A Retrospective Long-Document Modeling Transformer

Author: Ding, SiYu, primary, Shang, Junyuan, additional, Wang, Shuohuan, additional, Sun, Yu, additional, Tian, Hao, additional, Wu, Hua, additional, and Wang, Haifeng, additional
Published: 2021
Full Text: View/download PDF

40. ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora

Author: Ouyang, Xuan, primary, Wang, Shuohuan, additional, Pang, Chao, additional, Sun, Yu, additional, Tian, Hao, additional, Wu, Hua, additional, and Wang, Haifeng, additional
Published: 2021
Full Text: View/download PDF

41. Correcting Chinese Spelling Errors with Phonetic Pre-training

Author: Zhang, Ruiqing, primary, Pang, Chao, additional, Zhang, Chuanqiang, additional, Wang, Shuohuan, additional, He, Zhongjun, additional, Sun, Yu, additional, Wu, Hua, additional, and Wang, Haifeng, additional
Published: 2021
Full Text: View/download PDF

42. abcbpc at SemEval-2021 Task 7: ERNIE-based Multi-task Model for Detecting and Rating Humor and Offense

Author: Pang, Chao, primary, Fan, Xiaoran, additional, Su, Weiyue, additional, Chen, Xuyi, additional, Wang, Shuohuan, additional, Liu, Jiaxiang, additional, Ouyang, Xuan, additional, Feng, Shikun, additional, and Sun, Yu, additional
Published: 2021
Full Text: View/download PDF

43. ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding

Author: Sun, Yu, primary, Wang, Shuohuan, additional, Li, Yukun, additional, Feng, Shikun, additional, Tian, Hao, additional, Wu, Hua, additional, and Wang, Haifeng, additional
Published: 2020
Full Text: View/download PDF

44. ERNIE at SemEval-2020 Task 10: Learning Word Emphasis Selection by Pre-trained Language Model

Author: Huang, Zhengjie, primary, Feng, Shikun, additional, Su, Weiyue, additional, Chen, Xuyi, additional, Wang, Shuohuan, additional, Liu, Jiaxiang, additional, Ouyang, Xuan, additional, and Sun, Yu, additional
Published: 2020
Full Text: View/download PDF

45. Galileo at SemEval-2020 Task 12: Multi-lingual Learning for Offensive Language Identification Using Pre-trained Language Models

Author: Wang, Shuohuan, primary, Liu, Jiaxiang, additional, Ouyang, Xuan, additional, and Sun, Yu, additional
Published: 2020
Full Text: View/download PDF

46. Kk2018 at SemEval-2020 Task 9: Adversarial Training for Code-Mixing Sentiment Classification

Author: Liu, Jiaxiang, primary, Chen, Xuyi, additional, Feng, Shikun, additional, Wang, Shuohuan, additional, Ouyang, Xuan, additional, Sun, Yu, additional, Huang, Zhengjie, additional, and Su, Weiyue, additional
Published: 2020
Full Text: View/download PDF

47. ERNIE 2.0: A Continual Pre-training Framework for Language Understanding

Author: Haifeng Wang, Shikun Feng, Wang Shuohuan, Yu Sun, Hua Wu, Hao Tian, and Li Yukun
Subjects: FOS: Computer and information sciences, Language understanding, Computer Science - Computation and Language, Source code, Computer science, business.industry, media_common.quotation_subject, Closeness, 02 engineering and technology, General Medicine, Construct (python library), computer.software_genre, Training (civil), Syntax, Focus (linguistics), 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Computation and Language (cs.CL), Natural language processing, media_common
Abstract: Recently, pre-trained models have achieved state-of-the-art results in various language understanding tasks, which indicates that pre-training on large-scale corpora may play a crucial role in natural language processing. Current pre-training procedures usually focus on training the model with several simple tasks to grasp the co-occurrence of words or sentences. However, besides co-occurring, there exists other valuable lexical, syntactic and semantic information in training corpora, such as named entity, semantic closeness and discourse relations. In order to extract to the fullest extent, the lexical, syntactic and semantic information from training corpora, we propose a continual pre-training framework named ERNIE 2.0 which builds and learns incrementally pre-training tasks through constant multi-task learning. Experimental results demonstrate that ERNIE 2.0 outperforms BERT and XLNet on 16 tasks including English tasks on GLUE benchmarks and several common tasks in Chinese. The source codes and pre-trained models have been released at https://github.com/PaddlePaddle/ERNIE., Comment: 11 pages, 3 figures and 7 tables; Accepted by AAAI 2020
Published: 2019
Full Text: View/download PDF

48. OleNet at SemEval-2019 Task 9: BERT based Multi-Perspective Models for Suggestion Mining

Author: Wang Shuohuan, Jiaxiang Liu, and Yu Sun
Subjects: Computer science, business.industry, 02 engineering and technology, computer.software_genre, Convolutional neural network, Multi perspective, SemEval, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, Encoder, computer, Sentence, Natural language processing, Transformer (machine learning model)
Abstract: This paper describes our system partici- pated in Task 9 of SemEval-2019: the task is focused on suggestion mining and it aims to classify given sentences into sug- gestion and non-suggestion classes in do- main specific and cross domain training setting respectively. We propose a multi- perspective architecture for learning rep- resentations by using different classical models including Convolutional Neural Networks (CNN), Gated Recurrent Units (GRU), Feed Forward Attention (FFA), etc. To leverage the semantics distributed in large amount of unsupervised data, we also have adopted the pre-trained Bidi- rectional Encoder Representations from Transformers (BERT) model as an en- coder to produce sentence and word rep- resentations. The proposed architecture is applied for both sub-tasks, and achieved f1-score of 0.7812 for subtask A, and 0.8579 for subtask B. We won the first and second place for the two tasks respec- tively in the final competition.
Published: 2019

49. OleNet at SemEval-2019 Task 9: BERT based Multi-Perspective Models for Suggestion Mining

Author: Liu, Jiaxiang, primary, Wang, Shuohuan, additional, and Sun, Yu, additional
Published: 2019
Full Text: View/download PDF

50. A stacking ensemble model for predicting the occurrence of carotid atherosclerosis.

Author: Zhang X, Tang C, Wang S, Liu W, Yang W, Wang D, Wang Q, and Tang F
Subjects: Humans, Male, Female, Middle Aged, Risk Factors, Aged, Support Vector Machine, Algorithms, Prognosis, Risk Assessment methods, Cohort Studies, Carotid Artery Diseases epidemiology, Machine Learning
Abstract: Background: Carotid atherosclerosis (CAS) is a significant risk factor for cardio-cerebrovascular events. The objective of this study is to employ stacking ensemble machine learning techniques to enhance the prediction of CAS occurrence, incorporating a wide range of predictors, including endocrine-related markers., Methods: Based on data from a routine health check-up cohort, five individual prediction models for CAS were established based on logistic regression (LR), random forest (RF), support vector machine (SVM), extreme gradient boosting (XGBoost) and gradient boosting decision tree (GBDT) methods. Then, a stacking ensemble algorithm was used to integrate the base models to improve the prediction ability and address overfitting problems. Finally, the SHAP value method was applied for an in-depth analysis of variable importance at both the overall and individual levels, with a focus on elucidating the impact of endocrine-related variables., Results: A total of 441 of the 1669 subjects in the cohort were finally diagnosed with CAS. Seventeen variables were selected as predictors. The ensemble model outperformed the individual models, with AUCs of 0.893 in the testing set and 0.861 in the validation set. The ensemble model has the optimal accuracy, precision, recall and F1 score in the validation set, with considerable performance in the testing set. Carotid stenosis and age emerged as the most significant predictors, alongside notable contributions from endocrine-related factors., Conclusion: The ensemble model shows enhanced accuracy and generalizability in predicting CAS risk, underscoring its utility in identifying individuals at high risk. This approach integrates a comprehensive analysis of predictors, including endocrine markers, affirming the critical role of endocrine dysfunctions in CAS development. It represents a promising tool in identifying high-risk individuals for the prevention of CAS and cardio-cerebrovascular diseases., Competing Interests: Author SW were employed by Shandong International Trust Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest., (Copyright © 2024 Zhang, Tang, Wang, Liu, Yang, Wang, Wang and Tang.)
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

50 results on '"Wang, Shuohuan"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources