Author: "Zhao, Hanyu" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Zhao, Hanyu"' showing total 300 results

Start Over Author "Zhao, Hanyu"

300 results on '"Zhao, Hanyu"'

1. CCI3.0-HQ: a large-scale Chinese dataset of high quality designed for pre-training large language models

Author: Wang, Liangdong, Zhang, Bo-Wen, Wu, Chengwei, Zhao, Hanyu, Shi, Xiaofeng, Gu, Shuhao, Li, Jijie, Ma, Quanyue, Pan, TengFei, and Liu, Guang
Subjects: Computer Science - Computation and Language
Abstract: We present CCI3.0-HQ (https://huggingface.co/datasets/BAAI/CCI3-HQ), a high-quality 500GB subset of the Chinese Corpora Internet 3.0 (CCI3.0)(https://huggingface.co/datasets/BAAI/CCI3-Data), developed using a novel two-stage hybrid filtering pipeline that significantly enhances data quality. To evaluate its effectiveness, we trained a 0.5B parameter model from scratch on 100B tokens across various datasets, achieving superior performance on 10 benchmarks in a zero-shot setting compared to CCI3.0, SkyPile, and WanjuanV1. The high-quality filtering process effectively distills the capabilities of the Qwen2-72B-instruct model into a compact 0.5B model, attaining optimal F1 scores for Chinese web data classification. We believe this open-access dataset will facilitate broader access to high-quality language models.
Published: 2024

2. Mitigating Training Imbalance in LLM Fine-Tuning via Selective Parameter Merging

Author: Ju, Yiming, Ni, Ziyi, Xing, Xingrun, Zeng, Zhixiong, Zhao, hanyu, Fan, Siqi, and Zhang, Zheng
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Supervised fine-tuning (SFT) is crucial for adapting Large Language Models (LLMs) to specific tasks. In this work, we demonstrate that the order of training data can lead to significant training imbalances, potentially resulting in performance degradation. Consequently, we propose to mitigate this imbalance by merging SFT models fine-tuned with different data orders, thereby enhancing the overall effectiveness of SFT. Additionally, we introduce a novel technique, "parameter-selection merging," which outperforms traditional weighted-average methods on five datasets. Further, through analysis and ablation studies, we validate the effectiveness of our method and identify the sources of performance improvements., Comment: EMNLP 2024
Published: 2024

3. Beyond IID: Optimizing Instruction Learning from the Perspective of Instruction Interaction and Dependency

Author: Zhao, Hanyu, Du, Li, Ju, Yiming, Wu, Chengwei, and Pan, Tengfei
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: With the availability of various instruction datasets, a pivotal challenge is how to effectively select and integrate these instructions to fine-tune large language models (LLMs). Previous research mainly focuses on selecting individual high-quality instructions. However, these works overlooked the joint interactions and dependencies between different categories of instructions, leading to suboptimal selection strategies. Moreover, the nature of these interaction patterns remains largely unexplored, let alone optimize the instruction set with regard to them. To fill these gaps, in this paper, we: (1) systemically investigate interaction and dependency patterns between different categories of instructions, (2) manage to optimize the instruction set concerning the interaction patterns using a linear programming-based method, and optimize the learning schema of SFT using an instruction dependency taxonomy guided curriculum learning. Experimental results across different LLMs demonstrate improved performance over strong baselines on widely adopted benchmarks.
Published: 2024

4. Rubick: Exploiting Job Reconfigurability for Deep Learning Cluster Scheduling

Author: Zhang, Xinyi, Zhao, Hanyu, Xiao, Wencong, Jia, Xianyan, Xu, Fei, Li, Yong, Lin, Wei, and Liu, Fangming
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: The era of large deep learning models has given rise to advanced training strategies such as 3D parallelism and the ZeRO series. These strategies enable various (re-)configurable execution plans for a training job, which exhibit remarkably different requirements of multiple resource types. Existing cluster scheduling systems, however, treat such reconfigurable training jobs as black boxes: they rely on users to choose execution plans statically, and then make resource allocations without awareness of the chosen plans and their resource requirements. This approach results in mismatches between execution plans and resources, making both training performance and cluster utilization far from optimal. We introduce Rubick, a cluster scheduling system for deep learning training that exploits the reconfigurability to improve job performance and cluster efficiency. Rubick incorporates the job execution planning as a new dimension in cluster scheduling, by continuously reconfiguring jobs' execution plans and tuning multi-resource allocations across jobs jointly. Such a co-optimization is navigated by a performance model that understands the diverse resource requirements and performance characteristics of different jobs and execution plans. Rubick exploits such a model to make performance-aware scheduling decisions to maximize cluster throughput while providing performance guarantees to individual jobs. Evaluations on a 64-GPU high-performance training cluster show that Rubick improves average job completion time and makespan by up to 3.2x and 1.4x, respectively, compared against state-of-the-art systems.
Published: 2024

5. AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies

Author: Zhang, Bo-Wen, Wang, Liangdong, Yuan, Ye, Li, Jijie, Gu, Shuhao, Zhao, Mengdi, Wu, Xinya, Liu, Guang, Wu, Chengwei, Zhao, Hanyu, Du, Li, Ju, Yiming, Ma, Quanyue, Ao, Yulong, Zhao, Yingli, Zhu, Songhe, Cao, Zhou, Liang, Dong, Lin, Yonghua, Zhang, Ming, Wang, Shunfei, Zhou, Yanxin, Ye, Min, Chen, Xuekai, Yu, Xinyang, Huang, Xiangjun, and Yang, Jian
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: In recent years, with the rapid application of large language models across various fields, the scale of these models has gradually increased, and the resources required for their pre-training have grown exponentially. Training an LLM from scratch will cost a lot of computation resources while scaling up from a smaller model is a more efficient approach and has thus attracted significant attention. In this paper, we present AquilaMoE, a cutting-edge bilingual 8*16B Mixture of Experts (MoE) language model that has 8 experts with 16 billion parameters each and is developed using an innovative training methodology called EfficientScale. This approach optimizes performance while minimizing data requirements through a two-stage process. The first stage, termed Scale-Up, initializes the larger model with weights from a pre-trained smaller model, enabling substantial knowledge transfer and continuous pretraining with significantly less data. The second stage, Scale-Out, uses a pre-trained dense model to initialize the MoE experts, further enhancing knowledge transfer and performance. Extensive validation experiments on 1.8B and 7B models compared various initialization schemes, achieving models that maintain and reduce loss during continuous pretraining. Utilizing the optimal scheme, we successfully trained a 16B model and subsequently the 8*16B AquilaMoE model, demonstrating significant improvements in performance and training efficiency.
Published: 2024

6. Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach

Author: Dong, Jianbo, Luo, Bin, Zhang, Jun, Zhang, Pengcheng, Feng, Fei, Zhu, Yikai, Liu, Ang, Chen, Zian, Shi, Yi, Jiao, Hairong, Lu, Gang, Guan, Yu, Zhai, Ennan, Xiao, Wencong, Zhao, Hanyu, Yuan, Man, Yang, Siran, Li, Xiang, Wang, Jiamang, Men, Rui, Zhang, Jianwei, Zhong, Huang, Cai, Dennis, Xie, Yuan, and Fu, Binzhang
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: The emergence of Large Language Models (LLMs) has necessitated the adoption of parallel training techniques, involving the deployment of thousands of GPUs to train a single model. Unfortunately, we have found that the efficiency of current parallel training is often suboptimal, largely due to the following two main issues. Firstly, hardware failures are inevitable, leading to interruptions in the training tasks. The inability to quickly identify the faulty components results in a substantial waste of GPU resources. Secondly, since GPUs must wait for parameter synchronization to complete before proceeding to the next round of computation, network congestions can greatly increase the waiting time for GPUs. To address these challenges, this paper introduces a communication-driven solution, namely the C4. The key insights of C4 are two folds. First, in parallel training, collective communication exhibits periodic and homogeneous characteristics, so any anomalies are certainly due to some form of hardware malfunction. By leveraging this feature, C4 can rapidly identify the faulty components, swiftly isolate the anomaly, and restart the task, thereby avoiding resource wastage caused by delays in anomaly detection. Second, the predictable communication model of collective communication, involving few large flows, allows C4 to efficiently execute traffic planning, substantially reducing network congestion. C4 has been extensively implemented across our production systems, cutting error-induced overhead by roughly 30% and enhancing runtime performance by about 15% for certain applications with moderate communication costs.
Published: 2024

7. Llumnix: Dynamic Scheduling for Large Language Model Serving

Author: Sun, Biao, Huang, Ziming, Zhao, Hanyu, Xiao, Wencong, Zhang, Xinyi, Li, Yong, and Lin, Wei
Subjects: Computer Science - Hardware Architecture, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Machine Learning
Abstract: Inference serving for large language models (LLMs) is the key to unleashing their potential in people's daily lives. However, efficient LLM serving remains challenging today because the requests are inherently heterogeneous and unpredictable in terms of resource and latency requirements, as a result of the diverse applications and the dynamic execution nature of LLMs. Existing systems are fundamentally limited in handling these characteristics and cause problems such as severe queuing delays, poor tail latencies, and SLO violations. We introduce Llumnix, an LLM serving system that reacts to such heterogeneous and unpredictable requests by runtime rescheduling across multiple model instances. Similar to context switching across CPU cores in modern operating systems, Llumnix reschedules requests to improve load balancing and isolation, mitigate resource fragmentation, and differentiate request priorities and SLOs. Llumnix implements the rescheduling with an efficient and scalable live migration mechanism for requests and their in-memory states, and exploits it in a dynamic scheduling policy that unifies the multiple rescheduling scenarios elegantly. Our evaluations show that Llumnix improves tail latencies by an order of magnitude, accelerates high-priority requests by up to 1.5x, and delivers up to 36% cost savings while achieving similar tail latencies, compared against state-of-the-art LLM serving systems. Llumnix is publicly available at https://github.com/AlibabaPAI/llumnix., Comment: To appear at OSDI '24; open-source repo will be available in June 2024
Published: 2024

8. Variable structure smooth switching strategy of LLC-C resonant converter based on state trajectory control

Author: Li, Wei, Li, Mengjun, Ji, Ruilin, Che, Chaochang, and Zhao, Hanyu
Published: 2024
Full Text: View/download PDF

9. Variational Continual Test-Time Adaptation

Author: Lyu, Fan, Du, Kaile, Li, Yuyang, Zhao, Hanyu, Zhang, Zhang, Liu, Guangcan, and Wang, Liang
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: The prior drift is crucial in Continual Test-Time Adaptation (CTTA) methods that only use unlabeled test data, as it can cause significant error propagation. In this paper, we introduce VCoTTA, a variational Bayesian approach to measure uncertainties in CTTA. At the source stage, we transform a pre-trained deterministic model into a Bayesian Neural Network (BNN) via a variational warm-up strategy, injecting uncertainties into the model. During the testing time, we employ a mean-teacher update strategy using variational inference for the student model and exponential moving average for the teacher model. Our novel approach updates the student model by combining priors from both the source and teacher models. The evidence lower bound is formulated as the cross-entropy between the student and teacher models, along with the Kullback-Leibler (KL) divergence of the prior mixture. Experimental results on three datasets demonstrate the method's effectiveness in mitigating prior drift within the CTTA framework.
Published: 2024

10. Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache

Author: Lin, Bin, Zhang, Chen, Peng, Tao, Zhao, Hanyu, Xiao, Wencong, Sun, Minmin, Liu, Anmin, Zhang, Zhipeng, Li, Lanbo, Qiu, Xiafei, Li, Shen, Ji, Zhigang, Xie, Tao, Li, Yong, and Lin, Wei
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Hardware Architecture
Abstract: Large Language Models (LLMs) demonstrate substantial potential across a diverse array of domains via request serving. However, as trends continue to push for expanding context sizes, the autoregressive nature of LLMs results in highly dynamic behavior of the attention layers, showcasing significant differences in computational characteristics and memory requirements from the non-attention layers. This presents substantial challenges for resource management and performance optimization in service systems. Existing static model parallelism and resource allocation strategies fall short when dealing with this dynamicity. To address the issue, we propose Infinite-LLM, a novel LLM serving system designed to effectively handle dynamic context lengths. Infinite-LLM disaggregates attention layers from an LLM's inference process, facilitating flexible and independent resource scheduling that optimizes computational performance and enhances memory utilization jointly. By leveraging a pooled GPU memory strategy across a cluster, Infinite-LLM not only significantly boosts system throughput but also supports extensive context lengths. Evaluated on a dataset with context lengths ranging from a few to 2000K tokens across a cluster with 32 A100 GPUs, Infinite-LLM demonstrates throughput improvement of 1.35-3.4x compared to state-of-the-art methods, enabling efficient and elastic LLM deployment.
Published: 2024

11. ROAM: memory-efficient large DNN training via optimized operator ordering and memory layout

Author: Shu, Huiyao, Wang, Ang, Shi, Ziji, Zhao, Hanyu, Li, Yong, and Lu, Lu
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Databases
Abstract: As deep learning models continue to increase in size, the memory requirements for training have surged. While high-level techniques like offloading, recomputation, and compression can alleviate memory pressure, they also introduce overheads. However, a memory-efficient execution plan that includes a reasonable operator execution order and tensor memory layout can significantly increase the models' memory efficiency and reduce overheads from high-level techniques. In this paper, we propose ROAM which operates on computation graph level to derive memory-efficient execution plan with optimized operator order and tensor memory layout for models. We first propose sophisticated theories that carefully consider model structure and training memory load to support optimization for large complex graphs that have not been well supported in the past. An efficient tree-based algorithm is further proposed to search task divisions automatically, along with delivering high performance and effectiveness to solve the problem. Experiments show that ROAM achieves a substantial memory reduction of 35.7%, 13.3%, and 27.2% compared to Pytorch and two state-of-the-art methods and offers a remarkable 53.7x speedup. The evaluation conducted on the expansive GPT2-XL further validates ROAM's scalability.
Published: 2023

12. Artificial Intelligence Security Competition (AISC)

Author: Dong, Yinpeng, Chen, Peng, Deng, Senyou, L, Lianji, Sun, Yi, Zhao, Hanyu, Li, Jiaxing, Tan, Yunteng, Liu, Xinyu, Dong, Yangyi, Xu, Enhui, Xu, Jincai, Xu, Shu, Fu, Xuelin, Sun, Changfeng, Han, Haoliang, Zhang, Xuchong, Chen, Shen, Sun, Zhimin, Cao, Junyi, Yao, Taiping, Ding, Shouhong, Wu, Yu, Lin, Jian, Wu, Tianpeng, Wang, Ye, Fu, Yu, Feng, Lin, Gao, Kangkang, Liu, Zeyu, Pang, Yuanzhe, Duan, Chengqi, Zhou, Huipeng, Wang, Yajie, Zhao, Yuhang, Wu, Shangbo, Lyu, Haoran, Lin, Zhiyu, Gao, Yifei, Li, Shuang, Wang, Haonan, Sang, Jitao, Ma, Chen, Zheng, Junhao, Li, Yijia, Shen, Chao, Lin, Chenhao, Cui, Zhichao, Liu, Guoshuai, Shi, Huafeng, Hu, Kun, and Zhang, Mengxin
Subjects: Computer Science - Cryptography and Security, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: The security of artificial intelligence (AI) is an important research area towards safe, reliable, and trustworthy AI systems. To accelerate the research on AI security, the Artificial Intelligence Security Competition (AISC) was organized by the Zhongguancun Laboratory, China Industrial Control Systems Cyber Emergency Response Team, Institute for Artificial Intelligence, Tsinghua University, and RealAI as part of the Zhongguancun International Frontier Technology Innovation Competition (https://www.zgc-aisc.com/en). The competition consists of three tracks, including Deepfake Security Competition, Autonomous Driving Security Competition, and Face Recognition Security Competition. This report will introduce the competition rules of these three tracks and the solutions of top-ranking teams in each track., Comment: Technical report of AISC
Published: 2022

13. A Class of Optimal Control Problems of Forward–Backward Systems with Input Constraint

Author: Huang, Jianhui, Li, Wenqiang, and Zhao, Hanyu
Published: 2023
Full Text: View/download PDF

14. EasyScale: Accuracy-consistent Elastic Training for Deep Learning

Author: Li, Mingzhen, Xiao, Wencong, Sun, Biao, Zhao, Hanyu, Yang, Hailong, Ren, Shiru, Luan, Zhongzhi, Jia, Xianyan, Liu, Yi, Li, Yong, Lin, Wei, and Qian, Depei
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Distributed synchronized GPU training is commonly used for deep learning. The resource constraint of using a fixed number of GPUs makes large-scale training jobs suffer from long queuing time for resource allocation, and lowers the cluster utilization. Adapting to resource elasticity can alleviate this but often introduces inconsistent model accuracy, due to lacking of capability to decouple model training procedure from resource allocation. We propose EasyScale, an elastic training system that achieves consistent model accuracy under resource elasticity for both homogeneous and heterogeneous GPUs. EasyScale preserves the data-parallel training behaviors strictly, traces the consistency-relevant factors carefully, utilizes the deep learning characteristics for EasyScaleThread abstraction and fast context-switching. To utilize heterogeneous cluster, EasyScale dynamically assigns workers based on the intra-/inter-job schedulers, minimizing load imbalance and maximizing aggregated job throughput. Deployed in an online serving cluster, EasyScale powers the training jobs to utilize idle GPUs opportunistically, improving overall cluster utilization by 62.1%., Comment: To be appeared at SC'23. Link: https://sc23.supercomputing.org/presentation/?id=pap262&sess=sess168
Published: 2022

15. Instance-wise Prompt Tuning for Pretrained Language Models

Author: Jiang, Yuezihan, Yang, Hao, Lin, Junyang, Zhao, Hanyu, Yang, An, Zhou, Chang, Yang, Hongxia, Yang, Zhi, and Cui, Bin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Prompt Learning has recently gained great popularity in bridging the gap between pretraining tasks and various downstream tasks. It freezes Pretrained Language Models (PLMs) and only tunes a few task-related parameters (prompts) for downstream tasks, greatly reducing the cost of tuning giant models. The key enabler of this is the idea of querying PLMs with task-specific knowledge implicated in prompts. This paper reveals a major limitation of existing methods that the indiscriminate prompts for all input data in a task ignore the intrinsic knowledge from input data, resulting in sub-optimal performance. We introduce Instance-wise Prompt Tuning (IPT), the first prompt learning paradigm that injects knowledge from the input data instances to the prompts, thereby providing PLMs with richer and more concrete context information. We devise a series of strategies to produce instance-wise prompts, addressing various concerns like model quality and cost-efficiency. Across multiple tasks and resource settings, IPT significantly outperforms task-based prompt learning methods, and achieves comparable performance to conventional finetuning with only 0.5% - 1.5% of tuned parameters.
Published: 2022

16. A Roadmap for Big Model

Author: Yuan, Sha, Zhao, Hanyu, Zhao, Shuai, Leng, Jiahong, Liang, Yangxiao, Wang, Xiaozhi, Yu, Jifan, Lv, Xin, Shao, Zhou, He, Jiaao, Lin, Yankai, Han, Xu, Liu, Zhenghao, Ding, Ning, Rao, Yongming, Gao, Yizhao, Zhang, Liang, Ding, Ming, Fang, Cong, Wang, Yisen, Long, Mingsheng, Zhang, Jing, Dong, Yinpeng, Pang, Tianyu, Cui, Peng, Huang, Lingxiao, Liang, Zheng, Shen, Huawei, Zhang, Hui, Zhang, Quanshi, Dong, Qingxiu, Tan, Zhixing, Wang, Mingxuan, Wang, Shuo, Zhou, Long, Li, Haoran, Bao, Junwei, Pan, Yingwei, Zhang, Weinan, Yu, Zhou, Yan, Rui, Shi, Chence, Xu, Minghao, Zhang, Zuobai, Wang, Guoqiang, Pan, Xiang, Li, Mengjie, Chu, Xiaoyu, Yao, Zijun, Zhu, Fangwei, Cao, Shulin, Xue, Weicheng, Ma, Zixuan, Zhang, Zhengyan, Hu, Shengding, Qin, Yujia, Xiao, Chaojun, Zeng, Zheni, Cui, Ganqu, Chen, Weize, Zhao, Weilin, Yao, Yuan, Li, Peng, Zheng, Wenzhao, Zhao, Wenliang, Wang, Ziyi, Zhang, Borui, Fei, Nanyi, Hu, Anwen, Ling, Zenan, Li, Haoyang, Cao, Boxi, Han, Xianpei, Zhan, Weidong, Chang, Baobao, Sun, Hao, Deng, Jiawen, Zheng, Chujie, Li, Juanzi, Hou, Lei, Cao, Xigang, Zhai, Jidong, Liu, Zhiyuan, Sun, Maosong, Lu, Jiwen, Lu, Zhiwu, Jin, Qin, Song, Ruihua, Wen, Ji-Rong, Lin, Zhouchen, Wang, Liwei, Su, Hang, Zhu, Jun, Sui, Zhifang, Zhang, Jiajun, Liu, Yang, He, Xiaodong, Huang, Minlie, Tang, Jian, and Tang, Jie
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm. Researchers have achieved various outcomes in the construction of BMs and the BM application in many fields. At present, there is a lack of research work that sorts out the overall progress of BMs and guides the follow-up research. In this paper, we cover not only the BM technologies themselves but also the prerequisites for BM training and applications with BMs, dividing the BM review into four parts: Resource, Models, Key Technologies and Application. We introduce 16 specific BM-related topics in those four parts, they are Data, Knowledge, Computing System, Parallel Training System, Language Model, Vision Model, Multi-modal Model, Theory&Interpretability, Commonsense Reasoning, Reliability&Security, Governance, Evaluation, Machine Translation, Text Generation, Dialogue and Protein Research. In each topic, we summarize clearly the current studies and propose some future research directions. At the end of this paper, we conclude the further development of BMs in a more general view., Comment: This report has been withdrawn by the authors due to critical issues in Section 2.3.1 of Article 2
Published: 2022

17. WuDaoMM: A large-scale Multi-Modal Dataset for Pre-training models

Author: Yuan, Sha, Zhao, Shuai, Leng, Jiahong, Xue, Zhao, Zhao, Hanyu, Liu, Peiyu, Gong, Zheng, Zhao, Wayne Xin, Li, Junyi, and Tang, Jie
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
Abstract: Compared with the domain-specific model, the vision-language pre-training models (VLPMs) have shown superior performance on downstream tasks with fast fine-tuning process. For example, ERNIE-ViL, Oscar and UNIMO trained VLPMs with a uniform transformers stack architecture and large amounts of image-text paired data, achieving remarkable results on downstream tasks such as image-text reference(IR and TR), vision question answering (VQA) and image captioning (IC) etc. During the training phase, VLPMs are always fed with a combination of multiple public datasets to meet the demand of large-scare training data. However, due to the unevenness of data distribution including size, task type and quality, using the mixture of multiple datasets for model training can be problematic. In this work, we introduce a large-scale multi-modal corpora named WuDaoMM, totally containing more than 650M image-text pairs. Specifically, about 600 million pairs of data are collected from multiple webpages in which image and caption present weak correlation, and the other 50 million strong-related image-text pairs are collected from some high-quality graphic websites. We also release a base version of WuDaoMM with 5 million strong-correlated image-text pairs, which is sufficient to support the common cross-modal model pre-training. Besides, we trained both an understanding and a generation vision-language (VL) model to test the dataset effectiveness. The results show that WuDaoMM can be applied as an efficient dataset for VLPMs, especially for the model in text-to-image generation task. The data is released at https://data.wudaoai.cn, Comment: Some data problems cannot be obtained
Published: 2022

18. ZOOMER: Boosting Retrieval on Web-scale Graphs by Regions of Interest

Author: Jiang, Yuezihan, Cheng, Yu, Zhao, Hanyu, Zhang, Wentao, Miao, Xupeng, He, Yu, Wang, Liang, Yang, Zhi, and Cui, Bin
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: We introduce ZOOMER, a system deployed at Taobao, the largest e-commerce platform in China, for training and serving GNN-based recommendations over web-scale graphs. ZOOMER is designed for tackling two challenges presented by the massive user data at Taobao: low training/serving efficiency due to the huge scale of the graphs, and low recommendation quality due to the information overload which distracts the recommendation model from specific user intentions. ZOOMER achieves this by introducing a key concept, Region of Interests (ROI) in GNNs for recommendations, i.e., a neighborhood region in the graph with significant relevance to a strong user intention. ZOOMER narrows the focus from the whole graph and "zooms in" on the more relevant ROIs, thereby reducing the training/serving cost and mitigating the information overload at the same time. With carefully designed mechanisms, ZOOMER identifies the interest expressed by each recommendation request, constructs an ROI subgraph by sampling with respect to the interest, and guides the GNN to reweigh different parts of the ROI towards the interest by a multi-level attention module. Deployed as a large-scale distributed system, ZOOMER supports graphs with billions of nodes for training and thousands of requests per second for serving. ZOOMER achieves up to 14x speedup when downsizing sampling scales with comparable (even better) AUC performance than baseline methods. Besides, both the offline evaluation and online A/B test demonstrate the effectiveness of ZOOMER.
Published: 2022

19. Preparation and properties of sustainable superhydrophobic cotton fabrics modified with lignin nanoparticles, tannic acid and methyltrimethoxysilane

Author: Sha, Xinkang, Chen, Langqian, Jia, Ying, Zhao, Hanyu, Zuo, Shuai, Yuan, Pengfei, and Chen, Guangxue
Published: 2024
Full Text: View/download PDF

20. Synergistic addition of Cu and Ce enhanced sulfate reducing bacteria-assisted corrosion cracking resistance of 2205 duplex stainless steel

Author: Zhao, Hanyu, Gu, Yueyang, Zhang, Xinrui, Wei, Boxin, Xi, Tong, Zhao, Jinlong, Yang, Chunguang, and Yang, Ke
Published: 2024
Full Text: View/download PDF

21. An enhanced Energy-Based hysteresis model for electrical steel sheets considering mechanical stress

Author: Zhao, Hanyu, An, Yutao, and Liu, Jiabing
Published: 2024
Full Text: View/download PDF

22. Modifying the one-hot encoding technique can enhance the adversarial robustness of the visual model for symbol recognition

Author: Sun, Yi, Zheng, Jun, Zhao, Hanyu, Zhou, Huipeng, Li, Jiaxing, Li, Fan, Xiong, Zehui, Liu, Jun, and Li, Yuanzhang
Published: 2024
Full Text: View/download PDF

23. Excellent energy-storage performance in BNT-BT lead-free ceramics through optimized electromechanical breakdown

Author: Wang, Liang, Cao, Wenjun, Liang, Cen, Wang, Changyuan, Zhao, Hanyu, and Wang, Chunchang
Published: 2024
Full Text: View/download PDF

24. Hysteresis and Loss Characteristics of Soft Magnetic Materials Based on Nonlinear Preisach Model

Author: Zhao, Hanyu, Zhao, Xianlu, Xu, Shu, Liu, Weihao, Wu, Yujie, and An, Yutao
Published: 2023
Full Text: View/download PDF

25. Enhanced antibacterial and corrosion resistance of copper-containing 2205 duplex stainless steel against the corrosive bacterium Shewanella algae

Author: Li, Mankun, Zhou, Junye, Li, Yaqiang, Zhu, Guangqian, Hu, Zishuai, Liu, Shijia, Han, Baochen, Zhao, Hanyu, Liang, Yongmei, Liu, Dan, Xu, Dake, and Li, Jianhui
Published: 2024
Full Text: View/download PDF

26. Calculating Question Similarity is Enough: A New Method for KBQA Tasks

Author: Zhao, Hanyu, Yuan, Sha, Leng, Jiahong, Pan, Xiang, Wang, Guoqiang, Wu, Ledell, and Tang, Jie
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Knowledge Base Question Answering (KBQA) aims to answer natural language questions with the help of an external knowledge base. The core idea is to find the link between the internal knowledge behind questions and known triples of the knowledge base. Traditional KBQA task pipelines contain several steps, including entity recognition, entity linking, answering selection, etc. In this kind of pipeline methods, errors in any procedure will inevitably propagate to the final prediction. To address this challenge, this paper proposes a Corpus Generation - Retrieve Method (CGRM) with Pre-training Language Model (PLM) for the KBQA task. The major novelty lies in the design of the new method, wherein our approach, the knowledge enhanced T5 (kT5) model aims to generate natural language QA pairs based on Knowledge Graph triples and directly solve the QA by retrieving the synthetic dataset. The new method can extract more information about the entities from PLM to improve accuracy and simplify the processes. We test our method on NLPCC-ICCPOL 2016 KBQA dataset, and the results show that our method improves the performance of KBQA and the out straight-forward method is competitive with the state-of-the-art., Comment: We want to withdraw this submission, and add some experiments to make it more valuable
Published: 2021

27. Suppressing interfacial polarization via entropy increase strategy for superior energy-storage performance of Na0.5Bi0.5TiO3-based ceramics

Author: Zhao, Hanyu, Cao, Wenjun, Han, Donghao, Zhu, Xiyue, Liang, Cen, Wang, Changyuan, and Wang, Chunchang
Published: 2024
Full Text: View/download PDF

28. Equimolar high-entropy for excellent energy storage performance in Bi0.5Na0.5TiO3-based ceramics

Author: Wang, Changyuan, Cao, Wenjun, Liang, Cen, Zhao, Hanyu, and Wang, Chunchang
Published: 2024
Full Text: View/download PDF

29. Evolution of microstructure and changes in hydrogen embrittlement resistance of CLAM steel after long-term aging

Author: Gu, Yueyang, Zhao, Hanyu, Li, Xiaonan, Yan, Wei, Xiong, Liangyin, Chen, Wei, and Chen, Demin
Published: 2024
Full Text: View/download PDF

30. Cold deformation behavior and microstructure evolution of biomedical Cu-containing L605 alloy

Author: Yuan, Qiu, Zhao, Hanyu, Xi, Tong, Yang, Chunguang, Hao, Wenjun, and Yang, Ke
Published: 2024
Full Text: View/download PDF

31. Insights into the role of H2O2 on corrosion behavior of NiCu low alloy steel in simulated Beishan groundwater

Author: Sun, Yupeng, Wei, Xin, Dong, Junhua, Li, Dongyun, Zhao, Hanyu, Chen, Nan, Yin, Qili, Ren, Qiying, and Ke, Wei
Published: 2024
Full Text: View/download PDF

32. Controllable release of Cu ions contributes to the enhanced environmentally-friendly performance of antifouling Cu-bearing stainless steel coating prepared using high-velocity air fuel

Author: Zhao, Jinlong, Lian, Tongyu, Sun, Ziqing, Zhao, Hanyu, Yang, Chunguang, Fan, Xiujuan, Li, Shuangjian, Mao, Jie, Deng, Chunming, and Yang, Ke
Published: 2024
Full Text: View/download PDF

33. Research Status and Progress of Biomass-Based 3D Printing Materials

Author: Zhao, Hanyu, Jia, Ying, Chen, Guangxue, He, Minghui, Tian, Junfei, Chen, Qifeng, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Hirche, Sandra, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Möller, Sebastian, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Zhang, Junjie James, Series Editor, Xu, Min, editor, Yang, Li, editor, Zhang, Linghao, editor, and Yan, Shu, editor
Published: 2023
Full Text: View/download PDF

34. Development of chlorine dioxide sustained-release device using carboxymethyl cellulose-polyvinyl alcohol-β-cyclodextrin ternary hydrogel and a new sustained-release kinetic model

Author: Wang, Yanan, Zhao, Hanyu, Huang, Lijie, Chen, Guangxue, Wei, Zhehao, Mo, Qi, Li, Yishan, Wang, Xiyue, Huang, Chongxing, and Chen, Qifeng
Published: 2023
Full Text: View/download PDF

35. Enhanced energy storage performance of NaNbO3-based ceramics via band and domain engineering

Author: Liang, Cen, Wang, Changyuan, Zhao, Hanyu, Cao, Wenjun, Huang, Xuechen, and Wang, Chunchang
Published: 2023
Full Text: View/download PDF

36. Significantly improved energy-storage performance of NaNbO3 lead-free ceramics with Ca0.7Bi 0.2TiO3 addition

Author: Liang, Cen, Wang, Changyuan, Zhao, Hanyu, Cao, Wenjun, Li, Feng, and Wang, Chunchang
Published: 2023
Full Text: View/download PDF

37. Lead-free medium-entropy (Na0.47(1-x)Bi0.47(1-x)Ba0.06(1-x)Sr0.7xNd0.2x)TiO3 relaxor ceramics with robust energy-storage performance

Author: Zhao, Hanyu, Cao, Wenjun, Liang, Cen, Wang, Changyuan, and Wang, Chunchang
Published: 2023
Full Text: View/download PDF

38. MSD: Multi-Self-Distillation Learning via Multi-classifiers within Deep Neural Networks

Author: Luan, Yunteng, Zhao, Hanyu, Yang, Zhi, and Dai, Yafei
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: As the development of neural networks, more and more deep neural networks are adopted in various tasks, such as image classification. However, as the huge computational overhead, these networks could not be applied on mobile devices or other low latency scenes. To address this dilemma, multi-classifier convolutional network is proposed to allow faster inference via early classifiers with the corresponding classifiers. These networks utilize sophisticated designing to increase the early classifier accuracy. However, naively training the multi-classifier network could hurt the performance (accuracy) of deep neural networks as early classifiers throughout interfere with the feature generation process. In this paper, we propose a general training framework named multi-self-distillation learning (MSD), which mining knowledge of different classifiers within the same network and increase every classifier accuracy. Our approach can be applied not only to multi-classifier networks, but also modern CNNs (e.g., ResNet Series) augmented with additional side branch classifiers. We use sampling-based branch augmentation technique to transform a single-classifier network into a multi-classifier network. This reduces the gap of capacity between different classifiers, and improves the effectiveness of applying MSD. Our experiments show that MSD improves the accuracy of various networks: enhancing the accuracy of every classifier significantly for existing multi-classifier network (MSDNet), improving vanilla single-classifier networks with internal classifiers with high accuracy, while also improving the final accuracy.
Published: 2019

39. High energy-storage performance in X9R-type Na0.5Bi0.5TiO3-based lead-free ceramics

Author: Zhao, Hanyu, Cao, Wenjun, Liang, Cen, Wang, Changyuan, and Wang, Chunchang
Published: 2023
Full Text: View/download PDF

40. Research Status and Progress of Biomass-Based 3D Printing Materials

Author: Zhao, Hanyu, primary, Jia, Ying, additional, Chen, Guangxue, additional, He, Minghui, additional, Tian, Junfei, additional, and Chen, Qifeng, additional
Published: 2023
Full Text: View/download PDF

41. Understanding the role of alloyed Ni and Cu on improving corrosion resistance of low alloy steel in the simulated Beishan groundwater

Author: Sun, Yupeng, Wei, Xin, Dong, Junhua, Chen, Nan, Zhao, Hanyu, Ren, Qiying, and Ke, Wei
Published: 2022
Full Text: View/download PDF

42. A formal analysis method for composition protocol based on model checking

Author: Xiao, Meihua, Zhao, Hanyu, Yang, Ke, Ouyang, Ri, and Song, Weiwei
Published: 2022
Full Text: View/download PDF

43. Effect of waste tire rubber crumb on impermeability of cement-stabilized soil.

Author: Wang, Fengchi, Sun, Chang, Zhao, Hanyu, and Liu, Yanzhao
Subjects: RUBBER waste, SOIL permeability, HYDRAULIC conductivity, WASTE tires, SOIL mechanics, TIRE recycling
Abstract: To alleviate the application limitations of cement-stabilized soil in impermeable engineering and promote the recycling of waste rubber tires, crumb rubber produced from waste rubber tires was used to improve the engineering properties of the soil. The effect of crumb rubber on the permeability of the soil under different conditions was investigated using compression, compaction, and permeability tests. Crumb rubber can effectively improve the impermeability of cement-stabilized soil. The impermeability efficiency of crumb rubber is between 11% and 45%. Cleaning crumb rubber with water and Na2CO3 solution can reduce the hydraulic conductivity of rubberized cement-stabilized soil (RCSS) by 7.9%–63.6%. This, in turn, increases the unconfined compressive strength by 4.1%–25.5%. The hydraulic conductivity of the RCSS decreases with an increase in the cement content, curing duration, and void ratio. A crumb rubber concentration of 10%–20% is suitable for enhancing the impermeability of the RCSS and satisfying its strength requirements. The NOF of the equation used to predict the hydraulic conductivity of the RCSS by the rubber content, cement content, and curing duration is less than 0.35. The linear correlation between the predicted and measured values of hydraulic conductivity was determined to be 0.995 from the k-quc correlation model. The results show that the hydraulic conductivity of RCSS can be estimated reliably based on the mix ratio and compressive strength. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

44. A Chinese Machine Reading Comprehension Dataset Automatic Generated Based on Knowledge Graph

Author: Zhao, Hanyu, Yuan, Sha, Leng, Jiahong, Pan, Xiang, Xue, Zhao, Ma, Quanyue, Liang, Yangxiao, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Li, Sheng, editor, Sun, Maosong, editor, Liu, Yang, editor, Wu, Hua, editor, Kang, Liu, editor, Che, Wanxiang, editor, He, Shizhu, editor, and Rao, Gaoqi, editor
Published: 2021
Full Text: View/download PDF

45. Analysis of GPS Technology in Surveying and Mapping Engineering Technology

Author: Zhao, Hanyu, primary
Published: 2024
Full Text: View/download PDF

46. Six new polyphenolic metabolites isolated from the Suillus granulatus and their cytotoxicity against HepG2 cells

Author: Zhao, Hanyu, primary, Xiong, Miaomiao, additional, Yang, Xiaomin, additional, Yao, Lan, additional, Wang, Zeyan, additional, Wang, Li-an, additional, Li, Zhuang, additional, Zhang, Jinxiu, additional, and Lv, Jianhua, additional
Published: 2024
Full Text: View/download PDF

47. Applying self-attention model to learn both Empirical Risk Minimization and Invariant Risk Minimization for multimedia recommendation

Author: Zhao, Hanyu, primary, Huang, Yangqi, additional, Zhao, Kunqi, additional, and Wang, Sizhuo, additional
Published: 2024
Full Text: View/download PDF

48. WuDaoCorpora: A super large-scale Chinese corpora for pre-training language models

Author: Yuan, Sha, Zhao, Hanyu, Du, Zhengxiao, Ding, Ming, Liu, Xiao, Cen, Yukuo, Zou, Xu, Yang, Zhilin, and Tang, Jie
Published: 2021
Full Text: View/download PDF

49. High‐Entropy Design Toward Ultrahigh Energy Storage Density Under Moderate Electric Field in Bulk Lead‐Free Ceramics.

Author: Zhao, Hanyu, Cao, Wenjun, Liang, Cen, Wang, Changyuan, Wang, Chunchang, and Cheng, Zhenxiang
Subjects: *ENERGY density, *ENERGY storage, *ELECTRIC fields, *DIELECTRIC materials, *DIELECTRICS
Abstract: Electrostatic capacitors with ultrahigh energy‐storage density are crucial for the miniaturization of pulsed power devices. A long‐standing challenge is developing dielectric materials that achieve ultrahigh recoverable energy density Wrec ≥ 10 J cm−3 under moderate electric fields (30 ≤ E ≤ 50 kV mm−1). Herein, a specific high‐entropy strategy is proposed to modulate the phase structure and interfacial polarization of medium‐entropy base materials using linear dielectrics. This strategy ensures a sufficient polar phase and a high enough electric field for complete polarization, thereby achieving ultrahigh Wrec by enhancing polarization strength. The validity of this strategy is demonstrated in the (Na0.282Bi0.282Ba0.036Sr0.28Nd0.08)TiO3‐xCa0.7Bi0.2TiO3 (NBBSNT‐xCBT) (x = 0–0.15) system. The CBT‐modulated samples exhibit a polyphase structure of R3c, P4bm, and Pm‐3m with reduced remnant polarization (Pr). Additionally, the addition of CBT effectively suppresses interfacial polarization, enhancing the maximum polarization (Pmax). These factors significantly improve the value of ∆P = Pmax − Pr. As a result, an ultrahigh Wrec of 10.5 J cm−3 with a high‐efficiency η of 80.3% is obtained in the x = 0.1 sample under a moderate electric field of 45 kV mm−1 for the first time. This work paves the way for achieving superior energy‐storage performance under moderate electric fields. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

50. Optimization design of the modified SST based on adaptive genetic algorithm.

Author: Zhao, Hanyu, Zhang, Yueyue, Tu, Jianjian, Zheng, Fan, and Li, Jiale
Subjects: *MAGNETIC field measurements, *MAGNETIC structure, *POLYETHER ether ketone, *SOFT magnetic materials, *MAGNETIC measurements
Abstract: The coupling of magnetic, thermal and structural force fields exists in the design, manufacture and operation of the magnetic core of transformer and motor, which makes the experimental data obtained from the standard magnetic properties measurement inconsistent with the actual problem. In order to obtain the magnetic properties measurement data of soft magnetic materials such as electrical steel sheet under multiple physical factors, the magnetic circuit structure was improved on the basis of the standard single sheet tester. At the same time, anti-bending clips made of polyether ether ketone were installed in order to prevent the bending deformation of the sample sheet during the application of tensile stress which would lead to uneven force. Then0 this basic structure of the tester under the coupling of temperature and stress is designed. In order to further improve the excitation performance of the tester and the magnetization uniformity of the sample, combined with COMSOL and MATLAB co-simulation, adaptive genetic algorithm was used to optimize the size of the yoke and excitation coil, thereby ensuring the uniformity of the magnetic field distribution in the measurement area of the sample. Finally, by combining simulation and specific experiments, the loss measurement results of a single sheet tester under different ambient temperatures and different stresses are studied, which proves that the tester is feasible and accurate, and the variation law of complex magnetic characteristics of electrical steel sheets under multiple physical fields is summarized. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

300 results on '"Zhao, Hanyu"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources