Author: "Dai, Wenliang" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Dai, Wenliang"' showing total 116 results

Start Over Author "Dai, Wenliang"

116 results on '"Dai, Wenliang"'

1. NVLM: Open Frontier-Class Multimodal LLMs

Author: Dai, Wenliang, Lee, Nayeon, Wang, Boxin, Yang, Zhuolin, Liu, Zihan, Barker, Jon, Rintamaki, Tuomas, Shoeybi, Mohammad, Catanzaro, Bryan, and Ping, Wei
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Computer Science - Multimedia
Abstract: We introduce NVLM 1.0, a family of frontier-class multimodal large language models (LLMs) that achieve state-of-the-art results on vision-language tasks, rivaling the leading proprietary models (e.g., GPT-4o) and open-access models (e.g., Llama 3-V 405B and InternVL 2). Remarkably, NVLM 1.0 shows improved text-only performance over its LLM backbone after multimodal training. In terms of model design, we perform a comprehensive comparison between decoder-only multimodal LLMs (e.g., LLaVA) and cross-attention-based models (e.g., Flamingo). Based on the strengths and weaknesses of both approaches, we propose a novel architecture that enhances both training efficiency and multimodal reasoning capabilities. Furthermore, we introduce a 1-D tile-tagging design for tile-based dynamic high-resolution images, which significantly boosts performance on multimodal reasoning and OCR-related tasks. Regarding training data, we meticulously curate and provide detailed information on our multimodal pretraining and supervised fine-tuning datasets. Our findings indicate that dataset quality and task diversity are more important than scale, even during the pretraining phase, across all architectures. Notably, we develop production-grade multimodality for the NVLM-1.0 models, enabling them to excel in vision-language tasks while maintaining and even improving text-only performance compared to their LLM backbones. To achieve this, we craft and integrate a high-quality text-only dataset into multimodal training, alongside a substantial amount of multimodal math and reasoning data, leading to enhanced math and coding capabilities across modalities. To advance research in the field, we release the model weights at https://huggingface.co/nvidia/NVLM-D-72B and will open-source the training code for the community soon., Comment: Fixed the typos. For more information, please visit our project page at: https://research.nvidia.com/labs/adlr/NVLM-1
Published: 2024

2. Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models

Author: Lovenia, Holy, Dai, Wenliang, Cahyawijaya, Samuel, Ji, Ziwei, and Fung, Pascale
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
Abstract: Object hallucination poses a significant challenge in vision-language (VL) models, often leading to the generation of nonsensical or unfaithful responses with non-existent objects. However, the absence of a general measurement for evaluating object hallucination in VL models has hindered our understanding and ability to mitigate this issue. In this work, we present NOPE (Negative Object Presence Evaluation), a novel benchmark designed to assess object hallucination in VL models through visual question answering (VQA). We propose a cost-effective and scalable approach utilizing large language models to generate 29.5k synthetic negative pronoun (NegP) data of high quality for NOPE. We extensively investigate the performance of 10 state-of-the-art VL models in discerning the non-existence of objects in visual questions, where the ground truth answers are denoted as NegP (e.g., "none"). Additionally, we evaluate their standard performance on visual questions on 9 other VQA datasets. Through our experiments, we demonstrate that no VL model is immune to the vulnerability of object hallucination, as all models achieve accuracy below 10\% on NegP. Furthermore, we uncover that lexically diverse visual questions, question types with large scopes, and scene-relevant objects capitalize the risk of object hallucination in VL models., Comment: Published in ALVR Workshop at ACL 2024
Published: 2023

3. Survey of Social Bias in Vision-Language Models

Author: Lee, Nayeon, Bang, Yejin, Lovenia, Holy, Cahyawijaya, Samuel, Dai, Wenliang, and Fung, Pascale
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: In recent years, the rapid advancement of machine learning (ML) models, particularly transformer-based pre-trained models, has revolutionized Natural Language Processing (NLP) and Computer Vision (CV) fields. However, researchers have discovered that these models can inadvertently capture and reinforce social biases present in their training datasets, leading to potential social harms, such as uneven resource allocation and unfair representation of specific social groups. Addressing these biases and ensuring fairness in artificial intelligence (AI) systems has become a critical concern in the ML community. The recent introduction of pre-trained vision-and-language (VL) models in the emerging multimodal field demands attention to the potential social biases present in these models as well. Although VL models are susceptible to social bias, there is a limited understanding compared to the extensive discussions on bias in NLP and CV. This survey aims to provide researchers with a high-level insight into the similarities and differences of social bias studies in pre-trained models across NLP, CV, and VL. By examining these perspectives, the survey aims to offer valuable guidelines on how to approach and mitigate social bias in both unimodal and multimodal settings. The findings and recommendations presented here can benefit the ML community, fostering the development of fairer and non-biased AI models in various applications and research endeavors.
Published: 2023

4. Visual Instruction Tuning with Polite Flamingo

Author: Chen, Delong, Liu, Jianfeng, Dai, Wenliang, and Wang, Baoyuan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
Abstract: Recent research has demonstrated that the multi-task fine-tuning of multi-modal Large Language Models (LLMs) using an assortment of annotated downstream vision-language datasets significantly enhances their performance. Yet, during this process, a side effect, which we termed as the "multi-modal alignment tax", surfaces. This side effect negatively impacts the model's ability to format responses appropriately -- for instance, its "politeness" -- due to the overly succinct and unformatted nature of raw annotations, resulting in reduced human preference. In this paper, we introduce Polite Flamingo, a multi-modal response rewriter that transforms raw annotations into a more appealing, "polite" format. Polite Flamingo is trained to reconstruct high-quality responses from their automatically distorted counterparts and is subsequently applied to a vast array of vision-language datasets for response rewriting. After rigorous filtering, we generate the PF-1M dataset and further validate its value by fine-tuning a multi-modal LLM with it. Combined with novel methodologies including U-shaped multi-stage tuning and multi-turn augmentation, the resulting model, Clever Flamingo, demonstrates its advantages in both multi-modal understanding and response politeness according to automated and human evaluations., Comment: In AAAI-24
Published: 2023

5. InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning

Author: Dai, Wenliang, Li, Junnan, Li, Dongxu, Tiong, Anthony Meng Huat, Zhao, Junqi, Wang, Weisheng, Li, Boyang, Fung, Pascale, and Hoi, Steven
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Large-scale pre-training and instruction tuning have been successful at creating general-purpose language models with broad competence. However, building general-purpose vision-language models is challenging due to the rich input distributions and task diversity resulting from the additional visual input. Although vision-language pretraining has been widely studied, vision-language instruction tuning remains under-explored. In this paper, we conduct a systematic and comprehensive study on vision-language instruction tuning based on the pretrained BLIP-2 models. We gather 26 publicly available datasets, covering a wide variety of tasks and capabilities, and transform them into instruction tuning format. Additionally, we introduce an instruction-aware Query Transformer, which extracts informative features tailored to the given instruction. Trained on 13 held-in datasets, InstructBLIP attains state-of-the-art zero-shot performance across all 13 held-out datasets, substantially outperforming BLIP-2 and larger Flamingo models. Our models also lead to state-of-the-art performance when finetuned on individual downstream tasks (e.g., 90.7% accuracy on ScienceQA questions with image contexts). Furthermore, we qualitatively demonstrate the advantages of InstructBLIP over concurrent multimodal models. All InstructBLIP models are open-sourced at https://github.com/salesforce/LAVIS/tree/main/projects/instructblip., Comment: preprint
Published: 2023

6. A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity

Author: Bang, Yejin, Cahyawijaya, Samuel, Lee, Nayeon, Dai, Wenliang, Su, Dan, Wilie, Bryan, Lovenia, Holy, Ji, Ziwei, Yu, Tiezheng, Chung, Willy, Do, Quyet V., Xu, Yan, and Fung, Pascale
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: This paper proposes a framework for quantitatively evaluating interactive LLMs such as ChatGPT using publicly available data sets. We carry out an extensive technical evaluation of ChatGPT using 23 data sets covering 8 different common NLP application tasks. We evaluate the multitask, multilingual and multi-modal aspects of ChatGPT based on these data sets and a newly designed multimodal dataset. We find that ChatGPT outperforms LLMs with zero-shot learning on most tasks and even outperforms fine-tuned models on some tasks. We find that it is better at understanding non-Latin script languages than generating them. It is able to generate multimodal content from textual prompts, via an intermediate code generation step. Moreover, we find that ChatGPT is 63.41% accurate on average in 10 different reasoning categories under logical reasoning, non-textual reasoning, and commonsense reasoning, hence making it an unreliable reasoner. It is, for example, better at deductive than inductive reasoning. ChatGPT suffers from hallucination problems like other LLMs and it generates more extrinsic hallucinations from its parametric memory as it does not have access to an external knowledge base. Finally, the interactive feature of ChatGPT enables human collaboration with the underlying LLM to improve its performance, i.e, 8% ROUGE-1 on summarization and 2% ChrF++ on machine translation, in a multi-turn "prompt engineering" fashion. We also release codebase for evaluation set extraction., Comment: 45 pages, AACL 2022
Published: 2023

7. NusaCrowd: Open Source Initiative for Indonesian NLP Resources

Author: Cahyawijaya, Samuel, Lovenia, Holy, Aji, Alham Fikri, Winata, Genta Indra, Wilie, Bryan, Mahendra, Rahmad, Wibisono, Christian, Romadhony, Ade, Vincentio, Karissa, Koto, Fajri, Santoso, Jennifer, Moeljadi, David, Wirawan, Cahya, Hudi, Frederikus, Parmonangan, Ivan Halim, Alfina, Ika, Wicaksono, Muhammad Satrio, Putra, Ilham Firdausi, Rahmadani, Samsul, Oenang, Yulianti, Septiandri, Ali Akbar, Jaya, James, Dhole, Kaustubh D., Suryani, Arie Ardiyanti, Putri, Rifki Afina, Su, Dan, Stevens, Keith, Nityasya, Made Nindyatama, Adilazuarda, Muhammad Farid, Ignatius, Ryan, Diandaru, Ryandito, Yu, Tiezheng, Ghifari, Vito, Dai, Wenliang, Xu, Yan, Damapuspita, Dyah, Tho, Cuk, Karo, Ichwanul Muslim Karo, Fatyanosa, Tirana Noor, Ji, Ziwei, Fung, Pascale, Neubig, Graham, Baldwin, Timothy, Ruder, Sebastian, Sujaini, Herry, Sakti, Sakriani, and Purwarianti, Ayu
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: We present NusaCrowd, a collaborative initiative to collect and unify existing resources for Indonesian languages, including opening access to previously non-public resources. Through this initiative, we have brought together 137 datasets and 118 standardized data loaders. The quality of the datasets has been assessed manually and automatically, and their value is demonstrated through multiple experiments. NusaCrowd's data collection enables the creation of the first zero-shot benchmarks for natural language understanding and generation in Indonesian and the local languages of Indonesia. Furthermore, NusaCrowd brings the creation of the first multilingual automatic speech recognition benchmark in Indonesian and the local languages of Indonesia. Our work strives to advance natural language processing (NLP) research for languages that are under-represented despite being widely spoken.
Published: 2022

8. Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training

Author: Dai, Wenliang, Liu, Zihan, Ji, Ziwei, Su, Dan, and Fung, Pascale
Subjects: Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition
Abstract: Large-scale vision-language pre-trained (VLP) models are prone to hallucinate non-existent visual objects when generating text based on visual information. In this paper, we systematically study the object hallucination problem from three aspects. First, we examine recent state-of-the-art VLP models, showing that they still hallucinate frequently, and models achieving better scores on standard metrics (e.g., CIDEr) could be more unfaithful. Second, we investigate how different types of image encoding in VLP influence hallucination, including region-based, grid-based, and patch-based. Surprisingly, we find that patch-based features perform the best and smaller patch resolution yields a non-trivial reduction in object hallucination. Third, we decouple various VLP objectives and demonstrate that token-level image-text alignment and controlled generation are crucial to reducing hallucination. Based on that, we propose a simple yet effective VLP loss named ObjMLM to further mitigate object hallucination. Results show that it reduces object hallucination by up to 17.4% when tested on two benchmarks (COCO Caption for in-domain and NoCaps for out-of-domain evaluation)., Comment: Accepted at EACL 2023
Published: 2022

9. Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands

Author: Dai, Wenliang, Cahyawijaya, Samuel, Yu, Tiezheng, Barezi, Elham J, and Fung, Pascale
Subjects: Computer Science - Computation and Language, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: With the rise of deep learning and intelligent vehicles, the smart assistant has become an essential in-car component to facilitate driving and provide extra functionalities. In-car smart assistants should be able to process general as well as car-related commands and perform corresponding actions, which eases driving and improves safety. However, in this research field, most datasets are in major languages, such as English and Chinese. There is a huge data scarcity issue for low-resource languages, hindering the development of research and applications for broader communities. Therefore, it is crucial to have more benchmarks to raise awareness and motivate the research in low-resource languages. To mitigate this problem, we collect a new dataset, namely Cantonese In-car Audio-Visual Speech Recognition (CI-AVSR), for in-car speech recognition in the Cantonese language with video and audio data. Together with it, we propose Cantonese Audio-Visual Speech Recognition for In-car Commands as a new challenge for the community to tackle low-resource speech recognition under in-car scenarios.
Published: 2022

10. Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation

Author: Dai, Wenliang, Hou, Lu, Shang, Lifeng, Jiang, Xin, Liu, Qun, and Fung, Pascale
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: The recent large-scale vision-language pre-training (VLP) of dual-stream architectures (e.g., CLIP) with a tremendous amount of image-text pair data, has shown its superiority on various multimodal alignment tasks. Despite its success, the resulting models are not capable of multimodal generative tasks due to the weak text encoder. To tackle this problem, we propose to augment the dual-stream VLP model with a textual pre-trained language model (PLM) via vision-language knowledge distillation (VLKD), enabling the capability for multimodal generation. VLKD is pretty data- and computation-efficient compared to the pre-training from scratch. Experimental results show that the resulting model has strong zero-shot performance on multimodal generation tasks, such as open-ended visual question answering and image captioning. For example, it achieves 44.5% zero-shot accuracy on the VQAv2 dataset, surpassing the previous state-of-the-art zero-shot model with $7\times$ fewer parameters. Furthermore, the original textual language understanding and generation ability of the PLM is maintained after VLKD, which makes our model versatile for both multimodal and unimodal tasks., Comment: Accepted to ACL 2022
Published: 2022

11. Survey of Hallucination in Natural Language Generation

Author: Ji, Ziwei, Lee, Nayeon, Frieske, Rita, Yu, Tiezheng, Su, Dan, Xu, Yan, Ishii, Etsuko, Bang, Yejin, Chen, Delong, Dai, Wenliang, Chan, Ho Shu, Madotto, Andrea, and Fung, Pascale
Subjects: Computer Science - Computation and Language, A.1
Abstract: Natural Language Generation (NLG) has improved exponentially in recent years thanks to the development of sequence-to-sequence deep learning technologies such as Transformer-based language models. This advancement has led to more fluent and coherent NLG, leading to improved development in downstream tasks such as abstractive summarization, dialogue generation and data-to-text generation. However, it is also apparent that deep learning based generation is prone to hallucinate unintended text, which degrades the system performance and fails to meet user expectations in many real-world scenarios. To address this issue, many studies have been presented in measuring and mitigating hallucinated texts, but these have never been reviewed in a comprehensive manner before. In this survey, we thus provide a broad overview of the research progress and challenges in the hallucination problem in NLG. The survey is organized into two parts: (1) a general overview of metrics, mitigation methods, and future directions; (2) an overview of task-specific research progress on hallucinations in the following downstream tasks, namely abstractive summarization, dialogue generation, generative question answering, data-to-text generation, machine translation, and visual-language generation; and (3) hallucinations in large language models (LLMs). This survey serves to facilitate collaborative efforts among researchers in tackling the challenge of hallucinated texts in NLG.
Published: 2022
Full Text: View/download PDF

12. CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition

Author: Dai, Wenliang, Cahyawijaya, Samuel, Yu, Tiezheng, Barezi, Elham J., Xu, Peng, Yiu, Cheuk Tung Shadow, Frieske, Rita, Lovenia, Holy, Winata, Genta Indra, Chen, Qifeng, Ma, Xiaojuan, Shi, Bertram E., and Fung, Pascale
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: With the rise of deep learning and intelligent vehicle, the smart assistant has become an essential in-car component to facilitate driving and provide extra functionalities. In-car smart assistants should be able to process general as well as car-related commands and perform corresponding actions, which eases driving and improves safety. However, there is a data scarcity issue for low resource languages, hindering the development of research and applications. In this paper, we introduce a new dataset, Cantonese In-car Audio-Visual Speech Recognition (CI-AVSR), for in-car command recognition in the Cantonese language with both video and audio data. It consists of 4,984 samples (8.3 hours) of 200 in-car commands recorded by 30 native Cantonese speakers. Furthermore, we augment our dataset using common in-car background noises to simulate real environments, producing a dataset 10 times larger than the collected one. We provide detailed statistics of both the clean and the augmented versions of our dataset. Moreover, we implement two multimodal baselines to demonstrate the validity of CI-AVSR. Experiment results show that leveraging the visual signal improves the overall performance of the model. Although our best model can achieve a considerable quality on the clean test set, the speech recognition quality on the noisy data is still inferior and remains as an extremely challenging task for real in-car speech recognition systems. The dataset and code will be released at https://github.com/HLTCHKUST/CI-AVSR., Comment: 6 pages
Published: 2022

13. Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset

Author: Yu, Tiezheng, Frieske, Rita, Xu, Peng, Cahyawijaya, Samuel, Yiu, Cheuk Tung Shadow, Lovenia, Holy, Dai, Wenliang, Barezi, Elham J., Chen, Qifeng, Ma, Xiaojuan, Shi, Bertram E., and Fung, Pascale
Subjects: Computer Science - Computation and Language, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Automatic speech recognition (ASR) on low resource languages improves the access of linguistic minorities to technological advantages provided by artificial intelligence (AI). In this paper, we address the problem of data scarcity for the Hong Kong Cantonese language by creating a new Cantonese dataset. Our dataset, Multi-Domain Cantonese Corpus (MDCC), consists of 73.6 hours of clean read speech paired with transcripts, collected from Cantonese audiobooks from Hong Kong. It comprises philosophy, politics, education, culture, lifestyle and family domains, covering a wide range of topics. We also review all existing Cantonese datasets and analyze them according to their speech type, data source, total size and availability. We further conduct experiments with Fairseq S2T Transformer, a state-of-the-art ASR model, on the biggest existing dataset, Common Voice zh-HK, and our proposed MDCC, and the results show the effectiveness of our dataset. In addition, we create a powerful and robust Cantonese ASR model by applying multi-dataset learning on MDCC and Common Voice zh-HK.
Published: 2022

14. ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation

Author: Lovenia, Holy, Cahyawijaya, Samuel, Winata, Genta Indra, Xu, Peng, Yan, Xu, Liu, Zihan, Frieske, Rita, Yu, Tiezheng, Dai, Wenliang, Barezi, Elham J., Chen, Qifeng, Ma, Xiaojuan, Shi, Bertram E., and Fung, Pascale
Subjects: Computer Science - Computation and Language
Abstract: Code-switching is a speech phenomenon occurring when a speaker switches language during a conversation. Despite the spontaneous nature of code-switching in conversational spoken language, most existing works collect code-switching data from read speech instead of spontaneous speech. ASCEND (A Spontaneous Chinese-English Dataset) is a high-quality Mandarin Chinese-English code-switching corpus built on spontaneous multi-turn conversational dialogue sources collected in Hong Kong. We report ASCEND's design and procedure for collecting the speech data, including annotations. ASCEND consists of 10.62 hours of clean speech, collected from 23 bilingual speakers of Chinese and English. Furthermore, we conduct baseline experiments using pre-trained wav2vec 2.0 models, achieving a best performance of 22.69\% character error rate and 27.05% mixed error rate.
Published: 2021

15. Greenformer: Factorization Toolkit for Efficient Deep Neural Networks

Author: Cahyawijaya, Samuel, Winata, Genta Indra, Lovenia, Holy, Wilie, Bryan, Dai, Wenliang, Ishii, Etsuko, and Fung, Pascale
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: While the recent advances in deep neural networks (DNN) bring remarkable success, the computational cost also increases considerably. In this paper, we introduce Greenformer, a toolkit to accelerate the computation of neural networks through matrix factorization while maintaining performance. Greenformer can be easily applied with a single line of code to any DNN model. Our experimental results show that Greenformer is effective for a wide range of scenarios. We provide the showcase of Greenformer at https://samuelcahyawijaya.github.io/greenformer-demo/.
Published: 2021

16. Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization

Author: Yu, Tiezheng, Dai, Wenliang, Liu, Zihan, and Fung, Pascale
Subjects: Computer Science - Computation and Language
Abstract: Multimodal abstractive summarization (MAS) models that summarize videos (vision modality) and their corresponding transcripts (text modality) are able to extract the essential information from massive multimodal data on the Internet. Recently, large-scale generative pre-trained language models (GPLMs) have been shown to be effective in text generation tasks. However, existing MAS models cannot leverage GPLMs' powerful generation ability. To fill this research gap, we aim to study two research questions: 1) how to inject visual information into GPLMs without hurting their generation ability; and 2) where is the optimal place in GPLMs to inject the visual information? In this paper, we present a simple yet effective method to construct vision guided (VG) GPLMs for the MAS task using attention-based add-on layers to incorporate visual information while maintaining their original text generation ability. Results show that our best model significantly surpasses the prior state-of-the-art model by 5.7 ROUGE-1, 5.3 ROUGE-2, and 5.1 ROUGE-L scores on the How2 dataset, and our visual guidance method contributes 83.6% of the overall improvement. Furthermore, we conduct thorough ablation studies to analyze the effectiveness of various modality fusion methods and fusion locations., Comment: Long Paper Accepted in EMNLP 2021
Published: 2021

17. Weakly-supervised Multi-task Learning for Multimodal Affect Recognition

Author: Dai, Wenliang, Cahyawijaya, Samuel, Bang, Yejin, and Fung, Pascale
Subjects: Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia
Abstract: Multimodal affect recognition constitutes an important aspect for enhancing interpersonal relationships in human-computer interaction. However, relevant data is hard to come by and notably costly to annotate, which poses a challenging barrier to build robust multimodal affect recognition systems. Models trained on these relatively small datasets tend to overfit and the improvement gained by using complex state-of-the-art models is marginal compared to simple baselines. Meanwhile, there are many different multimodal affect recognition datasets, though each may be small. In this paper, we propose to leverage these datasets using weakly-supervised multi-task learning to improve the generalization performance on each of them. Specifically, we explore three multimodal affect recognition tasks: 1) emotion recognition; 2) sentiment analysis; and 3) sarcasm recognition. Our experimental results show that multi-tasking can benefit all these tasks, achieving an improvement up to 2.9% accuracy and 3.3% F1-score. Furthermore, our method also helps to improve the stability of model performance. In addition, our analysis suggests that weak supervision can provide a comparable contribution to strong supervision if the tasks are highly correlated., Comment: 13 pages, 2 figures
Published: 2021

18. Multimodal End-to-End Sparse Model for Emotion Recognition

Author: Dai, Wenliang, Cahyawijaya, Samuel, Liu, Zihan, and Fung, Pascale
Subjects: Computer Science - Computation and Language
Abstract: Existing works on multimodal affective computing tasks, such as emotion recognition, generally adopt a two-phase pipeline, first extracting feature representations for each single modality with hand-crafted algorithms and then performing end-to-end learning with the extracted features. However, the extracted features are fixed and cannot be further fine-tuned on different target tasks, and manually finding feature extraction algorithms does not generalize or scale well to different tasks, which can lead to sub-optimal performance. In this paper, we develop a fully end-to-end model that connects the two phases and optimizes them jointly. In addition, we restructure the current datasets to enable the fully end-to-end training. Furthermore, to reduce the computational overhead brought by the end-to-end model, we introduce a sparse cross-modal attention mechanism for the feature extraction. Experimental results show that our fully end-to-end model significantly surpasses the current state-of-the-art models based on the two-phase pipeline. Moreover, by adding the sparse cross-modal attention, our model can maintain performance with around half the computation in the feature extraction part., Comment: 12 pages, 6 figures
Published: 2021

19. CrossNER: Evaluating Cross-Domain Named Entity Recognition

Author: Liu, Zihan, Xu, Yan, Yu, Tiezheng, Dai, Wenliang, Ji, Ziwei, Cahyawijaya, Samuel, Madotto, Andrea, and Fung, Pascale
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Cross-domain named entity recognition (NER) models are able to cope with the scarcity issue of NER samples in target domains. However, most of the existing NER benchmarks lack domain-specialized entity types or do not focus on a certain domain, leading to a less effective cross-domain evaluation. To address these obstacles, we introduce a cross-domain NER dataset (CrossNER), a fully-labeled collection of NER data spanning over five diverse domains with specialized entity categories for different domains. Additionally, we also provide a domain-related corpus since using it to continue pre-training language models (domain-adaptive pre-training) is effective for the domain adaptation. We then conduct comprehensive experiments to explore the effectiveness of leveraging different levels of the domain corpus and pre-training strategies to do domain-adaptive pre-training for the cross-domain task. Results show that focusing on the fractional corpus containing domain-specialized entities and utilizing a more challenging pre-training strategy in domain-adaptive pre-training are beneficial for the NER domain adaptation, and our proposed method can consistently outperform existing cross-domain NER baselines. Nevertheless, experiments also illustrate the challenge of this cross-domain NER task. We hope that our dataset and baselines will catalyze research in the NER domain adaptation area. The code and data are available at https://github.com/zliucr/CrossNER., Comment: Accepted in AAAI-2021
Published: 2020

20. Dimsum @LaySumm 20: BART-based Approach for Scientific Document Summarization

Author: Yu, Tiezheng, Su, Dan, Dai, Wenliang, and Fung, Pascale
Subjects: Computer Science - Computation and Language
Abstract: Lay summarization aims to generate lay summaries of scientific papers automatically. It is an essential task that can increase the relevance of science for all of society. In this paper, we build a lay summary generation system based on the BART model. We leverage sentence labels as extra supervision signals to improve the performance of lay summarization. In the CL-LaySumm 2020 shared task, our model achieves 46.00\% Rouge1-F1 score., Comment: 4 pages
Published: 2020

21. Multi-hop Question Generation with Graph Convolutional Network

Author: Su, Dan, Xu, Yan, Dai, Wenliang, Ji, Ziwei, Yu, Tiezheng, and Fung, Pascale
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Multi-hop Question Generation (QG) aims to generate answer-related questions by aggregating and reasoning over multiple scattered evidence from different paragraphs. It is a more challenging yet under-explored task compared to conventional single-hop QG, where the questions are generated from the sentence containing the answer or nearby sentences in the same paragraph without complex reasoning. To address the additional challenges in multi-hop QG, we propose Multi-Hop Encoding Fusion Network for Question Generation (MulQG), which does context encoding in multiple hops with Graph Convolutional Network and encoding fusion via an Encoder Reasoning Gate. To the best of our knowledge, we are the first to tackle the challenge of multi-hop reasoning over paragraphs without any sentence-level information. Empirical results on HotpotQA dataset demonstrate the effectiveness of our method, in comparison with baselines on automatic evaluation metrics. Moreover, from the human evaluation, our proposed model is able to generate fluent questions with high completeness and outperforms the strongest baseline by 20.8% in the multi-hop evaluation. The code is publicly available at https://github.com/HLTCHKUST/MulQG}{https://github.com/HLTCHKUST/MulQG ., Comment: Findings of EMNLP 2020
Published: 2020
Full Text: View/download PDF

22. Modality-Transferable Emotion Embeddings for Low-Resource Multimodal Emotion Recognition

Author: Dai, Wenliang, Liu, Zihan, Yu, Tiezheng, and Fung, Pascale
Subjects: Computer Science - Computation and Language
Abstract: Despite the recent achievements made in the multi-modal emotion recognition task, two problems still exist and have not been well investigated: 1) the relationship between different emotion categories are not utilized, which leads to sub-optimal performance; and 2) current models fail to cope well with low-resource emotions, especially for unseen emotions. In this paper, we propose a modality-transferable model with emotion embeddings to tackle the aforementioned issues. We use pre-trained word embeddings to represent emotion categories for textual data. Then, two mapping functions are learned to transfer these embeddings into visual and acoustic spaces. For each modality, the model calculates the representation distance between the input sequence and target emotions and makes predictions based on the distances. By doing so, our model can directly adapt to the unseen emotions in any modality since we have their pre-trained embeddings and modality mapping functions. Experiments show that our model achieves state-of-the-art performance on most of the emotion categories. In addition, our model also outperforms existing baselines in the zero-shot and few-shot scenarios for unseen emotions., Comment: 12 pages, 5 figures
Published: 2020

23. Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for Offensive Language Detection

Author: Dai, Wenliang, Yu, Tiezheng, Liu, Zihan, and Fung, Pascale
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Nowadays, offensive content in social media has become a serious problem, and automatically detecting offensive language is an essential task. In this paper, we build an offensive language detection system, which combines multi-task learning with BERT-based models. Using a pre-trained language model such as BERT, we can effectively learn the representations for noisy text in social media. Besides, to boost the performance of offensive language detection, we leverage the supervision signals from other related tasks. In the OffensEval-2020 competition, our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place (92.23%F1). An empirical analysis is provided to explain the effectiveness of our approaches., Comment: Submitted to SemEval-2020 Workshop
Published: 2020

24. Visual Instruction Tuning with Polite Flamingo

Author: Chen, Delong, Liu, Jianfeng, Dai, Wenliang, Wang, Baoyuan, Chen, Delong, Liu, Jianfeng, Dai, Wenliang, and Wang, Baoyuan
Abstract: Recent research has demonstrated that the multi-task fine-tuning of multi-modal Large Language Models (LLMs) using an assortment of annotated downstream vision-language datasets significantly enhances their performance. Yet, during this process, a side effect, which we termed as the "multi-modal alignment tax", surfaces. This side effect negatively impacts the model's ability to format responses appropriately - for instance, its "politeness" - due to the overly succinct and unformatted nature of raw annotations, resulting in reduced human preference. In this paper, we introduce Polite Flamingo, a multi-modal response rewriter that transforms raw annotations into a more appealing, "polite" format. Polite Flamingo is trained to reconstruct high-quality responses from their automatically distorted counterparts and is subsequently applied to a vast array of vision-language datasets for response rewriting. After rigorous filtering, we generate the PF-1M dataset and further validate its value by fine-tuning a multi-modal LLM with it. Combined with novel methodologies including U-shaped multi-stage tuning and multi-turn augmentation, the resulting model, Clever Flamingo, demonstrates its advantages in both multi-modal understanding and response politeness according to automated and human evaluations. Code and dataset are available at https://github.com/ChenDelong1999/polite-flamingo
Published: 2024

25. Complete pathologic response to neoadjuvant icotinib in stage IIIA EGFR-mutant lung adenosquamous carcinoma: A case report

Author: Cai, Zhongfu, primary, Huang, Jishui, additional, Dai, Wenliang, additional, Li, Xiaobin, additional, Hong, Wencong, additional, and Hong, Youzhi, additional
Published: 2024
Full Text: View/download PDF

26. LB-ADI: An Efficient Method for Transient Thermal Simulation of Integrated Chiplets and Packages

Author: Li, Jie, primary, Tang, Min, additional, Wu, Lin-Sheng, additional, Jiang, Liguo, additional, Dai, Wenliang, additional, and Mao, Junfa, additional
Published: 2024
Full Text: View/download PDF

27. Catastrophe risk assessment based on arima and decision tree model

Author: Obaidat, Mohammad S., Mahalle, Parikshit N., Sun, Yuan, Yan, Tingting, Xu, Enlin, Dai, Wenliang, and He, Yuhang
Published: 2024
Full Text: View/download PDF

28. NusaCrowd: Open Source Initiative for Indonesian NLP Resources

Author: Cahyawijaya, Samuel, Lovenia, Holy, Wilie, Bryan, Su, Dan, Yu, Tiezheng, Dai, Wenliang, Xu, Yan, Ji, Ziwei, Fung, Pascale Ngan, Cahyawijaya, Samuel, Lovenia, Holy, Wilie, Bryan, Su, Dan, Yu, Tiezheng, Dai, Wenliang, Xu, Yan, Ji, Ziwei, and Fung, Pascale Ngan
Abstract: We present NusaCrowd, a collaborative initiative to collect and unify existing resources for Indonesian languages, including opening access to previously non-public resources. Through this initiative, we have brought together 137 datasets and 118 standardized data loaders. The quality of the datasets has been assessed manually and automatically, and their value is demonstrated through multiple experiments. NusaCrowd's data collection enables the creation of the first zero-shot benchmarks for natural language understanding and generation in Indonesian and the local languages of Indonesia. Furthermore, NusaCrowd brings the creation of the first multilingual automatic speech recognition benchmark in Indonesian and the local languages of Indonesia. Our work strives to advance natural language processing (NLP) research for languages that are underrepresented despite being widely spoken. © 2023 Association for Computational Linguistics.
Published: 2023

29. mCLIP: Multilingual CLIP via Cross-lingual Transfer

Author: Chen, Guanhua, Hou, Lu, Chen, Yun, Dai, Wenliang, Shang, Lifeng, Jiang, Xin, Liu, Qun, Pan, Jia, Wang, Wenping, Chen, Guanhua, Hou, Lu, Chen, Yun, Dai, Wenliang, Shang, Lifeng, Jiang, Xin, Liu, Qun, Pan, Jia, and Wang, Wenping
Abstract: Large-scale vision-language pretrained (VLP) models like CLIP have shown remarkable performance on various downstream cross-modal tasks. However, they are usually biased towards English due to the lack of sufficient non-English image-text pairs. Existing multilingual VLP methods often learn retrieval-inefficient single-stream models by translation-augmented non-English image-text pairs. In this paper, we introduce mCLIP, a retrieval-efficient dual-stream multilingual VLP model, trained by aligning the CLIP model and a Multilingual Text Encoder (MTE) through a novel Triangle Cross-modal Knowledge Distillation (TriKD) method. It is parameter-efficient as only two light projectors on the top of them are updated during distillation. Furthermore, to enhance the token- and sentence-level multilingual representation of the MTE, we propose to train it with machine translation and contrastive learning jointly before the TriKD to provide a better initialization. Empirical results show that mCLIP achieves new state-of-the-art performance for both zero-shot and finetuned multilingual image-text retrieval task. © 2023 Association for Computational Linguistics.
Published: 2023

30. Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training

Author: Dai, Wenliang, Liu, Zihan, Ji, Ziwei, Su, Dan, Fung, Pascale Ngan, Dai, Wenliang, Liu, Zihan, Ji, Ziwei, Su, Dan, and Fung, Pascale Ngan
Abstract: Large-scale vision-language pre-trained (VLP) models are prone to hallucinate non-existent visual objects when generating text based on visual information. In this paper, we systematically study the object hallucination problem from three aspects. First, we examine recent state-of-the-art VLP models, showing that they still hallucinate frequently and models achieving better scores on standard metrics (e.g., CIDEr) could be more unfaithful. Second, we investigate how different types of image encoding in VLP influence hallucination, including region-based, grid-based, and patch-based. Surprisingly, we find that patch-based features perform the best and smaller patch resolution yields a non-trivial reduction in object hallucination. Third, we decouple various VLP objectives and demonstrate that token-level image-text alignment and controlled generation are crucial to reducing hallucination. Based on that, we propose a simple yet effective VLP loss named ObjMLM to further mitigate object hallucination. Results show that it reduces object hallucination by up to 17.4% when tested on two benchmarks (COCO Caption for in-domain and NoCaps for out-of-domain evaluation).
Published: 2023

31. mCLIP: Multilingual CLIP via Cross-lingual Transfer

Author: Chen, Guanhua, primary, Hou, Lu, additional, Chen, Yun, additional, Dai, Wenliang, additional, Shang, Lifeng, additional, Jiang, Xin, additional, Liu, Qun, additional, Pan, Jia, additional, and Wang, Wenping, additional
Published: 2023
Full Text: View/download PDF

32. NusaCrowd: Open Source Initiative for Indonesian NLP Resources

Author: Cahyawijaya, Samuel, primary, Lovenia, Holy, additional, Aji, Alham Fikri, additional, Winata, Genta, additional, Wilie, Bryan, additional, Koto, Fajri, additional, Mahendra, Rahmad, additional, Wibisono, Christian, additional, Romadhony, Ade, additional, Vincentio, Karissa, additional, Santoso, Jennifer, additional, Moeljadi, David, additional, Wirawan, Cahya, additional, Hudi, Frederikus, additional, Wicaksono, Muhammad Satrio, additional, Parmonangan, Ivan, additional, Alfina, Ika, additional, Putra, Ilham Firdausi, additional, Rahmadani, Samsul, additional, Oenang, Yulianti, additional, Septiandri, Ali, additional, Jaya, James, additional, Dhole, Kaustubh, additional, Suryani, Arie, additional, Putri, Rifki Afina, additional, Su, Dan, additional, Stevens, Keith, additional, Nityasya, Made Nindyatama, additional, Adilazuarda, Muhammad, additional, Hadiwijaya, Ryan, additional, Diandaru, Ryandito, additional, Yu, Tiezheng, additional, Ghifari, Vito, additional, Dai, Wenliang, additional, Xu, Yan, additional, Damapuspita, Dyah, additional, Wibowo, Haryo, additional, Tho, Cuk, additional, Karo Karo, Ichwanul, additional, Fatyanosa, Tirana, additional, Ji, Ziwei, additional, Neubig, Graham, additional, Baldwin, Timothy, additional, Ruder, Sebastian, additional, Fung, Pascale, additional, Sujaini, Herry, additional, Sakti, Sakriani, additional, and Purwarianti, Ayu, additional
Published: 2023
Full Text: View/download PDF

33. ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation

Author: Lovenia, Holy, Cahyawijaya, Samuel, Winata, Genta Indra, Xu, Peng, Yan, Xu, Liu, Zihan, Frieske, Rita Maria, Yu, Tiezheng, Dai, Wenliang, Barezi, Elham J, Chen, Qifeng, Ma, Xiaojuan, Shi, Bertram Emil, Fung, Pascale Ngan, Lovenia, Holy, Cahyawijaya, Samuel, Winata, Genta Indra, Xu, Peng, Yan, Xu, Liu, Zihan, Frieske, Rita Maria, Yu, Tiezheng, Dai, Wenliang, Barezi, Elham J, Chen, Qifeng, Ma, Xiaojuan, Shi, Bertram Emil, and Fung, Pascale Ngan
Abstract: Code-switching is a speech phenomenon occurring when a speaker switches language during a conversation. Despite the spontaneous nature of code-switching in conversational spoken language, most existing works collect code-switching data from read speech instead of spontaneous speech. ASCEND (A Spontaneous Chinese-English Dataset) is a high-quality Mandarin Chinese-English code-switching corpus built on spontaneous multi-turn conversational dialogue sources collected in Hong Kong. We report ASCEND's design and procedure for collecting the speech data, including annotations. ASCEND consists of 10.62 hours of clean speech, collected from 23 bilingual speakers of Chinese and English. Furthermore, we conduct baseline experiments using pre-trained wav2vec 2.0 models, achieving a best performance of 22.69% character error rate and 27.05% mixed error rate. © European Language Resources Association (ELRA), licensed under CC-BY-NC-4.0.
Published: 2022

34. CI-AVSR: A Cantonese Audio-Visual Speech Datasetfor In-car Command Recognition

Author: Dai, Wenliang, Cahyawijaya, Samuel, Yu, Tiezheng, Jebalbarezi Sarbijan, Elham, Xu, Peng, Yiu, Cheuk Tung, Frieske, Rita Maria, Lovenia, Holy, Winata, Genta Indra, Chen, Qifeng, Ma, Xiaojuan, Shi, Bertram Emil, Fung, Pascale, Dai, Wenliang, Cahyawijaya, Samuel, Yu, Tiezheng, Jebalbarezi Sarbijan, Elham, Xu, Peng, Yiu, Cheuk Tung, Frieske, Rita Maria, Lovenia, Holy, Winata, Genta Indra, Chen, Qifeng, Ma, Xiaojuan, Shi, Bertram Emil, and Fung, Pascale
Abstract: With the rise of deep learning and intelligent vehicles, the smart assistant has become an essential in-car component to facilitate driving and provide extra functionalities. In-car smart assistants should be able to process general as well as car-related commands and perform corresponding actions, which eases driving and improves safety. However, there is a data scarcity issue for low resource languages, hindering the development of research and applications. In this paper, we introduce a new dataset, Cantonese In-car Audio-Visual Speech Recognition (CI-AVSR), for in-car command recognition in the Cantonese language with both video and audio data. It consists of 4,984 samples (8.3 hours) of 200 in-car commands recorded by 30 native Cantonese speakers. Furthermore, we augment our dataset using common in-car background noises to simulate real environments, producing a dataset 10 times larger than the collected one. We provide detailed statistics of both the clean and the augmented versions of our dataset. Moreover, we implement two multimodal baselines to demonstrate the validity of CI-AVSR. Experiment results show that leveraging the visual signal improves the overall performance of the model. Although our best model can achieve a considerable quality on the clean test set, the speech recognition quality on the noisy data is still inferior and remains an extremely challenging task for real in-car speech recognition systems. The dataset and code will be released at https://github.com/HLTCHKUST/CI-AVSR.
Published: 2022

35. Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation

Author: Dai, Wenliang, primary, Hou, Lu, additional, Shang, Lifeng, additional, Jiang, Xin, additional, Liu, Qun, additional, and Fung, Pascale, additional
Published: 2022
Full Text: View/download PDF

36. CrossNER: Evaluating Cross-Domain Named Entity Recognition

Author: Liu, Zihan, Xu, Yan, Yu, Tiezheng, Dai, Wenliang, Ji, Ziwei, Cahyawijaya, Samuel, Madotto, Andrea, Fung, Pascale Ngan, Liu, Zihan, Xu, Yan, Yu, Tiezheng, Dai, Wenliang, Ji, Ziwei, Cahyawijaya, Samuel, Madotto, Andrea, and Fung, Pascale Ngan
Abstract: Cross-domain named entity recognition (NER) models are able to cope with the scarcity issue of NER samples in target domains. However, most of the existing NER benchmarks lack domain-specialized entity types or do not focus on a certain domain, leading to a less effective cross-domain evaluation. To address these obstacles, we introduce a cross-domain NER dataset (CrossNER), a fully-labeled collection of NER data spanning over five diverse domains with specialized entity categories for different domains. Additionally, we also provide a domain-related corpus since using it to continue pre-training language models (domain-adaptive pre-training) is effective for the domain adaptation. We then conduct comprehensive experiments to explore the effectiveness of leveraging different levels of the domain corpus and pre-training strategies to do domain-adaptive pre-training for the crossdomain task. Results show that focusing on the fractional corpus containing domain-specialized entities and utilizing a more challenging pre-training strategy in domain-adaptive pre-training are beneficial for the NER domain adaptation, and our proposed method can consistently outperform existing cross-domain NER baselines. Nevertheless, experiments also illustrate the challenge of this cross-domain NER task. We hope that our dataset and baselines will catalyze research in the NER domain adaptation area. The code and data are available at https://github.com/zliucr/CrossNER.
Published: 2021

37. CrossNER: Evaluating Cross-Domain Named Entity Recognition

Author: Liu, Zihan, primary, Xu, Yan, additional, Yu, Tiezheng, additional, Dai, Wenliang, additional, Ji, Ziwei, additional, Cahyawijaya, Samuel, additional, Madotto, Andrea, additional, and Fung, Pascale, additional
Published: 2021
Full Text: View/download PDF

38. Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization

Author: Yu, Tiezheng, primary, Dai, Wenliang, additional, Liu, Zihan, additional, and Fung, Pascale, additional
Published: 2021
Full Text: View/download PDF

39. Multimodal End-to-End Sparse Model for Emotion Recognition

Author: Dai, Wenliang, primary, Cahyawijaya, Samuel, additional, Liu, Zihan, additional, and Fung, Pascale, additional
Published: 2021
Full Text: View/download PDF

40. Parameter extraction for on-chip interconnects by double-image green's function method combined with hierarchical algorithm

Author: Dai, Wenliang, Li, Zhengfan, and Mao, Junfa
Subjects: Algorithms, Algorithm, Business, Computers, Electronics, Electronics and electrical industries
Abstract: A novel double-image Green's function approach combined with the hierarchical algorithm is proposed to compute the frequency-dependent capacitance and conductance for the on-chip transmission lines and interconnects embedded in multiple Si[O.sub.2] layers of the general CMOS process. The effect of a protective layer and lossy silicon substrate layer of the CMOS process are considered in the four-layer structure deduced by the equivalent dielectric-constant approach whose adaptability is further proven in this paper. This double-image Green's function approach is fast convergent with the increasing order of reflections and transmissions, which is further accelerated by the hierarchical algorithm for computation of the Green's function rapidly. Moreover, the proposed method avoids the computation of bound charges on the dielectric interfaces. The frequency-dependent capacitance and conductance gained from the proposed method are shown to be in good agreement with the data obtained by other relevant methods. Index Terms--CMOS process, double-image Green's function method, equivalent dielectric-constant approach, frequency-dependent parameter extraction, hierarchical algorithm.
Published: 2005

41. Dimsum @LaySumm 20

Author: Yu, Tiezheng, primary, Su, Dan, additional, Dai, Wenliang, additional, and Fung, Pascale, additional
Published: 2020
Full Text: View/download PDF

42. Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-TaskLearning for Offensive Language Detection

Author: Dai, Wenliang, primary, Yu, Tiezheng, additional, Liu, Zihan, additional, and Fung, Pascale, additional
Published: 2020
Full Text: View/download PDF

43. Multi-hop Question Generation with Graph Convolutional Network

Author: Su, Dan, primary, Xu, Yan, additional, Dai, Wenliang, additional, Ji, Ziwei, additional, Yu, Tiezheng, additional, and Fung, Pascale, additional
Published: 2020
Full Text: View/download PDF

44. Comparative Study of Convolution and Order Reduction Techniques for Blackbox Macromodeling Using Scattering Parameters

Author: Schutt-Aine, José E., primary, Goh, Patrick, additional, Mekonnen, Yidnekachew, additional, Tan, Jilin, additional, Al-Hawari, Feras, additional, Liu, Ping, additional, and Dai, Wenliang, additional
Published: 2011
Full Text: View/download PDF

45. Partitioned Latency Insertion Method With a Generalized Stability Criteria

Author: Goh, Patrick, primary, Schutt-Aine, Jos E., additional, Klokotov, Dmitri, additional, Tan, Jilin, additional, Liu, Ping, additional, Dai, Wenliang, additional, and Al-Hawari, Feras, additional
Published: 2011
Full Text: View/download PDF

46. Multi-objective optimal allocation of distributed generation in smart grid

Author: Cui, Hong, primary and Dai, Wenliang, additional
Published: 2011
Full Text: View/download PDF

47. Partitioned latency insertion method (PLIM) with stability considerations

Author: Goh, Patrick, primary, Schutt-Aine, Jose E., additional, Klokotov, Dmitri, additional, Tan, Jilin, additional, Liu, Ping, additional, Dai, Wenliang, additional, and Al-Hawari, Feras, additional
Published: 2011
Full Text: View/download PDF

48. Teaching language models to see : building robust and versatile vision-language models

Author: Dai, Wenliang, primary
Full Text: View/download PDF

49. Application of the latency insertion method to circuits with blackbox macromodel representation

Author: Schutt-Aine, Jose, primary, Klokotov, Dmitri, additional, Goh, Patrick, additional, Tan, Jilin, additional, Al-Hawari, Feras, additional, Liu, Ping, additional, and Dai, Wenliang, additional
Published: 2009
Full Text: View/download PDF

50. Power supply analysis in package and SiP design

Author: Dai, Wenliang, primary
Published: 2009
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

116 results on '"Dai, Wenliang"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources