Author: "Tran, Son" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Tran, Son"' showing total 1,356 results

Start Over Author "Tran, Son"

1,356 results on '"Tran, Son"'

1. DreamBlend: Advancing Personalized Fine-tuning of Text-to-Image Diffusion Models

Author: Ram, Shwetha, Neiman, Tal, Feng, Qianli, Stuart, Andrew, Tran, Son, and Chilimbi, Trishul
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics, Computer Science - Machine Learning
Abstract: Given a small number of images of a subject, personalized image generation techniques can fine-tune large pre-trained text-to-image diffusion models to generate images of the subject in novel contexts, conditioned on text prompts. In doing so, a trade-off is made between prompt fidelity, subject fidelity and diversity. As the pre-trained model is fine-tuned, earlier checkpoints synthesize images with low subject fidelity but high prompt fidelity and diversity. In contrast, later checkpoints generate images with low prompt fidelity and diversity but high subject fidelity. This inherent trade-off limits the prompt fidelity, subject fidelity and diversity of generated images. In this work, we propose DreamBlend to combine the prompt fidelity from earlier checkpoints and the subject fidelity from later checkpoints during inference. We perform a cross attention guided image synthesis from a later checkpoint, guided by an image generated by an earlier checkpoint, for the same prompt. This enables generation of images with better subject fidelity, prompt fidelity and diversity on challenging prompts, outperforming state-of-the-art fine-tuning methods., Comment: Accepted to WACV 2025
Published: 2024

2. Towards Robust Extractive Question Answering Models: Rethinking the Training Methodology

Author: Tran, Son Quoc and Kretchmar, Matt
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: This paper proposes a novel training method to improve the robustness of Extractive Question Answering (EQA) models. Previous research has shown that existing models, when trained on EQA datasets that include unanswerable questions, demonstrate a significant lack of robustness against distribution shifts and adversarial attacks. Despite this, the inclusion of unanswerable questions in EQA training datasets is essential for ensuring real-world reliability. Our proposed training method includes a novel loss function for the EQA problem and challenges an implicit assumption present in numerous EQA datasets. Models trained with our method maintain in-domain performance while achieving a notable improvement on out-of-domain datasets. This results in an overall F1 score improvement of 5.7 across all testing sets. Furthermore, our models exhibit significantly enhanced robustness against two types of adversarial attacks, with a performance decrease of only about a third compared to the default models., Comment: EMNLP 2024 Findings
Published: 2024

3. Diffusion Models For Multi-Modal Generative Modeling

Author: Chen, Changyou, Ding, Han, Sisman, Bunyamin, Xu, Yi, Xie, Ouye, Yao, Benjamin Z., Tran, Son Dinh, and Zeng, Belinda
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Diffusion-based generative modeling has been achieving state-of-the-art results on various generation tasks. Most diffusion models, however, are limited to a single-generation modeling. Can we generalize diffusion models with the ability of multi-modal generative training for more generalizable modeling? In this paper, we propose a principled way to define a diffusion model by constructing a unified multi-modal diffusion model in a common diffusion space. We define the forward diffusion process to be driven by an information aggregation from multiple types of task-data, e.g., images for a generation task and labels for a classification task. In the reverse process, we enforce information sharing by parameterizing a shared backbone denoising network with additional modality-specific decoder heads. Such a structure can simultaneously learn to generate different types of multi-modal data with a multi-task loss, which is derived from a new multi-modal variational lower bound that generalizes the standard diffusion model. We propose several multimodal generation settings to verify our framework, including image transition, masked-image training, joint image-label and joint image-representation generative modeling. Extensive experimental results on ImageNet indicate the effectiveness of our framework for various multi-modal generative modeling, which we believe is an important research direction worthy of more future explorations., Comment: Published as a conference paper at ICLR 2024
Published: 2024

4. X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs

Author: Swetha, Sirnam, Yang, Jinyu, Neiman, Tal, Rizve, Mamshad Nayeem, Tran, Son, Yao, Benjamin, Chilimbi, Trishul, and Shah, Mubarak
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Computer Science - Multimedia
Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) have revolutionized the field of vision-language understanding by integrating visual perception capabilities into Large Language Models (LLMs). The prevailing trend in this field involves the utilization of a vision encoder derived from vision-language contrastive learning (CL), showing expertise in capturing overall representations while facing difficulties in capturing detailed local patterns. In this work, we focus on enhancing the visual representations for MLLMs by combining high-frequency and detailed visual representations, obtained through masked image modeling (MIM), with semantically-enriched low-frequency representations captured by CL. To achieve this goal, we introduce X-Former which is a lightweight transformer module designed to exploit the complementary strengths of CL and MIM through an innovative interaction mechanism. Specifically, X-Former first bootstraps vision-language representation learning and multimodal-to-multimodal generative learning from two frozen vision encoders, i.e., CLIP-ViT (CL-based) and MAE-ViT (MIM-based). It further bootstraps vision-to-language generative learning from a frozen LLM to ensure visual features from X-Former can be interpreted by the LLM. To demonstrate the effectiveness of our approach, we assess its performance on tasks demanding detailed visual understanding. Extensive evaluations indicate that X-Former excels in visual reasoning tasks involving both structural and semantic categories in the GQA dataset. Assessment on fine-grained visual perception benchmark further confirms its superior capabilities in visual understanding., Comment: Accepted at ECCV2024
Published: 2024

5. Open Vocabulary Multi-Label Video Classification

Author: Gupta, Rohit, Rizve, Mamshad Nayeem, Unnikrishnan, Jayakrishnan, Tawari, Ashish, Tran, Son, Shah, Mubarak, Yao, Benjamin, and Chilimbi, Trishul
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Pre-trained vision-language models (VLMs) have enabled significant progress in open vocabulary computer vision tasks such as image classification, object detection and image segmentation. Some recent works have focused on extending VLMs to open vocabulary single label action classification in videos. However, previous methods fall short in holistic video understanding which requires the ability to simultaneously recognize multiple actions and entities e.g., objects in the video in an open vocabulary setting. We formulate this problem as open vocabulary multilabel video classification and propose a method to adapt a pre-trained VLM such as CLIP to solve this task. We leverage large language models (LLMs) to provide semantic guidance to the VLM about class labels to improve its open vocabulary performance with two key contributions. First, we propose an end-to-end trainable architecture that learns to prompt an LLM to generate soft attributes for the CLIP text-encoder to enable it to recognize novel classes. Second, we integrate a temporal modeling module into CLIP's vision encoder to effectively model the spatio-temporal dynamics of video concepts as well as propose a novel regularized finetuning technique to ensure strong open vocabulary classification performance in the video domain. Our extensive experimentation showcases the efficacy of our approach on multiple benchmark datasets., Comment: Accepted at ECCV 2024
Published: 2024

6. VLUE: A New Benchmark and Multi-task Knowledge Transfer Learning for Vietnamese Natural Language Understanding

Author: Do, Phong Nguyen-Thuan, Tran, Son Quoc, Hoang, Phu Gia, Van Nguyen, Kiet, and Nguyen, Ngan Luu-Thuy
Subjects: Computer Science - Computation and Language
Abstract: The success of Natural Language Understanding (NLU) benchmarks in various languages, such as GLUE for English, CLUE for Chinese, KLUE for Korean, and IndoNLU for Indonesian, has facilitated the evaluation of new NLU models across a wide range of tasks. To establish a standardized set of benchmarks for Vietnamese NLU, we introduce the first Vietnamese Language Understanding Evaluation (VLUE) benchmark. The VLUE benchmark encompasses five datasets covering different NLU tasks, including text classification, span extraction, and natural language understanding. To provide an insightful overview of the current state of Vietnamese NLU, we then evaluate seven state-of-the-art pre-trained models, including both multilingual and Vietnamese monolingual models, on our proposed VLUE benchmark. Furthermore, we present CafeBERT, a new state-of-the-art pre-trained model that achieves superior results across all tasks in the VLUE benchmark. Our model combines the proficiency of a multilingual pre-trained model with Vietnamese linguistic knowledge. CafeBERT is developed based on the XLM-RoBERTa model, with an additional pretraining step utilizing a significant amount of Vietnamese textual data to enhance its adaptation to the Vietnamese language. For the purpose of future research, CafeBERT is made publicly available for research purposes., Comment: Accepted at NAACL 2024 (Findings)
Published: 2024

7. VidLA: Video-Language Alignment at Scale

Author: Rizve, Mamshad Nayeem, Fei, Fan, Unnikrishnan, Jayakrishnan, Tran, Son, Yao, Benjamin Z., Zeng, Belinda, Shah, Mubarak, and Chilimbi, Trishul
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: In this paper, we propose VidLA, an approach for video-language alignment at scale. There are two major limitations of previous video-language alignment approaches. First, they do not capture both short-range and long-range temporal dependencies and typically employ complex hierarchical deep network architectures that are hard to integrate with existing pretrained image-text foundation models. To effectively address this limitation, we instead keep the network architecture simple and use a set of data tokens that operate at different temporal resolutions in a hierarchical manner, accounting for the temporally hierarchical nature of videos. By employing a simple two-tower architecture, we are able to initialize our video-language model with pretrained image-text foundation models, thereby boosting the final performance. Second, existing video-language alignment works struggle due to the lack of semantically aligned large-scale training data. To overcome it, we leverage recent LLMs to curate the largest video-language dataset to date with better visual grounding. Furthermore, unlike existing video-text datasets which only contain short clips, our dataset is enriched with video clips of varying durations to aid our temporally hierarchical data tokens in extracting better representations at varying temporal scales. Overall, empirical results show that our proposed approach surpasses state-of-the-art methods on multiple retrieval benchmarks, especially on longer videos, and performs competitively on classification benchmarks., Comment: Accepted to CVPR 2024
Published: 2024

8. Enhance Statistical Features with Changepoint Detection for Driver Behaviour Analysis

Author: Maktoubian, Jamal, Tran, Son N., Shillabeer, Anna, Amin, Muhammad Bilal, Sambrooks, Lawrence, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Hadfi, Rafik, editor, Anthony, Patricia, editor, Sharma, Alok, editor, Ito, Takayuki, editor, and Bai, Quan, editor
Published: 2025
Full Text: View/download PDF

9. -Efp: Bridging Efficiency in Multi-agent Epistemic Planning with Heuristics

Author: Fabiano, Francesco, Platt, Theoderic, Tran, Son, Pontelli, Enrico, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Arisaka, Ryuta, editor, Sanchez-Anguix, Victor, editor, Stein, Sebastian, editor, Aydoğan, Reyhan, editor, van der Torre, Leon, editor, and Ito, Takayuki, editor
Published: 2025
Full Text: View/download PDF

10. Open Vocabulary Multi-label Video Classification

Author: Gupta, Rohit, Rizve, Mamshad Nayeem, Unnikrishnan, Jayakrishnan, Tawari, Ashish, Tran, Son, Shah, Mubarak, Yao, Benjamin, Chilimbi, Trishul, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
Published: 2025
Full Text: View/download PDF

11. X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs

Author: Sirnam, Swetha, Yang, Jinyu, Neiman, Tal, Rizve, Mamshad Nayeem, Tran, Son, Yao, Benjamin, Chilimbi, Trishul, Shah, Mubarak, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
Published: 2025
Full Text: View/download PDF

12. Deep Learning for Plant Identification and Disease Classification from Leaf Images: Multi-prediction Approaches

Author: Yao, Jianping, Tran, Son N., Garg, Saurabh, and Sawyer, Samantha
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Deep learning plays an important role in modern agriculture, especially in plant pathology using leaf images where convolutional neural networks (CNN) are attracting a lot of attention. While numerous reviews have explored the applications of deep learning within this research domain, there remains a notable absence of an empirical study to offer insightful comparisons due to the employment of varied datasets in the evaluation. Furthermore, a majority of these approaches tend to address the problem as a singular prediction task, overlooking the multifaceted nature of predicting various aspects of plant species and disease types. Lastly, there is an evident need for a more profound consideration of the semantic relationships that underlie plant species and disease types. In this paper, we start our study by surveying current deep learning approaches for plant identification and disease classification. We categorise the approaches into multi-model, multi-label, multi-output, and multi-task, in which different backbone CNNs can be employed. Furthermore, based on the survey of existing approaches in plant pathology and the study of available approaches in machine learning, we propose a new model named Generalised Stacking Multi-output CNN (GSMo-CNN). To investigate the effectiveness of different backbone CNNs and learning approaches, we conduct an intensive experiment on three benchmark datasets Plant Village, Plant Leaves, and PlantDoc. The experimental results demonstrate that InceptionV3 can be a good choice for a backbone CNN as its performance is better than AlexNet, VGG16, ResNet101, EfficientNet, MobileNet, and a custom CNN developed by us. Interestingly, empirical results support the hypothesis that using a single model can be comparable or better than using two models. Finally, we show that the proposed GSMo-CNN achieves state-of-the-art performance on three benchmark datasets., Comment: Jianping and Son are joint first authors (equal contribution)
Published: 2023

13. Machine Learning for Leaf Disease Classification: Data, Techniques and Applications

Author: Yao, Jianping, Tran, Son N., Sawyer, Samantha, and Garg, Saurabh
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The growing demand for sustainable development brings a series of information technologies to help agriculture production. Especially, the emergence of machine learning applications, a branch of artificial intelligence, has shown multiple breakthroughs which can enhance and revolutionize plant pathology approaches. In recent years, machine learning has been adopted for leaf disease classification in both academic research and industrial applications. Therefore, it is enormously beneficial for researchers, engineers, managers, and entrepreneurs to have a comprehensive view about the recent development of machine learning technologies and applications for leaf disease detection. This study will provide a survey in different aspects of the topic including data, techniques, and applications. The paper will start with publicly available datasets. After that, we summarize common machine learning techniques, including traditional (shallow) learning, deep learning, and augmented learning. Finally, we discuss related applications. This paper would provide useful resources for future study and application of machine learning for smart agriculture in general and leaf disease classification in particular.
Published: 2023
Full Text: View/download PDF

14. Does septic arthritis after anterior cruciate ligament reconstruction lead to poor outcomes? A systematic review and meta-analysis of observational studies

Author: Lin, Ashleigh Peng, Nguyen, Bao Tu Thai, Tran, Son Quang, Kuo, Yi-Jie, Huang, Shu-Wei, and Chen, Yu-Pin
Published: 2024
Full Text: View/download PDF

15. Study on the temporal and spatial distribution of Culex mosquitoes in Hanoi, Vietnam

Author: Krambrich, Janina, Nguyen-Tien, Thang, Pham-Thanh, Long, Dang-Xuan, Sinh, Andersson, Ella, Höller, Patrick, Vu, Duoc Trong, Tran, Son Hai, Vu, Lieu Thi, Akaberi, Dario, Ling, Jiaxin, Pettersson, John H.-O., Hesson, Jenny C., Lindahl, Johanna F., and Lundkvist, Åke
Published: 2024
Full Text: View/download PDF

16. Determining the factors impacting the quality of life among the general population in coastal communities in central Vietnam

Author: Nguyen, Gia Thanh, Tran, Thang Binh, Le, Duong Dinh, Nguyen, Tu Minh, Van Nguyen, Hiep, Ho, Phuong Uyen, Van Tran, Son, Thuy, Linh Nguyen Hoang, Tran, Trung Dinh, Phan, Long Thanh, Anh, Thu Dang Thi, and Watanabe, Toru
Published: 2024
Full Text: View/download PDF

17. AGent: A Novel Pipeline for Automatically Creating Unanswerable Questions

Author: Tran, Son Quoc, Do, Gia-Huy, Do, Phong Nguyen-Thuan, Kretchmar, Matt, and Du, Xinya
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: The development of large high-quality datasets and high-performing models have led to significant advancements in the domain of Extractive Question Answering (EQA). This progress has sparked considerable interest in exploring unanswerable questions within the EQA domain. Training EQA models with unanswerable questions helps them avoid extracting misleading or incorrect answers for queries that lack valid responses. However, manually annotating unanswerable questions is labor-intensive. To address this, we propose AGent, a novel pipeline that automatically creates new unanswerable questions by re-matching a question with a context that lacks the necessary information for a correct answer. In this paper, we demonstrate the usefulness of this AGent pipeline by creating two sets of unanswerable questions from answerable questions in SQuAD and HotpotQA. These created question sets exhibit low error rates. Additionally, models fine-tuned on these questions show comparable performance with those fine-tuned on the SQuAD 2.0 dataset on multiple EQA benchmarks., Comment: 16 pages, 10 tables, 3 figures
Published: 2023

18. UnsMOT: Unified Framework for Unsupervised Multi-Object Tracking with Geometric Topology Guidance

Author: Tran, Son, Tran, Cong, Tran, Anh, and Pham, Cuong
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Object detection has long been a topic of high interest in computer vision literature. Motivated by the fact that annotating data for the multi-object tracking (MOT) problem is immensely expensive, recent studies have turned their attention to the unsupervised learning setting. In this paper, we push forward the state-of-the-art performance of unsupervised MOT methods by proposing UnsMOT, a novel framework that explicitly combines the appearance and motion features of objects with geometric information to provide more accurate tracking. Specifically, we first extract the appearance and motion features using CNN and RNN models, respectively. Then, we construct a graph of objects based on their relative distances in a frame, which is fed into a GNN model together with CNN features to output geometric embedding of objects optimized using an unsupervised loss function. Finally, associations between objects are found by matching not only similar extracted features but also geometric embedding of detections and tracklets. Experimental results show remarkable performance in terms of HOTA, IDF1, and MOTA metrics in comparison with state-of-the-art methods.
Published: 2023

19. Single-Sentence Reader: A Novel Approach for Addressing Answer Position Bias

Author: Tran, Son Quoc and Kretchmar, Matt
Subjects: Computer Science - Computation and Language
Abstract: Machine Reading Comprehension (MRC) models tend to take advantage of spurious correlations (also known as dataset bias or annotation artifacts in the research community). Consequently, these models may perform the MRC task without fully comprehending the given context and question, which is undesirable since it may result in low robustness against distribution shift. The main focus of this paper is answer-position bias, where a significant percentage of training questions have answers located solely in the first sentence of the context. We propose a Single-Sentence Reader as a new approach for addressing answer position bias in MRC. Remarkably, in our experiments with six different models, our proposed Single-Sentence Readers trained on biased dataset achieve results that nearly match those of models trained on normal dataset, proving their effectiveness in addressing the answer position bias. Our study also discusses several challenges our Single-Sentence Readers encounter and proposes a potential solution., Comment: 10 pages, 5 tables, 2 figures
Published: 2023

20. SurveyLM: A platform to explore emerging value perspectives in augmented language models' behaviors

Author: Bickley, Steve J., Chan, Ho Fai, Dao, Bang, Torgler, Benno, and Tran, Son
Subjects: Computer Science - Artificial Intelligence, Computer Science - Social and Information Networks, Economics - General Economics
Abstract: This white paper presents our work on SurveyLM, a platform for analyzing augmented language models' (ALMs) emergent alignment behaviors through their dynamically evolving attitude and value perspectives in complex social contexts. Social Artificial Intelligence (AI) systems, like ALMs, often function within nuanced social scenarios where there is no singular correct response, or where an answer is heavily dependent on contextual factors, thus necessitating an in-depth understanding of their alignment dynamics. To address this, we apply survey and experimental methodologies, traditionally used in studying social behaviors, to evaluate ALMs systematically, thus providing unprecedented insights into their alignment and emergent behaviors. Moreover, the SurveyLM platform leverages the ALMs' own feedback to enhance survey and experiment designs, exploiting an underutilized aspect of ALMs, which accelerates the development and testing of high-quality survey frameworks while conserving resources. Through SurveyLM, we aim to shed light on factors influencing ALMs' emergent behaviors, facilitate their alignment with human intentions and expectations, and thereby contributed to the responsible development and deployment of advanced social AI systems. This white paper underscores the platform's potential to deliver robust results, highlighting its significance to alignment research and its implications for future social AI systems., Comment: 8 pages, 1 figure
Published: 2023

21. USING BLOOKET SOFTWARE IN TEACHING READING COMPREHENSION IN HIGH SCHOOLS

Author: Lam Tran Son Ngoc Thien Chuong, Truong Hoang Han
Subjects: blooket, teaching by blooket, literature, teaching methods, application of information technology, Technology, Social sciences (General), H1-99
Abstract: In the context of the 4.0 technological revolution, the use of software in teaching in general and Literature teaching in particular is an urgent need. The 2018 General Education Program aims to comprehensively develop students' abilities and qualities, including information technology capabilities. This research aims to promote initiative, independence, and creativity in learners through the use of Blooket software in teaching reading comprehension in high schools. Activities that can use Blooket software for teaching are also proposed. The results show that using Blooket in teaching reading comprehension activities has stimulated learners' interest. The research results will be a helpful reference for Literature teachers in the teaching process.
Published: 2024
Full Text: View/download PDF

22. Revealing Weaknesses of Vietnamese Language Models Through Unanswerable Questions in Machine Reading Comprehension

Author: Tran, Son Quoc, Do, Phong Nguyen-Thuan, Van Nguyen, Kiet, and Nguyen, Ngan Luu-Thuy
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Although the curse of multilinguality significantly restricts the language abilities of multilingual models in monolingual settings, researchers now still have to rely on multilingual models to develop state-of-the-art systems in Vietnamese Machine Reading Comprehension. This difficulty in researching is because of the limited number of high-quality works in developing Vietnamese language models. In order to encourage more work in this research field, we present a comprehensive analysis of language weaknesses and strengths of current Vietnamese monolingual models using the downstream task of Machine Reading Comprehension. From the analysis results, we suggest new directions for developing Vietnamese language models. Besides this main contribution, we also successfully reveal the existence of artifacts in Vietnamese Machine Reading Comprehension benchmarks and suggest an urgent need for new high-quality benchmarks to track the progress of Vietnamese Machine Reading Comprehension. Moreover, we also introduced a minor but valuable modification to the process of annotating unanswerable questions for Machine Reading Comprehension from previous work. Our proposed modification helps improve the quality of unanswerable questions to a higher level of difficulty for Machine Reading Comprehension systems to solve., Comment: Accepted at The 2023 EACL Student Research Workshop
Published: 2023

23. Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning

Author: Jiang, Qian, Chen, Changyou, Zhao, Han, Chen, Liqun, Ping, Qing, Tran, Son Dinh, Xu, Yi, Zeng, Belinda, and Chilimbi, Trishul
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Contrastive loss has been increasingly used in learning representations from multiple modalities. In the limit, the nature of the contrastive loss encourages modalities to exactly match each other in the latent space. Yet it remains an open question how the modality alignment affects the downstream task performance. In this paper, based on an information-theoretic argument, we first prove that exact modality alignment is sub-optimal in general for downstream prediction tasks. Hence we advocate that the key of better performance lies in meaningful latent modality structures instead of perfect modality alignment. To this end, we propose three general approaches to construct latent modality structures. Specifically, we design 1) a deep feature separation loss for intra-modality regularization; 2) a Brownian-bridge loss for inter-modality regularization; and 3) a geometric consistency loss for both intra- and inter-modality regularization. Extensive experiments are conducted on two popular multi-modal representation learning frameworks: the CLIP-based two-tower model and the ALBEF-based fusion model. We test our model on a variety of tasks including zero/few-shot image classification, image-text retrieval, visual question answering, visual reasoning, and visual entailment. Our method achieves consistent improvements over existing methods, demonstrating the effectiveness and generalizability of our proposed approach on latent modality structure regularization., Comment: 14 pages, 8 figure, CVPR 2023 accepted
Published: 2023

24. The Impacts of Unanswerable Questions on the Robustness of Machine Reading Comprehension Models

Author: Tran, Son Quoc, Do, Phong Nguyen-Thuan, Le, Uyen, and Kretchmar, Matt
Subjects: Computer Science - Artificial Intelligence
Abstract: Pretrained language models have achieved super-human performances on many Machine Reading Comprehension (MRC) benchmarks. Nevertheless, their relative inability to defend against adversarial attacks has spurred skepticism about their natural language understanding. In this paper, we ask whether training with unanswerable questions in SQuAD 2.0 can help improve the robustness of MRC models against adversarial attacks. To explore that question, we fine-tune three state-of-the-art language models on either SQuAD 1.1 or SQuAD 2.0 and then evaluate their robustness under adversarial attacks. Our experiments reveal that current models fine-tuned on SQuAD 2.0 do not initially appear to be any more robust than ones fine-tuned on SQuAD 1.1, yet they reveal a measure of hidden robustness that can be leveraged to realize actual performance gains. Furthermore, we find that the robustness of models fine-tuned on SQuAD 2.0 extends to additional out-of-domain datasets. Finally, we introduce a new adversarial attack to reveal artifacts of SQuAD 2.0 that current MRC models are learning., Comment: Accepted atThe 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023)
Published: 2023

25. Rapid-Motion-Track: Markerless Tracking of Fast Human Motion with Deeper Learning

Author: Li, Renjie, Lao, Chun Yu, George, Rebecca St., Lawler, Katherine, Garg, Saurabh, Tran, Son N., Bai, Quan, and Alty, Jane
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Objective The coordination of human movement directly reflects function of the central nervous system. Small deficits in movement are often the first sign of an underlying neurological problem. The objective of this research is to develop a new end-to-end, deep learning-based system, Rapid-Motion-Track (RMT) that can track the fastest human movement accurately when webcams or laptop cameras are used. Materials and Methods We applied RMT to finger tapping, a well-validated test of motor control that is one of the most challenging human motions to track with computer vision due to the small keypoints of digits and the high velocities that are generated. We recorded 160 finger tapping assessments simultaneously with a standard 2D laptop camera (30 frames/sec) and a high-speed wearable sensor-based 3D motion tracking system (250 frames/sec). RMT and a range of DLC models were applied to the video data with tapping frequencies up to 8Hz to extract movement features. Results The movement features (e.g. speed, rhythm, variance) identified with the new RMT system exhibited very high concurrent validity with the gold-standard measurements (97.3\% of RMT measures were within +/-0.5Hz of the Optotrak measures), and outperformed DLC and other advanced computer vision tools (around 88.2\% of DLC measures were within +/-0.5Hz of the Optotrak measures). RMT also accurately tracked a range of other rapid human movements such as foot tapping, head turning and sit-to -stand movements. Conclusion: With the ubiquity of video technology in smart devices, the RMT method holds potential to transform access and accuracy of human movement assessment.
Published: 2023

26. Sustainable Cold Storage Warehouse Site Selection in Vietnam: A TOPSIS Approach

Author: Le, Quoc An, Bui, Duy Xuan Bao, Ho, Dat Tan, Nguyen, Thinh Huu, Tran, Son Nam, Do, Ngoc-Hien, Nguyen, Duc Duy, Chlamtac, Imrich, Series Editor, Hai, Nguyen Thanh, editor, Huy, Nguyen Xuan, editor, Amine, Khalil, editor, and Lam, Tran Dai, editor
Published: 2024
Full Text: View/download PDF

27. Alternative Nature-Inspired Optimizers: An Attempt to Solve the Coverage and Connectivity Problem in Wireless Sensor Network Deployment

Author: Tran, Son, Phan, Duc Manh, Vu, Huy Nhat Minh, Hoang, Anh, Hoang, Duc Chinh, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Nguyen, Thi Dieu Linh, editor, Dawson, Maurice, editor, Ngoc, Le Anh, editor, and Lam, Kwok Yan, editor
Published: 2024
Full Text: View/download PDF

28. Bayesian Optimization of 2D Echocardiography Segmentation

Author: Tran, Son-Tung, Stough, Joshua V., Zhang, Xiaoyan, and Haggerty, Christopher M.
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition, Mathematics - Optimization and Control
Abstract: Bayesian Optimization (BO) is a well-studied hyperparameter tuning technique that is more efficient than grid search for high-cost, high-parameter machine learning problems. Echocardiography is a ubiquitous modality for evaluating heart structure and function in cardiology. In this work, we use BO to optimize the architectural and training-related hyperparameters of a previously published deep fully convolutional neural network model for multi-structure segmentation in echocardiography. In a fair comparison, the resulting model outperforms this recent state-of-the-art on the annotated CAMUS dataset in both apical two- and four-chamber echo views. We report mean Dice overlaps of 0.95, 0.96, and 0.93 on left ventricular (LV) endocardium, LV epicardium, and left atrium respectively. We also observe significant improvement in derived clinical indices, including smaller median absolute errors for LV end-diastolic volume (4.9mL vs. 6.7), end-systolic volume (3.1mL vs. 5.2), and ejection fraction (2.6% vs. 3.7); and much tighter limits of agreement, which were already within inter-rater variability for non-contrast echo. These results demonstrate the benefits of BO for echocardiography segmentation over a recent state-of-the-art framework, although validation using large-scale independent clinical data is required.
Published: 2022
Full Text: View/download PDF

29. Deep cross-domain transfer for emotion recognition via joint learning

Author: Nguyen, Dung, Nguyen, Duc Thanh, Sridharan, Sridha, Abdelrazek, Mohamed, Denman, Simon, Tran, Son N., Zeng, Rui, and Fookes, Clinton
Published: 2024
Full Text: View/download PDF

30. Zoonotic flavivirus exposure in peri-urban and suburban pig-keeping in Hanoi, Vietnam, and the knowledge and preventive practices of pig farmers

Author: Pham-Thanh, Long, Nguyen-Tien, Thang, Magnusson, Ulf, Bui, Vuong Nghia, Bui, Anh Ngoc, Lundkvist, Ake, Vu, Duoc Trong, Tran, Son Hai, Can, Minh Xuan, Nguyen-Viet, Hung, and Lindahl, Johanna F
Published: 2022

31. A Comprehensive Review on Deep Supervision: Theories and Applications

Author: Li, Renjie, Wang, Xinyi, Huang, Guan, Yang, Wenli, Zhang, Kaining, Gu, Xiaotong, Tran, Son N., Garg, Saurabh, Alty, Jane, and Bai, Quan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Deep supervision, or known as 'intermediate supervision' or 'auxiliary supervision', is to add supervision at hidden layers of a neural network. This technique has been increasingly applied in deep neural network learning systems for various computer vision applications recently. There is a consensus that deep supervision helps improve neural network performance by alleviating the gradient vanishing problem, as one of the many strengths of deep supervision. Besides, in different computer vision applications, deep supervision can be applied in different ways. How to make the most use of deep supervision to improve network performance in different applications has not been thoroughly investigated. In this paper, we provide a comprehensive in-depth review of deep supervision in both theories and applications. We propose a new classification of different deep supervision networks, and discuss advantages and limitations of current deep supervision networks in computer vision applications.
Published: 2022

32. Credit booms and bank risk in Southeast Asian countries: does credit information sharing matter?

Author: Tran, Son, Nguyen, Dat, Nguyen, Khuong, and Nguyen, Liem
Published: 2024
Full Text: View/download PDF

33. VLSP 2021 - ViMRC Challenge: Vietnamese Machine Reading Comprehension

Author: Van Nguyen, Kiet, Tran, Son Quoc, Nguyen, Luan Thanh, Van Huynh, Tin, Luu, Son T., and Nguyen, Ngan Luu-Thuy
Subjects: Computer Science - Computation and Language
Abstract: One of the emerging research trends in natural language understanding is machine reading comprehension (MRC) which is the task to find answers to human questions based on textual data. Existing Vietnamese datasets for MRC research concentrate solely on answerable questions. However, in reality, questions can be unanswerable for which the correct answer is not stated in the given textual data. To address the weakness, we provide the research community with a benchmark dataset named UIT-ViQuAD 2.0 for evaluating the MRC task and question answering systems for the Vietnamese language. We use UIT-ViQuAD 2.0 as a benchmark dataset for the challenge on Vietnamese MRC at the Eighth Workshop on Vietnamese Language and Speech Processing (VLSP 2021). This task attracted 77 participant teams from 34 universities and other organizations. In this article, we present details of the organization of the challenge, an overview of the methods employed by shared-task participants, and the results. The highest performances are 77.24% in F1-score and 67.43% in Exact Match on the private test set. The Vietnamese MRC systems proposed by the top 3 teams use XLM-RoBERTa, a powerful pre-trained language model based on the transformer architecture. The UIT-ViQuAD 2.0 dataset motivates researchers to further explore the Vietnamese machine reading comprehension task and related tasks such as question answering, question generation, and natural language inference., Comment: The 8th International Workshop on Vietnamese Language and Speech Processing (VLSP 2021)
Published: 2022
Full Text: View/download PDF

34. Multi-modal Alignment using Representation Codebook

Author: Duan, Jiali, Chen, Liqun, Tran, Son, Yang, Jinyu, Xu, Yi, Zeng, Belinda, and Chilimbi, Trishul
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Aligning signals from different modalities is an important step in vision-language representation learning as it affects the performance of later stages such as cross-modality fusion. Since image and text typically reside in different regions of the feature space, directly aligning them at instance level is challenging especially when features are still evolving during training. In this paper, we propose to align at a higher and more stable level using cluster representation. Specifically, we treat image and text as two "views" of the same entity, and encode them into a joint vision-language coding space spanned by a dictionary of cluster centers (codebook). We contrast positive and negative samples via their cluster assignments while simultaneously optimizing the cluster centers. To further smooth out the learning process, we adopt a teacher-student distillation paradigm, where the momentum teacher of one view guides the student learning of the other. We evaluated our approach on common vision language benchmarks and obtain new SoTA on zero-shot cross modality retrieval while being competitive on various other transfer tasks., Comment: Accepted by CVPR 2022
Published: 2022

35. Vision-Language Pre-Training with Triple Contrastive Learning

Author: Yang, Jinyu, Duan, Jiali, Tran, Son, Xu, Yi, Chanda, Sampath, Chen, Liqun, Zeng, Belinda, Chilimbi, Trishul, and Huang, Junzhou
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Vision-language representation learning largely benefits from image-text alignment through contrastive losses (e.g., InfoNCE loss). The success of this alignment strategy is attributed to its capability in maximizing the mutual information (MI) between an image and its matched text. However, simply performing cross-modal alignment (CMA) ignores data potential within each modality, which may result in degraded representations. For instance, although CMA-based models are able to map image-text pairs close together in the embedding space, they fail to ensure that similar inputs from the same modality stay close by. This problem can get even worse when the pre-training data is noisy. In this paper, we propose triple contrastive learning (TCL) for vision-language pre-training by leveraging both cross-modal and intra-modal self-supervision. Besides CMA, TCL introduces an intra-modal contrastive objective to provide complementary benefits in representation learning. To take advantage of localized and structural information from image and text input, TCL further maximizes the average MI between local regions of image/text and their global summary. To the best of our knowledge, ours is the first work that takes into account local structure information for multi-modality representation learning. Experimental evaluations show that our approach is competitive and achieves the new state of the art on various common down-stream vision-language tasks such as image-text retrieval and visual question answering., Comment: CVPR 2022; code: https://github.com/uta-smile/TCL
Published: 2022

36. Parallel Multi-Scale Networks with Deep Supervision for Hand Keypoint Detection

Author: Li, Renjie, Tran, Son, Garg, Saurabh, Lawler, Katherine, Alty, Jane, and Bai, Quan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Keypoint detection plays an important role in a wide range of applications. However, predicting keypoints of small objects such as human hands is a challenging problem. Recent works fuse feature maps of deep Convolutional Neural Networks (CNNs), either via multi-level feature integration or multi-resolution aggregation. Despite achieving some success, the feature fusion approaches increase the complexity and the opacity of CNNs. To address this issue, we propose a novel CNN model named Multi-Scale Deep Supervision Network (P-MSDSNet) that learns feature maps at different scales with deep supervisions to produce attention maps for adaptive feature propagation from layers to layers. P-MSDSNet has a multi-stage architecture which makes it scalable while its deep supervision with spatial attention improves transparency to the feature learning at each stage. We show that P-MSDSNet outperforms the state-of-the-art approaches on benchmark datasets while requiring fewer number of parameters. We also show the application of P-MSDSNet to quantify finger tapping hand movements in a neuroscience study.
Published: 2021

37. Logical Boltzmann Machines

Author: Tran, Son N. and Garcez, Artur d'Avila
Subjects: Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Logic in Computer Science, 68T01, I.2.4, I.2.6, I.2.11
Abstract: The idea of representing symbolic knowledge in connectionist systems has been a long-standing endeavour which has attracted much attention recently with the objective of combining machine learning and scalable sound reasoning. Early work has shown a correspondence between propositional logic and symmetrical neural networks which nevertheless did not scale well with the number of variables and whose training regime was inefficient. In this paper, we introduce Logical Boltzmann Machines (LBM), a neurosymbolic system that can represent any propositional logic formula in strict disjunctive normal form. We prove equivalence between energy minimization in LBM and logical satisfiability thus showing that LBM is capable of sound reasoning. We evaluate reasoning empirically to show that LBM is capable of finding all satisfying assignments of a class of logical formulae by searching fewer than 0.75% of the possible (approximately 1 billion) assignments. We compare learning in LBM with a symbolic inductive logic programming system, a state-of-the-art neurosymbolic system and a purely neural network-based system, achieving better learning performance in five out of seven data sets., Comment: 15 pages, 5 figures, 2 tables
Published: 2021

38. Hand gesture detection in tests performed by older adults

Author: Huang, Guan, Tran, Son N., Bai, Quan, and Alty, Jane
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Our team are developing a new online test that analyses hand movement features associated with ageing that can be completed remotely from the research centre. To obtain hand movement features, participants will be asked to perform a variety of hand gestures using their own computer cameras. However, it is challenging to collect high quality hand movement video data, especially for older participants, many of whom have no IT background. During the data collection process, one of the key steps is to detect whether the participants are following the test instructions correctly and also to detect similar gestures from different devices. Furthermore, we need this process to be automated and accurate as we expect many thousands of participants to complete the test. We have implemented a hand gesture detector to detect the gestures in the hand movement tests and our detection mAP is 0.782 which is better than the state-of-the-art. In this research, we have processed 20,000 images collected from hand movement tests and labelled 6,450 images to detect different hand gestures in the hand movement tests. This paper has the following three contributions. Firstly, we compared and analysed the performance of different network structures for hand gesture detection. Secondly, we have made many attempts to improve the accuracy of the model and have succeeded in improving the classification accuracy for similar gestures by implementing attention layers. Thirdly, we have created two datasets and included 20 percent of blurred images in the dataset to investigate how different network structures were impacted by noisy data, our experiments have also shown our network has better performance on the noisy dataset.
Published: 2021
Full Text: View/download PDF

39. Photodynamic treatment increases the lifespan and oxidative stress resistance of Caenorhabditis elegans

Author: Nguyen, Uyen Tran Tu, Youn, Esther, Le, Tram Anh Ngoc, Ha, Ngoc Minh, Tran, Son Hung, Lee, Sohyun, Cha, Jin Wook, Park, Jin-Soo, Kwon, Hak Cheol, and Kang, Kyungsu
Published: 2024
Full Text: View/download PDF

40. Deep Learning-Based Identification of Intraocular Pressure-Associated Genes Influencing Trabecular Meshwork Cell Morphology

Author: Greatbatch, Connor J., Lu, Qinyi, Hung, Sandy, Tran, Son N., Wing, Kristof, Liang, Helena, Han, Xikun, Zhou, Tiger, Siggs, Owen M., Mackey, David A., Liu, Guei-Sheung, Cook, Anthony L., Powell, Joseph E., Craig, Jamie E., MacGregor, Stuart, and Hewitt, Alex W.
Published: 2024
Full Text: View/download PDF

41. Taking Cognition Seriously: A generalised physics of cognition

Author: Taylor, Sophie Alyx, Tran, Son Cao, and Nicolau Jr, Dan V.
Subjects: Quantitative Biology - Neurons and Cognition, Computer Science - Artificial Intelligence, Mathematics - Category Theory, Physics - Biological Physics
Abstract: The study of complex systems through the lens of category theory consistently proves to be a powerful approach. We propose that cognition deserves the same category-theoretic treatment. We show that by considering a highly-compact cognitive system, there are fundamental physical trade-offs resulting in a utility problem. We then examine how to do this systematically, and propose some requirements for "cognitive categories", before investigating the phenomenona of topological defects in gauge fields over conceptual spaces.
Published: 2021

42. Prognostic value of in-hospital and 6-month mortality after acute coronary syndrome using GRACE, TIMI, and HEART scores

Author: Tran, An Viet, Truong, Dang Duy, Ngo, Toan Hoang, Nguyen, Oanh Thi Kim, Tran, Son Kim, and Huynh, Phuong Kim
Published: 2024
Full Text: View/download PDF

43. Coconut trees detection and segmentation in aerial imagery using mask region-based convolution neural network

Author: Iqbal, Muhammad Shakaib, Ali, Hazrat, Tran, Son N., and Iqbal, Talha
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Food resources face severe damages under extraordinary situations of catastrophes such as earthquakes, cyclones, and tsunamis. Under such scenarios, speedy assessment of food resources from agricultural land is critical as it supports aid activity in the disaster hit areas. In this article, a deep learning approach is presented for the detection and segmentation of coconut tress in aerial imagery provided through the AI competition organized by the World Bank in collaboration with OpenAerialMap and WeRobotics. Maked Region-based Convolutional Neural Network approach was used identification and segmentation of coconut trees. For the segmentation task, Mask R-CNN model with ResNet50 and ResNet1010 based architectures was used. Several experiments with different configuration parameters were performed and the best configuration for the detection of coconut trees with more than 90% confidence factor was reported. For the purpose of evaluation, Microsoft COCO dataset evaluation metric namely mean average precision (mAP) was used. An overall 91% mean average precision for coconut trees detection was achieved., Comment: Published in IET Computer Vision, 09 April 2021
Published: 2021
Full Text: View/download PDF

44. An Effectiveness of Repeating a Spoken Digit for Speaker Verification

Author: Vo, Duy, Le, Si Minh, Do, Hao Duc, Tran, Son Thai, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Nguyen, Ngoc Thanh, editor, Boonsang, Siridech, editor, Fujita, Hamido, editor, Hnatkowska, Bogumiła, editor, Hong, Tzung-Pei, editor, Pasupa, Kitsuchart, editor, and Selamat, Ali, editor
Published: 2023
Full Text: View/download PDF

45. Wine Characterisation with Spectral Information and Predictive Artificial Intelligence

Author: Yao, Jianping, Tran, Son N., Nguyen, Hieu, Sawyer, Samantha, Longo, Rocco, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Tanveer, Mohammad, editor, Agarwal, Sonali, editor, Ozawa, Seiichi, editor, Ekbal, Asif, editor, and Jatowt, Adam, editor
Published: 2023
Full Text: View/download PDF

46. dpUGC: Learn Differentially Private Representation for User Generated Contents (Best Paper Award, Third Place, Shared)

Author: Vu, Xuan-Son, Tran, Son N., Jiang, Lili, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, and Gelbukh, Alexander, editor
Published: 2023
Full Text: View/download PDF

47. A comparative study on kinetics and dynamics of two dump truck lifting mechanisms using MATLAB simscape

Author: Hong, Thong Duc, Pham, Minh Quang, Tran, Son Cong, Tran, Lam Quang, and Nguyen, Truong Thanh
Published: 2024
Full Text: View/download PDF

48. A novel adaptive ensemble learning framework for automated Beggiatoa Spp. coverage estimation

Author: Chen, Yanyu, Zhou, Yunjue, Park, Mira, Tran, Son, Hadley, Scott, and Bai, Quan
Published: 2024
Full Text: View/download PDF

49. Enitociclib, a selective CDK9 inhibitor: in vitro and in vivo preclinical studies in multiple myeloma

Author: Tran, Son, Sipila, Patrick, Frigault, Melanie M., Stelte-Ludwig, Beatrix, Johnson, Amy J., Birkett, Joseph, Izumi, Raquel, Hamdy, Ahmed, Maity, Ranjan, Bahlis, Nizar J., Neri, Paola, and Narendran, Aru
Published: 2024
Full Text: View/download PDF

50. Rapid-Motion-Track: Markerless tracking of fast human motion with deep learning

Author: Li, Renjie, Lau, Chun-yu, St George, Rebecca J., Lawler, Katherine, Garg, Saurabh, Tran, Son N., Bai, Quan, and Alty, Jane
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Category

Publication Type

Journal

Region

Database

Publisher

1,356 results on '"Tran, Son"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources