Author: "Luu, Khoa" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Luu, Khoa"' showing total 438 results

Start Over Author "Luu, Khoa"

438 results on '"Luu, Khoa"'

1. LiGAR: LiDAR-Guided Hierarchical Transformer for Multi-Modal Group Activity Recognition

Author: Chappa, Naga Venkata Sai Raviteja and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Group Activity Recognition (GAR) remains challenging in computer vision due to the complex nature of multi-agent interactions. This paper introduces LiGAR, a LIDAR-Guided Hierarchical Transformer for Multi-Modal Group Activity Recognition. LiGAR leverages LiDAR data as a structural backbone to guide the processing of visual and textual information, enabling robust handling of occlusions and complex spatial arrangements. Our framework incorporates a Multi-Scale LIDAR Transformer, Cross-Modal Guided Attention, and an Adaptive Fusion Module to integrate multi-modal data at different semantic levels effectively. LiGAR's hierarchical architecture captures group activities at various granularities, from individual actions to scene-level dynamics. Extensive experiments on the JRDB-PAR, Volleyball, and NBA datasets demonstrate LiGAR's superior performance, achieving state-of-the-art results with improvements of up to 10.6% in F1-score on JRDB-PAR and 5.9% in Mean Per Class Accuracy on the NBA dataset. Notably, LiGAR maintains high performance even when LiDAR data is unavailable during inference, showcasing its adaptability. Our ablation studies highlight the significant contributions of each component and the effectiveness of our multi-modal, multi-scale approach in advancing the field of group activity recognition., Comment: 14 pages, 4 figures, 10 tables
Published: 2024

2. FLAASH: Flow-Attention Adaptive Semantic Hierarchical Fusion for Multi-Modal Tobacco Content Analysis

Author: Chappa, Naga VS Raviteja, Dobbs, Page Daniel, Raj, Bhiksha, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The proliferation of tobacco-related content on social media platforms poses significant challenges for public health monitoring and intervention. This paper introduces a novel multi-modal deep learning framework named Flow-Attention Adaptive Semantic Hierarchical Fusion (FLAASH) designed to analyze tobacco-related video content comprehensively. FLAASH addresses the complexities of integrating visual and textual information in short-form videos by leveraging a hierarchical fusion mechanism inspired by flow network theory. Our approach incorporates three key innovations, including a flow-attention mechanism that captures nuanced interactions between visual and textual modalities, an adaptive weighting scheme that balances the contribution of different hierarchical levels, and a gating mechanism that selectively emphasizes relevant features. This multi-faceted approach enables FLAASH to effectively process and analyze diverse tobacco-related content, from product showcases to usage scenarios. We evaluate FLAASH on the Multimodal Tobacco Content Analysis Dataset (MTCAD), a large-scale collection of tobacco-related videos from popular social media platforms. Our results demonstrate significant improvements over existing methods, outperforming state-of-the-art approaches in classification accuracy, F1 score, and temporal consistency. The proposed method also shows strong generalization capabilities when tested on standard video question-answering datasets, surpassing current models. This work contributes to the intersection of public health and artificial intelligence, offering an effective tool for analyzing tobacco promotion in digital media., Comment: Under review at International Journal of Computer Vision; 20 pages, 4 figures, 5 tables
Published: 2024

3. DINTR: Tracking via Diffusion-based Interpolation

Author: Nguyen, Pha, Le, Ngan, Cothren, Jackson, Yilmaz, Alper, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Object tracking is a fundamental task in computer vision, requiring the localization of objects of interest across video frames. Diffusion models have shown remarkable capabilities in visual generation, making them well-suited for addressing several requirements of the tracking problem. This work proposes a novel diffusion-based methodology to formulate the tracking task. Firstly, their conditional process allows for injecting indications of the target object into the generation process. Secondly, diffusion mechanics can be developed to inherently model temporal correspondences, enabling the reconstruction of actual frames in video. However, existing diffusion models rely on extensive and unnecessary mapping to a Gaussian noise domain, which can be replaced by a more efficient and stable interpolation process. Our proposed interpolation mechanism draws inspiration from classic image-processing techniques, offering a more interpretable, stable, and faster approach tailored specifically for the object tracking task. By leveraging the strengths of diffusion models while circumventing their limitations, our Diffusion-based INterpolation TrackeR (DINTR) presents a promising new paradigm and achieves a superior multiplicity on seven benchmarks across five indicator representations., Comment: Accepted at NeurIPS 2024
Published: 2024

4. A Novel Dataset for Video-Based Autism Classification Leveraging Extra-Stimulatory Behavior

Author: Serna-Aguilera, Manuel, Nguyen, Xuan Bac, Seo, Han-Seok, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Autism Spectrum Disorder (ASD) can affect individuals at varying degrees of intensity, from challenges in overall health, communication, and sensory processing, and this often begins at a young age. Thus, it is critical for medical professionals to be able to accurately diagnose ASD in young children, but doing so is difficult. Deep learning can be responsibly leveraged to improve productivity in addressing this task. The availability of data, however, remains a considerable obstacle. Hence, in this work, we introduce the Video ASD dataset--a dataset that contains video frame convolutional and attention map feature data--to foster further progress in the task of ASD classification. The original videos showcase children reacting to chemo-sensory stimuli, among auditory, touch, and vision This dataset contains the features of the frames spanning 2,467 videos, for a total of approximately 1.4 million frames. Additionally, head pose angles are included to account for head movement noise, as well as full-sentence text labels for the taste and smell videos that describe how the facial expression changes before, immediately after, and long after interaction with the stimuli. In addition to providing features, we also test foundation models on this data to showcase how movement noise affects performance and the need for more data and more complex labels.
Published: 2024

5. Hierarchical Quantum Control Gates for Functional MRI Understanding

Author: Nguyen, Xuan-Bac, Nguyen, Hoang-Quan, Churchill, Hugh, Khan, Samee U., and Luu, Khoa
Subjects: Quantum Physics, Computer Science - Computer Vision and Pattern Recognition
Abstract: Quantum computing has emerged as a powerful tool for solving complex problems intractable for classical computers, particularly in popular fields such as cryptography, optimization, and neurocomputing. In this paper, we present a new quantum-based approach named the Hierarchical Quantum Control Gates (HQCG) method for efficient understanding of Functional Magnetic Resonance Imaging (fMRI) data. This approach includes two novel modules: the Local Quantum Control Gate (LQCG) and the Global Quantum Control Gate (GQCG), which are designed to extract local and global features of fMRI signals, respectively. Our method operates end-to-end on a quantum machine, leveraging quantum mechanics to learn patterns within extremely high-dimensional fMRI signals, such as 30,000 samples which is a challenge for classical computers. Empirical results demonstrate that our approach significantly outperforms classical methods. Additionally, we found that the proposed quantum model is more stable and less prone to overfitting than the classical methods., Comment: Accepted to IEEE Workshop on Signal Processing Systems (SiPS 2024)
Published: 2024

6. ED-SAM: An Efficient Diffusion Sampling Approach to Domain Generalization in Vision-Language Foundation Models

Author: Truong, Thanh-Dat, Li, Xin, Raj, Bhiksha, Cothren, Jackson, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The Vision-Language Foundation Model has recently shown outstanding performance in various perception learning tasks. The outstanding performance of the vision-language model mainly relies on large-scale pre-training datasets and different data augmentation techniques. However, the domain generalization problem of the vision-language foundation model needs to be addressed. This problem has limited the generalizability of the vision-language foundation model to unknown data distributions. In this paper, we introduce a new simple but efficient Diffusion Sampling approach to Domain Generalization (ED-SAM) to improve the generalizability of the vision-language foundation model. Our theoretical analysis in this work reveals the critical role and relation of the diffusion model to domain generalization in the vision-language foundation model. Then, based on the insightful analysis, we introduce a new simple yet effective Transport Transformation to diffusion sampling method. It can effectively generate adversarial samples to improve the generalizability of the foundation model against unknown data distributions. The experimental results on different scales of vision-language pre-training datasets, including CC3M, CC12M, and LAION400M, have consistently shown State-of-the-Art performance and scalability of the proposed ED-SAM approach compared to the other recent methods.
Published: 2024

7. EAGLE: Efficient Adaptive Geometry-based Learning in Cross-view Understanding

Author: Truong, Thanh-Dat, Prabhu, Utsav, Wang, Dongyi, Raj, Bhiksha, Gauch, Susan, Subbiah, Jeyamkondan, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Unsupervised Domain Adaptation has been an efficient approach to transferring the semantic segmentation model across data distributions. Meanwhile, the recent Open-vocabulary Semantic Scene understanding based on large-scale vision language models is effective in open-set settings because it can learn diverse concepts and categories. However, these prior methods fail to generalize across different camera views due to the lack of cross-view geometric modeling. At present, there are limited studies analyzing cross-view learning. To address this problem, we introduce a novel Unsupervised Cross-view Adaptation Learning approach to modeling the geometric structural change across views in Semantic Scene Understanding. First, we introduce a novel Cross-view Geometric Constraint on Unpaired Data to model structural changes in images and segmentation masks across cameras. Second, we present a new Geodesic Flow-based Correlation Metric to efficiently measure the geometric structural changes across camera views. Third, we introduce a novel view-condition prompting mechanism to enhance the view-information modeling of the open-vocabulary segmentation network in cross-view adaptation learning. The experiments on different cross-view adaptation benchmarks have shown the effectiveness of our approach in cross-view modeling, demonstrating that we achieve State-of-the-Art (SOTA) performance compared to prior unsupervised domain adaptation and open-vocabulary semantic segmentation methods., Comment: Accepted to NeurIPS'24
Published: 2024

8. CYCLO: Cyclic Graph Transformer Approach to Multi-Object Relationship Modeling in Aerial Videos

Author: Nguyen, Trong-Thuan, Nguyen, Pha, Li, Xin, Cothren, Jackson, Yilmaz, Alper, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Video scene graph generation (VidSGG) has emerged as a transformative approach to capturing and interpreting the intricate relationships among objects and their temporal dynamics in video sequences. In this paper, we introduce the new AeroEye dataset that focuses on multi-object relationship modeling in aerial videos. Our AeroEye dataset features various drone scenes and includes a visually comprehensive and precise collection of predicates that capture the intricate relationships and spatial arrangements among objects. To this end, we propose the novel Cyclic Graph Transformer (CYCLO) approach that allows the model to capture both direct and long-range temporal dependencies by continuously updating the history of interactions in a circular manner. The proposed approach also allows one to handle sequences with inherent cyclical patterns and process object relationships in the correct sequential order. Therefore, it can effectively capture periodic and overlapping relationships while minimizing information loss. The extensive experiments on the AeroEye dataset demonstrate the effectiveness of the proposed CYCLO model, demonstrating its potential to perform scene understanding on drone videos. Finally, the CYCLO method consistently achieves State-of-the-Art (SOTA) results on two in-the-wild scene graph generation benchmarks, i.e., PVSG and ASPIRe., Comment: Accepted to NeurIPS 2024
Published: 2024

9. Diffusion-Inspired Quantum Noise Mitigation in Parameterized Quantum Circuits

Author: Nguyen, Hoang-Quan, Nguyen, Xuan Bac, Chen, Samuel Yen-Chi, Churchill, Hugh, Borys, Nicholas, Khan, Samee U., and Luu, Khoa
Subjects: Quantum Physics, Computer Science - Machine Learning
Abstract: Parameterized Quantum Circuits (PQCs) have been acknowledged as a leading strategy to utilize near-term quantum advantages in multiple problems, including machine learning and combinatorial optimization. When applied to specific tasks, the parameters in the quantum circuits are trained to minimize the target function. Although there have been comprehensive studies to improve the performance of the PQCs on practical tasks, the errors caused by the quantum noise downgrade the performance when running on real quantum computers. In particular, when the quantum state is transformed through multiple quantum circuit layers, the effect of the quantum noise happens cumulatively and becomes closer to the maximally mixed state or complete noise. This paper studies the relationship between the quantum noise and the diffusion model. Then, we propose a novel diffusion-inspired learning approach to mitigate the quantum noise in the PQCs and reduce the error for specific tasks. Through our experiments, we illustrate the efficiency of the learning strategy and achieve state-of-the-art performance on classification tasks in the quantum noise scenarios.
Published: 2024

10. Quantum Visual Feature Encoding Revisited

Author: Nguyen, Xuan-Bac, Nguyen, Hoang-Quan, Churchill, Hugh, Khan, Samee U., and Luu, Khoa
Subjects: Quantum Physics, Computer Science - Computer Vision and Pattern Recognition
Abstract: Although quantum machine learning has been introduced for a while, its applications in computer vision are still limited. This paper, therefore, revisits the quantum visual encoding strategies, the initial step in quantum machine learning. Investigating the root cause, we uncover that the existing quantum encoding design fails to ensure information preservation of the visual features after the encoding process, thus complicating the learning process of the quantum machine learning models. In particular, the problem, termed "Quantum Information Gap" (QIG), leads to a gap of information between classical and corresponding quantum features. We provide theoretical proof and practical demonstrations of that found and underscore the significance of QIG, as it directly impacts the performance of quantum machine learning algorithms. To tackle this challenge, we introduce a simple but efficient new loss function named Quantum Information Preserving (QIP) to minimize this gap, resulting in enhanced performance of quantum machine learning algorithms. Extensive experiments validate the effectiveness of our approach, showcasing superior performance compared to current methodologies and consistently achieving state-of-the-art results in quantum modeling., Comment: Accepted to Quantum Machine Intelligence
Published: 2024

11. QClusformer: A Quantum Transformer-based Framework for Unsupervised Visual Clustering

Author: Nguyen, Xuan-Bac, Nguyen, Hoang-Quan, Chen, Samuel Yen-Chi, Khan, Samee U., Churchill, Hugh, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Unsupervised vision clustering, a cornerstone in computer vision, has been studied for decades, yielding significant outcomes across numerous vision tasks. However, these algorithms involve substantial computational demands when confronted with vast amounts of unlabeled data. Conversely, quantum computing holds promise in expediting unsupervised algorithms when handling large-scale databases. In this study, we introduce QClusformer, a pioneering Transformer-based framework leveraging quantum machines to tackle unsupervised vision clustering challenges. Specifically, we design the Transformer architecture, including the self-attention module and transformer blocks, from a quantum perspective to enable execution on quantum hardware. In addition, we present QClusformer, a variant based on the Transformer architecture, tailored for unsupervised vision clustering tasks. By integrating these elements into an end-to-end framework, QClusformer consistently outperforms previous methods running on classical computers. Empirical evaluations across diverse benchmarks, including MS-Celeb-1M and DeepFashion, underscore the superior performance of QClusformer compared to state-of-the-art methods.
Published: 2024

12. BRACTIVE: A Brain Activation Approach to Human Visual Brain Learning

Author: Nguyen, Xuan-Bac, Jang, Hojin, Li, Xin, Khan, Samee U., Sinha, Pawan, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The human brain is a highly efficient processing unit, and understanding how it works can inspire new algorithms and architectures in machine learning. In this work, we introduce a novel framework named Brain Activation Network (BRACTIVE), a transformer-based approach to studying the human visual brain. The main objective of BRACTIVE is to align the visual features of subjects with corresponding brain representations via fMRI signals. It allows us to identify the brain's Regions of Interest (ROI) of the subjects. Unlike previous brain research methods, which can only identify ROIs for one subject at a time and are limited by the number of subjects, BRACTIVE automatically extends this identification to multiple subjects and ROIs. Our experiments demonstrate that BRACTIVE effectively identifies person-specific regions of interest, such as face and body-selective areas, aligning with neuroscience findings and indicating potential applicability to various object categories. More importantly, we found that leveraging human visual brain activity to guide deep neural networks enhances performance across various benchmarks. It encourages the potential of BRACTIVE in both neuroscience and machine intelligence studies.
Published: 2024

13. Multi-view Action Recognition via Directed Gromov-Wasserstein Discrepancy

Author: Nguyen, Hoang-Quan, Truong, Thanh-Dat, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Action recognition has become one of the popular research topics in computer vision. There are various methods based on Convolutional Networks and self-attention mechanisms as Transformers to solve both spatial and temporal dimensions problems of action recognition tasks that achieve competitive performances. However, these methods lack a guarantee of the correctness of the action subject that the models give attention to, i.e., how to ensure an action recognition model focuses on the proper action subject to make a reasonable action prediction. In this paper, we propose a multi-view attention consistency method that computes the similarity between two attentions from two different views of the action videos using Directed Gromov-Wasserstein Discrepancy. Furthermore, our approach applies the idea of Neural Radiance Field to implicitly render the features from novel views when training on single-view datasets. Therefore, the contributions in this work are three-fold. Firstly, we introduce the multi-view attention consistency to solve the problem of reasonable prediction in action recognition. Secondly, we define a new metric for multi-view consistent attention using Directed Gromov-Wasserstein Discrepancy. Thirdly, we built an action recognition model based on Video Transformers and Neural Radiance Fields. Compared to the recent action recognition methods, the proposed approach achieves state-of-the-art results on three large-scale datasets, i.e., Jester, Something-Something V2, and Kinetics-400.
Published: 2024

14. Hybrid Quantum Tabu Search for Solving the Vehicle Routing Problem

Author: Holliday, James, Morgan, Braeden, Churchill, Hugh, and Luu, Khoa
Subjects: Computer Science - Emerging Technologies
Abstract: There has never been a more exciting time for the future of quantum computing than now. Near-term quantum computing usage is now the next XPRIZE. With that challenge in mind we have explored a new approach as a hybrid quantum-classical algorithm for solving NP-Hard optimization problems. We have focused on the classic problem of the Capacitated Vehicle Routing Problem (CVRP) because of its real-world industry applications. Heuristics are often employed to solve this problem because it is difficult. In addition, meta-heuristic algorithms have proven to be capable of finding reasonable solutions to optimization problems like the CVRP. Recent research has shown that quantum-only and hybrid quantum/classical approaches to solving the CVRP are possible. Where quantum approaches are usually limited to minimal optimization problems, hybrid approaches have been able to solve more significant problems. Still, the hybrid approaches often need help finding solutions as good as their classical counterparts. In our proposed approach, we created a hybrid quantum/classical metaheuristic algorithm capable of finding the best-known solution to a classic CVRP problem. Our experimental results show that our proposed algorithm often outperforms other hybrid approaches.
Published: 2024

15. HIG: Hierarchical Interlacement Graph Approach to Scene Graph Generation in Video Understanding

Author: Nguyen, Trong-Thuan, Nguyen, Pha, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Visual interactivity understanding within visual scenes presents a significant challenge in computer vision. Existing methods focus on complex interactivities while leveraging a simple relationship model. These methods, however, struggle with a diversity of appearance, situation, position, interaction, and relation in videos. This limitation hinders the ability to fully comprehend the interplay within the complex visual dynamics of subjects. In this paper, we delve into interactivities understanding within visual content by deriving scene graph representations from dense interactivities among humans and objects. To achieve this goal, we first present a new dataset containing Appearance-Situation-Position-Interaction-Relation predicates, named ASPIRe, offering an extensive collection of videos marked by a wide range of interactivities. Then, we propose a new approach named Hierarchical Interlacement Graph (HIG), which leverages a unified layer and graph within a hierarchical structure to provide deep insights into scene changes across five distinct tasks. Our approach demonstrates superior performance to other methods through extensive experiments conducted in various scenarios., Comment: Accepted to CVPR 2024
Published: 2023

16. POP-HIT: Partially Order-Preserving Hash-Induced Transformation for Privacy Protection in Face Recognition Access Control

Author: Dubasi, Yatish, Li, Qinghua, Luu, Khoa, Akan, Ozgur, Editorial Board Member, Bellavista, Paolo, Editorial Board Member, Cao, Jiannong, Editorial Board Member, Coulson, Geoffrey, Editorial Board Member, Dressler, Falko, Editorial Board Member, Ferrari, Domenico, Editorial Board Member, Gerla, Mario, Editorial Board Member, Kobayashi, Hisashi, Editorial Board Member, Palazzo, Sergio, Editorial Board Member, Sahni, Sartaj, Editorial Board Member, Shen, Xuemin, Editorial Board Member, Stan, Mircea, Editorial Board Member, Jia, Xiaohua, Editorial Board Member, Zomaya, Albert Y., Editorial Board Member, Duan, Haixin, editor, Debbabi, Mourad, editor, de Carné de Carnavalet, Xavier, editor, Luo, Xiapu, editor, Du, Xiaojiang, editor, and Au, Man Ho Allen, editor
Published: 2025
Full Text: View/download PDF

17. Depth Perspective-Aware Multiple Object Tracking

Author: Quach, Kha Gia, Nguyen, Pha, Duong, Chi Nhan, Bui, Tien Dai, Luu, Khoa, Yang, Xin-She, Series Editor, Dey, Nilanjan, Series Editor, and Fong, Simon, Series Editor
Published: 2025
Full Text: View/download PDF

18. Brainformer: Mimic Human Visual Brain Functions to Machine Vision Models via fMRI

Author: Nguyen, Xuan-Bac, Li, Xin, Sinha, Pawan, Khan, Samee U., and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Human perception plays a vital role in forming beliefs and understanding reality. A deeper understanding of brain functionality will lead to the development of novel deep neural networks. In this work, we introduce a novel framework named Brainformer, a straightforward yet effective Transformer-based framework, to analyze Functional Magnetic Resonance Imaging (fMRI) patterns in the human perception system from a machine-learning perspective. Specifically, we present the Multi-scale fMRI Transformer to explore brain activity patterns through fMRI signals. This architecture includes a simple yet efficient module for high-dimensional fMRI signal encoding and incorporates a novel embedding technique called 3D Voxels Embedding. Secondly, drawing inspiration from the functionality of the brain's Region of Interest, we introduce a novel loss function called Brain fMRI Guidance Loss. This loss function mimics brain activity patterns from these regions in the deep neural network using fMRI data. This work introduces a prospective approach to transfer knowledge from human perception to neural networks. Our experiments demonstrate that leveraging fMRI information allows the machine vision model to achieve results comparable to State-of-the-Art methods in various image recognition tasks.
Published: 2023

19. HAtt-Flow: Hierarchical Attention-Flow Mechanism for Group Activity Scene Graph Generation in Videos

Author: Chappa, Naga VS Raviteja, Nguyen, Pha, Le, Thi Hoang Ngan, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Group Activity Scene Graph (GASG) generation is a challenging task in computer vision, aiming to anticipate and describe relationships between subjects and objects in video sequences. Traditional Video Scene Graph Generation (VidSGG) methods focus on retrospective analysis, limiting their predictive capabilities. To enrich the scene understanding capabilities, we introduced a GASG dataset extending the JRDB dataset with nuanced annotations involving \textit{Appearance, Interaction, Position, Relationship, and Situation} attributes. This work also introduces an innovative approach, \textbf{H}ierarchical \textbf{Att}ention-\textbf{Flow} (HAtt-Flow) Mechanism, rooted in flow network theory to enhance GASG performance. Flow-Attention incorporates flow conservation principles, fostering competition for sources and allocation for sinks, effectively preventing the generation of trivial attention. Our proposed approach offers a unique perspective on attention mechanisms, where conventional "values" and "keys" are transformed into sources and sinks, respectively, creating a novel framework for attention-based models. Through extensive experiments, we demonstrate the effectiveness of our Hatt-Flow model and the superiority of our proposed Flow-Attention mechanism. This work represents a significant advancement in predictive video scene understanding, providing valuable insights and techniques for applications that require real-time relationship prediction in video data., Comment: 11 pages, 5 figures, 6 tables
Published: 2023
Full Text: View/download PDF

20. FALCON: Fairness Learning via Contrastive Attention Approach to Continual Semantic Scene Understanding

Author: Truong, Thanh-Dat, Prabhu, Utsav, Raj, Bhiksha, Cothren, Jackson, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Continual Learning in semantic scene segmentation aims to continually learn new unseen classes in dynamic environments while maintaining previously learned knowledge. Prior studies focused on modeling the catastrophic forgetting and background shift challenges in continual learning. However, fairness, another major challenge that causes unfair predictions leading to low performance among major and minor classes, still needs to be well addressed. In addition, prior methods have yet to model the unknown classes well, thus resulting in producing non-discriminative features among unknown classes. This paper presents a novel Fairness Learning via Contrastive Attention Approach to continual learning in semantic scene understanding. In particular, we first introduce a new Fairness Contrastive Clustering loss to address the problems of catastrophic forgetting and fairness. Then, we propose an attention-based visual grammar approach to effectively model the background shift problem and unknown classes, producing better feature representations for different unknown classes. Through our experiments, our proposed approach achieves State-of-the-Art (SOTA) performance on different continual learning benchmarks, i.e., ADE20K, Cityscapes, and Pascal VOC. It promotes the fairness of the continual semantic segmentation model.
Published: 2023

21. REACT: Recognize Every Action Everywhere All At Once

Author: Chappa, Naga VS Raviteja, Nguyen, Pha, Dobbs, Page Daniel, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Group Activity Recognition (GAR) is a fundamental problem in computer vision, with diverse applications in sports video analysis, video surveillance, and social scene understanding. Unlike conventional action recognition, GAR aims to classify the actions of a group of individuals as a whole, requiring a deep understanding of their interactions and spatiotemporal relationships. To address the challenges in GAR, we present REACT (\textbf{R}ecognize \textbf{E}very \textbf{Act}ion Everywhere All At Once), a novel architecture inspired by the transformer encoder-decoder model explicitly designed to model complex contextual relationships within videos, including multi-modality and spatio-temporal features. Our architecture features a cutting-edge Vision-Language Encoder block for integrated temporal, spatial, and multi-modal interaction modeling. This component efficiently encodes spatiotemporal interactions, even with sparsely sampled frames, and recovers essential local information. Our Action Decoder Block refines the joint understanding of text and video data, allowing us to precisely retrieve bounding boxes, enhancing the link between semantics and visual reality. At the core, our Actor Fusion Block orchestrates a fusion of actor-specific data and textual features, striking a balance between specificity and context. Our method outperforms state-of-the-art GAR approaches in extensive experiments, demonstrating superior accuracy in recognizing and understanding group activities. Our architecture's potential extends to diverse real-world applications, offering empirical evidence of its performance gains. This work significantly advances the field of group activity recognition, providing a robust framework for nuanced scene comprehension., Comment: 10 pages, 4 figures, 5 tables
Published: 2023
Full Text: View/download PDF

22. Insect-Foundation: A Foundation Model and Large-scale 1M Dataset for Visual Insect Understanding

Author: Nguyen, Hoang-Quan, Truong, Thanh-Dat, Nguyen, Xuan Bac, Dowling, Ashley, Li, Xin, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In precision agriculture, the detection and recognition of insects play an essential role in the ability of crops to grow healthy and produce a high-quality yield. The current machine vision model requires a large volume of data to achieve high performance. However, there are approximately 5.5 million different insect species in the world. None of the existing insect datasets can cover even a fraction of them due to varying geographic locations and acquisition costs. In this paper, we introduce a novel "Insect-1M" dataset, a game-changing resource poised to revolutionize insect-related foundation model training. Covering a vast spectrum of insect species, our dataset, including 1 million images with dense identification labels of taxonomy hierarchy and insect descriptions, offers a panoramic view of entomology, enabling foundation models to comprehend visual and semantic information about insects like never before. Then, to efficiently establish an Insect Foundation Model, we develop a micro-feature self-supervised learning method with a Patch-wise Relevant Attention mechanism capable of discerning the subtle differences among insect images. In addition, we introduce Description Consistency loss to improve micro-feature modeling via insect descriptions. Through our experiments, we illustrate the effectiveness of our proposed approach in insect modeling and achieve State-of-the-Art performance on standard benchmarks of insect-related tasks. Our Insect Foundation Model and Dataset promise to empower the next generation of insect-related vision models, bringing them closer to the ultimate goal of precision agriculture.
Published: 2023

23. Quantum Vision Clustering

Author: Nguyen, Xuan Bac, Churchill, Hugh, Luu, Khoa, and Khan, Samee U.
Subjects: Quantum Physics, Computer Science - Computer Vision and Pattern Recognition
Abstract: Unsupervised visual clustering has garnered significant attention in recent times, aiming to characterize distributions of unlabeled visual images through clustering based on a parameterized appearance approach. Alternatively, clustering algorithms can be viewed as assignment problems, often characterized as NP-hard, yet precisely solvable for small instances on contemporary hardware. Adiabatic quantum computing (AQC) emerges as a promising solution, poised to deliver substantial speedups for a range of NP-hard optimization problems. However, existing clustering formulations face challenges in quantum computing adoption due to scalability issues. In this study, we present the first clustering formulation tailored for resolution using Adiabatic quantum computing. An Ising model is introduced to represent the quantum mechanical system implemented on AQC. The proposed approach demonstrates high competitiveness compared to state-of-the-art optimization-based methods, even when utilizing off-the-shelf integer programming solvers. Lastly, this work showcases the solvability of the proposed clustering problem on current-generation real quantum computers for small examples and analyzes the properties of the obtained solutions, Comment: arXiv admin note: text overlap with arXiv:2202.08837 by other authors
Published: 2023

24. The Algonauts Project 2023 Challenge: UARK-UAlbany Team Solution

Author: Nguyen, Xuan-Bac, Liu, Xudong, Li, Xin, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: This work presents our solutions to the Algonauts Project 2023 Challenge. The primary objective of the challenge revolves around employing computational models to anticipate brain responses captured during participants' observation of intricate natural visual scenes. The goal is to predict brain responses across the entire visual brain, as it is the region where the most reliable responses to images have been observed. We constructed an image-based brain encoder through a two-step training process to tackle this challenge. Initially, we created a pretrained encoder using data from all subjects. Next, we proceeded to fine-tune individual subjects. Each step employed different training strategies, such as different loss functions and objectives, to introduce diversity. Ultimately, our solution constitutes an ensemble of multiple unique encoders. The code is available at https://github.com/uark-cviu/Algonauts2023, Comment: The Algonauts Project 2023 Challenge
Published: 2023

25. UTOPIA: Unconstrained Tracking Objects without Preliminary Examination via Cross-Domain Adaptation

Author: Nguyen, Pha, Quach, Kha Gia, Gauch, John, Khan, Samee U., Raj, Bhiksha, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Multiple Object Tracking (MOT) aims to find bounding boxes and identities of targeted objects in consecutive video frames. While fully-supervised MOT methods have achieved high accuracy on existing datasets, they cannot generalize well on a newly obtained dataset or a new unseen domain. In this work, we first address the MOT problem from the cross-domain point of view, imitating the process of new data acquisition in practice. Then, a new cross-domain MOT adaptation from existing datasets is proposed without any pre-defined human knowledge in understanding and modeling objects. It can also learn and update itself from the target data feedback. The intensive experiments are designed on four challenging settings, including MOTSynth to MOT17, MOT17 to MOT20, MOT17 to VisDrone, and MOT17 to DanceTrack. We then prove the adaptability of the proposed self-supervised learning strategy. The experiments also show superior performance on tracking metrics MOTA and IDF1, compared to fully supervised, unsupervised, and self-supervised state-of-the-art methods.
Published: 2023

26. Z-GMOT: Zero-shot Generic Multiple Object Tracking

Author: Tran, Kim Hoang, Dinh, Anh Duy Le, Nguyen, Tien Phat, Phan, Thinh, Nguyen, Pha, Luu, Khoa, Adjeroh, Donald, Doretto, Gianfranco, and Le, Ngan Hoang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Despite recent significant progress, Multi-Object Tracking (MOT) faces limitations such as reliance on prior knowledge and predefined categories and struggles with unseen objects. To address these issues, Generic Multiple Object Tracking (GMOT) has emerged as an alternative approach, requiring less prior information. However, current GMOT methods often rely on initial bounding boxes and struggle to handle variations in factors such as viewpoint, lighting, occlusion, and scale, among others. Our contributions commence with the introduction of the \textit{Referring GMOT dataset} a collection of videos, each accompanied by detailed textual descriptions of their attributes. Subsequently, we propose $\mathtt{Z-GMOT}$, a cutting-edge tracking solution capable of tracking objects from \textit{never-seen categories} without the need of initial bounding boxes or predefined categories. Within our $\mathtt{Z-GMOT}$ framework, we introduce two novel components: (i) $\mathtt{iGLIP}$, an improved Grounded language-image pretraining, for accurately detecting unseen objects with specific characteristics. (ii) $\mathtt{MA-SORT}$, a novel object association approach that adeptly integrates motion and appearance-based matching strategies to tackle the complex task of tracking objects with high similarity. Our contributions are benchmarked through extensive experiments conducted on the Referring GMOT dataset for GMOT task. Additionally, to assess the generalizability of the proposed $\mathtt{Z-GMOT}$, we conduct ablation studies on the DanceTrack and MOT20 datasets for the MOT task. Our dataset, code, and models are released at: https://fsoft-aic.github.io/Z-GMOT.
Published: 2023

27. Fairness Continual Learning Approach to Semantic Scene Understanding in Open-World Environments

Author: Truong, Thanh-Dat, Nguyen, Hoang-Quan, Raj, Bhiksha, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Continual semantic segmentation aims to learn new classes while maintaining the information from the previous classes. Although prior studies have shown impressive progress in recent years, the fairness concern in the continual semantic segmentation needs to be better addressed. Meanwhile, fairness is one of the most vital factors in deploying the deep learning model, especially in human-related or safety applications. In this paper, we present a novel Fairness Continual Learning approach to the semantic segmentation problem. In particular, under the fairness objective, a new fairness continual learning framework is proposed based on class distributions. Then, a novel Prototypical Contrastive Clustering loss is proposed to address the significant challenges in continual learning, i.e., catastrophic forgetting and background shift. Our proposed loss has also been proven as a novel, generalized learning paradigm of knowledge distillation commonly used in continual learning. Moreover, the proposed Conditional Structural Consistency loss further regularized the structural constraint of the predicted segmentation. Our proposed approach has achieved State-of-the-Art performance on three standard scene understanding benchmarks, i.e., ADE20K, Cityscapes, and Pascal VOC, and promoted the fairness of the segmentation model., Comment: Accepted to NeurIPS 2023
Published: 2023

28. Cross-view Action Recognition Understanding From Exocentric to Egocentric Perspective

Author: Truong, Thanh-Dat and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Understanding action recognition in egocentric videos has emerged as a vital research topic with numerous practical applications. With the limitation in the scale of egocentric data collection, learning robust deep learning-based action recognition models remains difficult. Transferring knowledge learned from the large-scale exocentric data to the egocentric data is challenging due to the difference in videos across views. Our work introduces a novel cross-view learning approach to action recognition (CVAR) that effectively transfers knowledge from the exocentric to the selfish view. First, we present a novel geometric-based constraint into the self-attention mechanism in Transformer based on analyzing the camera positions between two views. Then, we propose a new cross-view self-attention loss learned on unpaired cross-view data to enforce the self-attention mechanism learning to transfer knowledge across views. Finally, to further improve the performance of our cross-view learning approach, we present the metrics to measure the correlations in videos and attention maps effectively. Experimental results on standard egocentric action recognition benchmarks, i.e., Charades-Ego, EPIC-Kitchens-55, and EPIC-Kitchens-100, have shown our approach's effectiveness and state-of-the-art performance.
Published: 2023

29. Type-to-Track: Retrieve Any Object via Prompt-based Tracking

Author: Nguyen, Pha, Quach, Kha Gia, Kitani, Kris, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: One of the recent trends in vision problems is to use natural language captions to describe the objects of interest. This approach can overcome some limitations of traditional methods that rely on bounding boxes or category annotations. This paper introduces a novel paradigm for Multiple Object Tracking called Type-to-Track, which allows users to track objects in videos by typing natural language descriptions. We present a new dataset for that Grounded Multiple Object Tracking task, called GroOT, that contains videos with various types of objects and their corresponding textual captions describing their appearance and action in detail. Additionally, we introduce two new evaluation protocols and formulate evaluation metrics specifically for this task. We develop a new efficient method that models a transformer-based eMbed-ENcoDE-extRact framework (MENDER) using the third-order tensor decomposition. The experiments in five scenarios show that our MENDER approach outperforms another two-stage design in terms of accuracy and efficiency, up to 14.7% accuracy and 4$\times$ speed faster., Comment: Accepted at NeurIPS 2023. Project page: https://uark-cviu.github.io/Type-to-Track/
Published: 2023

30. SoGAR: Self-supervised Spatiotemporal Attention-based Social Group Activity Recognition

Author: Chappa, Naga VS Raviteja, Nguyen, Pha, Nelson, Alexander H, Seo, Han-Seok, Li, Xin, Dobbs, Page Daniel, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: This paper introduces a novel approach to Social Group Activity Recognition (SoGAR) using Self-supervised Transformers network that can effectively utilize unlabeled video data. To extract spatio-temporal information, we created local and global views with varying frame rates. Our self-supervised objective ensures that features extracted from contrasting views of the same video were consistent across spatio-temporal domains. Our proposed approach is efficient in using transformer-based encoders to alleviate the weakly supervised setting of group activity recognition. By leveraging the benefits of transformer models, our approach can model long-term relationships along spatio-temporal dimensions. Our proposed SoGAR method achieved state-of-the-art results on three group activity recognition benchmarks, namely JRDB-PAR, NBA, and Volleyball datasets, surpassing the current numbers in terms of F1-score, MCA, and MPCA metrics., Comment: Under review for PR journal; 32 pages, 7 figures. arXiv admin note: text overlap with arXiv:2303.12149
Published: 2023

31. Fairness in Visual Clustering: A Novel Transformer Clustering Approach

Author: Nguyen, Xuan-Bac, Duong, Chi Nhan, Savvides, Marios, Roy, Kaushik, Churchill, Hugh, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Promoting fairness for deep clustering models in unsupervised clustering settings to reduce demographic bias is a challenging goal. This is because of the limitation of large-scale balanced data with well-annotated labels for sensitive or protected attributes. In this paper, we first evaluate demographic bias in deep clustering models from the perspective of cluster purity, which is measured by the ratio of positive samples within a cluster to their correlation degree. This measurement is adopted as an indication of demographic bias. Then, a novel loss function is introduced to encourage a purity consistency for all clusters to maintain the fairness aspect of the learned clustering model. Moreover, we present a novel attention mechanism, Cross-attention, to measure correlations between multiple clusters, strengthening faraway positive samples and improving the purity of clusters during the learning process. Experimental results on a large-scale dataset with numerous attribute settings have demonstrated the effectiveness of the proposed approach on both clustering accuracy and fairness enhancement on several sensitive attributes.
Published: 2023

32. CoMaL: Conditional Maximum Likelihood Approach to Self-supervised Domain Adaptation in Long-tail Semantic Segmentation

Author: Truong, Thanh-Dat, Duong, Chi Nhan, Helton, Pierce, Dowling, Ashley, Li, Xin, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The research in self-supervised domain adaptation in semantic segmentation has recently received considerable attention. Although GAN-based methods have become one of the most popular approaches to domain adaptation, they have suffered from some limitations. They are insufficient to model both global and local structures of a given image, especially in small regions of tail classes. Moreover, they perform bad on the tail classes containing limited number of pixels or less training samples. In order to address these issues, we present a new self-supervised domain adaptation approach to tackle long-tail semantic segmentation in this paper. Firstly, a new metric is introduced to formulate long-tail domain adaptation in the segmentation problem. Secondly, a new Conditional Maximum Likelihood (CoMaL) approach in an autoregressive framework is presented to solve the problem of long-tail domain adaptation. Although other segmentation methods work under the pixel independence assumption, the long-tailed pixel distributions in CoMaL are generally solved in the context of structural dependency, as that is more realistic. Finally, the proposed method is evaluated on popular large-scale semantic segmentation benchmarks, i.e., "SYNTHIA to Cityscapes" and "GTA to Cityscapes", and outperforms the prior methods by a large margin in both the standard and the proposed evaluation protocols.
Published: 2023

33. CROVIA: Seeing Drone Scenes from Car Perspective via Cross-View Adaptation

Author: Truong, Thanh-Dat, Duong, Chi Nhan, Dowling, Ashley, Phung, Son Lam, Cothren, Jackson, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Understanding semantic scene segmentation of urban scenes captured from the Unmanned Aerial Vehicles (UAV) perspective plays a vital role in building a perception model for UAV. With the limitations of large-scale densely labeled data, semantic scene segmentation for UAV views requires a broad understanding of an object from both its top and side views. Adapting from well-annotated autonomous driving data to unlabeled UAV data is challenging due to the cross-view differences between the two data types. Our work proposes a novel Cross-View Adaptation (CROVIA) approach to effectively adapt the knowledge learned from on-road vehicle views to UAV views. First, a novel geometry-based constraint to cross-view adaptation is introduced based on the geometry correlation between views. Second, cross-view correlations from image space are effectively transferred to segmentation space without any requirement of paired on-road and UAV view data via a new Geometry-Constraint Cross-View (GeiCo) loss. Third, the multi-modal bijective networks are introduced to enforce the global structural modeling across views. Experimental results on new cross-view adaptation benchmarks introduced in this work, i.e., SYNTHIA to UAVID and GTA5 to UAVID, show the State-of-the-Art (SOTA) performance of our approach over prior adaptation methods
Published: 2023

34. Micron-BERT: BERT-based Facial Micro-Expression Recognition

Author: Nguyen, Xuan-Bac, Duong, Chi Nhan, Li, Xin, Gauch, Susan, Seo, Han-Seok, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Micro-expression recognition is one of the most challenging topics in affective computing. It aims to recognize tiny facial movements difficult for humans to perceive in a brief period, i.e., 0.25 to 0.5 seconds. Recent advances in pre-training deep Bidirectional Transformers (BERT) have significantly improved self-supervised learning tasks in computer vision. However, the standard BERT in vision problems is designed to learn only from full images or videos, and the architecture cannot accurately detect details of facial micro-expressions. This paper presents Micron-BERT ($\mu$-BERT), a novel approach to facial micro-expression recognition. The proposed method can automatically capture these movements in an unsupervised manner based on two key ideas. First, we employ Diagonal Micro-Attention (DMA) to detect tiny differences between two frames. Second, we introduce a new Patch of Interest (PoI) module to localize and highlight micro-expression interest regions and simultaneously reduce noisy backgrounds and distractions. By incorporating these components into an end-to-end deep network, the proposed $\mu$-BERT significantly outperforms all previous work in various micro-expression tasks. $\mu$-BERT can be trained on a large-scale unlabeled dataset, i.e., up to 8 million images, and achieves high accuracy on new unseen facial micro-expression datasets. Empirical experiments show $\mu$-BERT consistently outperforms state-of-the-art performance on four micro-expression benchmarks, including SAMM, CASME II, SMIC, and CASME3, by significant margins. Code will be available at \url{https://github.com/uark-cviu/Micron-BERT}, Comment: Accepted by CVPR2023
Published: 2023

35. FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding

Author: Truong, Thanh-Dat, Le, Ngan, Raj, Bhiksha, Cothren, Jackson, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Although Domain Adaptation in Semantic Scene Segmentation has shown impressive improvement in recent years, the fairness concerns in the domain adaptation have yet to be well defined and addressed. In addition, fairness is one of the most critical aspects when deploying the segmentation models into human-related real-world applications, e.g., autonomous driving, as any unfair predictions could influence human safety. In this paper, we propose a novel Fairness Domain Adaptation (FREDOM) approach to semantic scene segmentation. In particular, from the proposed formulated fairness objective, a new adaptation framework will be introduced based on the fair treatment of class distributions. Moreover, to generally model the context of structural dependency, a new conditional structural constraint is introduced to impose the consistency of predicted segmentation. Thanks to the proposed Conditional Structure Network, the self-attention mechanism has sufficiently modeled the structural information of segmentation. Through the ablation studies, the proposed method has shown the performance improvement of the segmentation models and promoted fairness in the model predictions. The experimental results on the two standard benchmarks, i.e., SYNTHIA $\to$ Cityscapes and GTA5 $\to$ Cityscapes, have shown that our method achieved State-of-the-Art (SOTA) performance., Comment: Accepted to CVPR'23
Published: 2023

36. SPARTAN: Self-supervised Spatiotemporal Transformers Approach to Group Activity Recognition

Author: Chappa, Naga VS Raviteja, Nguyen, Pha, Nelson, Alexander H, Seo, Han-Seok, Li, Xin, Dobbs, Page Daniel, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we propose a new, simple, and effective Self-supervised Spatio-temporal Transformers (SPARTAN) approach to Group Activity Recognition (GAR) using unlabeled video data. Given a video, we create local and global Spatio-temporal views with varying spatial patch sizes and frame rates. The proposed self-supervised objective aims to match the features of these contrasting views representing the same video to be consistent with the variations in spatiotemporal domains. To the best of our knowledge, the proposed mechanism is one of the first works to alleviate the weakly supervised setting of GAR using the encoders in video transformers. Furthermore, using the advantage of transformer models, our proposed approach supports long-term relationship modeling along spatio-temporal dimensions. The proposed SPARTAN approach performs well on two group activity recognition benchmarks, including NBA and Volleyball datasets, by surpassing the state-of-the-art results by a significant margin in terms of MCA and MPCA metrics., Comment: Accepted to CVPRW 2023; 11 pages, 5 figures
Published: 2023

37. React: recognize every action everywhere all at once

Author: Chappa, Naga V. S. Raviteja, Nguyen, Pha, Dobbs, Page Daniel, and Luu, Khoa
Published: 2024
Full Text: View/download PDF

38. Contextual Explainable Video Representation: Human Perception-based Understanding

Author: Vo, Khoa, Yamazaki, Kashu, Nguyen, Phong X., Nguyen, Phat, Luu, Khoa, and Le, Ngan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Video understanding is a growing field and a subject of intense research, which includes many interesting tasks to understanding both spatial and temporal information, e.g., action detection, action recognition, video captioning, video retrieval. One of the most challenging problems in video understanding is dealing with feature extraction, i.e. extract contextual visual representation from given untrimmed video due to the long and complicated temporal structure of unconstrained videos. Different from existing approaches, which apply a pre-trained backbone network as a black-box to extract visual representation, our approach aims to extract the most contextual information with an explainable mechanism. As we observed, humans typically perceive a video through the interactions between three main factors, i.e., the actors, the relevant objects, and the surrounding environment. Therefore, it is very crucial to design a contextual explainable video representation extraction that can capture each of such factors and model the relationships between them. In this paper, we discuss approaches, that incorporate the human perception process into modeling actors, objects, and the environment. We choose video paragraph captioning and temporal action detection to illustrate the effectiveness of human perception based-contextual representation in video understanding. Source code is publicly available at https://github.com/UARK-AICV/Video_Representation., Comment: Accepted in Asilomar Conference 2022
Published: 2022

39. Neural Cell Video Synthesis via Optical-Flow Diffusion

Author: Serna-Aguilera, Manuel, Luu, Khoa, Harris, Nathaniel, and Zou, Min
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The biomedical imaging world is notorious for working with small amounts of data, frustrating state-of-the-art efforts in the computer vision and deep learning worlds. With large datasets, it is easier to make progress we have seen from the natural image distribution. It is the same with microscopy videos of neuron cells moving in a culture. This problem presents several challenges as it can be difficult to grow and maintain the culture for days, and it is expensive to acquire the materials and equipment. In this work, we explore how to alleviate this data scarcity problem by synthesizing the videos. We, therefore, take the recent work of the video diffusion model to synthesize videos of cells from our training dataset. We then analyze the model's strengths and consistent shortcomings to guide us on improving video generation to be as high-quality as possible. To improve on such a task, we propose modifying the denoising function and adding motion information (dense optical flow) so that the model has more context regarding how video frames transition over time and how each pixel changes over time., Comment: 9 pages, 2 tables, 7 figures
Published: 2022

40. CONDA: Continual Unsupervised Domain Adaptation Learning in Visual Perception for Self-Driving Cars

Author: Truong, Thanh-Dat, Helton, Pierce, Moustafa, Ahmed, Cothren, Jackson David, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Although unsupervised domain adaptation methods have achieved remarkable performance in semantic scene segmentation in visual perception for self-driving cars, these approaches remain impractical in real-world use cases. In practice, the segmentation models may encounter new data that have not been seen yet. Also, the previous data training of segmentation models may be inaccessible due to privacy problems. Therefore, to address these problems, in this work, we propose a Continual Unsupervised Domain Adaptation (CONDA) approach that allows the model to continuously learn and adapt with respect to the presence of the new data. Moreover, our proposed approach is designed without the requirement of accessing previous training data. To avoid the catastrophic forgetting problem and maintain the performance of the segmentation models, we present a novel Bijective Maximum Likelihood loss to impose the constraint of predicted segmentation distribution shifts. The experimental results on the benchmark of continual unsupervised domain adaptation have shown the advanced performance of the proposed CONDA method., Comment: Accepted to CVPRW 2024
Published: 2022

41. Multi-Camera Multi-Object Tracking on the Move via Single-Stage Global Association Approach

Author: Nguyen, Pha, Quach, Kha Gia, Duong, Chi Nhan, Phung, Son Lam, Le, Ngan, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The development of autonomous vehicles generates a tremendous demand for a low-cost solution with a complete set of camera sensors capturing the environment around the car. It is essential for object detection and tracking to address these new challenges in multi-camera settings. In order to address these challenges, this work introduces novel Single-Stage Global Association Tracking approaches to associate one or more detection from multi-cameras with tracked objects. These approaches aim to solve fragment-tracking issues caused by inconsistent 3D object detection. Moreover, our models also improve the detection accuracy of the standard vision-based 3D object detectors in the nuScenes detection challenge. The experimental results on the nuScenes dataset demonstrate the benefits of the proposed method by outperforming prior vision-based tracking methods in multi-camera settings., Comment: In review PR journal. arXiv admin note: text overlap with arXiv:2204.09151
Published: 2022

42. Vec2Face-v2: Unveil Human Faces from their Blackbox Features via Attention-based Network in Face Recognition

Author: Truong, Thanh-Dat, Duong, Chi Nhan, Le, Ngan, Savvides, Marios, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this work, we investigate the problem of face reconstruction given a facial feature representation extracted from a blackbox face recognition engine. Indeed, it is a very challenging problem in practice due to the limitations of abstracted information from the engine. We, therefore, introduce a new method named Attention-based Bijective Generative Adversarial Networks in a Distillation framework (DAB-GAN) to synthesize the faces of a subject given his/her extracted face recognition features. Given any unconstrained unseen facial features of a subject, the DAB-GAN can reconstruct his/her facial images in high definition. The DAB-GAN method includes a novel attention-based generative structure with the newly defined Bijective Metrics Learning approach. The framework starts by introducing a bijective metric so that the distance measurement and metric learning process can be directly adopted in the image domain for an image reconstruction task. The information from the blackbox face recognition engine will be optimally exploited using the global distillation process. Then an attention-based generator is presented for a highly robust generator to synthesize realistic faces with ID preservation. We have evaluated our method on the challenging face recognition databases, i.e., CelebA, LFW, CFP-FP, CP-LFW, AgeDB, CA-LFW, and consistently achieved state-of-the-art results. The advancement of DAB-GAN is also proven in both image realism and ID preservation properties., Comment: arXiv admin note: substantial text overlap with arXiv:2003.06958
Published: 2022

43. Depth Perspective-aware Multiple Object Tracking

Author: Quach, Kha Gia, Le, Huu, Nguyen, Pha, Duong, Chi Nhan, Bui, Tien Dai, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: This paper aims to tackle Multiple Object Tracking (MOT), an important problem in computer vision but remains challenging due to many practical issues, especially occlusions. Indeed, we propose a new real-time Depth Perspective-aware Multiple Object Tracking (DP-MOT) approach to tackle the occlusion problem in MOT. A simple yet efficient Subject-Ordered Depth Estimation (SODE) is first proposed to automatically order the depth positions of detected subjects in a 2D scene in an unsupervised manner. Using the output from SODE, a new Active pseudo-3D Kalman filter, a simple but effective extension of Kalman filter with dynamic control variables, is then proposed to dynamically update the movement of objects. In addition, a new high-order association approach is presented in the data association step to incorporate first-order and second-order relationships between the detected objects. The proposed approach consistently achieves state-of-the-art performance compared to recent MOT methods on standard MOT benchmarks., Comment: In review PR journal
Published: 2022

44. VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning

Author: Yamazaki, Kashu, Truong, Sang, Vo, Khoa, Kidd, Michael, Rainwater, Chase, Luu, Khoa, and Le, Ngan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we leverage the human perceiving process, that involves vision and language interaction, to generate a coherent paragraph description of untrimmed videos. We propose vision-language (VL) features consisting of two modalities, i.e., (i) vision modality to capture global visual content of the entire scene and (ii) language modality to extract scene elements description of both human and non-human objects (e.g. animals, vehicles, etc), visual and non-visual elements (e.g. relations, activities, etc). Furthermore, we propose to train our proposed VLCap under a contrastive learning VL loss. The experiments and ablation studies on ActivityNet Captions and YouCookII datasets show that our VLCap outperforms existing SOTA methods on both accuracy and diversity metrics., Comment: accepted by The 29th IEEE International Conference on Image Processing (IEEE ICIP) 2022
Published: 2022

45. Self-supervised Domain Adaptation in Crowd Counting

Author: Nguyen, Pha, Truong, Thanh-Dat, Huang, Miaoqing, Liang, Yi, Le, Ngan, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Self-training crowd counting has not been attentively explored though it is one of the important challenges in computer vision. In practice, the fully supervised methods usually require an intensive resource of manual annotation. In order to address this challenge, this work introduces a new approach to utilize existing datasets with ground truth to produce more robust predictions on unlabeled datasets, named domain adaptation, in crowd counting. While the network is trained with labeled data, samples without labels from the target domain are also added to the training process. In this process, the entropy map is computed and minimized in addition to the adversarial training process designed in parallel. Experiments on Shanghaitech, UCF_CC_50, and UCF-QNRF datasets prove a more generalized improvement of our method over the other state-of-the-arts in the cross-domain setting., Comment: Accepted at ICIP 2022
Published: 2022

46. CapsNet for medical image segmentation

Author: Tran, Minh, primary, Vo-Ho, Viet-Khoa, additional, Quinn, Kyle, additional, Nguyen, Hien, additional, Luu, Khoa, additional, and Le, Ngan, additional
Published: 2024
Full Text: View/download PDF

47. Contributors

Author: Bian, Cheng, primary, Burt, Alastair D., additional, Cao, Xiaohuan, additional, Carass, Aaron, additional, Carneiro, Gustavo, additional, Cha, Kenny H., additional, Chen, Yang, additional, Dong, Qinglin, additional, Duncan, James, additional, Dvornek, Nicha, additional, Fan, Jingfan, additional, Fu, Huazhu, additional, Gao, Yue, additional, Ge, Bao, additional, Gossmann, Alexej, additional, Han, Shuo, additional, Hayat, Munawar, additional, He, Mengshen, additional, He, Yufan, additional, Hu, Xintao, additional, Huang, Heng, additional, Huang, Qiu, additional, Jeon, Eunjin, additional, Ji, Shuyi, additional, Jiang, Xi, additional, Khan, Fahad Shahbaz, additional, Khan, Muhammad Haris, additional, Khan, Salman, additional, Ko, Wonjun, additional, Le, Ngan, additional, Lei, Jianqin, additional, Li, Lei, additional, Li, Qing, additional, Li, Xiaoxiao, additional, Li, Yuexiang, additional, Liang, Dong, additional, Liu, Dingkun, additional, Liu, Luyan, additional, Liu, Tianming, additional, Liu, Yihao, additional, Liu, Yiheng, additional, Liu, Yuyuan, additional, Luu, Khoa, additional, Ma, Kai, additional, Maicas, Gabriel, additional, Mulyadi, Ahmad Wisnu, additional, Nguyen, Hien, additional, Oh, Gyutaek, additional, Petrick, Nicholas, additional, Prince, Jerry L., additional, Qiang, Ning, additional, Quinn, Kyle, additional, Roth, Holger R., additional, Sahiner, Berkman, additional, Samala, Ravi K., additional, Shamshad, Fahad, additional, Shen, Dinggang, additional, Shin, Seon Ho, additional, Singh, Rajvinder, additional, Staib, Lawrence H., additional, Suk, Heung-Il, additional, Sun, Kaicong, additional, Tian, Yu, additional, Tran, Minh, additional, Ventola, Pamela, additional, Verjans, Johan W., additional, Vo-Ho, Viet-Khoa, additional, Wang, Ge, additional, Wang, Han, additional, Wang, Jiyao, additional, Wang, Qiyuan, additional, Wang, Sihang, additional, Wang, Xiaosong, additional, Wen, Si, additional, Wu, Fuping, additional, Wu, Zihao, additional, Xu, Daguang, additional, Xu, Steven, additional, Xu, Ziyue, additional, Xue, Peng, additional, Xue, Zhong, additional, Yang, Dong, additional, Ye, Jong Chul, additional, Yoon, Jee Seok, additional, Zamir, Syed Waqas, additional, Zhang, Lu, additional, Zhang, Wei, additional, Zhao, Jun, additional, Zhao, Lin, additional, Zhao, Shijie, additional, Zheng, Yefeng, additional, Zhou, S. Kevin, additional, Zhu, Dajiang, additional, Zhuang, Juntang, additional, Zhuang, Xiahai, additional, Zorron Cheng Tao Pu, Leonardo, additional, and Zuo, Lianrui, additional
Published: 2024
Full Text: View/download PDF

48. Two-Dimensional Quantum Material Identification via Self-Attention and Soft-labeling in Deep Learning

Author: Nguyen, Xuan Bac, Bisht, Apoorva, Thompson, Ben, Churchill, Hugh, Luu, Khoa, and Khan, Samee U.
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: In quantum machine field, detecting two-dimensional (2D) materials in Silicon chips is one of the most critical problems. Instance segmentation can be considered as a potential approach to solve this problem. However, similar to other deep learning methods, the instance segmentation requires a large scale training dataset and high quality annotation in order to achieve a considerable performance. In practice, preparing the training dataset is a challenge since annotators have to deal with a large image, e.g 2K resolution, and extremely dense objects in this problem. In this work, we present a novel method to tackle the problem of missing annotation in instance segmentation in 2D quantum material identification. We propose a new mechanism for automatically detecting false negative objects and an attention based loss strategy to reduce the negative impact of these objects contributing to the overall loss function. We experiment on the 2D material detection datasets, and the experiments show our method outperforms previous works.
Published: 2022

49. OTAdapt: Optimal Transport-based Approach For Unsupervised Domain Adaptation

Author: Truong, Thanh-Dat, Chappa, Naga Venkata Sai Raviteja, Nguyen, Xuan Bac, Le, Ngan, Dowling, Ashley, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Unsupervised domain adaptation is one of the challenging problems in computer vision. This paper presents a novel approach to unsupervised domain adaptations based on the optimal transport-based distance. Our approach allows aligning target and source domains without the requirement of meaningful metrics across domains. In addition, the proposal can associate the correct mapping between source and target domains and guarantee a constraint of topology between source and target domains. The proposed method is evaluated on different datasets in various problems, i.e. (i) digit recognition on MNIST, MNIST-M, USPS datasets, (ii) Object recognition on Amazon, Webcam, DSLR, and VisDA datasets, (iii) Insect Recognition on the IP102 dataset. The experimental results show that our proposed method consistently improves performance accuracy. Also, our framework could be incorporated with any other CNN frameworks within an end-to-end deep network design for recognition problems to improve their performance., Comment: Accepted to ICPR 2022
Published: 2022

50. Multi-Camera Multiple 3D Object Tracking on the Move for Autonomous Vehicles

Author: Nguyen, Pha, Quach, Kha Gia, Duong, Chi Nhan, Le, Ngan, Nguyen, Xuan-Bac, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The development of autonomous vehicles provides an opportunity to have a complete set of camera sensors capturing the environment around the car. Thus, it is important for object detection and tracking to address new challenges, such as achieving consistent results across views of cameras. To address these challenges, this work presents a new Global Association Graph Model with Link Prediction approach to predict existing tracklets location and link detections with tracklets via cross-attention motion modeling and appearance re-identification. This approach aims at solving issues caused by inconsistent 3D object detection. Moreover, our model exploits to improve the detection accuracy of a standard 3D object detector in the nuScenes detection challenge. The experimental results on the nuScenes dataset demonstrate the benefits of the proposed method to produce SOTA performance on the existing vision-based tracking dataset., Comment: Accepted at CVPRW 2022
Published: 2022

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

438 results on '"Luu, Khoa"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources