Author: "Yang, Zhihan" / Database: arXiv - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Yang, Zhihan"' showing total 9 results

Start Over Author "Yang, Zhihan" Database arXiv

9 results on '"Yang, Zhihan"'

1. CMViM: Contrastive Masked Vim Autoencoder for 3D Multi-modal Representation Learning for AD classification

Author: Yang, Guangqian, Du, Kangrui, Yang, Zhihan, Du, Ye, Zheng, Yongping, and Wang, Shujun
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Alzheimer's disease (AD) is an incurable neurodegenerative condition leading to cognitive and functional deterioration. Given the lack of a cure, prompt and precise AD diagnosis is vital, a complex process dependent on multiple factors and multi-modal data. While successful efforts have been made to integrate multi-modal representation learning into medical datasets, scant attention has been given to 3D medical images. In this paper, we propose Contrastive Masked Vim Autoencoder (CMViM), the first efficient representation learning method tailored for 3D multi-modal data. Our proposed framework is built on a masked Vim autoencoder to learn a unified multi-modal representation and long-dependencies contained in 3D medical images. We also introduce an intra-modal contrastive learning module to enhance the capability of the multi-modal Vim encoder for modeling the discriminative features in the same modality, and an inter-modal contrastive learning module to alleviate misaligned representation among modalities. Our framework consists of two main steps: 1) incorporate the Vision Mamba (Vim) into the mask autoencoder to reconstruct 3D masked multi-modal data efficiently. 2) align the multi-modal representations with contrastive learning mechanisms from both intra-modal and inter-modal aspects. Our framework is pre-trained and validated ADNI2 dataset and validated on the downstream task for AD classification. The proposed CMViM yields 2.7\% AUC performance improvement compared with other state-of-the-art methods., Comment: 11 pages, 1 figure
Published: 2024

2. PMP-Swin: Multi-Scale Patch Message Passing Swin Transformer for Retinal Disease Classification

Author: Yang, Zhihan, Cheng, Zhiming, Weng, Tengjin, He, Shucheng, Wang, Yaqi, Ye, Xin, and Wang, Shuai
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Retinal disease is one of the primary causes of visual impairment, and early diagnosis is essential for preventing further deterioration. Nowadays, many works have explored Transformers for diagnosing diseases due to their strong visual representation capabilities. However, retinal diseases exhibit milder forms and often present with overlapping signs, which pose great difficulties for accurate multi-class classification. Therefore, we propose a new framework named Multi-Scale Patch Message Passing Swin Transformer for multi-class retinal disease classification. Specifically, we design a Patch Message Passing (PMP) module based on the Message Passing mechanism to establish global interaction for pathological semantic features and to exploit the subtle differences further between different diseases. Moreover, considering the various scale of pathological features we integrate multiple PMP modules for different patch sizes. For evaluation, we have constructed a new dataset, named OPTOS dataset, consisting of 1,033 high-resolution fundus images photographed by Optos camera and conducted comprehensive experiments to validate the efficacy of our proposed method. And the results on both the public dataset and our dataset demonstrate that our method achieves remarkable performance compared to state-of-the-art methods., Comment: 9 pages, 7 figures
Published: 2023

3. DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification

Author: Wang, Yuanyuan, Zhang, Yang, Wu, Zhiyong, Yang, Zhihan, Wei, Tao, Zou, Kun, and Meng, Helen
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Artificial Intelligence
Abstract: Data augmentation is vital to the generalization ability and robustness of deep neural networks (DNNs) models. Existing augmentation methods for speaker verification manipulate the raw signal, which are time-consuming and the augmented samples lack diversity. In this paper, we present a novel difficulty-aware semantic augmentation (DASA) approach for speaker verification, which can generate diversified training samples in speaker embedding space with negligible extra computing cost. Firstly, we augment training samples by perturbing speaker embeddings along semantic directions, which are obtained from speaker-wise covariance matrices. Secondly, accurate covariance matrices are estimated from robust speaker embeddings during training, so we introduce difficultyaware additive margin softmax (DAAM-Softmax) to obtain optimal speaker embeddings. Finally, we assume the number of augmented samples goes to infinity and derive a closed-form upper bound of the expected loss with DASA, which achieves compatibility and efficiency. Extensive experiments demonstrate the proposed approach can achieve a remarkable performance improvement. The best result achieves a 14.6% relative reduction in EER metric on CN-Celeb evaluation set., Comment: Accepted by ICASSP 2023
Published: 2023

4. Hierarchical Reinforcement Learning under Mixed Observability

Author: Nguyen, Hai, Yang, Zhihan, Baisero, Andrea, Ma, Xiao, Platt, Robert, and Amato, Christopher
Subjects: Computer Science - Robotics
Abstract: The framework of mixed observable Markov decision processes (MOMDP) models many robotic domains in which some state variables are fully observable while others are not. In this work, we identify a significant subclass of MOMDPs defined by how actions influence the fully observable components of the state and how those, in turn, influence the partially observable components and the rewards. This unique property allows for a two-level hierarchical approach we call HIerarchical Reinforcement Learning under Mixed Observability (HILMO), which restricts partial observability to the top level while the bottom level remains fully observable, enabling higher learning efficiency. The top level produces desired goals to be reached by the bottom level until the task is solved. We further develop theoretical guarantees to show that our approach can achieve optimal and quasi-optimal behavior under mild assumptions. Empirical results on long-horizon continuous control tasks demonstrate the efficacy and efficiency of our approach in terms of improved success rate, sample efficiency, and wall-clock training time. We also deploy policies learned in simulation on a real robot., Comment: Accepted at the 15th International Workshop on the Algorithmic Foundations of Robotics (WAFR) 2022, University of Maryland, College Park. The first two authors contributed equally
Published: 2022

5. Recurrent Off-policy Baselines for Memory-based Continuous Control

Author: Yang, Zhihan and Nguyen, Hai
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Robotics
Abstract: When the environment is partially observable (PO), a deep reinforcement learning (RL) agent must learn a suitable temporal representation of the entire history in addition to a strategy to control. This problem is not novel, and there have been model-free and model-based algorithms proposed for this problem. However, inspired by recent success in model-free image-based RL, we noticed the absence of a model-free baseline for history-based RL that (1) uses full history and (2) incorporates recent advances in off-policy continuous control. Therefore, we implement recurrent versions of DDPG, TD3, and SAC (RDPG, RTD3, and RSAC) in this work, evaluate them on short-term and long-term PO domains, and investigate key design choices. Our experiments show that RDPG and RTD3 can surprisingly fail on some domains and that RSAC is the most reliable, reaching near-optimal performance on nearly all domains. However, one task that requires systematic exploration still proved to be difficult, even for RSAC. These results show that model-free RL can learn good temporal representation using only reward signals; the primary difficulty seems to be computational cost and exploration. To facilitate future research, we have made our PyTorch implementation publicly available at https://github.com/zhihanyang2022/off-policy-continuous-control.
Published: 2021

6. PTeacher: a Computer-Aided Personalized Pronunciation Training System with Exaggerated Audio-Visual Corrective Feedback

Author: Bu, Yaohua, Ma, Tianyi, Li, Weijun, Zhou, Hang, Jia, Jia, Chen, Shengqi, Xu, Kaiyuan, Shi, Dachuan, Wu, Haozhe, Yang, Zhihan, Li, Kun, Wu, Zhiyong, Shi, Yuanchun, Lu, Xiaobo, and Liu, Ziwei
Subjects: Computer Science - Human-Computer Interaction
Abstract: Second language (L2) English learners often find it difficult to improve their pronunciations due to the lack of expressive and personalized corrective feedback. In this paper, we present Pronunciation Teacher (PTeacher), a Computer-Aided Pronunciation Training (CAPT) system that provides personalized exaggerated audio-visual corrective feedback for mispronunciations. Though the effectiveness of exaggerated feedback has been demonstrated, it is still unclear how to define the appropriate degrees of exaggeration when interacting with individual learners. To fill in this gap, we interview 100 L2 English learners and 22 professional native teachers to understand their needs and experiences. Three critical metrics are proposed for both learners and teachers to identify the best exaggeration levels in both audio and visual modalities. Additionally, we incorporate the personalized dynamic feedback mechanism given the English proficiency of learners. Based on the obtained insights, a comprehensive interactive pronunciation training course is designed to help L2 learners rectify mispronunciations in a more perceptible, understandable, and discriminative manner. Extensive user studies demonstrate that our system significantly promotes the learners' learning efficiency.
Published: 2021

7. Conditional Level Generation and Game Blending

Author: Sarkar, Anurag, Yang, Zhihan, and Cooper, Seth
Subjects: Computer Science - Machine Learning
Abstract: Prior research has shown variational autoencoders (VAEs) to be useful for generating and blending game levels by learning latent representations of existing level data. We build on such models by exploring the level design affordances and applications enabled by conditional VAEs (CVAEs). CVAEs augment VAEs by allowing them to be trained using labeled data, thus enabling outputs to be generated conditioned on some input. We studied how increased control in the level generation process and the ability to produce desired outputs via training on labeled game level data could build on prior PCGML methods. Through our results of training CVAEs on levels from Super Mario Bros., Kid Icarus and Mega Man, we show that such models can assist in level design by generating levels with desired level elements and patterns as well as producing blended levels with desired combinations of games., Comment: 6 pages, 8 figures, Experimental AI in Games Workshop at AIIDE 2020
Published: 2020

8. Game Level Clustering and Generation using Gaussian Mixture VAEs

Author: Yang, Zhihan, Sarkar, Anurag, and Cooper, Seth
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Variational autoencoders (VAEs) have been shown to be able to generate game levels but require manual exploration of the learned latent space to generate outputs with desired attributes. While conditional VAEs address this by allowing generation to be conditioned on labels, such labels have to be provided during training and thus require prior knowledge which may not always be available. In this paper, we apply Gaussian Mixture VAEs (GMVAEs), a variant of the VAE which imposes a mixture of Gaussians (GM) on the latent space, unlike regular VAEs which impose a unimodal Gaussian. This allows GMVAEs to cluster levels in an unsupervised manner using the components of the GM and then generate new levels using the learned components. We demonstrate our approach with levels from Super Mario Bros., Kid Icarus and Mega Man. Our results show that the learned components discover and cluster level structures and patterns and can be used to generate levels with desired characteristics., Comment: 6 pages, 5 figures, 16th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE 2020)
Published: 2020

9. Controllable Level Blending between Games using Variational Autoencoders

Author: Sarkar, Anurag, Yang, Zhihan, and Cooper, Seth
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Previous work explored blending levels from existing games to create levels for a new game that mixes properties of the original games. In this paper, we use Variational Autoencoders (VAEs) for improving upon such techniques. VAEs are artificial neural networks that learn and use latent representations of datasets to generate novel outputs. We train a VAE on level data from Super Mario Bros. and Kid Icarus, enabling it to capture the latent space spanning both games. We then use this space to generate level segments that combine properties of levels from both games. Moreover, by applying evolutionary search in the latent space, we evolve level segments satisfying specific constraints. We argue that these affordances make the VAE-based approach especially suitable for co-creative level design and compare its performance with similar generative models like the GAN and the VAE-GAN., Comment: 6 pages, 11 figures, Sixth Experimental AI in Games Workshop at AIIDE
Published: 2020

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

9 results on '"Yang, Zhihan"'

1. CMViM: Contrastive Masked Vim Autoencoder for 3D Multi-modal Representation Learning for AD classification

2. PMP-Swin: Multi-Scale Patch Message Passing Swin Transformer for Retinal Disease Classification

3. DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification

4. Hierarchical Reinforcement Learning under Mixed Observability

5. Recurrent Off-policy Baselines for Memory-based Continuous Control

6. PTeacher: a Computer-Aided Personalized Pronunciation Training System with Exaggerated Audio-Visual Corrective Feedback

7. Conditional Level Generation and Game Blending

8. Game Level Clustering and Generation using Gaussian Mixture VAEs

9. Controllable Level Blending between Games using Variational Autoencoders

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Publication Type

Database

9 results on '"Yang, Zhihan"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources