Author: "Deng, Chao" / Publication Type: Reports - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Deng, Chao"' showing total 57 results

Start Over Author "Deng, Chao" Publication Type Reports

57 results on '"Deng, Chao"'

1. SafeDialBench: A Fine-Grained Safety Benchmark for Large Language Models in Multi-Turn Dialogues with Diverse Jailbreak Attacks

Author: Cao, Hongye, Wang, Yanming, Jing, Sijia, Peng, Ziyue, Bai, Zhixin, Cao, Zhe, Fang, Meng, Feng, Fan, Wang, Boyan, Liu, Jiaheng, Yang, Tianpei, Huo, Jing, Gao, Yang, Meng, Fanyu, Yang, Xi, Deng, Chao, and Feng, Junlan
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: With the rapid advancement of Large Language Models (LLMs), the safety of LLMs has been a critical concern requiring precise assessment. Current benchmarks primarily concentrate on single-turn dialogues or a single jailbreak attack method to assess the safety. Additionally, these benchmarks have not taken into account the LLM's capability of identifying and handling unsafe information in detail. To address these issues, we propose a fine-grained benchmark SafeDialBench for evaluating the safety of LLMs across various jailbreak attacks in multi-turn dialogues. Specifically, we design a two-tier hierarchical safety taxonomy that considers 6 safety dimensions and generates more than 4000 multi-turn dialogues in both Chinese and English under 22 dialogue scenarios. We employ 7 jailbreak attack strategies, such as reference attack and purpose reverse, to enhance the dataset quality for dialogue generation. Notably, we construct an innovative assessment framework of LLMs, measuring capabilities in detecting, and handling unsafe information and maintaining consistency when facing jailbreak attacks. Experimental results across 17 LLMs reveal that Yi-34B-Chat and GLM4-9B-Chat demonstrate superior safety performance, while Llama3.1-8B-Instruct and o3-mini exhibit safety vulnerabilities.
Published: 2025

2. FatesGS: Fast and Accurate Sparse-View Surface Reconstruction using Gaussian Splatting with Depth-Feature Consistency

Author: Huang, Han, Wu, Yulun, Deng, Chao, Gao, Ge, Gu, Ming, and Liu, Yu-Shen
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recently, Gaussian Splatting has sparked a new trend in the field of computer vision. Apart from novel view synthesis, it has also been extended to the area of multi-view reconstruction. The latest methods facilitate complete, detailed surface reconstruction while ensuring fast training speed. However, these methods still require dense input views, and their output quality significantly degrades with sparse views. We observed that the Gaussian primitives tend to overfit the few training views, leading to noisy floaters and incomplete reconstruction surfaces. In this paper, we present an innovative sparse-view reconstruction framework that leverages intra-view depth and multi-view feature consistency to achieve remarkably accurate surface reconstruction. Specifically, we utilize monocular depth ranking information to supervise the consistency of depth distribution within patches and employ a smoothness loss to enhance the continuity of the distribution. To achieve finer surface reconstruction, we optimize the absolute position of depth through multi-view projection features. Extensive experiments on DTU and BlendedMVS demonstrate that our method outperforms state-of-the-art methods with a speedup of 60x to 200x, achieving swift and fine-grained mesh reconstruction without the need for costly pre-training., Comment: Accepted by AAAI 2025. Project page: https://alvin528.github.io/FatesGS/
Published: 2025

3. Sparis: Neural Implicit Surface Reconstruction of Indoor Scenes from Sparse Views

Author: Wu, Yulun, Huang, Han, Zhang, Wenyuan, Deng, Chao, Gao, Ge, Gu, Ming, and Liu, Yu-Shen
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In recent years, reconstructing indoor scene geometry from multi-view images has achieved encouraging accomplishments. Current methods incorporate monocular priors into neural implicit surface models to achieve high-quality reconstructions. However, these methods require hundreds of images for scene reconstruction. When only a limited number of views are available as input, the performance of monocular priors deteriorates due to scale ambiguity, leading to the collapse of the reconstructed scene geometry. In this paper, we propose a new method, named Sparis, for indoor surface reconstruction from sparse views. Specifically, we investigate the impact of monocular priors on sparse scene reconstruction, introducing a novel prior based on inter-image matching information. Our prior offers more accurate depth information while ensuring cross-view matching consistency. Additionally, we employ an angular filter strategy and an epipolar matching weight function, aiming to reduce errors due to view matching inaccuracies, thereby refining the inter-image prior for improved reconstruction accuracy. The experiments conducted on widely used benchmarks demonstrate superior performance in sparse-view scene reconstruction., Comment: Accepted by AAAI 2025. Project page: https://yulunwu0108.github.io/Sparis/
Published: 2025

4. LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and Locating

Author: Deng, Chao, Yuan, Jiale, Bu, Pi, Wang, Peijie, Li, Zhong-Zhi, Xu, Jian, Li, Xiao-Hui, Gao, Yuan, Song, Jun, Zheng, Bo, and Liu, Cheng-Lin
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Large vision language models (LVLMs) have improved the document understanding capabilities remarkably, enabling the handling of complex document elements, longer contexts, and a wider range of tasks. However, existing document understanding benchmarks have been limited to handling only a small number of pages and fail to provide a comprehensive analysis of layout elements locating. In this paper, we first define three primary task categories: Long Document Understanding, numerical Reasoning, and cross-element Locating, and then propose a comprehensive benchmark, LongDocURL, integrating above three primary tasks and comprising 20 sub-tasks categorized based on different primary tasks and answer evidences. Furthermore, we develop a semi-automated construction pipeline and collect 2,325 high-quality question-answering pairs, covering more than 33,000 pages of documents, significantly outperforming existing benchmarks. Subsequently, we conduct comprehensive evaluation experiments on both open-source and closed-source models across 26 different configurations, revealing critical performance gaps in this field.
Published: 2024

5. MacLight: Multi-scene Aggregation Convolutional Learning for Traffic Signal Control

Author: Lee, Sunbowen, Lyu, Hongqin, Gong, Yicheng, Sun, Yingying, and Deng, Chao
Subjects: Computer Science - Multiagent Systems, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Reinforcement learning methods have proposed promising traffic signal control policy that can be trained on large road networks. Current SOTA methods model road networks as topological graph structures, incorporate graph attention into deep Q-learning, and merge local and global embeddings to improve policy. However, graph-based methods are difficult to parallelize, resulting in huge time overhead. Moreover, none of the current peer studies have deployed dynamic traffic systems for experiments, which is far from the actual situation. In this context, we propose Multi-Scene Aggregation Convolutional Learning for traffic signal control (MacLight), which offers faster training speeds and more stable performance. Our approach consists of two main components. The first is the global representation, where we utilize variational autoencoders to compactly compress and extract the global representation. The second component employs the proximal policy optimization algorithm as the backbone, allowing value evaluation to consider both local features and global embedding representations. This backbone model significantly reduces time overhead and ensures stability in policy updates. We validated our method across multiple traffic scenarios under both static and dynamic traffic systems. Experimental results demonstrate that, compared to general and domian SOTA methods, our approach achieves superior stability, optimized convergence levels and the highest time efficiency. The code is under https://github.com/Aegis1863/MacLight., Comment: Accepted as full paper by AAMAS2025
Published: 2024

6. Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video Recognition

Author: Wang, Yulin, Zhang, Haoji, Yue, Yang, Song, Shiji, Deng, Chao, Feng, Junlan, and Huang, Gao
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: This paper presents a comprehensive exploration of the phenomenon of data redundancy in video understanding, with the aim to improve computational efficiency. Our investigation commences with an examination of spatial redundancy, which refers to the observation that the most informative region in each video frame usually corresponds to a small image patch, whose shape, size and location shift smoothly across frames. Motivated by this phenomenon, we formulate the patch localization problem as a dynamic decision task, and introduce a spatially adaptive video recognition approach, termed AdaFocus. In specific, a lightweight encoder is first employed to quickly process the full video sequence, whose features are then utilized by a policy network to identify the most task-relevant regions. Subsequently, the selected patches are inferred by a high-capacity deep network for the final prediction. The full model can be trained in end-to-end conveniently. Furthermore, AdaFocus can be extended by further considering temporal and sample-wise redundancies, i.e., allocating the majority of computation to the most task-relevant frames, and minimizing the computation spent on relatively "easier" videos. Our resulting approach, Uni-AdaFocus, establishes a comprehensive framework that seamlessly integrates spatial, temporal, and sample-wise dynamic computation, while it preserves the merits of AdaFocus in terms of efficient end-to-end training and hardware friendliness. In addition, Uni-AdaFocus is general and flexible as it is compatible with off-the-shelf efficient backbones (e.g., TSM and X3D), which can be readily deployed as our feature extractor, yielding a significantly improved computational efficiency. Empirically, extensive experiments based on seven benchmark datasets and three application scenarios substantiate that Uni-AdaFocus is considerably more efficient than the competitive baselines., Comment: Accepted by IEEE TPAMI. Journal version of arXiv:2105.03245 (AdaFocusV1, ICCV 2021 Oral), arXiv:2112.14238 (AdaFocusV2, CVPR 2022), and arXiv:2209.13465 (AdaFocusV3, ECCV 2022). Code and pre-trained models: https://github.com/LeapLabTHU/Uni-AdaFocus
Published: 2024

7. Automatic Database Configuration Debugging using Retrieval-Augmented Language Models

Author: Chen, Sibei, Fan, Ju, Wu, Bin, Tang, Nan, Deng, Chao, Wang, Pengyi, Li, Ye, Tan, Jian, Li, Feifei, Zhou, Jingren, and Du, Xiaoyong
Subjects: Computer Science - Databases
Abstract: Database management system (DBMS) configuration debugging, e.g., diagnosing poorly configured DBMS knobs and generating troubleshooting recommendations, is crucial in optimizing DBMS performance. However, the configuration debugging process is tedious and, sometimes challenging, even for seasoned database administrators (DBAs) with sufficient experience in DBMS configurations and good understandings of the DBMS internals (e.g., MySQL or Oracle). To address this difficulty, we propose Andromeda, a framework that utilizes large language models (LLMs) to enable automatic DBMS configuration debugging. Andromeda serves as a natural surrogate of DBAs to answer a wide range of natural language (NL) questions on DBMS configuration issues, and to generate diagnostic suggestions to fix these issues. Nevertheless, directly prompting LLMs with these professional questions may result in overly generic and often unsatisfying answers. To this end, we propose a retrieval-augmented generation (RAG) strategy that effectively provides matched domain-specific contexts for the question from multiple sources. They come from related historical questions, troubleshooting manuals and DBMS telemetries, which significantly improve the performance of configuration debugging. To support the RAG strategy, we develop a document retrieval mechanism addressing heterogeneous documents and design an effective method for telemetry analysis. Extensive experiments on real-world DBMS configuration debugging datasets show that Andromeda significantly outperforms existing solutions.
Published: 2024

8. MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing

Author: Zhou, Hao, Wang, Zhijun, Huang, Shujian, Huang, Xin, Han, Xue, Feng, Junlan, Deng, Chao, Luo, Weihua, and Chen, Jiajun
Subjects: Computer Science - Computation and Language
Abstract: Large Language Models (LLMs) are often English-centric due to the disproportionate distribution of languages in their pre-training data. Enhancing non-English language capabilities through post-pretraining often results in catastrophic forgetting of the ability of original languages. Previous methods either achieve good expansion with severe forgetting or slight forgetting with poor expansion, indicating the challenge of balancing language expansion while preventing forgetting. In this paper, we propose a method called MoE-LPR (Mixture-of-Experts with Language Priors Routing) to alleviate this problem. MoE-LPR employs a two-stage training approach to enhance the multilingual capability. First, the model is post-pretrained into a Mixture-of-Experts (MoE) architecture by upcycling, where all the original parameters are frozen and new experts are added. In this stage, we focus improving the ability on expanded languages, without using any original language data. Then, the model reviews the knowledge of the original languages with replay data amounting to less than 1% of post-pretraining, where we incorporate language priors routing to better recover the abilities of the original languages. Evaluations on multiple benchmarks show that MoE-LPR outperforms other post-pretraining methods. Freezing original parameters preserves original language knowledge while adding new experts preserves the learning ability. Reviewing with LPR enables effective utilization of multilingual knowledge within the parameters. Additionally, the MoE architecture maintains the same inference overhead while increasing total model parameters. Extensive experiments demonstrate MoE-LPR's effectiveness in improving expanded languages and preserving original language proficiency with superior scalability. Code and scripts are freely available at https://github.com/zjwang21/MoE-LPR.git.
Published: 2024

9. Large Language Models Are Cross-Lingual Knowledge-Free Reasoners

Author: Hu, Peng, Liu, Sizhe, Gao, Changjiang, Huang, Xin, Han, Xue, Feng, Junlan, Deng, Chao, and Huang, Shujian
Subjects: Computer Science - Computation and Language
Abstract: Large Language Models have demonstrated impressive reasoning capabilities across multiple languages. However, the relationship between capabilities in different languages is less explored. In this work, we decompose the process of reasoning tasks into two separated components: knowledge retrieval and knowledge-free reasoning, and analyze the relationship between cross-lingual transferability and these two components. With adapted commonsense reasoning datasets and constructed knowledge-free reasoning datasets, we show that the knowledge-free reasoning capability can be nearly perfectly transferred across various source-target language directions despite the secondary impact of resource in some specific target languages, while cross-lingual knowledge retrieval significantly hinders the transfer. Moreover, by analyzing the hidden states and feed-forward network neuron activation during the reasoning, we show that higher similarity of hidden representations and larger overlap of activated neurons could explain the better cross-lingual transferability of knowledge-free reasoning than knowledge retrieval. Thus, we hypothesize that knowledge-free reasoning shares similar neurons in different languages for reasoning, while knowledge is stored separately in different languages. Our code and data is available at: https://github.com/NJUNLP/Knowledge-Free-Reasoning.
Published: 2024

10. GenDistiller: Distilling Pre-trained Language Models based on an Autoregressive Generative Model

Author: Gao, Yingying, Zhang, Shilei, Deng, Chao, and Feng, Junlan
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Computation and Language, Computer Science - Sound
Abstract: Pre-trained speech language models such as HuBERT and WavLM leverage unlabeled speech data for self-supervised learning and offer powerful representations for numerous downstream tasks. Despite the success of these models, their high requirements for memory and computing resource hinder their application on resource restricted devices. Therefore, this paper introduces GenDistiller, a novel knowledge distillation framework which generates the hidden representations of the pre-trained teacher model directly by a much smaller student network. The proposed method takes the previous hidden layer as history and implements a layer-by-layer prediction of the teacher model autoregressively. Experiments on SUPERB reveal the advantage of GenDistiller over the baseline distilling method without an autoregressive framework, with 33% fewer parameters, similar time consumption and better performance on most of the SUPERB tasks. Ultimately, the proposed GenDistiller reduces the size of WavLM by 82%., Comment: arXiv admin note: text overlap with arXiv:2310.13418
Published: 2024

11. PolySpeech: Exploring Unified Multitask Speech Models for Competitiveness with Single-task Models

Author: Yang, Runyan, Yang, Huibao, Zhang, Xiqing, Ye, Tiantian, Liu, Ying, Gao, Yingying, Zhang, Shilei, Deng, Chao, and Feng, Junlan
Subjects: Computer Science - Computation and Language, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Recently, there have been attempts to integrate various speech processing tasks into a unified model. However, few previous works directly demonstrated that joint optimization of diverse tasks in multitask speech models has positive influence on the performance of individual tasks. In this paper we present a multitask speech model -- PolySpeech, which supports speech recognition, speech synthesis, and two speech classification tasks. PolySpeech takes multi-modal language model as its core structure and uses semantic representations as speech inputs. We introduce semantic speech embedding tokenization and speech reconstruction methods to PolySpeech, enabling efficient generation of high-quality speech for any given speaker. PolySpeech shows competitiveness across various tasks compared to single-task models. In our experiments, multitask optimization achieves performance comparable to single-task optimization and is especially beneficial for specific tasks., Comment: 5 pages, 2 figures
Published: 2024

12. Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners

Author: Zhang, Shimao, Gao, Changjiang, Zhu, Wenhao, Chen, Jiajun, Huang, Xin, Han, Xue, Feng, Junlan, Deng, Chao, and Huang, Shujian
Subjects: Computer Science - Computation and Language
Abstract: Recently, Large Language Models (LLMs) have shown impressive language capabilities. While most of the existing LLMs have very unbalanced performance across different languages, multilingual alignment based on translation parallel data is an effective method to enhance the LLMs' multilingual capabilities. In this work, we discover and comprehensively investigate the spontaneous multilingual alignment improvement of LLMs. We find that LLMs instruction-tuned on the question translation data (i.e. without annotated answers) are able to encourage the alignment between English and a wide range of languages, even including those unseen during instruction-tuning. Additionally, we utilize different settings and mechanistic interpretability methods to analyze the LLM's performance in the multilingual scenario comprehensively. Our work suggests that LLMs have enormous potential for improving multilingual alignment efficiently with great language and task generalization.
Published: 2024

13. InjectTST: A Transformer Method of Injecting Global Information into Independent Channels for Long Time Series Forecasting

Author: Chi, Ce, Wang, Xing, Yang, Kexin, Song, Zhiyan, Jin, Di, Zhu, Lin, Deng, Chao, and Feng, Junlan
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Transformer has become one of the most popular architectures for multivariate time series (MTS) forecasting. Recent Transformer-based MTS models generally prefer channel-independent structures with the observation that channel independence can alleviate noise and distribution drift issues, leading to more robustness. Nevertheless, it is essential to note that channel dependency remains an inherent characteristic of MTS, carrying valuable information. Designing a model that incorporates merits of both channel-independent and channel-mixing structures is a key to further improvement of MTS forecasting, which poses a challenging conundrum. To address the problem, an injection method for global information into channel-independent Transformer, InjectTST, is proposed in this paper. Instead of designing a channel-mixing model directly, we retain the channel-independent backbone and gradually inject global information into individual channels in a selective way. A channel identifier, a global mixing module and a self-contextual attention module are devised in InjectTST. The channel identifier can help Transformer distinguish channels for better representation. The global mixing module produces cross-channel global information. Through the self-contextual attention module, the independent channels can selectively concentrate on useful global information without robustness degradation, and channel mixing is achieved implicitly. Experiments indicate that InjectTST can achieve stable improvement compared with state-of-the-art models.
Published: 2024

14. Plugin Speech Enhancement: A Universal Speech Enhancement Framework Inspired by Dynamic Neural Network

Author: Chen, Yanan, Cui, Zihao, Gao, Yingying, Feng, Junlan, Deng, Chao, and Zhang, Shilei
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: The expectation to deploy a universal neural network for speech enhancement, with the aim of improving noise robustness across diverse speech processing tasks, faces challenges due to the existing lack of awareness within static speech enhancement frameworks regarding the expected speech in downstream modules. These limitations impede the effectiveness of static speech enhancement approaches in achieving optimal performance for a range of speech processing tasks, thereby challenging the notion of universal applicability. The fundamental issue in achieving universal speech enhancement lies in effectively informing the speech enhancement module about the features of downstream modules. In this study, we present a novel weighting prediction approach, which explicitly learns the task relationships from downstream training information to address the core challenge of universal speech enhancement. We found the role of deciding whether to employ data augmentation techniques as crucial downstream training information. This decision significantly impacts the expected speech and the performance of the speech enhancement module. Moreover, we introduce a novel speech enhancement network, the Plugin Speech Enhancement (Plugin-SE). The Plugin-SE is a dynamic neural network that includes the speech enhancement module, gate module, and weight prediction module. Experimental results demonstrate that the proposed Plugin-SE approach is competitive or superior to other joint training methods across various downstream tasks.
Published: 2024

15. Depth Map Denoising Network and Lightweight Fusion Network for Enhanced 3D Face Recognition

Author: Xu, Ruizhuo, Wang, Ke, Deng, Chao, Wang, Mei, Chen, Xi, Huang, Wenhui, Feng, Junlan, and Deng, Weihong
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: With the increasing availability of consumer depth sensors, 3D face recognition (FR) has attracted more and more attention. However, the data acquired by these sensors are often coarse and noisy, making them impractical to use directly. In this paper, we introduce an innovative Depth map denoising network (DMDNet) based on the Denoising Implicit Image Function (DIIF) to reduce noise and enhance the quality of facial depth images for low-quality 3D FR. After generating clean depth faces using DMDNet, we further design a powerful recognition network called Lightweight Depth and Normal Fusion network (LDNFNet), which incorporates a multi-branch fusion block to learn unique and complementary features between different modalities such as depth and normal images. Comprehensive experiments conducted on four distinct low-quality databases demonstrate the effectiveness and robustness of our proposed methods. Furthermore, when combining DMDNet and LDNFNet, we achieve state-of-the-art results on the Lock3DFace database., Comment: Accepted by Pattern Recognition
Published: 2024

16. Collaborative Word-based Pre-trained Item Representation for Transferable Recommendation

Author: Yang, Shenghao, Wang, Chenyang, Liu, Yankai, Xu, Kangping, Ma, Weizhi, Liu, Yiqun, Zhang, Min, Zeng, Haitao, Feng, Junlan, and Deng, Chao
Subjects: Computer Science - Information Retrieval
Abstract: Item representation learning (IRL) plays an essential role in recommender systems, especially for sequential recommendation. Traditional sequential recommendation models usually utilize ID embeddings to represent items, which are not shared across different domains and lack the transferable ability. Recent studies use pre-trained language models (PLM) for item text embeddings (text-based IRL) that are universally applicable across domains. However, the existing text-based IRL is unaware of the important collaborative filtering (CF) information. In this paper, we propose CoWPiRec, an approach of Collaborative Word-based Pre-trained item representation for Recommendation. To effectively incorporate CF information into text-based IRL, we convert the item-level interaction data to a word graph containing word-level collaborations. Subsequently, we design a novel pre-training task to align the word-level semantic- and CF-related item representation. Extensive experimental results on multiple public datasets demonstrate that compared to state-of-the-art transferable sequential recommenders, CoWPiRec achieves significantly better performances in both fine-tuning and zero-shot settings for cross-scenario recommendation and effectively alleviates the cold-start issue. The code is available at: https://github.com/ysh-1998/CoWPiRec., Comment: Accepted by ICDM 2023
Published: 2023

17. Cascaded Multi-task Adaptive Learning Based on Neural Architecture Search

Author: Gao, Yingying, Zhang, Shilei, Cui, Zihao, Deng, Chao, and Feng, Junlan
Subjects: Computer Science - Machine Learning, Electrical Engineering and Systems Science - Audio and Speech Processing, Electrical Engineering and Systems Science - Signal Processing
Abstract: Cascading multiple pre-trained models is an effective way to compose an end-to-end system. However, fine-tuning the full cascaded model is parameter and memory inefficient and our observations reveal that only applying adapter modules on cascaded model can not achieve considerable performance as fine-tuning. We propose an automatic and effective adaptive learning method to optimize end-to-end cascaded multi-task models based on Neural Architecture Search (NAS) framework. The candidate adaptive operations on each specific module consist of frozen, inserting an adapter and fine-tuning. We further add a penalty item on the loss to limit the learned structure which takes the amount of trainable parameters into account. The penalty item successfully restrict the searched architecture and the proposed approach is able to search similar tuning scheme with hand-craft, compressing the optimizing parameters to 8.7% corresponding to full fine-tuning on SLURP with an even better performance.
Published: 2023

18. GenDistiller: Distilling Pre-trained Language Models based on Generative Models

Author: Gao, Yingying, Zhang, Shilei, Cui, Zihao, Xu, Yanhan, Deng, Chao, and Feng, Junlan
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Electrical Engineering and Systems Science - Signal Processing
Abstract: Self-supervised pre-trained models such as HuBERT and WavLM leverage unlabeled speech data for representation learning and offer significantly improve for numerous downstream tasks. Despite the success of these methods, their large memory and strong computational requirements hinder their application on resource restricted devices. Therefore, this paper introduces GenDistiller, a novel knowledge distillation framework to distill hidden representations from teacher network based on generative language model. The generative structure enables the proposed model to generate the target teacher hidden layers autoregressively, considering the interactions between hidden layers without instroducing additional inputs. A two-dimensional attention mechanism is implemented to ensure the causality of hidden layers, while preserving bidirectional attention in the time dimension. Experiments reveal the advantage of the generative distiller over the baseline system that predicts the hidden layers of teacher network directly without a generatvie model.
Published: 2023

19. Fine-grained Recognition with Learnable Semantic Data Augmentation

Author: Pu, Yifan, Han, Yizeng, Wang, Yulin, Feng, Junlan, Deng, Chao, and Huang, Gao
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Fine-grained image recognition is a longstanding computer vision challenge that focuses on differentiating objects belonging to multiple subordinate categories within the same meta-category. Since images belonging to the same meta-category usually share similar visual appearances, mining discriminative visual cues is the key to distinguishing fine-grained categories. Although commonly used image-level data augmentation techniques have achieved great success in generic image classification problems, they are rarely applied in fine-grained scenarios, because their random editing-region behavior is prone to destroy the discriminative visual cues residing in the subtle regions. In this paper, we propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem. Specifically, we produce diversified augmented samples by translating image features along semantically meaningful directions. The semantic directions are estimated with a covariance prediction network, which predicts a sample-wise covariance matrix to adapt to the large intra-class variation inherent in fine-grained images. Furthermore, the covariance prediction network is jointly optimized with the classification network in a meta-learning manner to alleviate the degenerate solution problem. Experiments on four competitive fine-grained recognition benchmarks (CUB-200-2011, Stanford Cars, FGVC Aircrafts, NABirds) demonstrate that our method significantly improves the generalization performance on several popular classification networks (e.g., ResNets, DenseNets, EfficientNets, RegNets and ViT). Combined with a recently proposed method, our semantic data augmentation approach achieves state-of-the-art performance on the CUB-200-2011 dataset. The source code will be released.
Published: 2023

20. SwinFace: A Multi-task Transformer for Face Recognition, Expression Recognition, Age Estimation and Attribute Estimation

Author: Qin, Lixiong, Wang, Mei, Deng, Chao, Wang, Ke, Chen, Xi, Hu, Jiani, and Deng, Weihong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In recent years, vision transformers have been introduced into face recognition and analysis and have achieved performance breakthroughs. However, most previous methods generally train a single model or an ensemble of models to perform the desired task, which ignores the synergy among different tasks and fails to achieve improved prediction accuracy, increased data efficiency, and reduced training time. This paper presents a multi-purpose algorithm for simultaneous face recognition, facial expression recognition, age estimation, and face attribute estimation (40 attributes including gender) based on a single Swin Transformer. Our design, the SwinFace, consists of a single shared backbone together with a subnet for each set of related tasks. To address the conflicts among multiple tasks and meet the different demands of tasks, a Multi-Level Channel Attention (MLCA) module is integrated into each task-specific analysis subnet, which can adaptively select the features from optimal levels and channels to perform the desired tasks. Extensive experiments show that the proposed model has a better understanding of the face and achieves excellent performance for all tasks. Especially, it achieves 90.97% accuracy on RAF-DB and 0.22 $\epsilon$-error on CLAP2015, which are state-of-the-art results on facial expression recognition and age estimation respectively. The code and models will be made publicly available at https://github.com/lxq1000/SwinFace.
Published: 2023
Full Text: View/download PDF

21. Dynamic Perceiver for Efficient Visual Recognition

Author: Han, Yizeng, Han, Dongchen, Liu, Zeyu, Wang, Yulin, Pan, Xuran, Pu, Yifan, Deng, Chao, Feng, Junlan, Song, Shiji, and Huang, Gao
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Early exiting has become a promising approach to improving the inference efficiency of deep networks. By structuring models with multiple classifiers (exits), predictions for ``easy'' samples can be generated at earlier exits, negating the need for executing deeper layers. Current multi-exit networks typically implement linear classifiers at intermediate layers, compelling low-level features to encapsulate high-level semantics. This sub-optimal design invariably undermines the performance of later exits. In this paper, we propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task with a novel dual-branch architecture. A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks. Bi-directional cross-attention layers are established to progressively fuse the information of both branches. Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features. Dyn-Perceiver constitutes a versatile and adaptable framework that can be built upon various architectures. Experiments on image classification, action recognition, and object detection demonstrate that our method significantly improves the inference efficiency of different backbones, outperforming numerous competitive approaches across a broad range of computational budgets. Evaluation on both CPU and GPU platforms substantiate the superior practical efficiency of Dyn-Perceiver. Code is available at https://www.github.com/LeapLabTHU/Dynamic_Perceiver., Comment: Accepted at ICCV 2023
Published: 2023

22. MPPN: Multi-Resolution Periodic Pattern Network For Long-Term Time Series Forecasting

Author: Wang, Xing, Wang, Zhendong, Yang, Kexin, Feng, Junlan, Song, Zhiyan, Deng, Chao, and zhu, Lin
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Long-term time series forecasting plays an important role in various real-world scenarios. Recent deep learning methods for long-term series forecasting tend to capture the intricate patterns of time series by decomposition-based or sampling-based methods. However, most of the extracted patterns may include unpredictable noise and lack good interpretability. Moreover, the multivariate series forecasting methods usually ignore the individual characteristics of each variate, which may affecting the prediction accuracy. To capture the intrinsic patterns of time series, we propose a novel deep learning network architecture, named Multi-resolution Periodic Pattern Network (MPPN), for long-term series forecasting. We first construct context-aware multi-resolution semantic units of time series and employ multi-periodic pattern mining to capture the key patterns of time series. Then, we propose a channel adaptive module to capture the perceptions of multivariate towards different patterns. In addition, we present an entropy-based method for evaluating the predictability of time series and providing an upper bound on the prediction accuracy before forecasting. Our experimental evaluation on nine real-world benchmarks demonstrated that MPPN significantly outperforms the state-of-the-art Transformer-based, decomposition-based and sampling-based methods for long-term series forecasting., Comment: 21 pages
Published: 2023

23. Envisioning an Inclusive Metaverse: Student Perspectives on Accessible and Empowering Metaverse-Enabled Learning

Author: Mogavi, Reza Hadi, Hoffman, Jennifer, Deng, Chao, Du, Yiwei, Haq, Ehsan-Ul, and Hui, Pan
Subjects: Computer Science - Computers and Society, Computer Science - Human-Computer Interaction
Abstract: The emergence of the metaverse is being widely viewed as a revolutionary technology owing to a myriad of factors, particularly the potential to increase the accessibility of learning for students with disabilities. However, not much is yet known about the views and expectations of disabled students in this regard. The fact that the metaverse is still in its nascent stage exemplifies the need for such timely discourse. To bridge this important gap, we conducted a series of semi-structured interviews with 56 university students with disabilities in the United States and Hong Kong to understand their views and expectations concerning the future of metaverse-driven education. We have distilled student expectations into five thematic categories, referred to as the REEPS framework: Recognition, Empowerment, Engagement, Privacy, and Safety. Additionally, we have summarized the main design considerations in eight concise points. This paper is aimed at helping technology developers and policymakers plan ahead of time and improving the experiences of students with disabilities., Comment: This paper has been accepted for presentation at the L@S 2023 conference. The version provided here is the pre-print manuscript
Published: 2023
Full Text: View/download PDF

24. Exploring User Perspectives on ChatGPT: Applications, Perceptions, and Implications for AI-Integrated Education

Author: Mogavi, Reza Hadi, Deng, Chao, Kim, Justin Juho, Zhou, Pengyuan, Kwon, Young D., Metwally, Ahmed Hosny Saleh, Tlili, Ahmed, Bassanelli, Simone, Bucchiarone, Antonio, Gujar, Sujit, Nacke, Lennart E., and Hui, Pan
Subjects: Computer Science - Computers and Society, Computer Science - Human-Computer Interaction
Abstract: Understanding user perspectives on Artificial Intelligence (AI) in education is essential for creating pedagogically effective and ethically responsible AI-integrated learning environments. In this paper, we conduct an extensive qualitative content analysis of four major social media platforms (Twitter, Reddit, YouTube, and LinkedIn) to explore the user experience (UX) and perspectives of early adopters toward ChatGPT-an AI Chatbot technology-in various education sectors. We investigate the primary applications of ChatGPT in education (RQ1) and the various perceptions of the technology (RQ2). Our findings indicate that ChatGPT is most popularly used in the contexts of higher education (24.18%), K-12 education (22.09%), and practical-skills learning (15.28%). On social media platforms, the most frequently discussed topics about ChatGPT are productivity, efficiency, and ethics. While some early adopters lean toward seeing ChatGPT as a revolutionary technology with the potential to boost students' self-efficacy and motivation to learn, others express concern that overreliance on the AI system may promote superficial learning habits and erode students' social and critical thinking skills. Our study contributes to the broader discourse on Human-AI Interaction and offers recommendations based on crowd-sourced knowledge for educators and learners interested in incorporating ChatGPT into their educational settings. Furthermore, we propose a research agenda for future studies that sets the foundation for continued investigation into the application of ChatGPT in education., Comment: Preprint version
Published: 2023

25. ESCL: Equivariant Self-Contrastive Learning for Sentence Representations

Author: Liu, Jie, Liu, Yixuan, Han, Xue, Deng, Chao, and Feng, Junlan
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Previous contrastive learning methods for sentence representations often focus on insensitive transformations to produce positive pairs, but neglect the role of sensitive transformations that are harmful to semantic representations. Therefore, we propose an Equivariant Self-Contrastive Learning (ESCL) method to make full use of sensitive transformations, which encourages the learned representations to be sensitive to certain types of transformations with an additional equivariant learning task. Meanwhile, in order to improve practicability and generality, ESCL simplifies the implementations of traditional equivariant contrastive methods to share model parameters from the perspective of multi-task learning. We evaluate our ESCL on semantic textual similarity tasks. The proposed method achieves better results while using fewer learning parameters compared to previous methods., Comment: accepted by ICASSP 2023
Published: 2023

26. Adaptive Hybrid Spatial-Temporal Graph Neural Network for Cellular Traffic Prediction

Author: Wang, Xing, Yang, Kexin, Wang, Zhendong, Feng, Junlan, Zhu, Lin, Zhao, Juan, and Deng, Chao
Subjects: Computer Science - Networking and Internet Architecture, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Cellular traffic prediction is an indispensable part for intelligent telecommunication networks. Nevertheless, due to the frequent user mobility and complex network scheduling mechanisms, cellular traffic often inherits complicated spatial-temporal patterns, making the prediction incredibly challenging. Although recent advanced algorithms such as graph-based prediction approaches have been proposed, they frequently model spatial dependencies based on static or dynamic graphs and neglect the coexisting multiple spatial correlations induced by traffic generation. Meanwhile, some works lack the consideration of the diverse cellular traffic patterns, result in suboptimal prediction results. In this paper, we propose a novel deep learning network architecture, Adaptive Hybrid Spatial-Temporal Graph Neural Network (AHSTGNN), to tackle the cellular traffic prediction problem. First, we apply adaptive hybrid graph learning to learn the compound spatial correlations among cell towers. Second, we implement a Temporal Convolution Module with multi-periodic temporal data input to capture the nonlinear temporal dependencies. In addition, we introduce an extra Spatial-Temporal Adaptive Module to conquer the heterogeneity lying in cell towers. Our experiments on two real-world cellular traffic datasets show AHSTGNN outperforms the state-of-the-art by a significant margin, illustrating the superior scalability of our method for spatial-temporal cellular traffic prediction., Comment: To be published in IEEE International Conference on Communications (ICC)
Published: 2023

27. Your Favorite Gameplay Speaks Volumes about You: Predicting User Behavior and Hexad Type

Author: Mogavi, Reza Hadi, Deng, Chao, Hoffman, Jennifer, Haq, Ehsan-Ul, Gujar, Sujit, Bucchiarone, Antonio, and Hui, Pan
Subjects: Computer Science - Human-Computer Interaction
Abstract: In recent years, the gamification research community has widely and frequently questioned the effectiveness of one-size-fits-all gamification schemes. In consequence, personalization seems to be an important part of any successful gamification design. Personalization can be improved by understanding user behavior and Hexad player/user type. This paper comes with an original research idea: It investigates whether users' game-related data (collected via various gamer-archetype surveys) can be used to predict their behavioral characteristics and Hexad user types in non-game (but gamified) contexts. The affinity that exists between the concepts of gamification and gaming provided us with the impetus for running this exploratory research. We conducted an initial survey study with 67 Stack Exchange users (as a case study). We discovered that users' gameplay information could reveal valuable and helpful information about their behavioral characteristics and Hexad user types in a non-gaming (but gamified) environment. The results of testing three gamer archetypes (i.e., Bartle, Big Five, and BrainHex) show that they can all help predict users' most dominant Stack Exchange behavioral characteristics and Hexad user type better than a random labeler's baseline. That said, of all the gamer archetypes analyzed in this paper, BrainHex performs the best. In the end, we introduce a research agenda for future work., Comment: This manuscript is the pre-print version of our paper accepted at the conference of HCI International 2023
Published: 2023

28. Federated Learning over Coupled Graphs

Author: Lei, Runze, Wang, Pinghui, Zhao, Junzhou, Lan, Lin, Tao, Jing, Deng, Chao, Feng, Junlan, Wang, Xidian, and Guan, Xiaohong
Subjects: Computer Science - Machine Learning
Abstract: Graphs are widely used to represent the relations among entities. When one owns the complete data, an entire graph can be easily built, therefore performing analysis on the graph is straightforward. However, in many scenarios, it is impractical to centralize the data due to data privacy concerns. An organization or party only keeps a part of the whole graph data, i.e., graph data is isolated from different parties. Recently, Federated Learning (FL) has been proposed to solve the data isolation issue, mainly for Euclidean data. It is still a challenge to apply FL on graph data because graphs contain topological information which is notorious for its non-IID nature and is hard to partition. In this work, we propose a novel FL framework for graph data, FedCog, to efficiently handle coupled graphs that are a kind of distributed graph data, but widely exist in a variety of real-world applications such as mobile carriers' communication networks and banks' transaction networks. We theoretically prove the correctness and security of FedCog. Experimental results demonstrate that our method FedCog significantly outperforms traditional FL methods on graphs. Remarkably, our FedCog improves the accuracy of node classification tasks by up to 14.7%., Comment: Accepted by IEEE Transactions on Parallel and Distributed Systems
Published: 2023

29. Learning to Weight Samples for Dynamic Early-exiting Networks

Author: Han, Yizeng, Pu, Yifan, Lai, Zihang, Wang, Chaofei, Song, Shiji, Cao, Junfen, Huang, Wenhui, Deng, Chao, and Huang, Gao
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Early exiting is an effective paradigm for improving the inference efficiency of deep networks. By constructing classifiers with varying resource demands (the exits), such networks allow easy samples to be output at early exits, removing the need for executing deeper layers. While existing works mainly focus on the architectural design of multi-exit networks, the training strategies for such models are largely left unexplored. The current state-of-the-art models treat all samples the same during training. However, the early-exiting behavior during testing has been ignored, leading to a gap between training and testing. In this paper, we propose to bridge this gap by sample weighting. Intuitively, easy samples, which generally exit early in the network during inference, should contribute more to training early classifiers. The training of hard samples (mostly exit from deeper layers), however, should be emphasized by the late classifiers. Our work proposes to adopt a weight prediction network to weight the loss of different training samples at each exit. This weight prediction network and the backbone model are jointly optimized under a meta-learning framework with a novel optimization objective. By bringing the adaptive behavior during inference into the training phase, we show that the proposed weighting mechanism consistently improves the trade-off between classification accuracy and inference efficiency. Code is available at https://github.com/LeapLabTHU/L2W-DEN., Comment: ECCV 2022
Published: 2022

30. Meta Auxiliary Learning for Low-resource Spoken Language Understanding

Author: Gao, Yingying, Feng, Junlan, Deng, Chao, and Zhang, Shilei
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Computation and Language, Computer Science - Sound
Abstract: Spoken language understanding (SLU) treats automatic speech recognition (ASR) and natural language understanding (NLU) as a unified task and usually suffers from data scarcity. We exploit an ASR and NLU joint training method based on meta auxiliary learning to improve the performance of low-resource SLU task by only taking advantage of abundant manual transcriptions of speech data. One obvious advantage of such method is that it provides a flexible framework to implement a low-resource SLU training task without requiring access to any further semantic annotations. In particular, a NLU model is taken as label generation network to predict intent and slot tags from texts; a multi-task network trains ASR task and SLU task synchronously from speech; and the predictions of label generation network are delivered to the multi-task network as semantic targets. The efficiency of the proposed algorithm is demonstrated with experiments on the public CATSLU dataset, which produces more suitable ASR hypotheses for the downstream NLU task.
Published: 2022

31. Global regularity of 2D generalized incompressible magnetohydrodynamic equations

Author: Deng, Chao, Ye, Zhuan, Yuan, Baoquan, and Zhao, Jiefeng
Subjects: Mathematics - Analysis of PDEs, 35Q35, 35B65, 76W05, 76D03
Abstract: In this paper, we are concerned with the two-dimensional (2D) incompressible magnetohydrodynamic (MHD) equations with velocity dissipation given by $(-\Delta)^{\alpha}$ and magnetic diffusion given by reducing about logarithmic diffusion from standard Laplacian diffusion. More precisely, we establish the global regularity of solutions to the system as long as the power $\alpha$ is a positive constant. In addition, we prove several global \emph{a priori} bounds for the case $\alpha=0$. In particular, our results significantly improve previous works and take us one step closer to a complete resolution of the global regularity issue on the 2D resistive MHD equations, namely, the case when the MHD equations only have standard Laplacian magnetic diffusion., Comment: 31 pages. We have added some materials in this updated version. arXiv admin note: text overlap with arXiv:2205.00688
Published: 2022

32. A CTC Triggered Siamese Network with Spatial-Temporal Dropout for Speech Recognition

Author: Gao, Yingying, Feng, Junlan, Wang, Tianrui, Deng, Chao, and Zhang, Shilei
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Siamese networks have shown effective results in unsupervised visual representation learning. These models are designed to learn an invariant representation of two augmentations for one input by maximizing their similarity. In this paper, we propose an effective Siamese network to improve the robustness of End-to-End automatic speech recognition (ASR). We introduce spatial-temporal dropout to support a more violent disturbance for Siamese-ASR framework. Besides, we also relax the similarity regularization to maximize the similarities of distributions on the frames that connectionist temporal classification (CTC) spikes occur rather than on all of them. The efficiency of the proposed architecture is evaluated on two benchmarks, AISHELL-1 and Librispeech, resulting in 7.13% and 6.59% relative character error rate (CER) and word error rate (WER) reductions respectively. Analysis shows that our proposed approach brings a better uniformity for the trained model and enlarges the CTC spikes obviously.
Published: 2022

33. A Semantic Consistency Feature Alignment Object Detection Model Based on Mixed-Class Distribution Metrics

Author: Gou, Lijun, Yang, Jinrong, Yu, Hangcheng, Wang, Pan, Li, Xiaoping, and Deng, Chao
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Unsupervised domain adaptation is critical in various computer vision tasks, such as object detection, instance segmentation, etc. They attempt to reduce domain bias-induced performance degradation while also promoting model application speed. Previous works in domain adaptation object detection attempt to align image-level and instance-level shifts to eventually minimize the domain discrepancy, but they may align single-class features to mixed-class features in image-level domain adaptation because each image in the object detection task may be more than one class and object. In order to achieve single-class with single-class alignment and mixed-class with mixed-class alignment, we treat the mixed-class of the feature as a new class and propose a mixed-classes $H-divergence$ for object detection to achieve homogenous feature alignment and reduce negative transfer. Then, a Semantic Consistency Feature Alignment Model (SCFAM) based on mixed-classes $H-divergence$ was also presented. To improve single-class and mixed-class semantic information and accomplish semantic separation, the SCFAM model proposes Semantic Prediction Models (SPM) and Semantic Bridging Components (SBC). And the weight of the pix domain discriminator loss is then changed based on the SPM result to reduce sample imbalance. Extensive unsupervised domain adaption experiments on widely used datasets illustrate our proposed approach's robust object detection in domain bias settings.
Published: 2022

34. Network Topology Optimization via Deep Reinforcement Learning

Author: Li, Zhuoran, Wang, Xing, Pan, Ling, Zhu, Lin, Wang, Zhendong, Feng, Junlan, Deng, Chao, and Huang, Longbo
Subjects: Computer Science - Networking and Internet Architecture, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Topology impacts important network performance metrics, including link utilization, throughput and latency, and is of central importance to network operators. However, due to the combinatorial nature of network topology, it is extremely difficult to obtain an optimal solution, especially since topology planning in networks also often comes with management-specific constraints. As a result, local optimization with hand-tuned heuristic methods from human experts are often adopted in practice. Yet, heuristic methods cannot cover the global topology design space while taking into account constraints, and cannot guarantee to find good solutions. In this paper, we propose a novel deep reinforcement learning (DRL) algorithm, called Advantage Actor Critic-Graph Searching (A2C-GS), for network topology optimization. A2C-GS consists of three novel components, including a verifier to validate the correctness of a generated network topology, a graph neural network (GNN) to efficiently approximate topology rating, and a DRL actor layer to conduct a topology search. A2C-GS can efficiently search over large topology space and output topology with satisfying performance. We conduct a case study based on a real network scenario, and our experimental results demonstrate the superior performance of A2C-GS in terms of both efficiency and performance.
Published: 2022

35. OPAL: Occlusion Pattern Aware Loss for Unsupervised Light Field Disparity Estimation

Author: Li, Peng, Zhao, Jiayin, Wu, Jingyao, Deng, Chao, Wang, Haoqian, and Yu, Tao
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Light field disparity estimation is an essential task in computer vision with various applications. Although supervised learning-based methods have achieved both higher accuracy and efficiency than traditional optimization-based methods, the dependency on ground-truth disparity for training limits the overall generalization performance not to say for real-world scenarios where the ground-truth disparity is hard to capture. In this paper, we argue that unsupervised methods can achieve comparable accuracy, but, more importantly, much higher generalization capacity and efficiency than supervised methods. Specifically, we present the Occlusion Pattern Aware Loss, named OPAL, which successfully extracts and encodes the general occlusion patterns inherent in the light field for loss calculation. OPAL enables: i) accurate and robust estimation by effectively handling occlusions without using any ground-truth information for training and ii) much efficient performance by significantly reducing the network parameters required for accurate inference. Besides, a transformer-based network and a refinement module are proposed for achieving even more accurate results. Extensive experiments demonstrate our method not only significantly improves the accuracy compared with the SOTA unsupervised methods, but also possesses strong generalization capacity, even for real-world data, compared with supervised methods. Our code will be made publicly available.
Published: 2022

36. GenAD: General Representations of Multivariate Time Seriesfor Anomaly Detection

Author: Hua, Xiaolei, Zhu, Lin, Zhang, Shenglin, Li, Zeyan, Wang, Su, Zhou, Dong, Wang, Shuo, and Deng, Chao
Subjects: Computer Science - Networking and Internet Architecture, Electrical Engineering and Systems Science - Signal Processing
Abstract: The reliability of wireless base stations in China Mobile is of vital importance, because the cell phone users are connected to the stations and the behaviors of the stations are directly related to user experience. Although the monitoring of the station behaviors can be realized by anomaly detection on multivariate time series, due to complex correlations and various temporal patterns of multivariate series in large-scale stations, building a general unsupervised anomaly detection model with a higher F1-score remains a challenging task. In this paper, we propose a General representation of multivariate time series for Anomaly Detection(GenAD). First, we pre-train a general model on large-scale wireless base stations with self-supervision, which can be easily transferred to a specific station anomaly detection with a small amount of training data. Second, we employ Multi-Correlation Attention and Time-Series Attention to represent the correlations and temporal patterns of the stations. With the above innovations, GenAD increases F1-score by total 9% on real-world datasets in China Mobile, while the performance does not significantly degrade on public datasets with only 10% of the training data.
Published: 2022

37. Adaptive Multi-receptive Field Spatial-Temporal Graph Convolutional Network for Traffic Forecasting

Author: Wang, Xing, Zhao, Juan, Zhu, Lin, Zhou, Xu, Li, Zhao, Feng, Junlan, Deng, Chao, and Zhang, Yong
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Mobile network traffic forecasting is one of the key functions in daily network operation. A commercial mobile network is large, heterogeneous, complex and dynamic. These intrinsic features make mobile network traffic forecasting far from being solved even with recent advanced algorithms such as graph convolutional network-based prediction approaches and various attention mechanisms, which have been proved successful in vehicle traffic forecasting. In this paper, we cast the problem as a spatial-temporal sequence prediction task. We propose a novel deep learning network architecture, Adaptive Multi-receptive Field Spatial-Temporal Graph Convolutional Networks (AMF-STGCN), to model the traffic dynamics of mobile base stations. AMF-STGCN extends GCN by (1) jointly modeling the complex spatial-temporal dependencies in mobile networks, (2) applying attention mechanisms to capture various Receptive Fields of heterogeneous base stations, and (3) introducing an extra decoder based on a fully connected deep network to conquer the error propagation challenge with multi-step forecasting. Experiments on four real-world datasets from two different domains consistently show AMF-STGCN outperforms the state-of-the-art methods., Comment: To be published in IEEE GLOBECOM
Published: 2021

38. 10-mega pixel snapshot compressive imaging with a hybrid coded aperture

Author: Zhang, Zhihong, Deng, Chao, Liu, Yang, Yuan, Xin, Suo, Jinli, and Dai, Qionghai
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition, Physics - Optics
Abstract: High resolution images are widely used in our daily life, whereas high-speed video capture is challenging due to the low frame rate of cameras working at the high resolution mode. Digging deeper, the main bottleneck lies in the low throughput of existing imaging systems. Towards this end, snapshot compressive imaging (SCI) was proposed as a promising solution to improve the throughput of imaging systems by compressive sampling and computational reconstruction. During acquisition, multiple high-speed images are encoded and collapsed to a single measurement. After this, algorithms are employed to retrieve the video frames from the coded snapshot. Recently developed Plug-and-Play (PnP) algorithms make it possible for SCI reconstruction in large-scale problems. However, the lack of high-resolution encoding systems still precludes SCI's wide application. In this paper, we build a novel hybrid coded aperture snapshot compressive imaging (HCA-SCI) system by incorporating a dynamic liquid crystal on silicon and a high-resolution lithography mask. We further implement a PnP reconstruction algorithm with cascaded denoisers for high quality reconstruction. Based on the proposed HCA-SCI system and algorithm, we achieve a 10-mega pixel SCI system to capture high-speed scenes, leading to a high throughput of 4.6G voxels per second. Both simulation and real data experiments verify the feasibility and performance of our proposed HCA-SCI scheme., Comment: 11 pages, 8 figures, accepted by Photonics Research
Published: 2021
Full Text: View/download PDF

39. Carton dataset synthesis method for domain shift based on foreground texture decoupling and replacement

Author: Gou, Lijun, Wu, Shengkai, Yang, Jinrong, Yu, Hangcheng, Lin, Chenxi, Li, Xiaoping, and Deng, Chao
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: One major impediment in rapidly deploying object detection models for industrial applications is the lack of large annotated datasets. We currently have presented the Sacked Carton Dataset(SCD) that contains carton images from three scenarios, such as comprehensive pharmaceutical logistics company(CPLC), e-commerce logistics company(ECLC), fruit market(FM). However, due to domain shift, the model trained with one of the three scenarios in SCD has poor generalization ability when applied to the rest scenarios. To solve this problem, a novel image synthesis method is proposed to replace the foreground texture of the source datasets with the texture of the target datasets. Our method can keep the context relationship of foreground objects and backgrounds unchanged and greatly augment the target datasets. We firstly propose a surface segmentation algorithm to achieve texture decoupling of each instance. Secondly, a contour reconstruction algorithm is proposed to keep the occlusion and truncation relationship of the instance unchanged. Finally, the Gaussian fusion algorithm is used to replace the foreground texture from the source datasets with the texture from the target datasets. The novel image synthesis method can largely boost AP by at least 4.3%~6.5% on RetinaNet and 3.4%~6.8% on Faster R-CNN for the target domain. Code is available at https://github.com/hustgetlijun/RCAN.
Published: 2021

40. Relative wealth concerns with partial information and heterogeneous priors

Author: Deng, Chao, Su, Xizhi, and Zhou, Chao
Subjects: Quantitative Finance - Portfolio Management
Abstract: We establish a Nash equilibrium in a market with $ N $ agents with the performance criteria of relative wealth level when the market return is unobservable. Each investor has a random prior belief on the return rate of the risky asset. The investors can be heterogeneous in both the mean and variance of the prior. By a separation result and a martingale argument, we show that the optimal investment strategy under a stochastic return rate model can be characterized by a fully-coupled linear FBSDE. Two sets of deep neural networks are used for the numerical computation to first find each investor's estimate of the mean return rate and then solve the FBSDEs. We establish the existence and uniqueness result for the class of FBSDEs with stochastic coefficients and solve the utility game under partial information using deep neural network function approximators. We demonstrate the efficiency and accuracy by a base-case comparison with the solution from the finite difference scheme in the linear case and apply the algorithm to the general case of nonlinear hidden variable process. Simulations of investment strategies show a herd effect that investors trade more aggressively under relativeness concerns. Statistical properties of the investment strategies and the portfolio performance, including the Sharpe ratios and the Variance Risk ratios (VRRs) are examed. We observe that the agent with the most accurate prior estimate is likely to lead the herd, and the effect of competition on heterogeneous agents varies more with market characteristics compared to the homogeneous case.
Published: 2020

41. Hardware Impairments Aware Full-Duplex NOMA Networks Over Rician Fading Channels

Author: Deng, Chao, Liu, Meng, Li, Xingwang, and Liu, Yuanwei
Subjects: Computer Science - Information Theory
Abstract: A cooperative full duplex (FD) non-orthogonal multiple access (NOMA) scheme over Rician fading channels is considered. To be practical, imperfect successive interference cancellation (ipSIC) and residual hardware impairments (RHIs) at transceivers are taken into account. To evaluate the performance of the considered system, the analytical approximate expressions for the outage probability (OP) and the ergodic rate (ER) of the considered system are derived, and the asymptotic performance is also explored. Simulation results manifest that the FD cooperative NOMA can improve the ergodic sum rate (ESR) of the system compared to the half duplex (HD) mode.
Published: 2020
Full Text: View/download PDF

42. Smart Prediction of the Complaint Hotspot Problem in Mobile Network

Author: Zhu, Lin, Zhao, Juan, Wang, Yiting, Feng, Juanlan, Deng, Chao, Huang, Zhenning, and Li, Hui
Subjects: Computer Science - Networking and Internet Architecture
Abstract: In mobile network, a complaint hotspot problem often affects even thousands of users' service and leads to significant economic losses and bulk complaints. In this paper, we propose an approach to predict a customer complaint based on real-time user signalling data. Through analyzing the network and user sevice procedure, 30 key data fields related to user experience have been extracted in XDR data collected from the S1 interface. Furthermore, we augment these basic features with derived features for user experience evaluation, such as one-hot features, statistical features and differential features. Considering the problems of unbalanced data, we use LightGBM as our prediction model. LightGBM has strong generalization ability and was designed to handle unbalanced data. Experiments we conducted prove the effectiveness and efficiency of this proposal. This approach has been deployed for daily routine to locate the hot complaint problem scope as well as to report affected users and area.
Published: 2020

43. Is mesh non-fixation safe in transabdominal preperitoneal (TAPP) inguinal hernia repair? A meta-analysis of randomized controlled trials

Author: Jiang, Tao, primary, Zhang, Chen, additional, Wang, Xiao-Ling, additional, Yue, Da-Chun, additional, Yuan, Xiao-Ping, additional, and Wang, Deng-Chao, additional
Published: 2024
Full Text: View/download PDF

44. Sogou Machine Reading Comprehension Toolkit

Author: Wu, Jindou, Yang, Yunlun, Deng, Chao, Tang, Hongyi, Wang, Bingning, Sun, Haoze, Yao, Ting, and Zhang, Qi
Subjects: Computer Science - Computation and Language
Abstract: Machine reading comprehension have been intensively studied in recent years, and neural network-based models have shown dominant performances. In this paper, we present a Sogou Machine Reading Comprehension (SMRC) toolkit that can be used to provide the fast and efficient development of modern machine comprehension models, including both published models and original prototypes. To achieve this goal, the toolkit provides dataset readers, a flexible preprocessing pipeline, necessary neural network components, and built-in models, which make the whole process of data preparation, model construction, and training easier.
Published: 2019

45. High fidelity single-pixel imaging

Author: Deng, Chao, Hu, Xuemei, Li, Xiaoxu, Suo, Jinli, Zhang, Zhili, and Dai, Qionghai
Subjects: Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Single-pixel imaging (SPI) is an emerging technique which has attracts wide attention in various research fields. However, restricted by the low reconstruction quality and large amount of measurements, the practical application is still in its infancy. Inspired by the fact that natural scenes exhibit unique degenerate structures in the low dimensional subspace, we propose to take advantage of the local prior in convolutional sparse coding to implement high fidelity single-pixel imaging. Specifically, by statistically learning strategy, the target scene can be sparse represented on an overcomplete dictionary. The dictionary is composed of various basis learned from a natural image database. We introduce the above local prior into conventional SPI framework to promote the final reconstruction quality. Experiments both on synthetic data and real captured data demonstrate that our method can achieve better reconstruction from the same measurements, and thus consequently reduce the number of required measurements for same reconstruction quality., Comment: 5 pages, 6 figures
Published: 2018

46. Identifying viruses from metagenomic data by deep learning

Author: Ren, Jie, Song, Kai, Deng, Chao, Ahlgren, Nathan A., Fuhrman, Jed A., Li, Yi, Xie, Xiaohui, and Sun, Fengzhu
Subjects: Quantitative Biology - Genomics
Abstract: The recent development of metagenomic sequencing makes it possible to sequence microbial genomes including viruses in an environmental sample. Identifying viral sequences from metagenomic data is critical for downstream virus analyses. The existing reference-based and gene homology-based methods are not efficient in identifying unknown viruses or short viral sequences. Here we have developed a reference-free and alignment-free machine learning method, DeepVirFinder, for predicting viral sequences in metagenomic data using deep learning techniques. DeepVirFinder was trained based on a large number of viral sequences discovered before May 2015. Evaluated on the sequences after that date, DeepVirFinder outperformed the state-of-the-art method VirFinder at all contig lengths. Enlarging the training data by adding millions of purified viral sequences from environmental metavirome samples significantly improves the accuracy for predicting under-represented viruses. Applying DeepVirFinder to real human gut metagenomic samples from patients with colorectal carcinoma (CRC) identified 51,138 viral sequences belonging to 175 bins. Ten bins were associated with the cancer status, indicating their potential use for non-invasive diagnosis of CRC. In summary, DeepVirFinder greatly improved the precision and recall rates of viral identification, and it will significantly accelerate the discovery rate of viruses.
Published: 2018

47. Snapshot hyperspectral imaging via spectral basis multiplexing in Fourier domain

Author: Deng, Chao, Hu, Xuemei, Suo, Jinli, Zhang, Yuanlong, Zhang, Zhili, and Dai, Qionghai
Subjects: Physics - Instrumentation and Detectors, Physics - Optics
Abstract: Hyperspectral imaging is an important tool having been applied in various fields, but still limited in observation of dynamic scenes. In this paper, we propose a snapshot hyperspectral imaging technique which exploits both spectral and spatial sparsity of natural scenes. Under the computational imaging scheme, we conduct spectral dimension reduction and spatial frequency truncation to the hyperspectral data cube and snapshot it in a low cost manner. Specifically, we modulate the spectral variations by several broadband spectral filters, and then map these modulated images into different regions in the Fourier domain. The encoded image compressed in both spectral and spatial are finally collected by a monochrome detector. Correspondingly, the reconstruction is essentially a Fourier domain extraction and spectral dimensional back projection with low computational load. This Fourier-spectral multiplexing in a 2D sensor simplifies both the encoding and decoding process, and makes hyperspectral data captured in a low cost manner. We demonstrate the high performance of our method by quantitative evaluation on simulation data and build a prototype system experimentally for further validation., Comment: 13 pages, 8 figures
Published: 2018
Full Text: View/download PDF

48. Single-shot thermal ghost imaging using wavelength-division multiplexing

Author: Deng, Chao, Wang, Yuwang, Suo, Jinli, Zhang, Zhili, and Dai, Qionghai
Subjects: Physics - Optics, Experimental work, I.4.1
Abstract: Ghost imaging (GI) is a potential imaging technique that reconstructs the target scene from its correlated measurements with a sequential of patterns. Restricted by the multi-shot principle, GI usually requires long acquisition time and is limited in observation of dynamic scenes. To handle this problem, this paper proposes a single-shot thermal ghost imaging scheme via wavelength-division multiplexing technique. Specifically, we generate thousands of patterns simultaneously by modulating a broadband light source with a wavelength dependent diffuser. These patterns carry the scene's spatial information and then the correlated measurements are coupled into a spectrometer for the final reconstruction. This technique accelerates the ghost imaging speed significantly and promotes the applications in dynamic ghost imaging., Comment: 10 pages, 4 figures
Published: 2017
Full Text: View/download PDF

49. Estimating the number of species to attain sufficient representation in a random sample

Author: Deng, Chao, Daley, Timothy, Calabrese, Peter, Ren, Jie, and Smith, Andrew D.
Subjects: Statistics - Methodology
Abstract: The statistical problem of using an initial sample to estimate the number of species in a larger sample has found important applications in fields far removed from ecology. Here we address the general problem of estimating the number of species that will be represented by at least a number r of observations in a future sample. The number r indicates species with sufficient observations, which are commonly used as a necessary condition for any robust statistical inference. We derive a procedure to construct consistent estimators that apply universally for a given population: once constructed, they can be evaluated as a simple function of r. Our approach is based on a relation between the number of species represented at least r times and the higher derivatives of the expected number of species discovered per unit of time. Combining this relation with a rational function approximation, we propose nonparametric estimators that are accurate for both large values of r and long-range extrapolations. We further show that our estimators retain asymptotic behaviors that are essential for applications on large-scale datasets. We evaluate the performance of this approach by both simulation and real data applications for inferences of the vocabulary of Shakespeare and Dickens, the topology of a Twitter social network, and molecular diversity in DNA sequencing data., Comment: 19 pages, 5 figures, 3 tables
Published: 2016

50. Variational Autoencoders for Semi-supervised Text Classification

Author: Xu, Weidi, Sun, Haoze, Deng, Chao, and Tan, Ying
Subjects: Computer Science - Computation and Language, Computer Science - Learning
Abstract: Although semi-supervised variational autoencoder (SemiVAE) works in image classification task, it fails in text classification task if using vanilla LSTM as its decoder. From a perspective of reinforcement learning, it is verified that the decoder's capability to distinguish between different categorical labels is essential. Therefore, Semi-supervised Sequential Variational Autoencoder (SSVAE) is proposed, which increases the capability by feeding label into its decoder RNN at each time-step. Two specific decoder structures are investigated and both of them are verified to be effective. Besides, in order to reduce the computational complexity in training, a novel optimization method is proposed, which estimates the gradient of the unlabeled objective function by sampling, along with two variance reduction techniques. Experimental results on Large Movie Review Dataset (IMDB) and AG's News corpus show that the proposed approach significantly improves the classification accuracy compared with pure-supervised classifiers, and achieves competitive performance against previous advanced methods. State-of-the-art results can be obtained by integrating other pretraining-based methods., Comment: 8 pages, 4 figure
Published: 2016

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Publication Type

Database

57 results on '"Deng, Chao"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources