Author: "Zhang, Malu" / Publication Year Range: Last 10 years - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Zhang, Malu"' showing total 220 results

Start Over Author "Zhang, Malu" Publication Year Range Last 10 years

220 results on '"Zhang, Malu"'

1. QP-SNN: Quantized and Pruned Spiking Neural Networks

Author: Wei, Wenjie, Zhang, Malu, Zhou, Zijian, Belatreche, Ammar, Shan, Yimeng, Liang, Yu, Cao, Honglin, Zhang, Jieyuan, and Yang, Yang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Brain-inspired Spiking Neural Networks (SNNs) leverage sparse spikes to encode information and operate in an asynchronous event-driven manner, offering a highly energy-efficient paradigm for machine intelligence. However, the current SNN community focuses primarily on performance improvement by developing large-scale models, which limits the applicability of SNNs in resource-limited edge devices. In this paper, we propose a hardware-friendly and lightweight SNN, aimed at effectively deploying high-performance SNN in resource-limited scenarios. Specifically, we first develop a baseline model that integrates uniform quantization and structured pruning, called QP-SNN baseline. While this baseline significantly reduces storage demands and computational costs, it suffers from performance decline. To address this, we conduct an in-depth analysis of the challenges in quantization and pruning that lead to performance degradation and propose solutions to enhance the baseline's performance. For weight quantization, we propose a weight rescaling strategy that utilizes bit width more effectively to enhance the model's representation capability. For structured pruning, we propose a novel pruning criterion using the singular value of spatiotemporal spike activities to enable more accurate removal of redundant kernels. Extensive experiments demonstrate that integrating two proposed methods into the baseline allows QP-SNN to achieve state-of-the-art performance and efficiency, underscoring its potential for enhancing SNN deployment in edge intelligence computing., Comment: 26 pages, 17 figures, Published as a conference paper at ICLR 2025
Published: 2025

2. Mixed-Precision Graph Neural Quantization for Low Bit Large Language Models

Author: Liu, Wanlong, Xiao, Yichen, Zeng, Dingyi, Zhao, Hongyang, Chen, Wenyu, and Zhang, Malu
Subjects: Computer Science - Computation and Language
Abstract: Post-Training Quantization (PTQ) is pivotal for deploying large language models (LLMs) within resource-limited settings by significantly reducing resource demands. However, existing PTQ strategies underperform at low bit levels < 3 bits due to the significant difference between the quantized and original weights. To enhance the quantization performance at low bit widths, we introduce a Mixed-precision Graph Neural PTQ (MG-PTQ) approach, employing a graph neural network (GNN) module to capture dependencies among weights and adaptively assign quantization bit-widths. Through the information propagation of the GNN module, our method more effectively captures dependencies among target weights, leading to a more accurate assessment of weight importance and optimized allocation of quantization strategies. Extensive experiments on the WikiText2 and C4 datasets demonstrate that our MG-PTQ method outperforms previous state-of-the-art PTQ method GPTQ, setting new benchmarks for quantization performance under low-bit conditions., Comment: ICASSP 2025
Published: 2025

3. Quantized Spike-driven Transformer

Author: Qiu, Xuerui, Zhang, Malu, Zhang, Jieyuan, Wei, Wenjie, Cao, Honglin, Guo, Junsheng, Zhu, Rui-Jie, Shan, Yimeng, Yang, Yang, and Li, Haizhou
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Spiking neural networks are emerging as a promising energy-efficient alternative to traditional artificial neural networks due to their spike-driven paradigm. However, recent research in the SNN domain has mainly focused on enhancing accuracy by designing large-scale Transformer structures, which typically rely on substantial computational resources, limiting their deployment on resource-constrained devices. To overcome this challenge, we propose a quantized spike-driven Transformer baseline (QSD-Transformer), which achieves reduced resource demands by utilizing a low bit-width parameter. Regrettably, the QSD-Transformer often suffers from severe performance degradation. In this paper, we first conduct empirical analysis and find that the bimodal distribution of quantized spike-driven self-attention (Q-SDSA) leads to spike information distortion (SID) during quantization, causing significant performance degradation. To mitigate this issue, we take inspiration from mutual information entropy and propose a bi-level optimization strategy to rectify the information distribution in Q-SDSA. Specifically, at the lower level, we introduce an information-enhanced LIF to rectify the information distribution in Q-SDSA. At the upper level, we propose a fine-grained distillation scheme for the QSD-Transformer to align the distribution in Q-SDSA with that in the counterpart ANN. By integrating the bi-level optimization strategy, the QSD-Transformer can attain enhanced energy efficiency without sacrificing its high-performance advantage.For instance, when compared to the prior SNN benchmark on ImageNet, the QSD-Transformer achieves 80.3% top-1 accuracy, accompanied by significant reductions of 6.0$\times$ and 8.1$\times$ in power consumption and model size, respectively. Code is available at https://github.com/bollossom/QSD-Transformer., Comment: Accepted by ICLR 2025
Published: 2025

4. Binary Event-Driven Spiking Transformer

Author: Cao, Honglin, Zhou, Zijian, Wei, Wenjie, Belatreche, Ammar, Liang, Yu, Zhang, Dehao, Zhang, Malu, Yang, Yang, and Li, Haizhou
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Transformer-based Spiking Neural Networks (SNNs) introduce a novel event-driven self-attention paradigm that combines the high performance of Transformers with the energy efficiency of SNNs. However, the larger model size and increased computational demands of the Transformer structure limit their practicality in resource-constrained scenarios. In this paper, we integrate binarization techniques into Transformer-based SNNs and propose the Binary Event-Driven Spiking Transformer, i.e. BESTformer. The proposed BESTformer can significantly reduce storage and computational demands by representing weights and attention maps with a mere 1-bit. However, BESTformer suffers from a severe performance drop from its full-precision counterpart due to the limited representation capability of binarization. To address this issue, we propose a Coupled Information Enhancement (CIE) method, which consists of a reversible framework and information enhancement distillation. By maximizing the mutual information between the binary model and its full-precision counterpart, the CIE method effectively mitigates the performance degradation of the BESTformer. Extensive experiments on static and neuromorphic datasets demonstrate that our method achieves superior performance to other binary SNNs, showcasing its potential as a compact yet high-performance model for resource-limited edge devices., Comment: 11 pages, 5 figures
Published: 2025

5. A Compressive Memory-based Retrieval Approach for Event Argument Extraction

Author: Liu, Wanlong, Zhang, Enqi, Zhou, Li, Zeng, Dingyi, Cheng, Shaohuan, Zhang, Chen, Zhang, Malu, and Chen, Wenyu
Subjects: Computer Science - Computation and Language
Abstract: Recent works have demonstrated the effectiveness of retrieval augmentation in the Event Argument Extraction (EAE) task. However, existing retrieval-based EAE methods have two main limitations: (1) input length constraints and (2) the gap between the retriever and the inference model. These issues limit the diversity and quality of the retrieved information. In this paper, we propose a Compressive Memory-based Retrieval (CMR) mechanism for EAE, which addresses the two limitations mentioned above. Our compressive memory, designed as a dynamic matrix that effectively caches retrieved information and supports continuous updates, overcomes the limitations of the input length. Additionally, after pre-loading all candidate demonstrations into the compressive memory, the model further retrieves and filters relevant information from memory based on the input query, bridging the gap between the retriever and the inference model. Extensive experiments show that our method achieves new state-of-the-art performance on three public datasets (RAMS, WikiEvents, ACE05), significantly outperforming existing retrieval-based EAE methods., Comment: 15 pages
Published: 2024

6. Raindrop Clarity: A Dual-Focused Dataset for Day and Night Raindrop Removal

Author: Jin, Yeying, Li, Xin, Wang, Jiadong, Zhang, Yan, and Zhang, Malu
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Existing raindrop removal datasets have two shortcomings. First, they consist of images captured by cameras with a focus on the background, leading to the presence of blurry raindrops. To our knowledge, none of these datasets include images where the focus is specifically on raindrops, which results in a blurry background. Second, these datasets predominantly consist of daytime images, thereby lacking nighttime raindrop scenarios. Consequently, algorithms trained on these datasets may struggle to perform effectively in raindrop-focused or nighttime scenarios. The absence of datasets specifically designed for raindrop-focused and nighttime raindrops constrains research in this area. In this paper, we introduce a large-scale, real-world raindrop removal dataset called Raindrop Clarity. Raindrop Clarity comprises 15,186 high-quality pairs/triplets (raindrops, blur, and background) of images with raindrops and the corresponding clear background images. There are 5,442 daytime raindrop images and 9,744 nighttime raindrop images. Specifically, the 5,442 daytime images include 3,606 raindrop- and 1,836 background-focused images. While the 9,744 nighttime images contain 4,838 raindrop- and 4,906 background-focused images. Our dataset will enable the community to explore background-focused and raindrop-focused images, including challenges unique to daytime and nighttime conditions. Our data and code are available at: \url{https://github.com/jinyeying/RaindropClarity}, Comment: Accepted to ECCV2024, dataset and benchmark at: \url{https://github.com/jinyeying/RaindropClarity}
Published: 2024

7. Ternary Spike-based Neuromorphic Signal Processing System

Author: Wang, Shuai, Zhang, Dehao, Belatreche, Ammar, Xiao, Yichen, Qing, Hongyu, We, Wenjie, Zhang, Malu, and Yang, Yang
Subjects: Electrical Engineering and Systems Science - Signal Processing, Computer Science - Neural and Evolutionary Computing, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Deep Neural Networks (DNNs) have been successfully implemented across various signal processing fields, resulting in significant enhancements in performance. However, DNNs generally require substantial computational resources, leading to significant economic costs and posing challenges for their deployment on resource-constrained edge devices. In this study, we take advantage of spiking neural networks (SNNs) and quantization technologies to develop an energy-efficient and lightweight neuromorphic signal processing system. Our system is characterized by two principal innovations: a threshold-adaptive encoding (TAE) method and a quantized ternary SNN (QT-SNN). The TAE method can efficiently encode time-varying analog signals into sparse ternary spike trains, thereby reducing energy and memory demands for signal processing. QT-SNN, compatible with ternary spike trains from the TAE method, quantifies both membrane potentials and synaptic weights to reduce memory requirements while maintaining performance. Extensive experiments are conducted on two typical signal-processing tasks: speech and electroencephalogram recognition. The results demonstrate that our neuromorphic signal processing system achieves state-of-the-art (SOTA) performance with a 94% reduced memory requirement. Furthermore, through theoretical energy consumption analysis, our system shows 7.5x energy saving compared to other SNN works. The efficiency and efficacy of the proposed system highlight its potential as a promising avenue for energy-efficient signal processing.
Published: 2024

8. Q-SNNs: Quantized Spiking Neural Networks

Author: Wei, Wenjie, Liang, Yu, Belatreche, Ammar, Xiao, Yichen, Cao, Honglin, Ren, Zhenbang, Wang, Guoqing, Zhang, Malu, and Yang, Yang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Brain-inspired Spiking Neural Networks (SNNs) leverage sparse spikes to represent information and process them in an asynchronous event-driven manner, offering an energy-efficient paradigm for the next generation of machine intelligence. However, the current focus within the SNN community prioritizes accuracy optimization through the development of large-scale models, limiting their viability in resource-constrained and low-power edge devices. To address this challenge, we introduce a lightweight and hardware-friendly Quantized SNN (Q-SNN) that applies quantization to both synaptic weights and membrane potentials. By significantly compressing these two key elements, the proposed Q-SNNs substantially reduce both memory usage and computational complexity. Moreover, to prevent the performance degradation caused by this compression, we present a new Weight-Spike Dual Regulation (WS-DR) method inspired by information entropy theory. Experimental evaluations on various datasets, including static and neuromorphic, demonstrate that our Q-SNNs outperform existing methods in terms of both model size and accuracy. These state-of-the-art results in efficiency and efficacy suggest that the proposed method can significantly improve edge intelligent computing., Comment: 8 pages, 5 figures
Published: 2024

9. Global-Local Convolution with Spiking Neural Networks for Energy-efficient Keyword Spotting

Author: Wang, Shuai, Zhang, Dehao, Shi, Kexin, Wang, Yuchen, Wei, Wenjie, Wu, Jibin, and Zhang, Malu
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Computer Science - Neural and Evolutionary Computing, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Thanks to Deep Neural Networks (DNNs), the accuracy of Keyword Spotting (KWS) has made substantial progress. However, as KWS systems are usually implemented on edge devices, energy efficiency becomes a critical requirement besides performance. Here, we take advantage of spiking neural networks' energy efficiency and propose an end-to-end lightweight KWS model. The model consists of two innovative modules: 1) Global-Local Spiking Convolution (GLSC) module and 2) Bottleneck-PLIF module. Compared to the hand-crafted feature extraction methods, the GLSC module achieves speech feature extraction that is sparser, more energy-efficient, and yields better performance. The Bottleneck-PLIF module further processes the signals from GLSC with the aim to achieve higher accuracy with fewer parameters. Extensive experiments are conducted on the Google Speech Commands Dataset (V1 and V2). The results show our method achieves competitive performance among SNN-based KWS models with fewer parameters.
Published: 2024

10. SFedCA: Credit Assignment-Based Active Client Selection Strategy for Spiking Federated Learning

Author: Zhan, Qiugang, Cao, Jinbo, Xie, Xiurui, Zhang, Malu, Tang, Huajin, and Liu, Guisong
Subjects: Computer Science - Machine Learning, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Emerging Technologies, Computer Science - Multimedia, Computer Science - Neural and Evolutionary Computing
Abstract: Spiking federated learning is an emerging distributed learning paradigm that allows resource-constrained devices to train collaboratively at low power consumption without exchanging local data. It takes advantage of both the privacy computation property in federated learning (FL) and the energy efficiency in spiking neural networks (SNN). Thus, it is highly promising to revolutionize the efficient processing of multimedia data. However, existing spiking federated learning methods employ a random selection approach for client aggregation, assuming unbiased client participation. This neglect of statistical heterogeneity affects the convergence and accuracy of the global model significantly. In our work, we propose a credit assignment-based active client selection strategy, the SFedCA, to judiciously aggregate clients that contribute to the global sample distribution balance. Specifically, the client credits are assigned by the firing intensity state before and after local model training, which reflects the local data distribution difference from the global model. Comprehensive experiments are conducted on various non-identical and independent distribution (non-IID) scenarios. The experimental results demonstrate that the SFedCA outperforms the existing state-of-the-art spiking federated learning methods, and requires fewer communication rounds., Comment: 9 pages
Published: 2024

11. Raindrop Clarity: A Dual-Focused Dataset for Day and Night Raindrop Removal

Author: Jin, Yeying, Li, Xin, Wang, Jiadong, Zhang, Yan, Zhang, Malu, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
Published: 2025
Full Text: View/download PDF

12. Advancing Spiking Neural Networks towards Multiscale Spatiotemporal Interaction Learning

Author: Shan, Yimeng, Zhang, Malu, Zhu, Rui-jie, Qiu, Xuerui, Eshraghian, Jason K., and Qu, Haicheng
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent advancements in neuroscience research have propelled the development of Spiking Neural Networks (SNNs), which not only have the potential to further advance neuroscience research but also serve as an energy-efficient alternative to Artificial Neural Networks (ANNs) due to their spike-driven characteristics. However, previous studies often neglected the multiscale information and its spatiotemporal correlation between event data, leading SNN models to approximate each frame of input events as static images. We hypothesize that this oversimplification significantly contributes to the performance gap between SNNs and traditional ANNs. To address this issue, we have designed a Spiking Multiscale Attention (SMA) module that captures multiscale spatiotemporal interaction information. Furthermore, we developed a regularization method named Attention ZoneOut (AZO), which utilizes spatiotemporal attention weights to reduce the model's generalization error through pseudo-ensemble training. Our approach has achieved state-of-the-art results on mainstream neural morphology datasets. Additionally, we have reached a performance of 77.1% on the Imagenet-1K dataset using a 104-layer ResNet architecture enhanced with SMA and AZO. This achievement confirms the state-of-the-art performance of SNNs with non-transformer architectures and underscores the effectiveness of our method in bridging the performance gap between SNN models and traditional ANN models.
Published: 2024

13. Beyond Single-Event Extraction: Towards Efficient Document-Level Multi-Event Argument Extraction

Author: Liu, Wanlong, Zhou, Li, Zeng, Dingyi, Xiao, Yichen, Cheng, Shaohuan, Zhang, Chen, Lee, Grandee, Zhang, Malu, and Chen, Wenyu
Subjects: Computer Science - Computation and Language
Abstract: Recent mainstream event argument extraction methods process each event in isolation, resulting in inefficient inference and ignoring the correlations among multiple events. To address these limitations, here we propose a multiple-event argument extraction model DEEIA (Dependency-guided Encoding and Event-specific Information Aggregation), capable of extracting arguments from all events within a document simultaneouslyThe proposed DEEIA model employs a multi-event prompt mechanism, comprising DE and EIA modules. The DE module is designed to improve the correlation between prompts and their corresponding event contexts, whereas the EIA module provides event-specific information to improve contextual understanding. Extensive experiments show that our method achieves new state-of-the-art performance on four public datasets (RAMS, WikiEvents, MLEE, and ACE05), while significantly saving the inference time compared to the baselines. Further analyses demonstrate the effectiveness of the proposed modules., Comment: Accepted to Findings of ACL 2024
Published: 2024

14. Event-Driven Learning for Spiking Neural Networks

Author: Wei, Wenjie, Zhang, Malu, Zhang, Jilin, Belatreche, Ammar, Wu, Jibin, Xu, Zijing, Qiu, Xuerui, Chen, Hong, Yang, Yang, and Li, Haizhou
Subjects: Computer Science - Neural and Evolutionary Computing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Brain-inspired spiking neural networks (SNNs) have gained prominence in the field of neuromorphic computing owing to their low energy consumption during feedforward inference on neuromorphic hardware. However, it remains an open challenge how to effectively benefit from the sparse event-driven property of SNNs to minimize backpropagation learning costs. In this paper, we conduct a comprehensive examination of the existing event-driven learning algorithms, reveal their limitations, and propose novel solutions to overcome them. Specifically, we introduce two novel event-driven learning methods: the spike-timing-dependent event-driven (STD-ED) and membrane-potential-dependent event-driven (MPD-ED) algorithms. These proposed algorithms leverage precise neuronal spike timing and membrane potential, respectively, for effective learning. The two methods are extensively evaluated on static and neuromorphic datasets to confirm their superior performance. They outperform existing event-driven counterparts by up to 2.51% for STD-ED and 6.79% for MPD-ED on the CIFAR-100 dataset. In addition, we theoretically and experimentally validate the energy efficiency of our methods on neuromorphic hardware. On-chip learning experiments achieved a remarkable 30-fold reduction in energy consumption over time-step-based surrogate gradient methods. The demonstrated efficiency and efficacy of the proposed event-driven learning methods emphasize their potential to significantly advance the fields of neuromorphic computing, offering promising avenues for energy-efficiency applications.
Published: 2024

15. LitE-SNN: Designing Lightweight and Efficient Spiking Neural Network through Spatial-Temporal Compressive Network Search and Joint Optimization

Author: Liu, Qianhui, Yan, Jiaqi, Zhang, Malu, Pan, Gang, and Li, Haizhou
Subjects: Computer Science - Neural and Evolutionary Computing
Abstract: Spiking Neural Networks (SNNs) mimic the information-processing mechanisms of the human brain and are highly energy-efficient, making them well-suited for low-power edge devices. However, the pursuit of accuracy in current studies leads to large, long-timestep SNNs, conflicting with the resource constraints of these devices. In order to design lightweight and efficient SNNs, we propose a new approach named LitE-SNN that incorporates both spatial and temporal compression into the automated network design process. Spatially, we present a novel Compressive Convolution block (CompConv) to expand the search space to support pruning and mixed-precision quantization. Temporally, we are the first to propose a compressive timestep search to identify the optimal number of timesteps under specific computation cost constraints. Finally, we formulate a joint optimization to simultaneously learn the architecture parameters and spatial-temporal compression strategies to achieve high performance while minimizing memory and computation costs. Experimental results on CIFAR-10, CIFAR-100, and Google Speech Command datasets demonstrate our proposed LitE-SNNs can achieve competitive or even higher accuracy with remarkably smaller model sizes and fewer computation costs.
Published: 2024

16. CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing

Author: Yue, Xianghu, Tian, Xiaohai, Lu, Lu, Zhang, Malu, Wu, Zhizheng, and Li, Haizhou
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Multimedia, Computer Science - Sound, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: There has been a long-standing quest for a unified audio-visual-text model to enable various multimodal understanding tasks, which mimics the listening, seeing and reading process of human beings. Humans tends to represent knowledge using two separate systems: one for representing verbal (textual) information and one for representing non-verbal (visual and auditory) information. These two systems can operate independently but can also interact with each other. Motivated by this understanding of human cognition, in this paper, we introduce CoAVT -- a novel cognition-inspired Correlated Audio-Visual-Text pre-training model to connect the three modalities. It contains a joint audio-visual encoder that learns to encode audio-visual synchronization information together with the audio and visual content for non-verbal information, and a text encoder to handle textual input for verbal information. To bridge the gap between modalities, CoAVT employs a query encoder, which contains a set of learnable query embeddings, and extracts the most informative audiovisual features of the corresponding text. Additionally, to leverage the correspondences between audio and vision with language respectively, we also establish the audio-text and visual-text bi-modal alignments upon the foundational audiovisual-text tri-modal alignment to enhance the multimodal representation learning. Finally, we jointly optimize CoAVT model with three multimodal objectives: contrastive loss, matching loss and language modeling loss. Extensive experiments show that CoAVT can learn strong multimodal correlations and be generalized to various downstream tasks. CoAVT establishes new state-of-the-art performance on text-video retrieval task on AudioCaps for both zero-shot and fine-tuning settings, audio-visual event classification and audio-visual retrieval tasks on AudioSet and VGGSound.
Published: 2024

17. A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators

Author: Zhang, Chen, D'Haro, Luis Fernando, Chen, Yiming, Zhang, Malu, and Li, Haizhou
Subjects: Computer Science - Computation and Language
Abstract: Automatic evaluation is an integral aspect of dialogue system research. The traditional reference-based NLG metrics are generally found to be unsuitable for dialogue assessment. Consequently, recent studies have suggested various unique, reference-free neural metrics that better align with human evaluations. Notably among them, large language models (LLMs), particularly the instruction-tuned variants like ChatGPT, are shown to be promising substitutes for human judges. Yet, existing works on utilizing LLMs for automatic dialogue evaluation are limited in their scope in terms of the number of meta-evaluation datasets, mode of evaluation, coverage of LLMs, etc. Hence, it remains inconclusive how effective these LLMs are. To this end, we conduct a comprehensive study on the application of LLMs for automatic dialogue evaluation. Specifically, we analyze the multi-dimensional evaluation capability of 30 recently emerged LLMs at both turn and dialogue levels, using a comprehensive set of 12 meta-evaluation datasets. Additionally, we probe the robustness of the LLMs in handling various adversarial perturbations at both turn and dialogue levels. Finally, we explore how model-level and dimension-level ensembles impact the evaluation performance. All resources are available at https://github.com/e0397123/comp-analysis., Comment: An extended version of AAAI-2024 camera-ready paper (appendix included, 16 pages)
Published: 2023

18. JoReS-Diff: Joint Retinex and Semantic Priors in Diffusion Model for Low-light Image Enhancement

Author: Wu, Yuhui, Wang, Guoqing, Wang, Zhiwen, Yang, Yang, Li, Tianyu, Zhang, Malu, Li, Chongyi, and Shen, Heng Tao
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Low-light image enhancement (LLIE) has achieved promising performance by employing conditional diffusion models. Despite the success of some conditional methods, previous methods may neglect the importance of a sufficient formulation of task-specific condition strategy, resulting in suboptimal visual outcomes. In this study, we propose JoReS-Diff, a novel approach that incorporates Retinex- and semantic-based priors as the additional pre-processing condition to regulate the generating capabilities of the diffusion model. We first leverage pre-trained decomposition network to generate the Retinex prior, which is updated with better quality by an adjustment network and integrated into a refinement network to implement Retinex-based conditional generation at both feature- and image-levels. Moreover, the semantic prior is extracted from the input image with an off-the-shelf semantic segmentation model and incorporated through semantic attention layers. By treating Retinex- and semantic-based priors as the condition, JoReS-Diff presents a unique perspective for establishing an diffusion model for LLIE and similar image enhancement tasks. Extensive experiments validate the rationality and superiority of our approach., Comment: Accepted by ACM MM 2024
Published: 2023

19. SynA-ResNet: Spike-driven ResNet Achieved through OR Residual Connection

Author: Shan, Yimeng, Qiu, Xuerui, Zhu, Rui-jie, Eshraghian, Jason K., Zhang, Malu, and Qu, Haicheng
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Spiking Neural Networks (SNNs) have garnered substantial attention in brain-like computing for their biological fidelity and the capacity to execute energy-efficient spike-driven operations. As the demand for heightened performance in SNNs surges, the trend towards training deeper networks becomes imperative, while residual learning stands as a pivotal method for training deep neural networks. In our investigation, we identified that the SEW-ResNet, a prominent representative of deep residual spiking neural networks, incorporates non-event-driven operations. To rectify this, we propose a novel training paradigm that first accumulates a large amount of redundant information through OR Residual Connection (ORRC), and then filters out the redundant information using the Synergistic Attention (SynA) module, which promotes feature extraction in the backbone while suppressing the influence of noise and useless features in the shortcuts. When integrating SynA into the network, we observed the phenomenon of "natural pruning", where after training, some or all of the shortcuts in the network naturally drop out without affecting the model's classification accuracy. This significantly reduces computational overhead and makes it more suitable for deployment on edge devices. Experimental results on various public datasets confirmed that the SynA-ResNet achieved single-sample classification with as little as 0.8 spikes per neuron. Moreover, when compared to other residual SNN models, it exhibited higher accuracy and up to a 28-fold reduction in energy consumption., Comment: 12 pages, 5 figures and 10 tables
Published: 2023

20. ESVAE: An Efficient Spiking Variational Autoencoder with Reparameterizable Poisson Spiking Sampling

Author: Zhan, Qiugang, Tao, Ran, Xie, Xiurui, Liu, Guisong, Zhang, Malu, Tang, Huajin, and Yang, Yang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Neural and Evolutionary Computing
Abstract: In recent years, studies on image generation models of spiking neural networks (SNNs) have gained the attention of many researchers. Variational autoencoders (VAEs), as one of the most popular image generation models, have attracted a lot of work exploring their SNN implementation. Due to the constrained binary representation in SNNs, existing SNN VAE methods implicitly construct the latent space by an elaborated autoregressive network and use the network outputs as the sampling variables. However, this unspecified implicit representation of the latent space will increase the difficulty of generating high-quality images and introduces additional network parameters. In this paper, we propose an efficient spiking variational autoencoder (ESVAE) that constructs an interpretable latent space distribution and design a reparameterizable spiking sampling method. Specifically, we construct the prior and posterior of the latent space as a Poisson distribution using the firing rate of the spiking neurons. Subsequently, we propose a reparameterizable Poisson spiking sampling method, which is free from the additional network. Comprehensive experiments have been conducted, and the experimental results show that the proposed ESVAE outperforms previous SNN VAE methods in reconstructed & generated images quality. In addition, experiments demonstrate that ESVAE's encoder is able to retain the original image information more efficiently, and the decoder is more robust. The source code is available at https://github.com/QgZhan/ESVAE., Comment: This work has been submitted to the IEEE for possible publication
Published: 2023

21. Tensor Decomposition Based Attention Module for Spiking Neural Networks

Author: Deng, Haoyu, Zhu, Ruijie, Qiu, Xuerui, Duan, Yule, Zhang, Malu, and Deng, Liangjian
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition
Abstract: The attention mechanism has been proven to be an effective way to improve spiking neural network (SNN). However, based on the fact that the current SNN input data flow is split into tensors to process on GPUs, none of the previous works consider the properties of tensors to implement an attention module. This inspires us to rethink current SNN from the perspective of tensor-relevant theories. Using tensor decomposition, we design the \textit{projected full attention} (PFA) module, which demonstrates excellent results with linearly growing parameters. Specifically, PFA is composed by the \textit{linear projection of spike tensor} (LPST) module and \textit{attention map composing} (AMC) module. In LPST, we start by compressing the original spike tensor into three projected tensors using a single property-preserving strategy with learnable parameters for each dimension. Then, in AMC, we exploit the inverse procedure of the tensor decomposition process to combine the three tensors into the attention map using a so-called connecting factor. To validate the effectiveness of the proposed PFA module, we integrate it into the widely used VGG and ResNet architectures for classification tasks. Our method achieves state-of-the-art performance on both static and dynamic benchmark datasets, surpassing the existing SNN models with Transformer-based and CNN-based backbones., Comment: Accepted by Knowledge-Based Systems
Published: 2023

22. Delayed Memory Unit: Modelling Temporal Dependency Through Delay Gate

Author: Sun, Pengfei, Wu, Jibin, Zhang, Malu, Devos, Paul, and Botteldooren, Dick
Subjects: Computer Science - Neural and Evolutionary Computing, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Audio and Speech Processing, Electrical Engineering and Systems Science - Signal Processing
Abstract: Recurrent Neural Networks (RNNs) are widely recognized for their proficiency in modeling temporal dependencies, making them highly prevalent in sequential data processing applications. Nevertheless, vanilla RNNs are confronted with the well-known issue of gradient vanishing and exploding, posing a significant challenge for learning and establishing long-range dependencies. Additionally, gated RNNs tend to be over-parameterized, resulting in poor computational efficiency and network generalization. To address these challenges, this paper proposes a novel Delayed Memory Unit (DMU). The DMU incorporates a delay line structure along with delay gates into vanilla RNN, thereby enhancing temporal interaction and facilitating temporal credit assignment. Specifically, the DMU is designed to directly distribute the input information to the optimal time instant in the future, rather than aggregating and redistributing it over time through intricate network dynamics. Our proposed DMU demonstrates superior temporal modeling capabilities across a broad range of sequential modeling tasks, utilizing considerably fewer parameters than other state-of-the-art gated RNN models in applications such as speech recognition, radar gesture recognition, ECG waveform segmentation, and permuted sequential image classification., Comment: Accepted for publication in IEEE Transactions on Neural Networks and Learning Systems, 2024
Published: 2023
Full Text: View/download PDF

23. LC-TTFS: Towards Lossless Network Conversion for Spiking Neural Networks with TTFS Coding

Author: Yang, Qu, Zhang, Malu, Wu, Jibin, Tan, Kay Chen, and Li, Haizhou
Subjects: Computer Science - Neural and Evolutionary Computing
Abstract: The biological neurons use precise spike times, in addition to the spike firing rate, to communicate with each other. The time-to-first-spike (TTFS) coding is inspired by such biological observation. However, there is a lack of effective solutions for training TTFS-based spiking neural network (SNN). In this paper, we put forward a simple yet effective network conversion algorithm, which is referred to as LC-TTFS, by addressing two main problems that hinder an effective conversion from a high-performance artificial neural network (ANN) to a TTFS-based SNN. We show that our algorithm can achieve a near-perfect mapping between the activation values of an ANN and the spike times of an SNN on a number of challenging AI tasks, including image classification, image reconstruction, and speech enhancement. With TTFS coding, we can achieve up to orders of magnitude saving in computation over ANN and other rate-based SNNs. The study, therefore, paves the way for deploying ultra-low-power TTFS-based SNNs on power-constrained edge computing platforms.
Published: 2023

24. Rethinking Relation Classification with Graph Meaning Representations

Author: Zhou, Li, Chen, Wenyu, Zeng, Dingyi, Zhang, Malu, and Hershcovich, Daniel
Subjects: Computer Science - Computation and Language
Abstract: In the field of natural language understanding, the intersection of neural models and graph meaning representations (GMRs) remains a compelling area of research. Despite the growing interest, a critical gap persists in understanding the exact influence of GMRs, particularly concerning relation extraction tasks. Addressing this, we introduce DAGNN-plus, a simple and parameter-efficient neural architecture designed to decouple contextual representation learning from structural information propagation. Coupled with various sequence encoders and GMRs, this architecture provides a foundation for systematic experimentation on two English and two Chinese datasets. Our empirical analysis utilizes four different graph formalisms and nine parsers. The results yield a nuanced understanding of GMRs, showing improvements in three out of the four datasets, particularly favoring English over Chinese due to highly accurate parsers. Interestingly, GMRs appear less effective in literary-domain datasets compared to general-domain datasets. These findings lay the groundwork for better-informed design of GMRs and parsers to improve relation classification, which is expected to tangibly impact the future trajectory of natural language understanding research., Comment: 10 pages
Published: 2023

25. Utilizing Contextual Clues and Role Correlations for Enhancing Document-level Event Argument Extraction

Author: Liu, Wanlong, Zeng, Dingyi, Zhou, Li, Xiao, Yichen, Kong, Weishan, Zhang, Malu, Cheng, Shaohuan, Zhao, Hongyang, and Chen, Wenyu
Subjects: Computer Science - Computation and Language, Computer Science - Information Retrieval
Abstract: Document-level event argument extraction is a crucial yet challenging task within the field of information extraction. Current mainstream approaches primarily focus on the information interaction between event triggers and their arguments, facing two limitations: insufficient context interaction and the ignorance of event correlations. Here, we introduce a novel framework named CARLG (Contextual Aggregation of clues and Role-based Latent Guidance), comprising two innovative components: the Contextual Clues Aggregation (CCA) and the Role-based Latent Information Guidance (RLIG). The CCA module leverages the attention weights derived from a pre-trained encoder to adaptively assimilates broader contextual information, while the RLIG module aims to capture the semantic correlations among event roles. We then instantiate the CARLG framework into two variants based on two types of current mainstream EAE approaches. Notably, our CARLG framework introduces less than 1% new parameters yet significantly improving the performance. Comprehensive experiments across the RAMS, WikiEvents, and MLEE datasets confirm the superiority of CARLG, showing significant superiority in terms of both performance and inference speed compared to major benchmarks. Further analyses demonstrate the effectiveness of the proposed modules., Comment: pre-submission
Published: 2023

26. Spiking-LEAF: A Learnable Auditory front-end for Spiking Neural Networks

Author: Song, Zeyang, Wu, Jibin, Zhang, Malu, Shou, Mike Zheng, and Li, Haizhou
Subjects: Computer Science - Sound, Computer Science - Neural and Evolutionary Computing, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Brain-inspired spiking neural networks (SNNs) have demonstrated great potential for temporal signal processing. However, their performance in speech processing remains limited due to the lack of an effective auditory front-end. To address this limitation, we introduce Spiking-LEAF, a learnable auditory front-end meticulously designed for SNN-based speech processing. Spiking-LEAF combines a learnable filter bank with a novel two-compartment spiking neuron model called IHC-LIF. The IHC-LIF neurons draw inspiration from the structure of inner hair cells (IHC) and they leverage segregated dendritic and somatic compartments to effectively capture multi-scale temporal dynamics of speech signals. Additionally, the IHC-LIF neurons incorporate the lateral feedback mechanism along with spike regularization loss to enhance spike encoding efficiency. On keyword spotting and speaker identification tasks, the proposed Spiking-LEAF outperforms both SOTA spiking auditory front-ends and conventional real-valued acoustic features in terms of classification accuracy, noise robustness, and encoding efficiency., Comment: Accepted by ICASSP2024
Published: 2023

27. Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert

Author: Wang, Jiadong, Qian, Xinyuan, Zhang, Malu, Tan, Robby T., and Li, Haizhou
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Talking face generation, also known as speech-to-lip generation, reconstructs facial motions concerning lips given coherent speech input. The previous studies revealed the importance of lip-speech synchronization and visual quality. Despite much progress, they hardly focus on the content of lip movements i.e., the visual intelligibility of the spoken words, which is an important aspect of generation quality. To address the problem, we propose using a lip-reading expert to improve the intelligibility of the generated lip regions by penalizing the incorrect generation results. Moreover, to compensate for data scarcity, we train the lip-reading expert in an audio-visual self-supervised manner. With a lip-reading expert, we propose a novel contrastive learning to enhance lip-speech synchronization, and a transformer to encode audio synchronically with video, while considering global temporal dependency of audio. For evaluation, we propose a new strategy with two different lip-reading experts to measure intelligibility of the generated videos. Rigorous experiments show that our proposal is superior to other State-of-the-art (SOTA) methods, such as Wav2Lip, in reading intelligibility i.e., over 38% Word Error Rate (WER) on LRS2 dataset and 27.8% accuracy on LRW dataset. We also achieve the SOTA performance in lip-speech synchronization and comparable performances in visual quality., Comment: accepted by CVPR 2023
Published: 2023

28. Training Spiking Neural Networks with Local Tandem Learning

Author: Yang, Qu, Wu, Jibin, Zhang, Malu, Chua, Yansong, Wang, Xinchao, and Li, Haizhou
Subjects: Computer Science - Neural and Evolutionary Computing
Abstract: Spiking neural networks (SNNs) are shown to be more biologically plausible and energy efficient over their predecessors. However, there is a lack of an efficient and generalized training method for deep SNNs, especially for deployment on analog computing substrates. In this paper, we put forward a generalized learning rule, termed Local Tandem Learning (LTL). The LTL rule follows the teacher-student learning approach by mimicking the intermediate feature representations of a pre-trained ANN. By decoupling the learning of network layers and leveraging highly informative supervisor signals, we demonstrate rapid network convergence within five training epochs on the CIFAR-10 dataset while having low computational complexity. Our experimental results have also shown that the SNNs thus trained can achieve comparable accuracies to their teacher ANNs on CIFAR-10, CIFAR-100, and Tiny ImageNet datasets. Moreover, the proposed LTL rule is hardware friendly. It can be easily implemented on-chip to perform fast parameter calibration and provide robustness against the notorious device non-ideality issues. It, therefore, opens up a myriad of opportunities for training and deployment of SNN on ultra-low-power mixed-signal neuromorphic computing chips.10, Comment: Accepted by NeurIPS 2022
Published: 2022

29. TCJA-SNN: Temporal-Channel Joint Attention for Spiking Neural Networks

Author: Zhu, Rui-Jie, Zhang, Malu, Zhao, Qihang, Deng, Haoyu, Duan, Yule, and Deng, Liang-Jian
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Spiking Neural Networks (SNNs) are attracting widespread interest due to their biological plausibility, energy efficiency, and powerful spatio-temporal information representation ability. Given the critical role of attention mechanisms in enhancing neural network performance, the integration of SNNs and attention mechanisms exhibits potential to deliver energy-efficient and high-performance computing paradigms. We present a novel Temporal-Channel Joint Attention mechanism for SNNs, referred to as TCJA-SNN. The proposed TCJA-SNN framework can effectively assess the significance of spike sequence from both spatial and temporal dimensions. More specifically, our essential technical contribution lies on: 1) We employ the squeeze operation to compress the spike stream into an average matrix. Then, we leverage two local attention mechanisms based on efficient 1D convolutions to facilitate comprehensive feature extraction at the temporal and channel levels independently. 2) We introduce the Cross Convolutional Fusion (CCF) layer as a novel approach to model the inter-dependencies between the temporal and channel scopes. This layer breaks the independence of these two dimensions and enables the interaction between features. Experimental results demonstrate that the proposed TCJA-SNN outperforms SOTA by up to 15.7% accuracy on standard static and neuromorphic datasets, including Fashion-MNIST, CIFAR10-DVS, N-Caltech 101, and DVS128 Gesture. Furthermore, we apply the TCJA-SNN framework to image generation tasks by leveraging a variation autoencoder. To the best of our knowledge, this study is the first instance where the SNN-attention mechanism has been employed for image classification and generation tasks. Notably, our approach has achieved SOTA performance in both domains, establishing a significant advancement in the field. Codes are available at https://github.com/ridgerchu/TCJA., Comment: Accepted by IEEE Transactions on Neural Networks and Learning Systems
Published: 2022

30. Delay learning based on temporal coding in Spiking Neural Networks

Author: Sun, Pengfei, Wu, Jibin, Zhang, Malu, Devos, Paul, and Botteldooren, Dick
Published: 2024
Full Text: View/download PDF

31. Federated learning for spiking neural networks by hint-layer knowledge distillation

Author: Xie, Xiurui, Feng, Jingxuan, Liu, Guisong, Zhan, Qiugang, Liu, Zhetong, and Zhang, Malu
Published: 2024
Full Text: View/download PDF

32. DPGNN: Dual-Perception Graph Neural Network for Representation Learning

Author: Zhou, Li, Chen, Wenyu, Zeng, Dingyi, Cheng, Shaohuan, Liu, Wanlong, Zhang, Malu, and Qu, Hong
Subjects: Computer Science - Machine Learning
Abstract: Graph neural networks (GNNs) have drawn increasing attention in recent years and achieved remarkable performance in many graph-based tasks, especially in semi-supervised learning on graphs. However, most existing GNNs are based on the message-passing paradigm to iteratively aggregate neighborhood information in a single topology space. Despite their success, the expressive power of GNNs is limited by some drawbacks, such as inflexibility of message source expansion, negligence of node-level message output discrepancy, and restriction of single message space. To address these drawbacks, we present a novel message-passing paradigm, based on the properties of multi-step message source, node-specific message output, and multi-space message interaction. To verify its validity, we instantiate the new message-passing paradigm as a Dual-Perception Graph Neural Network (DPGNN), which applies a node-to-step attention mechanism to aggregate node-specific multi-step neighborhood information adaptively. Our proposed DPGNN can capture the structural neighborhood information and the feature-related information simultaneously for graph representation learning. Experimental results on six benchmark datasets with different topological structures demonstrate that our method outperforms the latest state-of-the-art models, which proves the superiority and versatility of our method. To our knowledge, we are the first to consider node-specific message passing in the GNNs., Comment: Published in Knowledge-Based Systems
Published: 2021
Full Text: View/download PDF

33. Tensor decomposition based attention module for spiking neural networks

Author: Deng, Haoyu, Zhu, Ruijie, Qiu, Xuerui, Duan, Yule, Zhang, Malu, and Deng, Liang-Jian
Published: 2024
Full Text: View/download PDF

34. QDAP: Downsizing adaptive policy for cooperative multi-agent reinforcement learning

Author: Zhao, Zhitong, Zhang, Ya, Wang, Siying, Zhang, Fan, Zhang, Malu, and Chen, Wenyu
Published: 2024
Full Text: View/download PDF

35. A universal ANN-to-SNN framework for achieving high accuracy and low latency deep Spiking Neural Networks

Author: Wang, Yuchen, Liu, Hanwen, Zhang, Malu, Luo, Xiaoling, and Qu, Hong
Published: 2024
Full Text: View/download PDF

36. Sequential Random Network for Fine-grained Image Classification

Author: Li, Chaorong, Zhang, Malu, Huang, Wei, Qin, Fengqing, Zeng, Anping, and Huang, Yuanyuan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Deep Convolutional Neural Network (DCNN) and Transformer have achieved remarkable successes in image recognition. However, their performance in fine-grained image recognition is still difficult to meet the requirements of actual needs. This paper proposes a Sequence Random Network (SRN) to enhance the performance of DCNN. The output of DCNN is one-dimensional features. This one-dimensional feature abstractly represents image information, but it does not express well the detailed information of image. To address this issue, we use the proposed SRN which composed of BiLSTM and several Tanh-Dropout blocks (called BiLSTM-TDN), to further process DCNN one-dimensional features for highlighting the detail information of image. After the feature transform by BiLSTM-TDN, the recognition performance has been greatly improved. We conducted the experiments on six fine-grained image datasets. Except for FGVC-Aircraft, the accuracies of the proposed methods on the other datasets exceeded 99%. Experimental results show that BiLSTM-TDN is far superior to the existing state-of-the-art methods. In addition to DCNN, BiLSTM-TDN can also be extended to other models, such as Transformer., Comment: The performance of the model is very severely affected by the order of the test samples
Published: 2021

37. Efficient spiking neural network design via neural architecture search

Author: Yan, Jiaqi, Liu, Qianhui, Zhang, Malu, Feng, Lang, Ma, De, Li, Haizhou, and Pan, Gang
Published: 2024
Full Text: View/download PDF

38. Multi-Tones' Phase Coding (MTPC) of Interaural Time Difference by Spiking Neural Network

Author: Pan, Zihan, Zhang, Malu, Wu, Jibin, and Li, Haizhou
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Neural and Evolutionary Computing, Computer Science - Sound
Abstract: Inspired by the mammal's auditory localization pathway, in this paper we propose a pure spiking neural network (SNN) based computational model for precise sound localization in the noisy real-world environment, and implement this algorithm in a real-time robotic system with a microphone array. The key of this model relies on the MTPC scheme, which encodes the interaural time difference (ITD) cues into spike patterns. This scheme naturally follows the functional structures of the human auditory localization system, rather than artificially computing of time difference of arrival. Besides, it highlights the advantages of SNN, such as event-driven and power efficiency. The MTPC is pipelined with two different SNN architectures, the convolutional SNN and recurrent SNN, by which it shows the applicability to various SNNs. This proposal is evaluated by the microphone collected location-dependent acoustic data, in a real-world environment with noise, obstruction, reflection, or other affects. The experiment results show a mean error azimuth of 1~3 degrees, which surpasses the accuracy of the other biologically plausible neuromorphic approach for sound source localization.
Published: 2020

39. You Only Spike Once: Improving Energy-Efficient Neuromorphic Inference to ANN-Level Accuracy

Author: P, Srivatsa, Chu, Kyle Timothy Ng, Amornpaisannon, Burin, Tavva, Yaswanth, Miriyala, Venkata Pavan Kumar, Wu, Jibin, Zhang, Malu, Li, Haizhou, and Carlson, Trevor E.
Subjects: Computer Science - Neural and Evolutionary Computing, Computer Science - Artificial Intelligence, Computer Science - Hardware Architecture, Computer Science - Machine Learning
Abstract: In the past decade, advances in Artificial Neural Networks (ANNs) have allowed them to perform extremely well for a wide range of tasks. In fact, they have reached human parity when performing image recognition, for example. Unfortunately, the accuracy of these ANNs comes at the expense of a large number of cache and/or memory accesses and compute operations. Spiking Neural Networks (SNNs), a type of neuromorphic, or brain-inspired network, have recently gained significant interest as power-efficient alternatives to ANNs, because they are sparse, accessing very few weights, and typically only use addition operations instead of the more power-intensive multiply-and-accumulate (MAC) operations. The vast majority of neuromorphic hardware designs support rate-encoded SNNs, where the information is encoded in spike rates. Rate-encoded SNNs could be seen as inefficient as an encoding scheme because it involves the transmission of a large number of spikes. A more efficient encoding scheme, Time-To-First-Spike (TTFS) encoding, encodes information in the relative time of arrival of spikes. While TTFS-encoded SNNs are more efficient than rate-encoded SNNs, they have, up to now, performed poorly in terms of accuracy compared to previous methods. Hence, in this work, we aim to overcome the limitations of TTFS-encoded neuromorphic systems. To accomplish this, we propose: (1) a novel optimization algorithm for TTFS-encoded SNNs converted from ANNs and (2) a novel hardware accelerator for TTFS-encoded SNNs, with a scalable and low-power design. Overall, our work in TTFS encoding and training improves the accuracy of SNNs to achieve state-of-the-art results on MNIST MLPs, while reducing power consumption by 1.46$\times$ over the state-of-the-art neuromorphic hardware., Comment: 10 pages, 4 figures. This work has been submitted to the IEEE for possible publication. This work is an extended version of the paper accepted to the 2nd Workshop on Accelerated Machine Learning (AccML 2020)
Published: 2020

40. Rectified Linear Postsynaptic Potential Function for Backpropagation in Deep Spiking Neural Networks

Author: Zhang, Malu, Wang, Jiadong, Amornpaisannon, Burin, Zhang, Zhixuan, Miriyala, VPK, Belatreche, Ammar, Qu, Hong, Wu, Jibin, Chua, Yansong, Carlson, Trevor E., and Li, Haizhou
Subjects: Computer Science - Neural and Evolutionary Computing, Computer Science - Machine Learning
Abstract: Spiking Neural Networks (SNNs) use spatio-temporal spike patterns to represent and transmit information, which is not only biologically realistic but also suitable for ultra-low-power event-driven neuromorphic implementation. Motivated by the success of deep learning, the study of Deep Spiking Neural Networks (DeepSNNs) provides promising directions for artificial intelligence applications. However, training of DeepSNNs is not straightforward because the well-studied error back-propagation (BP) algorithm is not directly applicable. In this paper, we first establish an understanding as to why error back-propagation does not work well in DeepSNNs. To address this problem, we propose a simple yet efficient Rectified Linear Postsynaptic Potential function (ReL-PSP) for spiking neurons and propose a Spike-Timing-Dependent Back-Propagation (STDBP) learning algorithm for DeepSNNs. In STDBP algorithm, the timing of individual spikes is used to convey information (temporal coding), and learning (back-propagation) is performed based on spike timing in an event-driven manner. Our experimental results show that the proposed learning algorithm achieves state-of-the-art classification accuracy in single spike time based learning algorithms of DeepSNNs. Furthermore, by utilizing the trained model parameters obtained from the proposed STDBP learning algorithm, we demonstrate the ultra-low-power inference operations on a recently proposed neuromorphic inference accelerator. Experimental results show that the neuromorphic hardware consumes 0.751~mW of the total power consumption and achieves a low latency of 47.71~ms to classify an image from the MNIST dataset. Overall, this work investigates the contribution of spike timing dynamics to information encoding, synaptic plasticity and decision making, providing a new perspective to design of future DeepSNNs and neuromorphic hardware systems., Comment: This work has been submitted to the IEEE for possible publication. Copyrightmay be transferred without notice, after which this version may no longer beaccessible
Published: 2020

41. Temporal-Sequential Learning with Columnar-Structured Spiking Neural Networks

Author: Luo, Xiaoling, Liu, Hanwen, Chen, Yi, Zhang, Malu, Qu, Hong, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Tanveer, Mohammad, editor, Agarwal, Sonali, editor, Ozawa, Seiichi, editor, Ekbal, Asif, editor, and Jatowt, Adam, editor
Published: 2023
Full Text: View/download PDF

42. A two-stage spiking meta-learning method for few-shot classification

Author: Zhan, Qiugang, Wang, Bingchao, Jiang, Anning, Xie, Xiurui, Zhang, Malu, and Liu, Guisong
Published: 2024
Full Text: View/download PDF

43. Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition

Author: Wu, Jibin, Yilmaz, Emre, Zhang, Malu, Li, Haizhou, and Tan, Kay Chen
Subjects: Computer Science - Neural and Evolutionary Computing, Computer Science - Computation and Language
Abstract: Artificial neural networks (ANN) have become the mainstream acoustic modeling technique for large vocabulary automatic speech recognition (ASR). A conventional ANN features a multi-layer architecture that requires massive amounts of computation. The brain-inspired spiking neural networks (SNN) closely mimic the biological neural networks and can operate on low-power neuromorphic hardware with spike-based computation. Motivated by their unprecedented energyefficiency and rapid information processing capability, we explore the use of SNNs for speech recognition. In this work, we use SNNs for acoustic modeling and evaluate their performance on several large vocabulary recognition scenarios. The experimental results demonstrate competitive ASR accuracies to their ANN counterparts, while require significantly reduced computational cost and inference time. Integrating the algorithmic power of deep SNNs with energy-efficient neuromorphic hardware, therefore, offer an attractive solution for ASR applications running locally on mobile and embedded devices., Comment: Submitted to Frontier of Neuroscience
Published: 2019

44. Neural Population Coding for Effective Temporal Classification

Author: Pan, Zihan, Wu, Jibin, Chua, Yansong, Zhang, Malu, and Li, Haizhou
Subjects: Computer Science - Neural and Evolutionary Computing, Quantitative Biology - Neurons and Cognition
Abstract: Neural encoding plays an important role in faithfully describing the temporally rich patterns, whose instances include human speech and environmental sounds. For tasks that involve classifying such spatio-temporal patterns with the Spiking Neural Networks (SNNs), how these patterns are encoded directly influence the difficulty of the task. In this paper, we compare several existing temporal and population coding schemes and evaluate them on both speech (TIDIGITS) and sound (RWCP) datasets. We show that, with population neural codings, the encoded patterns are linearly separable using the Support Vector Machine (SVM). We note that the population neural codings effectively project the temporal information onto the spatial domain, thus improving linear separability in the spatial dimension, achieving an accuracy of 95\% and 100\% for TIDIGITS and RWCP datasets classified using the SVM, respectively. This observation suggests that an effective neural coding scheme greatly simplifies the classification problem such that a simple linear classifier would suffice. The above datasets are then classified using the Tempotron, an SNN-based classifier. SNN classification results agree with the SVM findings that population neural codings help to improve classification accuracy. Hence, other than the learning algorithm, effective neural encoding is just as important as an SNN designed to recognize spatio-temporal patterns. It is an often neglected but powerful abstraction that deserves further study.
Published: 2019

45. An efficient and perceptually motivated auditory neural encoding and decoding algorithm for spiking neural networks

Author: Pan, Zihan, Chua, Yansong, Wu, Jibin, Zhang, Malu, Li, Haizhou, and Ambikairajah, Eliathamby
Subjects: Computer Science - Sound, Computer Science - Neural and Evolutionary Computing, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Auditory front-end is an integral part of a spiking neural network (SNN) when performing auditory cognitive tasks. It encodes the temporal dynamic stimulus, such as speech and audio, into an efficient, effective and reconstructable spike pattern to facilitate the subsequent processing. However, most of the auditory front-ends in current studies have not made use of recent findings in psychoacoustics and physiology concerning human listening. In this paper, we propose a neural encoding and decoding scheme that is optimized for speech processing. The neural encoding scheme, that we call Biologically plausible Auditory Encoding (BAE), emulates the functions of the perceptual components of the human auditory system, that include the cochlear filter bank, the inner hair cells, auditory masking effects from psychoacoustic models, and the spike neural encoding by the auditory nerve. We evaluate the perceptual quality of the BAE scheme using PESQ; the performance of the BAE based on speech recognition experiments. Finally, we also built and published two spike-version of speech datasets: the Spike-TIDIGITS and the Spike-TIMIT, for researchers to use and benchmarking of future SNN research.
Published: 2019

46. A Tandem Learning Rule for Effective Training and Rapid Inference of Deep Spiking Neural Networks

Author: Wu, Jibin, Chua, Yansong, Zhang, Malu, Li, Guoqi, Li, Haizhou, and Tan, Kay Chen
Subjects: Computer Science - Neural and Evolutionary Computing
Abstract: Spiking neural networks (SNNs) represent the most prominent biologically inspired computing model for neuromorphic computing (NC) architectures. However, due to the non-differentiable nature of spiking neuronal functions, the standard error back-propagation algorithm is not directly applicable to SNNs. In this work, we propose a tandem learning framework, that consists of an SNN and an Artificial Neural Network (ANN) coupled through weight sharing. The ANN is an auxiliary structure that facilitates the error back-propagation for the training of the SNN at the spike-train level. To this end, we consider the spike count as the discrete neural representation in the SNN, and design ANN neuronal activation function that can effectively approximate the spike count of the coupled SNN. The proposed tandem learning rule demonstrates competitive pattern recognition and regression capabilities on both the conventional frame-based and event-based vision datasets, with at least an order of magnitude reduced inference time and total synaptic operations over other state-of-the-art SNN implementations. Therefore, the proposed tandem learning rule offers a novel solution to training efficient, low latency, and high accuracy deep SNNs with low computing resources.
Published: 2019

47. Deep Spiking Neural Network with Spike Count based Learning Rule

Author: Wu, Jibin, Chua, Yansong, Zhang, Malu, Yang, Qu, Li, Guoqi, and Li, Haizhou
Subjects: Computer Science - Neural and Evolutionary Computing
Abstract: Deep spiking neural networks (SNNs) support asynchronous event-driven computation, massive parallelism and demonstrate great potential to improve the energy efficiency of its synchronous analog counterpart. However, insufficient attention has been paid to neural encoding when designing SNN learning rules. Remarkably, the temporal credit assignment has been performed on rate-coded spiking inputs, leading to poor learning efficiency. In this paper, we introduce a novel spike-based learning rule for rate-coded deep SNNs, whereby the spike count of each neuron is used as a surrogate for gradient backpropagation. We evaluate the proposed learning rule by training deep spiking multi-layer perceptron (MLP) and spiking convolutional neural network (CNN) on the UCI machine learning and MNIST handwritten digit datasets. We show that the proposed learning rule achieves state-of-the-art accuracies on all benchmark datasets. The proposed learning rule allows introducing latency, spike rate and hardware constraints into the SNN learning, which is superior to the indirect approach in which conventional artificial neural networks are first trained and then converted to SNNs. Hence, it allows direct deployment to the neuromorphic hardware and supports efficient inference. Notably, a test accuracy of 98.40% was achieved on the MNIST dataset in our experiments with only 10 simulation time steps, when the same latency constraint is imposed during training.
Published: 2019

48. Bio-inspired Active Learning method in spiking neural network

Author: Zhan, Qiugang, Liu, Guisong, Xie, Xiurui, Zhang, Malu, and Sun, Guolin
Published: 2023
Full Text: View/download PDF

49. When to transfer: a dynamic domain adaptation method for effective knowledge transfer

Author: Xie, Xiurui, Cai, Qing, Zhang, Hongjie, Zhang, Malu, Yang, Zeheng, and Liu, Guisong
Published: 2022
Full Text: View/download PDF

50. An Automatic Sound Classification Framework with Non-volatile Memory

Author: Wu, Jibin, Chua, Yansong, Zhang, Malu, Li, Haizhou, Tan, Kay Chen, Lew, Wen Siang, editor, Lim, Gerard Joseph, editor, and Dananjaya, Putu Andhita, editor
Published: 2021
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

220 results on '"Zhang, Malu"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources