21,579 results on '"Xu, Bo"'
Search Results
2. In-situ Self-optimization of Quantum Dot Emission for Lasers by Machine-Learning Assisted Epitaxy
- Author
-
Shen, Chao, Zhan, Wenkang, Pan, Shujie, Hao, Hongyue, Zhuo, Ning, Xin, Kaiyao, Cong, Hui, Xu, Chi, Xu, Bo, Ng, Tien Khee, Chen, Siming, Xue, Chunlai, Liu, Fengqi, Wang, Zhanguo, and Zhao, Chao
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics ,Computer Science - Machine Learning - Abstract
Traditional methods for optimizing light source emissions rely on a time-consuming trial-and-error approach. While in-situ optimization of light source gain media emission during growth is ideal, it has yet to be realized. In this work, we integrate in-situ reflection high-energy electron diffraction (RHEED) with machine learning (ML) to correlate the surface reconstruction with the photoluminescence (PL) of InAs/GaAs quantum dots (QDs), which serve as the active region of lasers. A lightweight ResNet-GLAM model is employed for the real-time processing of RHEED data as input, enabling effective identification of optical performance. This approach guides the dynamic optimization of growth parameters, allowing real-time feedback control to adjust the QDs emission for lasers. We successfully optimized InAs QDs on GaAs substrates, with a 3.2-fold increase in PL intensity and a reduction in full width at half maximum (FWHM) from 36.69 meV to 28.17 meV under initially suboptimal growth conditions. Our automated, in-situ self-optimized lasers with 5-layer InAs QDs achieved electrically pumped continuous-wave operation at 1240 nm with a low threshold current of 150 A/cm2 at room temperature, an excellent performance comparable to samples grown through traditional manual multi-parameter optimization methods. These results mark a significant step toward intelligent, low-cost, and reproductive light emitters production., Comment: 5 figures
- Published
- 2024
3. Proposal of quantum repeater architecture based on Rydberg atom quantum processors
- Author
-
Zhang, Yan-Lei, Jie, Qing-Xuan, Li, Ming, Wu, Shu-Hao, Wang, Zhu-Bo, Zou, Xu-Bo, Zhang, Peng-Fei, Li, Gang, Zhang, Tiancai, Guo, Guang-Can, and Zou, Chang-Ling
- Subjects
Quantum Physics - Abstract
Realizing large-scale quantum networks requires the generation of high-fidelity quantum entanglement states between remote quantum nodes, a key resource for quantum communication, distributed computation and sensing applications. However, entanglement distribution between quantum network nodes is hindered by optical transmission loss and local operation errors. Here, we propose a novel quantum repeater architecture that synergistically integrates Rydberg atom quantum processors with optical cavities to overcome these challenges. Our scheme leverages cavity-mediated interactions for efficient remote entanglement generation, followed by Rydberg interaction-based entanglement purification and swapping. Numerical simulations, incorporating realistic experimental parameters, demonstrate the generation of Bell states with 99\% fidelity at rates of 1.1\,kHz between two nodes in local-area network (distance $0.1\,\mathrm{km}$), and can be extend to metropolitan-area ($25\,\mathrm{km}$) or intercity ($\mathrm{250\,\mathrm{km}}$, with the assitance of frequency converters) network with a rate of 0.1\,kHz. This scalable approach opens up near-term opportunities for exploring quantum network applications and investigating the advantages of distributed quantum information processing., Comment: 3 figures
- Published
- 2024
4. Improve Meta-learning for Few-Shot Text Classification with All You Can Acquire from the Tasks
- Author
-
Liu, Xinyue, Gao, Yunlong, Zong, Linlin, and Xu, Bo
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Meta-learning has emerged as a prominent technology for few-shot text classification and has achieved promising performance. However, existing methods often encounter difficulties in drawing accurate class prototypes from support set samples, primarily due to probable large intra-class differences and small inter-class differences within the task. Recent approaches attempt to incorporate external knowledge or pre-trained language models to augment data, but this requires additional resources and thus does not suit many few-shot scenarios. In this paper, we propose a novel solution to address this issue by adequately leveraging the information within the task itself. Specifically, we utilize label information to construct a task-adaptive metric space, thereby adaptively reducing the intra-class differences and magnifying the inter-class differences. We further employ the optimal transport technique to estimate class prototypes with query set samples together, mitigating the problem of inaccurate and ambiguous support set samples caused by large intra-class differences. We conduct extensive experiments on eight benchmark datasets, and our approach shows obvious advantages over state-of-the-art models across all the tasks on all the datasets. For reproducibility, all the datasets and codes are available at https://github.com/YvoGao/LAQDA., Comment: Accepted by EMNLP 2024 Findings
- Published
- 2024
5. Towards Comprehensive Detection of Chinese Harmful Memes
- Author
-
Lu, Junyu, Xu, Bo, Zhang, Xiaokun, Wang, Hongbo, Zhu, Haohao, Zhang, Dongyu, Yang, Liang, and Lin, Hongfei
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
This paper has been accepted in the NeurIPS 2024 D & B Track. Harmful memes have proliferated on the Chinese Internet, while research on detecting Chinese harmful memes significantly lags behind due to the absence of reliable datasets and effective detectors. To this end, we focus on the comprehensive detection of Chinese harmful memes. We construct ToxiCN MM, the first Chinese harmful meme dataset, which consists of 12,000 samples with fine-grained annotations for various meme types. Additionally, we propose a baseline detector, Multimodal Knowledge Enhancement (MKE), incorporating contextual information of meme content generated by the LLM to enhance the understanding of Chinese memes. During the evaluation phase, we conduct extensive quantitative experiments and qualitative analyses on multiple baselines, including LLMs and our MKE. The experimental results indicate that detecting Chinese harmful memes is challenging for existing models while demonstrating the effectiveness of MKE. The resources for this paper are available at https://github.com/DUT-lujunyu/ToxiCN_MM.
- Published
- 2024
6. PclGPT: A Large Language Model for Patronizing and Condescending Language Detection
- Author
-
Wang, Hongbo, Li, Mingda, Lu, Junyu, Xia, Hebin, Yang, Liang, Xu, Bo, Liu, Ruizhu, and Lin, Hongfei
- Subjects
Computer Science - Computation and Language - Abstract
Disclaimer: Samples in this paper may be harmful and cause discomfort! Patronizing and condescending language (PCL) is a form of speech directed at vulnerable groups. As an essential branch of toxic language, this type of language exacerbates conflicts and confrontations among Internet communities and detrimentally impacts disadvantaged groups. Traditional pre-trained language models (PLMs) perform poorly in detecting PCL due to its implicit toxicity traits like hypocrisy and false sympathy. With the rise of large language models (LLMs), we can harness their rich emotional semantics to establish a paradigm for exploring implicit toxicity. In this paper, we introduce PclGPT, a comprehensive LLM benchmark designed specifically for PCL. We collect, annotate, and integrate the Pcl-PT/SFT dataset, and then develop a bilingual PclGPT-EN/CN model group through a comprehensive pre-training and supervised fine-tuning staircase process to facilitate implicit toxic detection. Group detection results and fine-grained detection from PclGPT and other models reveal significant variations in the degree of bias in PCL towards different vulnerable groups, necessitating increased societal attention to protect them., Comment: Accepted for EMNLP2024 (Findings)
- Published
- 2024
7. Multiscale fusion enhanced spiking neural network for invasive BCI neural signal decoding
- Author
-
Song, Yu, Han, Liyuan, Xu, Bo, and Zhang, Tielin
- Subjects
Computer Science - Neural and Evolutionary Computing ,Computer Science - Artificial Intelligence ,Quantitative Biology - Neurons and Cognition - Abstract
Brain-computer interfaces (BCIs) are an advanced fusion of neuroscience and artificial intelligence, requiring stable and long-term decoding of neural signals. Spiking Neural Networks (SNNs), with their neuronal dynamics and spike-based signal processing, are inherently well-suited for this task. This paper presents a novel approach utilizing a Multiscale Fusion enhanced Spiking Neural Network (MFSNN). The MFSNN emulates the parallel processing and multiscale feature fusion seen in human visual perception to enable real-time, efficient, and energy-conserving neural signal decoding. Initially, the MFSNN employs temporal convolutional networks and channel attention mechanisms to extract spatiotemporal features from raw data. It then enhances decoding performance by integrating these features through skip connections. Additionally, the MFSNN improves generalizability and robustness in cross-day signal decoding through mini-batch supervised generalization learning. In two benchmark invasive BCI paradigms, including the single-hand grasp-and-touch and center-and-out reach tasks, the MFSNN surpasses traditional artificial neural network methods, such as MLP and GRU, in both accuracy and computational efficiency. Moreover, the MFSNN's multiscale feature fusion framework is well-suited for the implementation on neuromorphic chips, offering an energy-efficient solution for online decoding of invasive BCI signals.
- Published
- 2024
8. Knowing When to Ask -- Bridging Large Language Models and Data
- Author
-
Radhakrishnan, Prashanth, Chen, Jennifer, Xu, Bo, Ramaswami, Prem, Pho, Hannah, Olmos, Adriana, Manyika, James, and Guha, R. V.
- Subjects
Computer Science - Computation and Language ,Computer Science - Information Retrieval - Abstract
Large Language Models (LLMs) are prone to generating factually incorrect information when responding to queries that involve numerical and statistical data or other timely facts. In this paper, we present an approach for enhancing the accuracy of LLMs by integrating them with Data Commons, a vast, open-source repository of public statistics from trusted organizations like the United Nations (UN), Center for Disease Control and Prevention (CDC) and global census bureaus. We explore two primary methods: Retrieval Interleaved Generation (RIG), where the LLM is trained to produce natural language queries to retrieve data from Data Commons, and Retrieval Augmented Generation (RAG), where relevant data tables are fetched from Data Commons and used to augment the LLM's prompt. We evaluate these methods on a diverse set of queries, demonstrating their effectiveness in improving the factual accuracy of LLM outputs. Our work represents an early step towards building more trustworthy and reliable LLMs that are grounded in verifiable statistical data and capable of complex factual reasoning., Comment: 39 pages - 25 page paper, 14 page Appendix, 7 figures, 9 tables
- Published
- 2024
9. Optical Spiking Neurons Enable High-Speed and Energy-Efficient Optical Neural Networks
- Author
-
Xu, Bo, Huang, Zefeng, Fang, Yuetong, Wang, Xin, Cheng, Bojun, Yu, Shaoliang, Wang, Zhongrui, and Xu, Renjing
- Subjects
Physics - Optics ,Physics - Computational Physics - Abstract
Optical neural networks (ONNs) perform extensive computations using photons instead of electrons, resulting in passively energy-efficient and low-latency computing. Among various ONNs, the diffractive optical neural networks (DONNs) particularly excel in energy efficiency, bandwidth, and parallelism, therefore attract considerable attention. However, their performance is limited by the inherent constraints of traditional frame-based sensors, which process and produce dense and redundant information at low operating frequency. Inspired by the spiking neurons in human neural system, which utilize a thresholding mechanism to transmit information sparsely and efficiently, we propose integrating a threshold-locking method into neuromorphic vision sensors to generate sparse and binary information, achieving microsecond-level accurate perception similar to human spiking neurons. By introducing novel Binary Dual Adaptive Training (BAT) and Optically Parallel Mixture of Experts (OPMoE) inference methods, the high-speed, spike-based diffractive optical neural network (S2NN) demonstrates an ultra-fast operating speed of 3649 FPS, which is 30 fold faster than that of reported DONNs, delivering a remarkable computational speed of 417.96 TOPS and a system energy efficiency of 12.6 TOPS/W. Our work demonstrates the potential of incorporating neuromorphic architecture to facilitate optical neural network applications in real-world scenarios for both low-level and high-level machine vision tasks.
- Published
- 2024
10. An innovation-based cycle-slip, multipath estimation, detection and mitigation method for tightly coupled GNSS/INS/Vision navigation in urban areas
- Author
-
Xu, Bo, Zhang, Shoujian, Wang, Jingrong, and Li, Jiancheng
- Subjects
Electrical Engineering and Systems Science - Systems and Control - Abstract
Precise, consistent, and reliable positioning is crucial for a multitude of uses. In order to achieve high precision global positioning services, multi-sensor fusion techniques, such as the Global Navigation Satellite System (GNSS)/Inertial Navigation System (INS)/Vision integration system, combine the strengths of various sensors. This technique is essential for localization in complex environments and has been widely used in the mass market. However, frequent signal deterioration and blocking in urban environments exacerbates the degradation of GNSS positioning and negatively impacts the performance of the multi-sensor integration system. For GNSS pseudorange and carrier phase observation data in the urban environment, we offer an innovation-based cycle slip/multipath estimation, detection, and mitigation (I-EDM) method to reduce the influence of multipath effects and cycle slips on location induced by obstruction in urban settings. The method obtains the innovations of GNSS observations with the cluster analysis method. Then the innovations are used to detect the cycle slips and multipath. Compared with the residual-based method, the innovation-based method avoids the residual overfitting caused by the least square method, resulting in better detection of outliers within the GNSS observations. The vehicle tests carried out in urban settings verify the proposed approach. Experimental results indicate that the accuracy of 0.23m, 0.11m, and 0.31m in the east, north and up components can be achieved by the GNSS/INS/Vision tightly coupled system with the I-EDM method, which has a maximum of 21.6% improvement when compared with the residual-based EDM (R-EDM) method.
- Published
- 2024
11. LUK: Empowering Log Understanding with Expert Knowledge from Large Language Models
- Author
-
Ma, Lipeng, Yang, Weidong, Jiang, Sihang, Fei, Ben, Zhou, Mingjie, Li, Shuhao, Xu, Bo, and Xiao, Yanghua
- Subjects
Computer Science - Software Engineering ,Computer Science - Artificial Intelligence - Abstract
Logs play a critical role in providing essential information for system monitoring and troubleshooting. Recently, with the success of pre-trained language models (PLMs) and large language models (LLMs) in natural language processing (NLP), smaller PLMs (such as BERT) and LLMs (like ChatGPT) have become the current mainstream approaches for log analysis. While LLMs possess rich knowledge, their high computational costs and unstable performance make LLMs impractical for analyzing logs directly. In contrast, smaller PLMs can be fine-tuned for specific tasks even with limited computational resources, making them more practical. However, these smaller PLMs face challenges in understanding logs comprehensively due to their limited expert knowledge. To better utilize the knowledge embedded within LLMs for log understanding, this paper introduces a novel knowledge enhancement framework, called LUK, which acquires expert knowledge from LLMs to empower log understanding on a smaller PLM. Specifically, we design a multi-expert collaboration framework based on LLMs consisting of different roles to acquire expert knowledge. In addition, we propose two novel pre-training tasks to enhance the log pre-training with expert knowledge. LUK achieves state-of-the-art results on different log analysis tasks and extensive experiments demonstrate expert knowledge from LLMs can be utilized more effectively to understand logs., Comment: Under review
- Published
- 2024
12. Reuse and Blend: Energy-Efficient Optical Neural Network Enabled by Weight Sharing
- Author
-
Xu, Bo, Fang, Yuetong, Yu, Shaoliang, and Xu, Renjing
- Subjects
Computer Science - Hardware Architecture - Abstract
Optical neural networks (ONN) based on micro-ring resonators (MRR) have emerged as a promising alternative to significantly accelerating the massive matrix-vector multiplication (MVM) operations in artificial intelligence (AI) applications. However, the limited scale of MRR arrays presents a challenge for AI acceleration. The disparity between the small MRR arrays and the large weight matrices in AI necessitates extensive MRR writings, including reprogramming and calibration, resulting in considerable latency and energy overheads. To address this problem, we propose a novel design methodology to lessen the need for frequent weight reloading. Specifically, we propose a reuse and blend (R&B) architecture to support efficient layer-wise and block-wise weight sharing, which allows weights to be reused several times between layers/blocks. Experimental results demonstrate the R&B system can maintain comparable accuracy with 69% energy savings and 57% latency improvement. These results highlight the promise of the R&B to enable the efficient deployment of advanced deep learning models on photonic accelerators.
- Published
- 2024
13. Integrated photonic nonreciprocal devices based on susceptibility-programmable medium
- Author
-
Zhang, Yan-Lei, Li, Ming, Xu, Xin-Biao, Wang, Zhu-Bo, Dong, Chun-Hua, Guo, Guang-Can, Zou, Chang-Ling, and Zou, Xu-Bo
- Subjects
Physics - Optics - Abstract
The switching and control of optical fields based on nonlinear optical effects are often limited to relatively weak nonlinear susceptibility and strong optical pump fields. Here, an optical medium with programmable susceptibility tensor based on polarizable atoms is proposed. Under a structured optical pump, the ground state population of atoms could be efficiently controlled by tuning the chirality and intensity of optical fields, and thus the optical response of the medium is programmable in both space and time. We demonstrate the potential of this approach by engineering the spatial distribution of the complex susceptibility tensor of the medium in photonic structures to realize nonreciprocal optical effects. Specifically, we investigate the advantages of chiral interaction between atoms and photons in an atom-cladded waveguide, theoretically showing that reconfigurable, strong, and fastly switchable isolation of optical signals in a selected optical mode is possible. The susceptibility-programmable medium provides a promising way to efficiently control the optical field, opening up a wide range of applications for integrated photonic devices and structured optics., Comment: 7 pages, 4 figures
- Published
- 2024
14. Haploid Culture and Double Haploid Induction in Medicago sativa L. cv. XinJiangDaYe
- Author
-
Xu, Bo, Wu, Rina, Tang, Fang, Gao, Cuiping, Gao, Xia, and Shi, Fengling
- Published
- 2021
- Full Text
- View/download PDF
15. Don't Click the Bait: Title Debiasing News Recommendation via Cross-Field Contrastive Learning
- Author
-
Shu, Yijie, Zhang, Xiaokun, Wu, Youlin, Xu, Bo, Yang, Liang, and Lin, Hongfei
- Subjects
Computer Science - Information Retrieval - Abstract
News recommendation emerges as a primary means for users to access content of interest from the vast amount of news. The title clickbait extensively exists in news domain and increases the difficulty for news recommendation to offer satisfactory services for users. Fortunately, we find that news abstract, as a critical field of news, aligns cohesively with the news authenticity. To this end, we propose a Title Debiasing News Recommendation with Cross-field Contrastive learning (TDNR-C2) to overcome the title bias by incorporating news abstract. Specifically, a multi-field knowledge extraction module is devised to extract multi-view knowledge about news from various fields. Afterwards, we present a cross-field contrastive learning module to conduct bias removal via contrasting learned knowledge from title and abstract fileds. Experimental results on a real-world dataset demonstrate the superiority of the proposed TDNR-C2 over existing state-of-the-art methods. Further analysis also indicates the significance of news abstract for title debiasing.
- Published
- 2024
16. Autonomous, Self-driving Multi-Step Growth of Semiconductor Heterostructures Guided by Machine Learning
- Author
-
Shen, Chao, Zhan, Wenkang, Sun, Hongyu, Xin, Kaiyao, Xu, Bo, Wang, Zhanguo, and Zhao, Chao
- Subjects
Condensed Matter - Materials Science ,Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Systems and Control - Abstract
The semiconductor industry has prioritized automating repetitive tasks by closed-loop, autonomous experimentation which enables accelerated optimization of complex multi-step processes. The emergence of machine learning (ML) has ushered in automated process with minimal human intervention. In this work, we develop SemiEpi, a self-driving automation platform capable of executing molecular beam epitaxy (MBE) growth with multi-steps, continuous in-situ monitoring, and on-the-fly feedback control. By integrating standard hardware, homemade software, curve fitting, and multiple ML models, SemiEpi operates autonomously, eliminating the need for extensive expertise in MBE processes to achieve optimal outcomes. The platform actively learns from previous experimental results, identifying favorable conditions and proposing new experiments to achieve the desired results. We standardize and optimize growth for InAs/GaAs quantum dots (QDs) heterostructures to showcase the power of ML-guided multi-step growth. A temperature calibration was implemented to get the initial growth condition, and fine control of the process was executed using ML. Leveraging RHEED movies acquired during the growth, SemiEpi successfully identified and optimized a novel route for multi-step heterostructure growth. This work demonstrates the capabilities of closed-loop, ML-guided systems in addressing challenges in multi-step growth for any device. Our method is critical to achieve repeatable materials growth using commercially scalable tools. Our strategy facilitates the development of a hardware-independent process and enhancing process repeatability and stability, even without exhaustive knowledge of growth parameters., Comment: 5 figures
- Published
- 2024
17. Empathy Level Alignment via Reinforcement Learning for Empathetic Response Generation
- Author
-
Ma, Hui, Zhang, Bo, Xu, Bo, Wang, Jian, Lin, Hongfei, and Sun, Xiao
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Empathetic response generation, aiming at understanding the user's situation and feelings and respond empathically, is crucial in building human-like dialogue systems. Previous methods mainly focus on using maximum likelihood estimation as the optimization objective for training response generation models, without taking into account the empathy level alignment between generated responses and target responses. To this end, we propose an empathetic response generation using reinforcement learning (EmpRL) framework. The framework designs an effective empathy reward function and generates empathetic responses by maximizing the expected reward through reinforcement learning. Given the powerful text generation capability of pre-trained language models, EmpRL utilizes the pre-trained T5 model as the generator and conducts further training to initialize the policy. To align the empathy level between generated responses and target responses in the context, an empathy reward function containing three empathy communication mechanisms, i.e., emotional reaction, interpretation, and exploration, is constructed using pre-designed and pre-trained empathy identifiers. Finally, the proximal policy optimization algorithm is used to further train the policy to produce empathetic responses. Both automatic and manual evaluations demonstrate that the proposed EmpRL framework can improve the quality of generated responses, enhance the empathy level similarity between generated and target responses, and produce empathetic responses covering both affective and cognitive aspects.
- Published
- 2024
18. Enhanced Radiation Hardness of InAs/GaAs Quantum Dot Lasers for Space Communication
- Author
-
Li, Manyang, Zhan, Wenkang, Pan, Shujie, Chen, Jinpeng, Cheng, Xiaotian, Ni, Zhibo, Xu, Bo, Yu, Jinling, Jin, Chaoyuan, Chen, Siming, Zhao, Chao, and Wang, Zhanguo
- Subjects
Physics - Applied Physics ,Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
Semiconductor lasers have great potential for space laser communication. However, excessive radiation in space can cause laser failure. Quantum dot (QD) lasers are more resistant to radiation compared to quantum well (QW) and bulk lasers due to better carrier confinement and a smaller active region. Therefore, it is crucial to find the most radiation-tolerant QD structures and compare the radiation tolerance of QD and QW structures at different radiation fluences where the QDs can show their advantages in the best way. Proton and 60Co {\gamma}-ray radiation tests were conducted on different InAs/GaAs QD and InGaAs/GaAs QW materials and devices. The results show that the QD samples were more radiation-tolerant than QW samples within a certain fluence range, and more radiation-tolerant QD structures were identified. Dislocations were found near the QWs but not the QDs after 1 x 1011 cm-2 radiation. Defects were created in all samples after 7 x 1013 cm-2 proton radiation. Additionally, 60Co {\gamma}-rays radiation tests ranging from 10 to 12000 Gy were conducted, and all the samples exhibited good tolerance to total radiation dose effects.
- Published
- 2024
19. Integer-Valued Training and Spike-Driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection
- Author
-
Luo, Xinhao, Yao, Man, Chou, Yuhong, Xu, Bo, and Li, Guoqi
- Subjects
Computer Science - Artificial Intelligence - Abstract
Brain-inspired Spiking Neural Networks (SNNs) have bio-plausibility and low-power advantages over Artificial Neural Networks (ANNs). Applications of SNNs are currently limited to simple classification tasks because of their poor performance. In this work, we focus on bridging the performance gap between ANNs and SNNs on object detection. Our design revolves around network architecture and spiking neuron. First, the overly complex module design causes spike degradation when the YOLO series is converted to the corresponding spiking version. We design a SpikeYOLO architecture to solve this problem by simplifying the vanilla YOLO and incorporating meta SNN blocks. Second, object detection is more sensitive to quantization errors in the conversion of membrane potentials into binary spikes by spiking neurons. To address this challenge, we design a new spiking neuron that activates Integer values during training while maintaining spike-driven by extending virtual timesteps during inference. The proposed method is validated on both static and neuromorphic object detection datasets. On the static COCO dataset, we obtain 66.2% mAP@50 and 48.9% mAP@50:95, which is +15.0% and +18.7% higher than the prior state-of-the-art SNN, respectively. On the neuromorphic Gen1 dataset, we achieve 67.2% mAP@50, which is +2.5% greater than the ANN with equivalent architecture, and the energy efficiency is improved by 5.7*. Code: https://github.com/BICLab/SpikeYOLO, Comment: Accepted by ECCV2024; 19 pages, 4 figures
- Published
- 2024
20. RSC-SNN: Exploring the Trade-off Between Adversarial Robustness and Accuracy in Spiking Neural Networks via Randomized Smoothing Coding
- Author
-
Wu, Keming, Yao, Man, Chou, Yuhong, Qiu, Xuerui, Yang, Rui, Xu, Bo, and Li, Guoqi
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Spiking Neural Networks (SNNs) have received widespread attention due to their unique neuronal dynamics and low-power nature. Previous research empirically shows that SNNs with Poisson coding are more robust than Artificial Neural Networks (ANNs) on small-scale datasets. However, it is still unclear in theory how the adversarial robustness of SNNs is derived, and whether SNNs can still maintain its adversarial robustness advantage on large-scale dataset tasks. This work theoretically demonstrates that SNN's inherent adversarial robustness stems from its Poisson coding. We reveal the conceptual equivalence of Poisson coding and randomized smoothing in defense strategies, and analyze in depth the trade-off between accuracy and adversarial robustness in SNNs via the proposed Randomized Smoothing Coding (RSC) method. Experiments demonstrate that the proposed RSC-SNNs show remarkable adversarial robustness, surpassing ANNs and achieving state-of-the-art robustness results on large-scale dataset ImageNet. Our open-source implementation code is available at this https URL: https://github.com/KemingWu/RSC-SNN., Comment: Accepted by ACM MM 2024
- Published
- 2024
21. SpikeVoice: High-Quality Text-to-Speech Via Efficient Spiking Neural Network
- Author
-
Wang, Kexin, Zhang, Jiahong, Ren, Yong, Yao, Man, Shang, Di, Xu, Bo, and Li, Guoqi
- Subjects
Computer Science - Neural and Evolutionary Computing ,Computer Science - Machine Learning - Abstract
Brain-inspired Spiking Neural Network (SNN) has demonstrated its effectiveness and efficiency in vision, natural language, and speech understanding tasks, indicating their capacity to "see", "listen", and "read". In this paper, we design \textbf{SpikeVoice}, which performs high-quality Text-To-Speech (TTS) via SNN, to explore the potential of SNN to "speak". A major obstacle to using SNN for such generative tasks lies in the demand for models to grasp long-term dependencies. The serial nature of spiking neurons, however, leads to the invisibility of information at future spiking time steps, limiting SNN models to capture sequence dependencies solely within the same time step. We term this phenomenon "partial-time dependency". To address this issue, we introduce Spiking Temporal-Sequential Attention STSA in the SpikeVoice. To the best of our knowledge, SpikeVoice is the first TTS work in the SNN field. We perform experiments using four well-established datasets that cover both Chinese and English languages, encompassing scenarios with both single-speaker and multi-speaker configurations. The results demonstrate that SpikeVoice can achieve results comparable to Artificial Neural Networks (ANN) with only 10.5 energy consumption of ANN., Comment: 9 pages
- Published
- 2024
22. Dilated convolution neural operator for multiscale partial differential equations
- Author
-
Xu, Bo, Liu, Xinliang, and Zhang, Lei
- Subjects
Computer Science - Machine Learning ,Mathematics - Numerical Analysis - Abstract
This paper introduces a data-driven operator learning method for multiscale partial differential equations, with a particular emphasis on preserving high-frequency information. Drawing inspiration from the representation of multiscale parameterized solutions as a combination of low-rank global bases (such as low-frequency Fourier modes) and localized bases over coarse patches (analogous to dilated convolution), we propose the Dilated Convolutional Neural Operator (DCNO). The DCNO architecture effectively captures both high-frequency and low-frequency features while maintaining a low computational cost through a combination of convolution and Fourier layers. We conduct experiments to evaluate the performance of DCNO on various datasets, including the multiscale elliptic equation, its inverse problem, Navier-Stokes equation, and Helmholtz equation. We show that DCNO strikes an optimal balance between accuracy and computational cost and offers a promising solution for multiscale operator learning.
- Published
- 2024
23. GTPT: Group-based Token Pruning Transformer for Efficient Human Pose Estimation
- Author
-
Wang, Haonan, Liu, Jie, Tang, Jie, Wu, Gangshan, Xu, Bo, Chou, Yanbing, and Wang, Yong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In recent years, 2D human pose estimation has made significant progress on public benchmarks. However, many of these approaches face challenges of less applicability in the industrial community due to the large number of parametric quantities and computational overhead. Efficient human pose estimation remains a hurdle, especially for whole-body pose estimation with numerous keypoints. While most current methods for efficient human pose estimation primarily rely on CNNs, we propose the Group-based Token Pruning Transformer (GTPT) that fully harnesses the advantages of the Transformer. GTPT alleviates the computational burden by gradually introducing keypoints in a coarse-to-fine manner. It minimizes the computation overhead while ensuring high performance. Besides, GTPT groups keypoint tokens and prunes visual tokens to improve model performance while reducing redundancy. We propose the Multi-Head Group Attention (MHGA) between different groups to achieve global interaction with little computational overhead. We conducted experiments on COCO and COCO-WholeBody. Compared to other methods, the experimental results show that GTPT can achieve higher performance with less computation, especially in whole-body with numerous keypoints., Comment: ECCV 2024 accepted
- Published
- 2024
24. Take its Essence, Discard its Dross! Debiasing for Toxic Language Detection via Counterfactual Causal Effect
- Author
-
Lu, Junyu, Xu, Bo, Zhang, Xiaokun, Liu, Kaiyuan, Zhang, Dongyu, Yang, Liang, and Lin, Hongfei
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Current methods of toxic language detection (TLD) typically rely on specific tokens to conduct decisions, which makes them suffer from lexical bias, leading to inferior performance and generalization. Lexical bias has both "useful" and "misleading" impacts on understanding toxicity. Unfortunately, instead of distinguishing between these impacts, current debiasing methods typically eliminate them indiscriminately, resulting in a degradation in the detection accuracy of the model. To this end, we propose a Counterfactual Causal Debiasing Framework (CCDF) to mitigate lexical bias in TLD. It preserves the "useful impact" of lexical bias and eliminates the "misleading impact". Specifically, we first represent the total effect of the original sentence and biased tokens on decisions from a causal view. We then conduct counterfactual inference to exclude the direct causal effect of lexical bias from the total effect. Empirical evaluations demonstrate that the debiased TLD model incorporating CCDF achieves state-of-the-art performance in both accuracy and fairness compared to competitive baselines applied on several vanilla models. The generalization capability of our model outperforms current debiased models for out-of-distribution data.
- Published
- 2024
25. CN-DBpedia2: An Extraction and Verification Framework for Enriching Chinese Encyclopedia Knowledge Base
- Author
-
Xu, Bo, Liang, Jiaqing, Xie, Chenhao, Liang, Bin, Chen, Lihan, and Xiao, Yanghua
- Subjects
Information technology ,T58.5-58.64 - Abstract
Knowledge base plays an important role in machine understanding and has been widely used in various applications, such as search engine, recommendation system and question answering. However, most knowledge bases are incomplete, which can cause many downstream applications to perform poorly because they cannot find the corresponding facts in the knowledge bases. In this paper, we propose an extraction and verification framework to enrich the knowledge bases. Specifically, based on the existing knowledge base, we first extract new facts from the description texts of entities. But not all newly-formed facts can be added directly to the knowledge base because the errors might be involved by the extraction. Then we propose a novel crowd-sourcing based verification step to verify the candidate facts. Finally, we apply this framework to the existing knowledge base CN-DBpedia and construct a new version of knowledge base CN-DBpedia2, which additionally contains the high confidence facts extracted from the description texts of entities.
- Published
- 2019
- Full Text
- View/download PDF
26. Concise Total Syntheses of (-)-Crinipellins A and B Enabled by a Controlled Cargill Rearrangement.
- Author
-
Xu, Bo, Zhang, Ziyao, Tantillo, Dean, and Dai, Mingji
- Subjects
Diterpenes ,Stereoisomerism ,Biological Products ,Molecular Structure ,Cycloaddition Reaction ,Alkylation - Abstract
Herein, we report concise total syntheses of diterpene natural products (-)-crinipellins A and B with a tetraquinane skeleton, three adjacent all-carbon quaternary centers, and multiple oxygenated and labile functional groups. Our synthesis features a convergent Kozikowski β-alkylation to unite two readily available building blocks with all the required carbon atoms, an intramolecular photochemical [2 + 2] cycloaddition to install three challenging and adjacent all-carbon quaternary centers and a 5-6-4-5 tetracyclic skeleton, and a controlled Cargill rearrangement to rearrange the 5-6-4-5 tetracyclic skeleton to the desired tetraquinane skeleton. These strategically enabling transformations allowed us to complete total syntheses of (-)-crinipellins A and B in 12 and 13 steps, respectively. The results of quantum chemical computations revealed that the Bronsted acid-catalyzed Cargill rearrangements likely involve stepwise paths to products and the AlR3-catalyzed Cargill rearrangements likely involve a concerted path with asynchronous alkyl shifting events to form the desired product.
- Published
- 2024
27. Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation
- Author
-
Bai, Fengshuo, Zhao, Rui, Zhang, Hongming, Cui, Sijia, Wen, Ying, Yang, Yaodong, Xu, Bo, and Han, Lei
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
Preference-based reinforcement learning (PbRL) has shown impressive capabilities in training agents without reward engineering. However, a notable limitation of PbRL is its dependency on substantial human feedback. This dependency stems from the learning loop, which entails accurate reward learning compounded with value/policy learning, necessitating a considerable number of samples. To boost the learning loop, we propose SEER, an efficient PbRL method that integrates label smoothing and policy regularization techniques. Label smoothing reduces overfitting of the reward model by smoothing human preference labels. Additionally, we bootstrap a conservative estimate $\widehat{Q}$ using well-supported state-action pairs from the current replay memory to mitigate overestimation bias and utilize it for policy learning regularization. Our experimental results across a variety of complex tasks, both in online and offline settings, demonstrate that our approach improves feedback efficiency, outperforming state-of-the-art methods by a large margin. Ablation studies further reveal that SEER achieves a more accurate Q-function compared to prior work.
- Published
- 2024
28. High-Performance Temporal Reversible Spiking Neural Networks with $O(L)$ Training Memory and $O(1)$ Inference Cost
- Author
-
Hu, JiaKui, Yao, Man, Qiu, Xuerui, Chou, Yuhong, Cai, Yuxuan, Qiao, Ning, Tian, Yonghong, XU, Bo, and Li, Guoqi
- Subjects
Computer Science - Neural and Evolutionary Computing - Abstract
Multi-timestep simulation of brain-inspired Spiking Neural Networks (SNNs) boost memory requirements during training and increase inference energy cost. Current training methods cannot simultaneously solve both training and inference dilemmas. This work proposes a novel Temporal Reversible architecture for SNNs (T-RevSNN) to jointly address the training and inference challenges by altering the forward propagation of SNNs. We turn off the temporal dynamics of most spiking neurons and design multi-level temporal reversible interactions at temporal turn-on spiking neurons, resulting in a $O(L)$ training memory. Combined with the temporal reversible nature, we redesign the input encoding and network organization of SNNs to achieve $O(1)$ inference energy cost. Then, we finely adjust the internal units and residual connections of the basic SNN block to ensure the effectiveness of sparse temporal information interaction. T-RevSNN achieves excellent accuracy on ImageNet, while the memory efficiency, training time acceleration, and inference energy efficiency can be significantly improved by $8.6 \times$, $2.0 \times$, and $1.6 \times$, respectively. This work is expected to break the technical bottleneck of significantly increasing memory cost and training time for large-scale SNNs while maintaining high performance and low inference energy cost. Source code and models are available at: https://github.com/BICLab/T-RevSNN., Comment: Accepted by ICML2024
- Published
- 2024
29. Enhanced Spatiotemporal Prediction Using Physical-guided And Frequency-enhanced Recurrent Neural Networks
- Author
-
Zhao, Xuanle, Sun, Yue, Zhang, Tielin, and Xu, Bo
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Spatiotemporal prediction plays an important role in solving natural problems and processing video frames, especially in weather forecasting and human action recognition. Recent advances attempt to incorporate prior physical knowledge into the deep learning framework to estimate the unknown governing partial differential equations (PDEs), which have shown promising results in spatiotemporal prediction tasks. However, previous approaches only restrict neural network architectures or loss functions to acquire physical or PDE features, which decreases the representative capacity of a neural network. Meanwhile, the updating process of the physical state cannot be effectively estimated. To solve the above mentioned problems, this paper proposes a physical-guided neural network, which utilizes the frequency-enhanced Fourier module and moment loss to strengthen the model's ability to estimate the spatiotemporal dynamics. Furthermore, we propose an adaptive second-order Runge-Kutta method with physical constraints to model the physical states more precisely. We evaluate our model on both spatiotemporal and video prediction tasks. The experimental results show that our model outperforms state-of-the-art methods and performs best in several datasets, with a much smaller parameter count., Comment: 11 pages, 8 figures
- Published
- 2024
30. Distilling Implicit Multimodal Knowledge into LLMs for Zero-Resource Dialogue Generation
- Author
-
Zhang, Bo, Ma, Hui, Ding, Jian, Wang, Jian, Xu, Bo, and Lin, Hongfei
- Subjects
Computer Science - Computation and Language ,Computer Science - Multimedia - Abstract
Integrating multimodal knowledge into large language models (LLMs) represents a significant advancement in dialogue generation capabilities. However, the effective incorporation of such knowledge in zero-resource scenarios remains a substantial challenge due to the scarcity of diverse, high-quality dialogue datasets. To address this, we propose the Visual Implicit Knowledge Distillation Framework (VIKDF), an innovative approach aimed at enhancing LLMs for enriched dialogue generation in zero-resource contexts by leveraging implicit multimodal knowledge. VIKDF comprises two main stages: knowledge distillation, using an Implicit Query Transformer to extract and encode visual implicit knowledge from image-text pairs into knowledge vectors; and knowledge integration, employing a novel Bidirectional Variational Information Fusion technique to seamlessly integrate these distilled vectors into LLMs. This enables the LLMs to generate dialogues that are not only coherent and engaging but also exhibit a deep understanding of the context through implicit multimodal cues, effectively overcoming the limitations of zero-resource scenarios. Our extensive experimentation across two dialogue datasets shows that VIKDF outperforms existing state-of-the-art models in generating high-quality dialogues. The code will be publicly available following acceptance., Comment: Under Review
- Published
- 2024
31. Co-learning-aided Multi-modal-deep-learning Framework of Passive DOA Estimators for a Heterogeneous Hybrid Massive MIMO Receiver
- Author
-
Bai, Jiatong, Shu, Feng, Zheng, Qinghe, Xu, Bo, Shi, Baihua, Chen, Yiwen, Zhang, Weibin, and Wang, Xianpeng
- Subjects
Electrical Engineering and Systems Science - Signal Processing ,Computer Science - Artificial Intelligence ,Computer Science - Information Theory - Abstract
Due to its excellent performance in rate and resolution, fully-digital (FD) massive multiple-input multiple-output (MIMO) antenna arrays has been widely applied in data transmission and direction of arrival (DOA) measurements, etc. But it confronts with two main challenges: high computational complexity and circuit cost. The two problems may be addressed well by hybrid analog-digital (HAD) structure. But there exists the problem of phase ambiguity for HAD, which leads to its low-efficiency or high-latency. Does exist there such a MIMO structure of owning low-cost, low-complexity and high time efficiency at the same time. To satisfy the three properties, a novel heterogeneous hybrid MIMO receiver structure of integrating FD and heterogeneous HAD ($\rm{H}^2$AD-FD) is proposed and corresponding multi-modal (MD)-learning framework is developed. The framework includes three major stages: 1) generate the candidate sets via root multiple signal classification (Root-MUSIC) or deep learning (DL); 2) infer the class of true solutions from candidate sets using machine learning (ML) methods; 3) fuse the two-part true solutions to achieve a better DOA estimation. The above process form two methods named MD-Root-MUSIC and MDDL. To improve DOA estimation accuracy and reduce the clustering complexity, a co-learning-aided MD framework is proposed to form two enhanced methods named CoMDDL and CoMD-RootMUSIC. Moreover, the Cramer-Rao lower bound (CRLB) for the proposed $\rm{H}^2$AD-FD structure is also derived. Experimental results demonstrate that our proposed four methods could approach the CRLB for signal-to-noise ratio (SNR) > 0 dB and the proposed CoMDDL and MDDL perform better than CoMD-RootMUSIC and MD-RootMUSIC, particularly in the extremely low SNR region.
- Published
- 2024
32. Enhancing Textual Personality Detection toward Social Media: Integrating Long-term and Short-term Perspectives
- Author
-
Zhu, Haohao, Zhang, Xiaokun, Lu, Junyu, Wu, Youlin, Bai, Zewen, Min, Changrong, Yang, Liang, Xu, Bo, Zhang, Dongyu, and Lin, Hongfei
- Subjects
Computer Science - Computation and Language - Abstract
Textual personality detection aims to identify personality characteristics by analyzing user-generated content toward social media platforms. Numerous psychological literature highlighted that personality encompasses both long-term stable traits and short-term dynamic states. However, existing studies often concentrate only on either long-term or short-term personality representations, without effectively combining both aspects. This limitation hinders a comprehensive understanding of individuals' personalities, as both stable traits and dynamic states are vital. To bridge this gap, we propose a Dual Enhanced Network(DEN) to jointly model users' long-term and short-term personality for textual personality detection. In DEN, a Long-term Personality Encoding is devised to effectively model long-term stable personality traits. Short-term Personality Encoding is presented to capture short-term dynamic personality states. The Bi-directional Interaction component facilitates the integration of both personality aspects, allowing for a comprehensive representation of the user's personality. Experimental results on two personality detection datasets demonstrate the effectiveness of the DEN model and the benefits of considering both the dynamic and stable nature of personality characteristics for textual personality detection., Comment: 11 pages, 9 figures
- Published
- 2024
33. FineRec:Exploring Fine-grained Sequential Recommendation
- Author
-
Zhang, Xiaokun, Xu, Bo, Wu, Youlin, Zhong, Yuan, Lin, Hongfei, and Ma, Fenglong
- Subjects
Computer Science - Information Retrieval ,Computer Science - Artificial Intelligence - Abstract
Sequential recommendation is dedicated to offering items of interest for users based on their history behaviors. The attribute-opinion pairs, expressed by users in their reviews for items, provide the potentials to capture user preferences and item characteristics at a fine-grained level. To this end, we propose a novel framework FineRec that explores the attribute-opinion pairs of reviews to finely handle sequential recommendation. Specifically, we utilize a large language model to extract attribute-opinion pairs from reviews. For each attribute, a unique attribute-specific user-opinion-item graph is created, where corresponding opinions serve as the edges linking heterogeneous user and item nodes. To tackle the diversity of opinions, we devise a diversity-aware convolution operation to aggregate information within the graphs, enabling attribute-specific user and item representation learning. Ultimately, we present an interaction-driven fusion mechanism to integrate attribute-specific user/item representations across all attributes for generating recommendations. Extensive experiments conducted on several realworld datasets demonstrate the superiority of our FineRec over existing state-of-the-art methods. Further analysis also verifies the effectiveness of our fine-grained manner in handling the task., Comment: This work has been accepted by SIGIR24' as a full paper
- Published
- 2024
- Full Text
- View/download PDF
34. Disentangling ID and Modality Effects for Session-based Recommendation
- Author
-
Zhang, Xiaokun, Xu, Bo, Ren, Zhaochun, Wang, Xiaochen, Lin, Hongfei, and Ma, Fenglong
- Subjects
Computer Science - Information Retrieval ,Computer Science - Artificial Intelligence - Abstract
Session-based recommendation aims to predict intents of anonymous users based on their limited behaviors. Modeling user behaviors involves two distinct rationales: co-occurrence patterns reflected by item IDs, and fine-grained preferences represented by item modalities (e.g., text and images). However, existing methods typically entangle these causes, leading to their failure in achieving accurate and explainable recommendations. To this end, we propose a novel framework DIMO to disentangle the effects of ID and modality in the task. At the item level, we introduce a co-occurrence representation schema to explicitly incorporate cooccurrence patterns into ID representations. Simultaneously, DIMO aligns different modalities into a unified semantic space to represent them uniformly. At the session level, we present a multi-view self-supervised disentanglement, including proxy mechanism and counterfactual inference, to disentangle ID and modality effects without supervised signals. Leveraging these disentangled causes, DIMO provides recommendations via causal inference and further creates two templates for generating explanations. Extensive experiments on multiple real-world datasets demonstrate the consistent superiority of DIMO over existing methods. Further analysis also confirms DIMO's effectiveness in generating explanations., Comment: This work has been accepted by SIGIR24' as a full paper
- Published
- 2024
- Full Text
- View/download PDF
35. Sky-GVIO: an enhanced GNSS/INS/Vision navigation with FCN-based sky-segmentation in urban canyon
- Author
-
Wang, Jingrong, Xu, Bo, Jin, Ronghe, Zhang, Shoujian, Gao, Kefu, and Liu, Jingnan
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Electrical Engineering and Systems Science - Signal Processing - Abstract
Accurate, continuous, and reliable positioning is a critical component of achieving autonomous driving. However, in complex urban canyon environments, the vulnerability of a stand-alone sensor and non-line-of-sight (NLOS) caused by high buildings, trees, and elevated structures seriously affect positioning results. To address these challenges, a sky-view images segmentation algorithm based on Fully Convolutional Network (FCN) is proposed for GNSS NLOS detection. Building upon this, a novel NLOS detection and mitigation algorithm (named S-NDM) is extended to the tightly coupled Global Navigation Satellite Systems (GNSS), Inertial Measurement Units (IMU), and visual feature system which is called Sky-GVIO, with the aim of achieving continuous and accurate positioning in urban canyon environments. Furthermore, the system harmonizes Single Point Positioning (SPP) with Real-Time Kinematic (RTK) methodologies to bolster its operational versatility and resilience. In urban canyon environments, the positioning performance of S-NDM algorithm proposed in this paper is evaluated under different tightly coupled SPP-related and RTK-related models. The results exhibit that Sky-GVIO system achieves meter-level accuracy under SPP mode and sub-decimeter precision with RTK, surpassing the performance of GNSS/INS/Vision frameworks devoid of S-NDM. Additionally, the sky-view image dataset, inclusive of training and evaluation subsets, has been made publicly accessible for scholarly exploration at https://github.com/whuwangjr/sky-view-images .
- Published
- 2024
36. Biologically-Plausible Topology Improved Spiking Actor Network for Efficient Deep Reinforcement Learning
- Author
-
Zhang, Duzhen, Wang, Qingyu, Zhang, Tielin, and Xu, Bo
- Subjects
Computer Science - Neural and Evolutionary Computing ,Quantitative Biology - Neurons and Cognition - Abstract
The success of Deep Reinforcement Learning (DRL) is largely attributed to utilizing Artificial Neural Networks (ANNs) as function approximators. Recent advances in neuroscience have unveiled that the human brain achieves efficient reward-based learning, at least by integrating spiking neurons with spatial-temporal dynamics and network topologies with biologically-plausible connectivity patterns. This integration process allows spiking neurons to efficiently combine information across and within layers via nonlinear dendritic trees and lateral interactions. The fusion of these two topologies enhances the network's information-processing ability, crucial for grasping intricate perceptions and guiding decision-making procedures. However, ANNs and brain networks differ significantly. ANNs lack intricate dynamical neurons and only feature inter-layer connections, typically achieved by direct linear summation, without intra-layer connections. This limitation leads to constrained network expressivity. To address this, we propose a novel alternative for function approximator, the Biologically-Plausible Topology improved Spiking Actor Network (BPT-SAN), tailored for efficient decision-making in DRL. The BPT-SAN incorporates spiking neurons with intricate spatial-temporal dynamics and introduces intra-layer connections, enhancing spatial-temporal state representation and facilitating more precise biological simulations. Diverging from the conventional direct linear weighted sum, the BPT-SAN models the local nonlinearities of dendritic trees within the inter-layer connections. For the intra-layer connections, the BPT-SAN introduces lateral interactions between adjacent neurons, integrating them into the membrane potential formula to ensure accurate spike firing., Comment: Work in Progress
- Published
- 2024
37. Fourier or Wavelet bases as counterpart self-attention in spikformer for efficient visual classification
- Author
-
Wang, Qingyu, Zhang, Duzhen, Zhang, Tilelin, and Xu, Bo
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning ,Computer Science - Neural and Evolutionary Computing - Abstract
Energy-efficient spikformer has been proposed by integrating the biologically plausible spiking neural network (SNN) and artificial Transformer, whereby the Spiking Self-Attention (SSA) is used to achieve both higher accuracy and lower computational cost. However, it seems that self-attention is not always necessary, especially in sparse spike-form calculation manners. In this paper, we innovatively replace vanilla SSA (using dynamic bases calculating from Query and Key) with spike-form Fourier Transform, Wavelet Transform, and their combinations (using fixed triangular or wavelets bases), based on a key hypothesis that both of them use a set of basis functions for information transformation. Hence, the Fourier-or-Wavelet-based spikformer (FWformer) is proposed and verified in visual classification tasks, including both static image and event-based video datasets. The FWformer can achieve comparable or even higher accuracies ($0.4\%$-$1.5\%$), higher running speed ($9\%$-$51\%$ for training and $19\%$-$70\%$ for inference), reduced theoretical energy consumption ($20\%$-$25\%$), and reduced GPU memory usage ($4\%$-$26\%$), compared to the standard spikformer. Our result indicates the continuous refinement of new Transformers, that are inspired either by biological discovery (spike-form), or information theory (Fourier or Wavelet Transform), is promising., Comment: 18 pages, 2 figures. arXiv admin note: substantial text overlap with arXiv:2308.02557
- Published
- 2024
38. An Accurate and Real-time Relative Pose Estimation from Triple Point-line Images by Decoupling Rotation and Translation
- Author
-
Xu, Zewen, He, Yijia, Wei, Hao, Xu, Bo, Xie, BinJian, and Wu, Yihong
- Subjects
Computer Science - Robotics ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Line features are valid complements for point features in man-made environments. 3D-2D constraints provided by line features have been widely used in Visual Odometry (VO) and Structure-from-Motion (SfM) systems. However, how to accurately solve three-view relative motion only with 2D observations of points and lines in real time has not been fully explored. In this paper, we propose a novel three-view pose solver based on rotation-translation decoupled estimation. First, a high-precision rotation estimation method based on normal vector coplanarity constraints that consider the uncertainty of observations is proposed, which can be solved by Levenberg-Marquardt (LM) algorithm efficiently. Second, a robust linear translation constraint that minimizes the degree of the rotation components and feature observation components in equations is elaborately designed for estimating translations accurately. Experiments on synthetic data and real-world data show that the proposed approach improves both rotation and translation accuracy compared to the classical trifocal-tensor-based method and the state-of-the-art two-view algorithm in outdoor and indoor environments.
- Published
- 2024
39. URS-NeRF: Unordered Rolling Shutter Bundle Adjustment for Neural Radiance Fields
- Author
-
Xu, Bo, Liu, Ziao, Guo, Mengqi, Li, Jiancheng, and Lee, Gim Hee
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We propose a novel rolling shutter bundle adjustment method for neural radiance fields (NeRF), which utilizes the unordered rolling shutter (RS) images to obtain the implicit 3D representation. Existing NeRF methods suffer from low-quality images and inaccurate initial camera poses due to the RS effect in the image, whereas, the previous method that incorporates the RS into NeRF requires strict sequential data input, limiting its widespread applicability. In constant, our method recovers the physical formation of RS images by estimating camera poses and velocities, thereby removing the input constraints on sequential data. Moreover, we adopt a coarse-to-fine training strategy, in which the RS epipolar constraints of the pairwise frames in the scene graph are used to detect the camera poses that fall into local minima. The poses detected as outliers are corrected by the interpolation method with neighboring poses. The experimental results validate the effectiveness of our method over state-of-the-art works and demonstrate that the reconstruction of 3D representations is not constrained by the requirement of video sequence input.
- Published
- 2024
40. n-dimensional hyperchaotic discrete map with desired positive Lyapunov exponents and application to UART secure communication
- Author
-
Xu, Bo, Tang, Zhongmin, Ye, Xiaoxuan, Chen, Kai, Gou, Xuan, and Zhao, Jia
- Published
- 2024
- Full Text
- View/download PDF
41. Learning curve for the combined trans-oral and chest approach to endoscopic selective neck dissection: a cumulative sum (CUSUM) analysis
- Author
-
Chen, Zhen-Xin, Zhao, Xin-Ran, Pang, Feng-Shun, Chen, Jing-Bao, Song, Ya-Min, Cao, Ying, Lin, Zhan-Hong, Xu, Bo, and Qin, You
- Published
- 2024
- Full Text
- View/download PDF
42. Degradation of pectic polysaccharides by ascorbic acid/H2O2–pectinase system and its application in cotton scouring
- Author
-
Luo, Laipeng, Guo, Ziying, Wang, Ping, Wang, Qiang, Xu, Bo, and Yu, Yuanyuan
- Published
- 2024
- Full Text
- View/download PDF
43. Efficient preparation of cycloamylose from potato starch using recombinant 4-α-glucanotransferase
- Author
-
Huang, Yan, Zhu, Rong, Liu, Jiehu, Qiao, Xueyi, Xu, Bo, Wang, Lei, and Su, Lingqia
- Published
- 2024
- Full Text
- View/download PDF
44. Characterization of freshwater sludge generated in Singapore: exploring opportunities for a circular economy
- Author
-
Xu, Bo, Qin, Junde, Yang, Mingqian, and Yi, Yaolin
- Published
- 2024
- Full Text
- View/download PDF
45. The Influence of Core and Ring Power on the Formation of 5083 Aluminum Alloy Laser Weld Seam
- Author
-
Yu, Zhiyuan, Zhu, Guorong, Xu, Bo, Chen, Hu, Chen, Wenfei, Yu, Chun, Jiang, Lei, Ya, Yunqi, and Chen, Jieshi
- Published
- 2024
- Full Text
- View/download PDF
46. Frontiers in high entropy alloys and high entropy functional materials
- Author
-
Zhang, Wen-Tao, Wang, Xue-Qian, Zhang, Feng-Qi, Cui, Xiao-Ya, Fan, Bing-Bing, Guo, Jia-Ming, Guo, Zhi-Min, Huang, Rui, Huang, Wen, Li, Xu-Bo, Li, Meng-Ru, Ma, Yan, Shen, Zhi-Hua, Sun, Yong-Gang, Wang, De-Zhuang, Wang, Fei-Yang, Wang, Li-Qiang, Wang, Nan, Wang, Tian-Li, Wang, Wei, Wang, Xiao-Yang, Wang, Yi-Han, Yu, Fu-Jie, Yin, Yu-Zhen, Zhang, Ling-Kun, Zhang, Yi, Zhang, Jian-Yang, Zhao, Qi, Zhao, Yu-Ping, Zhu, Xin-Dong, Sohail, Yasir, Chen, Ya-Nan, Feng, Tao, Gao, Qi-Long, He, Hai-Yan, Huang, Yong-Jiang, Jiao, Zeng-Bao, Ji, Hua, Jiang, Yao, Li, Qiang, Li, Xiao-Ming, Liao, Wei-Bing, Lin, Huai-Jun, Liu, Hui, Liu, Qi, Liu, Qing-Feng, Liu, Wei-Di, Liu, Xiong-Jun, Lu, Yang, Lu, Yi-Ping, Ma, Wen, Miao, Xue-Fei, Pan, Jie, Wang, Qing, Wu, Hong-Hui, Wu, Yuan, Yang, Tao, Yang, Wei-Ming, Yu, Qian, Zhang, Jin-Yu, Chen, Zhi-Gang, Mao, Liang, Ren, Yang, Shen, Bao-Long, Wang, Xun-Li, Jia, Zhe, Zhu, He, Wu, Zhen-Duo, and Lan, Si
- Published
- 2024
- Full Text
- View/download PDF
47. Tuning Synaptic Connections Instead of Weights by Genetic Algorithm in Spiking Policy Network
- Author
-
Zhang, Duzhen, Zhang, Tielin, Jia, Shuncheng, Wang, Qingyu, and Xu, Bo
- Published
- 2024
- Full Text
- View/download PDF
48. Preparation of high-performance copper fluoride cathode material for thermal batteries: effect of heat treatment temperature of precursor ammonium copper fluoride
- Author
-
Xu, Bo, Tao, Bo, Yu, Kai, Cui, Yanhua, Bai, Xintao, Yin, Huayi, Song, Qiushi, Ning, Zhiqiang, and Xie, Hongwei
- Published
- 2024
- Full Text
- View/download PDF
49. TFAP2C Activates CST1 Transcription to Facilitate Breast Cancer Progression and Suppress Ferroptosis
- Author
-
Yuan, Lin, Zhou, Di, Li, Weiwen, Guan, Jianhua, Li, Junda, and Xu, Bo
- Published
- 2024
- Full Text
- View/download PDF
50. Three-Dimensional Phase-Field Simulation of Stress-Assisted Two-Way Shape Memory Effect and Its Cyclic Degradation of Single-Crystal NiTi Shape Memory Alloy
- Author
-
Xu, Bo, Yu, Chao, Wang, Chong, Wang, Qingyuan, and Kang, Guozheng
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.