Publication Type: 3 selected / Publication Year Range: Last 3 years / Publisher: academic press inc. / Search Limiters: Available in Library Collection and Full Text / Topic: artificial neural networks - Searchworks@Jio Institute Digital Library Search Results

Showing total 50 results

Start Over Search Limiters Available in Library Collection Search Limiters Full Text Topic artificial neural networks Publication Year Range Last 3 years Publication Type Academic Journals Publication Type Reviews Publication Type eBooks Publisher academic press inc.

50 results

1. Performance modeling on DaVinci AI core.

Author: Tang, Yifeng and Wang, Cho-li
Subjects: *ARTIFICIAL neural networks, *ARTIFICIAL intelligence, *MATRIX multiplications, *ERROR rates, *LOGIC programming
Abstract: The extensive use of Deep Neural Networks (DNNs) encourages people to design domain-specific hardware called Artificial Intelligence (AI) processors. The novel hardware makes optimizations challenging without a proper performance model that reveals working details and performance implications. This paper presents a performance model, Verrocchio, for Huawei DaVinci AI Core, which predicts the execution time of real-world DaVinci kernels. We propose specially-crafted micro-benchmarks to identify contention source, runtime behaviors, and bandwidth sharing, which significantly determine performance. Since DaVinci Core adopts a binary semaphore mechanism for synchronization, Verrocchio views each instruction as a discrete event and manages its execution time based on the programming logic. For evaluation, Verrocchio achieves average error rates of 2.62% and 2.30% in sample kernels for single-core and double-core execution. We demonstrate an optimizing process of matrix multiplications with Verrocchio, achieving speedups of 1.70× for operators and 1.53× for applications and error rates of 5.06% and 5.25%. • Detailed dissections of Huawei DaVinci AI Core, a novel AI processor. • Benchmarking the DaVinci Core bandwidth contention, the key performance factor. • Performance model for accurate execution time prediction of kernel program. • Demonstration of DaVinci kernel optimization and prediction accuracy evaluation. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

2. Electrochemical promotion of organic waste fermentation: Research advances and prospects.

Author: Wang, Nuohan, Gao, Ming, Liu, Shuo, Zhu, Wenbin, Zhang, Yuanchun, Wang, Xiaona, Sun, Haishu, Guo, Yan, and Wang, Qunhui
Subjects: *ORGANIC wastes, *ORGANIC acids, *FERMENTATION, *ARTIFICIAL neural networks, *SUCCINIC acid, *MICROBIAL metabolism, *NUTRIENT cycles
Abstract: The current methods of treating organic waste suffer from limited resource usage and low product value. Research and development of value-added products emerges as an unavoidable trend for future growth. Electro-fermentation (EF) is a technique employed to stimulate cell proliferation, expedite microbial metabolism, and enhance the production of value-added products by administering minute voltages or currents in the fermentation system. This method represents a novel research direction lying at the crossroads of electrochemistry and biology. This article documents the current progress of EF for a range of value-added products, including gaseous fuels, organic acids, and other organics. It also presents novel value-added products, such as 1,3-propanediol, 3-hydroxypropionic acid, succinic acid, acrylic acid, and lysine. The latest research trends suggest a focus on EF for cogeneration of value-added products, studying microbial community structure and electroactive bacteria, exploring electron transfer mechanisms in EF systems, developing effective methods for nutrient recovery of nitrogen and phosphorus, optimizing EF conditions, and utilizing biosensors and artificial neural networks in this area. In this paper, an analysis is conducted on the challenges that currently exist regarding the selection of conductive materials, optimization of electrode materials, and development of bioelectrochemical system (BES) coupling processes in EF systems. The aim is to provide a reference for the development of more efficient, advanced, and value-added EF technologies. Overall, this paper aims to provide references and ideas for the development of more efficient and advanced EF technology. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. Understanding the impact on convolutional neural networks with different model scales in AIoT domain.

Author: Lin, Longxin, Xu, Zhenxiong, Chen, Chien-Ming, Wang, Ke, Hassan, Md. Rafiul, Alam, Md. Golam Rabiul, Hassan, Mohammad Mehedi, and Fortino, Giancarlo
Subjects: *ARTIFICIAL neural networks, *CONVOLUTIONAL neural networks, *MODELS & modelmaking, *DEEP learning, *NEURAL circuitry
Abstract: In recent years many amazing deep learning models have been developed, but in the process of practical applications, people often find that these deep learning models have high requirements for hardware storage space and computing power. In Artificial Intelligent of Things (AIoT) scenario, the computing power of the edge or terminal side are relatively limited, therefore, most conventional deep learning models are difficult to be deployed into AIoT devices. It is significant to explore the different performance under different scales of deep learning models. In this paper, we mainly propose a method to analyze the impact of deep learning models with various sizes through various experiments. We employ slimmable network as a Neural Archtecture Search (NAS) tool to realize various model size freely, and evaluate them on the indicators of flops, robustness and accuracy. The experimental results show the variation of flops, robustness and accuracy with the various model sizes, which help understand the impact on performance of deep learning models with different scales in AIoT systems. • Interpreting robustness of deep learning models with different model complexity. • Employing slimmable network to compress neural network. • Understanding the relevance between robustness and accuracy of deep learning models. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

4. Forecasting of compound ocean-fluvial floods using machine learning.

Author: Moradian, Sogol, AghaKouchak, Amir, Gharbia, Salem, Broderick, Ciaran, and Olbert, Agnieszka I.
Subjects: *ARTIFICIAL neural networks, *MACHINE learning, *FLOOD damage prevention, *FLOOD damage, *FLOOD control, *FLOOD warning systems, *FLOOD risk, *WATER levels, *WATER depth
Abstract: Flood modelling and forecasting can enhance our understanding of flood mechanisms and facilitate effective management of flood risk. Conventional flood hazard and risk assessments usually consider one driver at a time, whether it is ocean, fluvial or pluvial, without considering the compound nature of flood events. In this paper, we developed a novel approach for modelling and forecasting compound coastal-fluvial floods using a two-step framework. In step one, a hydrodynamic model is used to simulate floodwater propagation; while in step two, machine learning (ML) models are used to generate flood forecasts. The architecture of hydrodynamic-ML forecasting system incorporates a hydrodynamic model covering a specific domain, with individual ML models trained for each pixel. In total 7 ML models including: Support Vector Regression (SVR), Support Vector Machine (SVM), Radial Basis Function (RBF), Linear Regression (LR), Gaussian Process Regression (GPR), Decision Tree (DT), and Artificial Neural Network (ANN) were applied in this study. Forecasting compound floods is achieved using two sets of inputs: timeseries of river discharges in the upstream fluvial section and downstream ocean water levels in the coastal areas. The accuracy of the flood forecasting system is demonstrated for Cork City, Ireland; and modelling performance was evaluated using several statistical tools. Results show that the proposed models can provide reliable estimates of flood inundation and associated water depths. Overall, the RBF model exhibits the best performance. Despite the complexity of compound multi-driver floods, this study shows that the coupled hydrodynamic-ML approach can forecast coastal-fluvial flood with limited hydraulic and hydrological input data. This system overcomes the limitations of traditional hydrodynamic model-based systems where trade-offs between the always competing numerical model accuracy and computational time prohibit the model to be used for short-term flood forecasting. Once trained, the ML component of the coupled system can perform flood forecasting in near real-time, potentially integrating into a flood early warning system. Accurate flood forecasting has a wide range of positive societal impacts, including improved flood preparedness, increased confidence, better resource allocation, reduced flood damage, and potentially even flood prevention. • Novel technique to improve compound coastal-fluvial flood modelling. • Flood modelling technique based on seven Machine Learning methods. • Approach is successfully implemented, and results are evaluated over Cork, Ireland. • The results demonstrate minimal quantitative errors for the proposed models. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. Single-channel speech enhancement using colored spectrograms.

Author: Gul, Sania, Khan, Muhammad Salman, and Fazeel, Muhammad
Subjects: *SPEECH enhancement, *SPECTRUM analysis, *ARTIFICIAL neural networks, *DEEP learning, *SIGNAL denoising
Abstract: • A novel approach for single channel speech enhancement is presented, using the colored spectrograms. • Using deep neural network for spectrogram denoising and regression neural network for translating the colors of denoised spectrograms to short time Fouier transform (STFT) magnitudes. • Output is comparable to the state-of-the-art baseline models at a much reduced computational cost. Speech enhancement concerns the processes required to remove unwanted background sounds from the target speech to improve its quality and intelligibility. In this paper, a novel approach for single-channel speech enhancement is presented using colored spectrograms. We propose the use of a deep neural network (DNN) architecture adapted from the pix2pix generative adversarial network (GAN) and train it over colored spectrograms of speech to denoise them. After denoising, the colors of spectrograms are translated to magnitudes of short-time Fourier transform (STFT) using a shallow regression neural network. These estimated STFT magnitudes are later combined with the noisy phases to obtain an enhanced speech. The results show an improvement of almost 0.84 points in the perceptual evaluation of speech quality (PESQ) and 1 % in the short-term objective intelligibility (STOI) over the unprocessed noisy data. The gain in quality and intelligibility over the unprocessed signal is almost equal to the gain achieved by the baseline methods used for comparison with the proposed model, but at a much reduced computational cost. The proposed solution offers a comparative PESQ score at almost 10 times reduced computational cost than a similar baseline model that has generated the highest PESQ score trained on grayscaled spectrograms, while it provides only a 1 % deficit in STOI at 28 times reduced computational cost when compared to another baseline system based on convolutional neural network-GAN (CNN-GAN) that produces the most intelligible speech. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. Joint speaker encoder and neural back-end model for fully end-to-end automatic speaker verification with multiple enrollment utterances.

Author: Zeng, Chang, Miao, Xiaoxiao, Wang, Xin, Cooper, Erica, and Yamagishi, Junichi
Subjects: *IDENTIFICATION, *TIME delay systems, *ARTIFICIAL neural networks, *AUTOMATIC speech recognition, *DISCRIMINANT analysis
Abstract: Conventional automatic speaker verification systems can usually be decomposed into a front-end model such as time delay neural network (TDNN) for extracting speaker embeddings and a back-end model such as statistics-based probabilistic linear discriminant analysis (PLDA) or neural network-based neural PLDA (NPLDA) for similarity scoring. However, the sequential optimization of the front-end and back-end models may lead to a local minimum, which theoretically prevents the whole system from achieving the best optimization. Although some methods have been proposed for jointly optimizing the two models, such as the generalized end-to-end (GE2E) model and NPLDA E2E model, most of these methods have not fully investigated how to model the intra-relationship between multiple enrollment utterances. In this paper, we propose a new E2E joint method for speaker verification especially designed for the practical scenario of multiple enrollment utterances. To leverage the intra-relationship among multiple enrollment utterances, our model comes equipped with frame-level and utterance-level attention mechanisms. Additionally, focal loss is utilized to balance the importance of positive and negative samples within a mini-batch and focus on the difficult samples during the training process. We also utilize several data augmentation techniques, including conventional noise augmentation using MUSAN and RIRs datasets and a unique speaker embedding-level mixup strategy for better optimization. • Extend attention back-end to end-to-end for multi-enrollment speaker verification. • Introduce focal loss, suggest speaker embedding mixup for fine-tuning end-to-end model. • Analyze the effect of the number of enrollment utterances. • Analyze the embedding-level attention weights for speech genres. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. Energy-efficient offloading for DNN-based applications in edge-cloud computing: A hybrid chaotic evolutionary approach.

Author: Li, Zengpeng, Yu, Huiqun, Fan, Guisheng, Zhang, Jiayin, and Xu, Jin
Subjects: *ARTIFICIAL neural networks, *PROCESS capability, *BUDGET, *SIMULATED annealing, *EVOLUTIONARY algorithms, *CLOUD computing, *IMAGE encryption
Abstract: The rapid development of Deep Neural Networks (DNNs) lays solid foundations for Internet of Things systems. However, mobile devices with limited processing capacity and short battery life confront the difficulties of executing complex DNNs. To satisfy different Quality of Service requirements, a feasible solution is offloading DNN layers to edge nodes and the cloud. The energy-efficient offloading problem for DNN-based applications with the deadline and budget constraints in the edge-cloud environment is still an open and challenging issue. To this end, this paper proposes a Hybrid Chaotic Evolutionary Algorithm (HCEA) incorporating diversification and intensification strategies and a DVFS-enabled version of it (HCEA-DVFS). The Archimedes Optimization Algorithm-based diversification strategy exploits global and local guiding information to improve population diversity during the updating process and employs Metropolis acceptance rule of Simulated Annealing to avoid premature convergence. The Genetic Algorithm-based chaotic intensification strategy is designed to enhance the local search capability of HCEA. Moreover, the Dynamic Voltage Frequency Scaling-enabled adjustment strategies can be embedded into HCEA to further reduce energy consumption by resetting frequency levels and reallocating DNN layers. Experimental results over four DNN-based applications demonstrate that HCEA-DVFS can reduce more energy consumption under different deadlines, budgets, and workloads on average by 7.93, 9.68, 11.02, 11.84, and 19.38 percent in comparison with HCEA, PSO-GA, MCEA, AOA, and Greedy, respectively. • An AOA-based diversification strategy is designed to exploit global and local guiding information for population updating. • A chaotic GA-based intensification strategy is developed to enhance local search capability and introduce randomness. • A DVFS-enabled adjustment strategy is presented to save energy by resetting frequency levels and reallocating DNN layers. • HCEA and HCEA-DVFS outperform baselines in energy consumption optimization under budget and deadline constraints. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. Rep-MCA-former: An efficient multi-scale convolution attention encoder for text-independent speaker verification.

Author: Liu, Xiaohu, Chen, Defu, Wang, Xianbao, Xiang, Sheng, and Zhou, Xuwen
Subjects: *AUTOMATIC speech recognition, *MULTI-factor authentication, *DATA extraction, *ARTIFICIAL neural networks, *NATURAL language processing, *PARAMETERIZATION
Abstract: In many speaker verification tasks, the quality of speaker embedding is an important factor in affecting speaker verification systems. Advanced speaker embedding extraction networks aim to capture richer speaker features through the multi-branch network architecture. Recently, speaker verification systems based on transformer encoders have received much attention, and many satisfactory results have been achieved because transformer encoders can efficiently extract the global features of the speaker (e.g., MFA-Conformer). However, the large number of model parameters and computational latency are common problems faced by the above approaches, which make them difficult to apply to resource-constrained edge terminals. To address this issue, this paper proposes an effective, lightweight transformer model (MCA-former) with multi-scale convolutional self-attention (MCA), which can perform multi-scale modeling and channel modeling in the temporal direction of the input with low computational cost. In addition, in the inference phase of the model, we further develop a systematic re-parameterization method to convert the multi-branch network structure into the single-path topology, effectively improving the inference speed. We investigate the performance of the MCA-former for speaker verification under the VoxCeleb1 test set. The results show that the MCA-based transformer model is more advantageous in terms of the number of parameters and inference efficiency. By applying the re-parameterization, the inference speed of the model is increased by about 30%, and the memory consumption is significantly improved. • Designing a lightweight multi-scale convolutional self-attention module. • An efficient transformer encoder for speaker verification. • Using the re-parameterization method improves the model's inference efficiency. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. Tool for fast assessment of stormwater flood volumes for urban catchment: A machine learning approach.

Author: Szeląg, Bartosz, Majerek, Dariusz, Eusebi, Anna Laura, Kiczko, Adam, de Paola, Francesco, McGarity, Arthur, Wałek, Grzegorz, and Fatone, Francesco
Subjects: *ARTIFICIAL neural networks, *MACHINE learning, *WATERSHEDS, *RAINFALL, *RF values (Chromatography), *FLOODS
Abstract: Specific flood volume is an important criterion for evaluating the performance of sewer networks. Currently, mechanistic models - MCMs (e.g., SWMM) are usually used for its prediction, but they require the collection of detailed information about the characteristics of the catchment and sewer network, which can be difficult to obtain, and the process of model calibration is a complex task. This paper presents a methodology for developing simulators to predict specific flood volume using machine learning methods (DNN - Deep Neural Network, GAM - Generalized Additive Model). The results of Sobol index calculations using the GSA method were used to select the ML model as an alternative to the MCM model. It was shown that the DNN model can be used for flood prediction, for which high agreement was obtained between the results of GSA calculations for rainfall data, catchment and sewer network characteristics, and calibrated SWMM parameters describing land use and sewer retention. Regression relationships (polynomials and exponential functions) were determined between Sobol indices (retention depth of impervious area, correction factor of impervious area, Manning's roughness coefficient of sewers) and sewer network characteristics (unit density of sewers, retention factor - the downstream and upstream of retention ratio) obtaining R2 = 0. 55–0.78. The feasibility of predicting sewer network flooding and modernization with the DNN model using a limited range of input data compared to the SWMM was shown. The developed model can be applied to the management of urban catchments with limited access to data and at the stage of urban planning. • ML simulator of specific flood volume as an alternative to the SWMM model. • Multi-criteria selection of machine learning method for flooding simulation. • Sobol indices prediction in GSA via empirical models. • Correlation between Sobol indices and characteristic of the catchment and sewer. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. Coupling machine learning and physical modelling for predicting runoff at catchment scale.

Author: Zubelzu, Sergio, Ghalkha, Abdulmomen, Ben Issaid, Chaouki, Zanella, Andrea, and Bennis, Medhi
Subjects: *MACHINE learning, *ARTIFICIAL neural networks, *RUNOFF models, *STORMS, *HYDROLOGIC models
Abstract: In this paper, we present an approach that combines data-driven and physical modelling for predicting the runoff occurrence and volume at catchment scale. With that aim, we first estimated the runoff volume from recorded storms aided by the Green-Ampt infiltration model. Then, we used machine learning algorithms, namely LightGBM (LGBM) and Deep Neural Network (DNN), to predict the outputs of the physical model fed on a set of atmospheric variables (relative humidity, temperature, atmospheric pressure, and wind velocity) collected before or immediately after the beginning of the storm. Results for a small urban catchment in Madrid show DNN performed better in predicting the runoff occurrence and volume. Moreover, enriching the input primary atmospheric variables with auxiliary variables (e.g., storm intensity data recorded during the first hour, or rain volume and intensity estimates obtained from auxiliary regression methods) largely increased the model performance. We show in this manuscript data-driven algorithms shaped by physical criteria can be successfully generated by allowing the data-driven algorithm learn from the output of physical models. It represents a novel approach for physics-informed data-driven algorithms shifting from common practices in hydrological modelling through machine learning. • Data-driven and physical modelling are merged to predict runoff. • Data-driven algorithm fed on weather variables before the storm starts. • Light GBM and Deep Neural Networks were coupled with Green-Ampt model. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. A physical exertion inspired multi-task learning framework for detecting out-of-breath speech.

Author: Sahoo, Sibasis and Dandapat, Samarendra
Subjects: *COMPUTER multitasking, *TELEMEDICINE, *AUTOMATIC speech recognition, *NEURAL computers, *ARTIFICIAL neural networks
Abstract: Physical exertion is a stress condition that affects how we normally produce speech. It alters both the temporal and spectral pattern of speech characteristics. Therefore, speech utterances can be used as a cost-effective telehealth solution to detect whether a person is under physical exertion. This paper deals with the detection of shortness of breath (or out-of-breath) condition from speech under physical load by using an multi-task learning (MTL) framework. The primary classification targets are the neutral and the out-of-breath classes. Naturally these are binary targets and do not reflect the actual extent of exertion. This leads to the creation of a novel auxiliary target learning in collaboration with a pre-trained expert system. The targets here indicate the level of exertion under physical load. The MTL framework for both the convolutional neural network (CNN) and CNN-long short-term memory (CLSTM) network performs nearly 1.5% (F1-score) better than the single task learning (STL) framework. In addition to that, out-of-breath speech has more influence on the lower frequency spectrum. Warped spectrograms are given as input to the networks, enabling the deep networks to focus on lower spectral regions. The non-linear frequency warping is achieved by Mel-scale transformation and constant-Q-transform (CQT). CQT, being less dependent on fixed-window size, shows at least 6.57% (F1-score) improvement over Mel-based inputs. The MTL framework, combined with the warped spectrum, performs better in classifying out-of-breath speech from neutral. • Aims to detect speech-based out-of-breath (or physical load) conditions. • Efficiently determines the extent of physical exertion using speech utterances. • Lower frequency-centric approach for better detection of out-of-breath condition. • Multi-task learning further improves the performance by combing the above two steps. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

12. AI-based prediction of the improvement in air quality induced by emergency measures.

Author: Pari, Pavithra, Abbasi, Tasneem, and Abbasi, S.A.
Subjects: *ARTIFICIAL intelligence, *COVID-19 pandemic, *ARTIFICIAL neural networks, *AIR quality, *WIND speed
Abstract: Several cities in the developing world, of which the capital city of India, New Delhi, is an example, often experience air quality in which pollutant levels go way above the levels considered hazardous for human health. To bring down the air quality to within permissible limits quickly, the measures typically taken involve shutting down certain high-polluting activities for some time to enable the air quality to recover temporarily. This paper presents a first-ever model based on artificial neural networks to forecast the extent of reduction in air quality parameters that can be achieved and the time period within which a change can be experienced when the source of the emissions is cut off temporarily. The model is based on the extensive data on the extent of reduction in air quality parameters that occurred during the lockdown that was imposed during the COVID-19 pandemic. The non-linear autoregressive exogenous network-based model chosen for the purpose employs the hour since stopping of emissions, relative humidity, wind speed, wind direction, and ambient temperature as input parameters to predict the rate of change of PM 2.5 with respect to the concentration at the start of the stopping of the emissions. Air quality data from a key monitoring station in New Delhi was used to develop the model. The model predicted the rate of drop in PM 2.5 with an R and MSE of 0.0044 and 0.9736, respectively, while training and 0.0095 and 0.9583 while testing. The model was then tested with data from 19 other stations in New Delhi, and accuracy of the model was found to be exceptionally accurate, with the correlation between the measured and the predicted PM 2.5 levels ranging from 0.74 to 0.94 and the MSE ranging from 0.0110 to 1.0746. Thus, the model can be employed to determine the number of hours of temporary stoppage of emissions required for the PM 2.5 concentration to reach safe levels. The methodology of development of the model can be extrapolated to construct models tailored for use in other parts of the world as well. • A novel ANN -based model is reported to predict rate of drop of PM 2.5 • Model has been trained on extensive dataset from the covid lockdown in Delhi. • Duration and efficacy of emergency measures to reduce emissions can be estimated. • The model uses easy-to-measure meteorological parameters as inputs. • Model has been validated for 19 stations of various land-use types in Delhi. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. DeepSE: Detecting super-enhancers among typical enhancers using only sequence feature embeddings.

Author: Ji, Qiao-Ying, Gong, Xiu-Jun, Li, Hao-Min, and Du, Pu-Feng
Subjects: *DNA sequencing, *CONVOLUTIONAL neural networks, *ARTIFICIAL neural networks, *NUCLEOTIDE sequence, *CELL lines, *SUPER enhancers
Abstract: Super-enhancer (SE) is a cluster of active typical enhancers (TE) with high levels of the Mediator complex, master transcriptional factors, and chromatin regulators. SEs play a key role in the control of cell identity and disease. Traditionally, scientists used a variety of high-throughput data of different transcriptional factors or chromatin marks to distinguish SEs from TEs. This kind of experimental methods are usually costly and time-consuming. In this paper, we proposed a model DeepSE, which is based on a deep convolutional neural network model, to distinguish the SEs from TEs. DeepSE represent the DNA sequences using the dna2vec feature embeddings. With only the DNA sequence information, DeepSE outperformed all state-of-the-art methods. In addition, DeepSE can be generalized well across different cell lines, which implied that cell-type specific SEs may share hidden sequence patterns across different cell lines. The source code and data are stored in GitHub (https://github.com/QiaoyingJi/DeepSE). • A computational model for detecting super enhancers from typical enhancers is proposed. • The model can be generalized to different cell lines, even for cell-type specific SEs. • DNA sequence is informative to identify genome elements. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

14. An adaptive synthesis to handle imbalanced big data with deep siamese network for electricity theft detection in smart grids.

Author: Javaid, Nadeem, Jan, Naeem, and Javed, Muhammad Umar
Subjects: *ARTIFICIAL neural networks, *CONVOLUTIONAL neural networks, *ELECTRICITY, *THEFT, *ELECTRIC power consumption, *BIG data
Abstract: The bi-directional flow of energy and information in the smart grid makes it possible to record and analyze the electricity consumption profiles of consumers. Because of the increasing rate of inflation over the past few years, people started looking for means to use electricity illegally, termed as electricity theft. Many data analytics techniques are proposed in the literature for electricity theft detection (ETD). These techniques help in the detection of suspected illegal consumers. However, the existing approaches have a low ETD rate either due to improper handling of the imbalanced class problem in a dataset or the selection of inappropriate classifier. In this paper, a robust big data analytics technique is proposed to resolve the aforementioned concerns. Firstly, adaptive synthesis (ADASYN) is applied to handle the imbalanced class problem of data. Secondly convolutional neural network (CNN) and long-short term memory (LSTM) integrated deep siamese network (DSN) are proposed to discriminate the features of both honest and fraudulent consumers. Specifically, the task of feature extraction from weekly energy consumption profiles is handed over to the CNN module while the LSTM module performs the sequence learning. Finally, the DSN contemplates on the shared features provided by the CNN-LSTM and applies final judgment. The data analytics is performed on different train–test ratios of the real-time smart meters' data. The simulation results validate the proposed model's effectiveness in terms of high area under the curve, F 1 -Score, precision and recall. • A robust big data analytic technique for electricity theft detection in smart grid is proposed. • We propose an adaptive synthesis to handle imbalanced class problem. • A deep siamese network with underpinned integrated networks of convolution neural network and long-short term memory is proposed to discriminate both features of fair and fraud consumers. • Simulation results validate the proposed model in terms of high area under the curve, low false positive rate and low false negative rate. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

15. Speech enhancement approach for body-conducted unvoiced speech based on Taylor–Boltzmann machines trained DNN.

Author: Karthikeyan, C., Rajesh Kumar, T., Vijendra Babu, D., Baskar, M., Jayaraman, Ramkumar, and Shahid, Mohammad
Subjects: *AUTOMATIC speech recognition, *TAYLOR'S series, *BOLTZMANN machine, *ARTIFICIAL neural networks, *MACHINE learning, *DEEP learning, *CLASSIFIERS (Linguistics), *MICROPHONES
Abstract: • A novel voice conversion technique for body-conducted unvoiced speech is proposed based on Taylor-Boltzmann Machines. The proposed algorithm was the integration of Taylor Series with Boltzmann machine Deep Neural Networks. • A body conductive microphone referred as the non audible murmur (NAM) microphone is used to capture the non-audible murmur from the neck part behind the ear. • The classifier was trained with the features likes "pitch chroma, spectral centroid, and spectral skewness" that are derived from the incoming body conducted low voice verbal communication signal. • In the initial training phase, a Taylor series approximation DNN model is trained from a collection of stereo data which encompasses the NAM speech of same phonemes represented as log-power spectral features. • Moreover, in the recognition phase, a well trained Taylor-DNN model that is based on Boltzmann Machine intakes the speech features and generates the "log-power-spectra features" and it aids in recognizing the speech waveform. • The paper may gain high interest from the readers and experts of Speech processing, Voice Conversion Techniques, body-conducted unvoiced speech. Communication becomes effectual while the verbal communication signal reaches the destination with profound characteristics. An application based on human voice attracts the researcher's attention. It leads the researcher to proceed towards the speech conversion for low voice humans having murmured voices as their attitude. The researchers introduced a technique for improving the body-conducted unvoiced speech for silent speech communication using Taylor-Boltzmann machine-trained optimized Deep Neural Network (DNN). The projected model includes two main stages: the training stage as well as recognition stage. In the training stage, the following functionalities are undergone: (i) pre-processing, (ii) feature extraction, and (iii) deep learning-based speech enhancement. Initially, the collected input Non-Audible Murmur (NAM) speech signal is subjected to the pre-processing stage, wherein the improved wiener filtering is employed for suppressing the noise in the input signal. From the pre-processed speech signal, the most relevant characteristics such as "spectral skewness, spectral chroma, spectral centroid and improved Mel Frequency Cepstral Coefficients (MFCC)" are extracted, and they are fused together. Then, using these fused features, the Taylor series with optimized DNN and Boltzmann Machine (Taylor-DNN-BM) model is trained. The final enhanced speech is acquired from the Taylor- optimized DNN-BM. The recognition phase encapsulates the feature extraction and deep learning-based speech enhancement stage. The input speech data enters into the feature extraction phase, wherein the characteristics such as "spectral skewness, spectral chroma, spectral centroid and improved (MFCC)" are retrieved. After that, these features are fused and fed as input to the Taylor-optimized DNN-BM model. From Taylor- optimized DNN-BM, the enhanced speech signal is acquired. In DNN, the weight function is fine-tuned using the newly projected Self Improved Flower Pollination Algorithm (SI-FPA) algorithm. This SI-FPA model is the conceptual improvement of the Flower Pollination Algorithm (FPA). Finally, the projected model is compared with the existing models to validate its efficiency in terms of speech enhancement. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

16. Trends and developments in automatic speech recognition research.

Author: O'Shaughnessy, Douglas
Subjects: *AUTOMATIC speech recognition, *MACHINE learning, *IMAGE recognition (Computer vision), *ORAL communication, *DEEP learning, *ARTIFICIAL neural networks
Abstract: This paper discusses how automatic speech recognition systems are and could be designed, in order to best exploit the discriminative information encoded in human speech. This contrasts with many recent machine learning approaches that apply general recognition architectures to signals to identify, with little concern for the nature of the input. The implicit assumption has often been that training can automatically discover the useful properties that exist in signals, with minimal manual intervention. These approaches may be suitable for some tasks such as image recognition, where the diversity of visual input is vast; e.g., an image may be any (natural or synthetic) scene that a camera views. We first examine what makes speech special, i.e., a natural signal from a complex tube, driven by a source that is quasi-periodic and/or noisy, aiming to communicate a wide variety of information, using the different vocal systems of human speakers. Then, we view how pertinent features are extracted from speech via efficient means, related to the objectives of communication. We see how to reliably and efficiently identify the different units of oral language. We learn from the history of attempts to do ASR, e.g., why they succeeded and how improved methods exploited the increasing availability of data and computer power (in particular, deep neural networks). Finally, we suggest ways to render ASR both more accurate and efficient. This work is aimed at both newcomers to ASR and experts, in terms of presenting issues broadly, but without mathematical or algorithmic details, which are readily found in the references. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. Multi-task learning neural framework for categorizing sexism.

Author: Abburi, Harika, Parikh, Pulkit, Chhaya, Niyati, and Varma, Vasudeva
Subjects: *SEXISM, *ARTIFICIAL neural networks, *COMPUTER multitasking, *MACHINE learning, *GENDER studies
Abstract: Sexism, a form of oppression based on one's sex, manifests itself in numerous ways and causes enormous suffering. In view of the growing number of experiences of sexism reported online, automatically classifying these recollections can aid in the battle against sexism by allowing gender studies researchers and government officials involved in policymaking to conduct more effective analyses. This paper investigates the 23-class fine-grained, multi-label classification of accounts (reports) of sexism. Moreover, we propose a knowledge-based cascaded multi-task framework for fine-grained multi-label sexism classification. We leverage several supporting tasks, including homogeneous and heterogeneous auxiliary tasks. Homogeneous tasks are set up without incurring any manual labeling cost and heterogeneous tasks are set up which have a high correlation with the accounts of sexism. Unlabeled accounts of sexism are utilized through unsupervised learning to help construct our multi-task setup. In addition, we incorporate a knowledge module within the framework to infuse external knowledge features into the learning process. Further, we investigate transfer learning that employs weakly labeled accounts of sexism and transfers the learning to the multi-label sexism classification. We also devise objective functions that exploit label correlations in the training data explicitly. Multiple proposed multi-task methods outperform the state-of-the-art multi-label sexism classification across five standard metrics. • Propose a knowledge-based cascaded multi-task neural framework for the multi-label sexism classification. • Explore two types of auxiliary tasks: (a) Homogeneous tasks and (b) Heterogeneous tasks • Trained the model with weakly labeled data and transfered the knowledge to the sexism classification task. • Proposed loss functions aimed at tapping label correlations in the multi-label data explicitly. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. Rates of approximation by ReLU shallow neural networks.

Author: Mao, Tong and Zhou, Ding-Xuan
Subjects: *ARTIFICIAL neural networks, *MACHINE learning, *DEEP learning
Abstract: Neural networks activated by the rectified linear unit (ReLU) play a central role in the recent development of deep learning. The topic of approximating functions from Hölder spaces by these networks is crucial for understanding the efficiency of the induced learning algorithms. Although the topic has been well investigated in the setting of deep neural networks with many layers of hidden neurons, it is still open for shallow networks having only one hidden layer. In this paper, we provide rates of uniform approximation by these networks. We show that ReLU shallow neural networks with m hidden neurons can uniformly approximate functions from the Hölder space W ∞ r ([ − 1 , 1 ] d) with rates O ((log ⁡ m) 1 2 + d m − r d d + 2 d + 4 ) when r < d / 2 + 2. Such rates are very close to the optimal one O (m − r d ) in the sense that d + 2 d + 4 is close to 1, when the dimension d is large. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

19. Approximating smooth and sparse functions by deep neural networks: Optimal approximation rates and saturation.

Author: Liu, Xia
Subjects: *ARTIFICIAL neural networks, *SMOOTHNESS of functions, *APPROXIMATION theory
Abstract: Constructing neural networks for function approximation is a classical and longstanding topic in approximation theory. In this paper, we aim at constructing deep neural networks with three hidden layers using a sigmoidal activation function to approximate smooth and sparse functions. Specifically, we prove that the constructed deep nets with controllable magnitude of free parameters can reach the optimal approximation rate in approximating both smooth and sparse functions. In particular, we prove that neural networks with three hidden layers can avoid the phenomenon of saturation, i.e., the phenomenon that for some neural network architectures, the approximation rate stops improving for functions of very high smoothness. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

20. EEG-dependent automatic speech recognition using deep residual encoder based VGG net CNN.

Author: Chinta, Babu and M, Moorthi.
Subjects: *AUTOMATIC speech recognition, *SPEECH disorders, *ARTIFICIAL neural networks, *ELECTROENCEPHALOGRAPHY, *DEEP learning, *DISCRETE wavelet transforms, *EIGENVECTORS
Abstract: Speech difficulties are common in children and teenagers, but they can also occur in adults as a result of physical problems. A speech disorder is a situation in which an individual struggles to produce or construct the spoken sounds necessary for interpersonal communication. As a result, it could be challenging to comprehend the person's speech. Articulation abnormalities are typical speech problems. In this situation, automatic speech recognition (ASR) technology may be used to detect and further rectify such deficiencies. The first attempts to detect speech abnormalities were made in the early 1970s, and they appear to have followed the same path as those on the ASR. These early experiments did rely heavily on signal processing techniques. As time goes on, more ideas from ASR technology are being incorporated into systems that deal with speech impairments. Many traditional techniques are executed in the ASR system. In this paper, we developed an automatic speech recognition technology based on deep learning techniques. In this paper, we research alternative extraction and classification methods of electroencephalography (EEG) to help diagnose speech disorders (SD). The EEG data is prepared before degradation into numerous EEG sub-strands with a discrete wavelet transformation to eliminate unimportant errors. For sharpening signals, the Eigenvector crack curvature wavelet method was used. A hyper-similarity abnormality coder is used for feature extraction in the EEG recording and to detect synchronization between EEG channels, which may show abnormalities in communication. The recovered functions are then categorized using the Deep Residual–encoder–based VGG net CNN Classification Method. Thus, the techniques proposed to produce the most promising outcome aren't the suggested technique attained better classification accuracy when compared to the traditional methodologies. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

21. Adsorption kinetics of ciprofloxacin and ofloxacin by green-modified carbon nanotubes.

Author: Oliveira, Mariana G., Spaolonzi, Marcela P., Duarte, Emanuele D.V., Costa, Heloisa P.S., da Silva, Meuris G.C., and Vieira, Melissa G.A.
Subjects: *CARBON nanotubes, *ADSORPTION kinetics, *ARTIFICIAL neural networks, *CIPROFLOXACIN, *ASSERTIVENESS (Psychology)
Abstract: This paper investigated the uptake of CIP and OFL in single and multicomponent adsorptive systems using modified carbon nanotubes (CNTs) as adsorbent material. The characterization analyses of the pre- and post-process material by XPS, TG/DTG, FT-IR, SEM/EDS, and XRD helped in the elucidation of the mechanisms, indicating greater involvement of n-n and π -π interactions. In the kinetic studies, the simple systems with CIP and OFL were similar, both showed equilibrium time around 20/30 min and increased adsorptive capacity with increasing initial drug concentration. In the multicomponent system, different fractions of CIP and OFL were tested and the time to reach equilibrium also varied between 20 and 30 min. In general, the adsorption capacity of CIP is slightly lower than that of OFL under the conditions tested. The selectivity analysis of the system showed that the selectivity's of the two drugs are identical in equimolar fractions. The mathematical modeling of the kinetic data indicated that in monocomponent systems, the model of pseudo-second order (PSO) adequately described both CIP and OFL kinetics. Furthermore, with the implementation of Artificial Neural Networks (ANN), it was possible to obtain a more assertive prediction of the behavior of single and binary systems. [Display omitted] • Investigation of single and competitive adsorption of ciprofloxacin and ofloxacin. • Carbon nanotubes were functionalized with iron nanoparticles via the green route. • Ofloxacin showed a greater adsorption preference than ciprofloxacin. • The main adsorption mechanisms were hydrogen bonds, π-π, and n-π interactions. • Artificial neural network successfully predicted single and binary kinetic data. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

22. An optimal approach for text feature selection.

Author: El-Hajj, Wassim and Hajj, Hazem
Subjects: *FEATURE selection, *ARTIFICIAL neural networks, *CLASSIFICATION algorithms, *TEXT mining, *DATA mining
Abstract: • In this paper, an optimal approach for text feature Selection , we work on text categorization and propose a statistical-based feature selection method (MFX) that considers all documents from the same category as one extended document, and chooses the most discriminative terms that are frequent and common across all documents of the same category, but rarely present in other categories. MFX is language independent and backed up with a mathematical formulation that finds the optimal number of features that guarantees accurate text categorization. Experimental results show the superiority of MFX over the state-of-the-art existing techniques. This work is very significant and timely given its applicability in applications such as spam filtering, opinion mining and topic spotting, among others. Traditionally, feature selection is conducted by first deriving a candidate list of features, then ranking and selecting the top features based on predefined threshold. These methods are highly dependent on the choice of the threshold, and therefore lead to sub-optimal text categorization results. In this paper, we address the selection problem by suggesting a one-step method designed to optimally select the subset of features. The selection is formulated mathematically as an optimization problem with the objective of maximizing classification accuracy while simultaneously deriving and choosing the most discriminative features. Our method, MFX, is applicable to many of the conventional methods, with two distinguishing aspects. First, it is based on considering all documents from the same category as one extended document, instead of analyzing individual documents. Second, it considers choosing the most discriminative terms that are frequent and common across all documents of the same category, and minimally present in other categories. Moreover, MFX is language-independent. It was tested on the well-known benchmark Reuters RCV1 dataset. To showcase its language independence, MFX was also tested on Arabic datasets extracted from Arabic news sources. The results indicated that MFX always performed similar to or better than other well-known feature selection methods. MFX with a Support Vector Machine (SVM) classifier was also shown to outperform recent text classification algorithms based on neural networks and word embeddings. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

23. Quantifying the groundwater total contamination risk using an inclusive multi-level modelling strategy.

Author: Gharekhani, Maryam, Nadiri, Ata Allah, Khatibi, Rahman, Nikoo, Mohammad Reza, Barzegar, Rahim, Sadeghfam, Sina, and Moghaddam, Asghar Asghari
Subjects: *MULTILEVEL models, *ARTIFICIAL neural networks, *SUPPORT vector machines, *WATER pollution potential, *GROUNDWATER, *POLLUTANTS, *ARSENIC removal (Water purification)
Abstract: This paper investigates aggregated risks in aquifers, where risk exposures may originate from different contaminants e.g. nitrate-N (NO 3 –N), arsenic (As), boron (B), fluoride (F), and aluminium (Al). The main goal is to develop a new concept for the total risk problem under sparse data as an efficient planning tool for management through the following methodology: (i) mapping aquifer vulnerability by DRASTIC and SPECTR frameworks; (ii) mapping risk indices to anthropogenic and geogenic contaminants by unsupervised methods; (iii) improving the anthropogenic and geogenic risks by a multi-level modelling strategy at three levels: Level 1 includes Artificial Neural Networks (ANN) and Support Vector Machines (SVM) models, Level 2 combines the outputs of Level 1 by unsupervised Entropy Model Averaging (EMA), and Level 3 integrates the risk maps of various contaminants (nitrate-N, arsenic, boron, fluoride, and aluminium) modelled at Level 2. The methodology offers new data layers to transform vulnerability indices into risk indices and thereby integrates risks by a heuristic scheme but without any learning as no measured values are available for the integrated risk. The results reveal that the risk indexing methodology is fit-for-purpose. According to the integrated risk map, there are hotspots at the study area and exposed to a number of contaminants (nitrate-N, arsenic, boron, fluoride, and aluminium). [Display omitted] • Risk Index (RI) is products of Passive and Active Vulnerability Indices (PVI and AVI). • RIs are aggregated from anthropogenic and geogenic sources for 5 contaminants. • By Multiple-level Models strategy, risks are mapped at 3 levels, L1, L2, L3. • L1 uses ANN + SVM; L2 uses Entropy Model Averaging, L3 uses a scoring system. • ROC/AUC plots show defensible RIs for NO 3 –N, As, B, F, and Al and their aggregation. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

24. Multiple time-instances features based approach for reference-free speech quality measurement.

Author: Jaiswal, Rahul Kumar and Dubey, Rajesh Kumar
Subjects: *ARTIFICIAL neural networks, *AUTOMATIC speech recognition, *SPEECH processing systems, *ACOUSTIC signal processing, *ACOUSTIC signal detection, *FEATURE extraction, *GAUSSIAN mixture models
Abstract: This paper investigates the problem of measuring speech quality of received speech signal without employing the original speech signal. The problem of deterioration of the speech quality arises due to noise present in the surroundings. To this line, we propose a multiple time-instances (MTI) features-based approach for reference-free speech quality measurement model. A voice activity detector (VAD) is exploited primarily for calculating the number of active speech chunks of a speech signal. For these chunks and their successive combinations called here batches, multi-resolution auditory model (MRAM), mel-frequency cepstral coefficients (MFCC) and line spectral frequencies (LSF) features are extracted and called as MTI features. It is hypothesized that the MTI features are capable in capturing the distortions caused by time-localized effects of short-time transients, impulsive noise, and its differences from the plosive sounds. The MTI metric estimates (MTI-ME) are calculated corresponding to these MTI features employing the Gaussian mixture model (GMM) probabilistic technique. The overall objective speech quality of a speech signal is then determined as a linear combination of optimally weighted MTI-ME corresponding to distinct active speech chunks and their successive combinations, that is, batches of that speech signal. Minimum mean square error criterion or Pearson's correlation maximization criterion is employed for computing optimal weights. In addition, a deep neural network (DNN)-based speech quality model is also developed for calculating a single objective speech quality while considering all active speech chunks together. Pearson's correlation coefficient and weighted average correlation are exploited for evaluating the performance. Results demonstrate that the proposed model achieves promising improvement over the standard speech quality model (P.563) and improves correlation values by around 37%. • Addressing speech quality measuring problem by developing a reference-free speech quality model. • Developing a feature extraction technique incorporating distinct auditory features. • Developing a joint GMM training algorithm for computing objective speech quality. • Developing a deep neural network (DNN) framework for validating the proposed algorithm. • Demonstrating better performance as compared to the standard speech quality model (P.563). [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

25. Automatic offline annotation of turn-taking transitions in task-oriented dialogue.

Author: Brusco, Pablo and Gravano, Agustín
Subjects: *AUTOMATICITY (Learning process), *SPEECH processing systems, *ARTIFICIAL neural networks, *RANDOM forest algorithms, *LANGUAGE & languages
Abstract: As the volume of recorded conversations continues to surge, so does the need for their automatic processing. Plenty of information beyond words may be extracted from the speech signal that could be valuable in domains such as call-center quality assurance. In particular, describing the dynamics of turn-taking exchanges allows for a deeper understanding of the development and outcome of a dialogue. In this paper, we investigate the construction of an automatic turn-taking annotation tool based on recordings of entire conversations (in offline mode) — an unexplored topic to our knowledge. We experiment with two supervised learning approaches, using recurrent neural networks and random forests, on a corpus of Argentine Spanish task-oriented dialogues annotated with 12 turn-taking categories following standard guidelines. Our models achieve promising results, with F1 scores ranging 0.7–0.9 for the most frequent labels (e.g., smooth switches, backchannels), but much lower for the least frequent ones (various kinds of interruptions), for which further research is needed. We also evaluate our best-performing models considering their generalizability in scenarios of growing difficulty, including dialogues in two different languages (English and Slovak). Finally, to address the typical data scarcity issue, we analyze the impact of combining training data from different corpora, again including cross-linguistic data. • Machine learning (ML) models for offline annotation of turn-taking exchanges in entire dialogues. • Full taxonomy of turn-taking transitions, including smooth switches, interruptions, and backchannels. • Comparison of two ML approaches: random forests, and recurrent neural model (with a novel loss function strategy). • Generalizability assessment of our best models in scenarios of growing difficulty in Spanish. • Cross-linguistic evaluation in English and Slovak dialogues, as a way of dealing with labeled data scarcity. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

26. Language modelling for speaker diarization in telephonic interviews.

Author: India, Miquel, Hernando, Javier, and Fonollosa, José A.R.
Subjects: *LINGUISTICS, *ARTIFICIAL neural networks, *ACOUSTIC models, *LANGUAGE & languages, *TELEPHONE interviewing
Abstract: The aim of this paper is to investigate the benefit of combining both language and acoustic modelling for speaker diarization. Although conventional systems only use acoustic features, in some scenarios linguistic data contain high discriminative speaker information, even more reliable than the acoustic ones. In this study we analyze how an appropriate fusion of both kind of features is able to obtain good results in these cases. The proposed system is based on an iterative algorithm where a LSTM network is used as a speaker classifier. The network is fed with character-level word embeddings and a GMM based acoustic score created with the output labels from previous iterations. The presented algorithm has been evaluated in a Call-Center database, which is composed of telephone interview audios. The combination of acoustic features and linguistic content shows a 84.29% improvement in terms of a word-level DER as compared to a HMM/VB baseline system. The results of this study confirms that linguistic content can be efficiently used for some speaker recognition tasks. • Linguistic content can be efficiently combined with acoustic features for the speaker diarization task. • Given a specific scenario, speaker diarization can be solved with only linguistic content. • Acoustic features lead to a better diarizatcon performance in comparison with linguistic content in large speaker segments. • Linguistic content is more discriminative than acoustic features to identify speakers in short speech segments. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

27. A speaker verification backend with robust performance across conditions.

Author: Ferrer, Luciana, McLaren, Mitchell, and Brümmer, Niko
Subjects: *AUTOMATIC speech recognition, *FISHER discriminant analysis, *EMBEDDINGS (Mathematics), *ARTIFICIAL neural networks, *DEEP learning, *CALIBRATION, *LINEAR systems
Abstract: In this paper, we address the problem of speaker verification in conditions unseen or unknown during development. A standard method for speaker verification consists of extracting speaker embeddings with a deep neural network and processing them through a backend composed of probabilistic linear discriminant analysis (PLDA) and global logistic regression score calibration. This method is known to result in systems that work poorly on conditions different from those used to train the calibration model. We propose to modify the standard backend, introducing an adaptive calibrator that uses duration and other automatically extracted side-information to adapt to the conditions of the inputs. The backend is trained discriminatively to optimize binary cross-entropy. When trained on a number of diverse datasets that are labeled only with respect to speaker, the proposed backend consistently and, in some cases, dramatically improves calibration, compared to the standard PLDA approach, on a number of held-out datasets, some of which are markedly different from the training data. Discrimination performance is also consistently improved. We show that joint training of the PLDA and the adaptive calibrator is essential — the same benefits cannot be achieved when freezing PLDA and fine-tuning the calibrator. To our knowledge, the results in this paper are the first evidence in the literature that it is possible to develop a speaker verification system with robust out-of-the-box performance on a large variety of conditions. • A standard backend for speaker verification is based on PLDA followed by global calibration. • We propose a novel backend that consists of a PLDA-like formulation followed by condition-aware calibration. • The backend is trained discriminatively to optimize binary cross-entropy on a large variety of conditions. • We show that this backend results in significant gains over the standard PLDA approach. • The proposed approach results in a system that can be used out-of-the-box on data from different conditions. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

28. Multi-level context features extraction for named entity recognition.

Author: Chang, Jun and Han, Xiaohong
Subjects: *SHORT-term memory, *ARTIFICIAL neural networks, *FEATURE extraction, *ARTIFICIAL intelligence, *PATTERN recognition systems
Abstract: Bidirectional long short-term memory (Bi-LSTM), as one of the effective networks for sequence labeling tasks, is widely used in named entity recognition (NER). However, the sequential nature of Bi-LSTM and the inability to recognize multiple sentences at the same time make it impossible to obtain overall information. In this paper, to make up for the shortcomings of Bi-LSTM in extracting global information, we propose a hierarchical context model embedded with sentence-level and document-level feature extraction. In sentence-level feature extraction, we use the self-attention mechanism to extract sentence-level representations considering the different contribution of each word to the sentence. For document-level feature extraction, 3D convolutional neural network (CNN), which not only can extract features within sentences, but also pays attention to the sequential relationship between sentences, is used to extract document-level representations. Furthermore, we investigate a layer-by-layer residual (LBL Residual) structure to optimize each Bi-LSTM block of our model, which can solve the degradation problem that appears as the number of model layers increases. Experiments show that our model achieves results competitive with the state-of-the-art records on the CONLL-2003 and Ontonotes5.0 English datasets respectively. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

29. The Nature-Based Solutions Case-Based System: A hybrid expert system.

Author: Sarabi, Shahryar, Han, Qi, de Vries, Bauke, Romme, A. Georges L., and Almassy, Dora
Subjects: *EXPERT systems, *ARTIFICIAL neural networks, *CASE-based reasoning, *ARTIFICIAL intelligence, *INFORMATION architecture
Abstract: Deriving knowledge and learning from past experiences is essential for the successful adoption of Nature-Based Solutions (NBS) as novel integrative solutions that involve many uncertainties. Past experiences in implementing NBS have been collected in a number of repositories; however, it is a major challenge to derive knowledge from the huge amount of information provided by these repositories. This calls for information systems that can facilitate the knowledge extraction process. This paper introduces the NBS Case-Based System (NBS-CBS), an expert system that uses a hybrid architecture to derive information and recommendations from an NBS experience repository. The NBS-CBS combines a 'black-box' artificial neural networks model with a 'white-box' case-based reasoning model to deliver an intelligent, adaptive, and explainable system. Experts have tested this system to assess its functionality and accuracy. Accordingly, the NBS-CBS appears to provide inspirational recommendations and information for the NBS planning and design process. • Learning from experience is essential for enhancing the uptake of NBS. • A case-based system (NBS-CBS) is developed and tested in a Dutch city. • NBS-CBS integrates an artificial neural network with a case-based reasoning model. • NBS-CBS supports the extraction of knowledge from an NBS repository. • NBS-CBS recommends NBS measures and accordingly finds relevant past experiences. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

30. PLDA inspired Siamese networks for speaker verification.

Author: Ramoji, Shreyas, Krishnan, Prashant, and Ganapathy, Sriram
Subjects: *DEEP learning, *AUTOMATIC speech recognition, *DISCRIMINANT analysis, *FEATURE extraction, *TOPOLOGICAL embeddings, *ARTIFICIAL neural networks
Abstract: The deep learning methodologies in state-of-the-art speaker recognition systems are predominantly limited to the extraction of recording level embeddings. This is usually followed by generative modeling of the embeddings to output the verification score. In this paper, we explore a fully neural approach where the neural model outputs the verification score directly, given the acoustic feature inputs. This model, termed as Siamese neural network (SiamNN), combines the embedding extraction and back-end modeling into a single processing pipeline. The back-end modeling is achieved using a neural approach to PLDA modeling, called neural probabilistic linear discriminant analysis (NPLDA). In the NPLDA model, the verification score is computed as a discriminative similarity function. The development of the single neural SiamNN model allows the joint optimization of all the modules using a verification cost. Several speaker recognition experiments are performed using SITW, VOiCES, and NIST SRE datasets where the proposed SiamNN model is shown to significantly improve over the state-of-art x-vector PLDA baseline system (relative improvements of up to 35% in the primary cost metric). We also provide a detailed analysis of the influence of hyper-parameters, choice of loss functions, and data sampling strategies for training the model. In particular, we highlight that the proposed soft detection cost function based optimization improves over other loss functions considered. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

31. Adversarial attack and defense strategies for deep speaker recognition systems.

Author: Jati, Arindam, Hsu, Chin-Cheng, Pal, Monisankha, Peri, Raghuveer, AbdAlmageed, Wael, and Narayanan, Shrikanth
Subjects: *AUTOMATIC speech recognition, *DEFENSIVENESS (Psychology), *SIGNAL-to-noise ratio, *SMART speakers, *ARTIFICIAL neural networks
Abstract: • Expository study on adversarial attacks and possible countermeasures for deep speaker recognition systems. • White box attacks: FGSM, PGD, Carlini and Wagner. • Defensive countermeasures: Adversarial training, adversarial Lipschitz regularization. • Several ablation studies e.g., varying strength of the attack, measuring signal-to-noise ratio and perceptibility, effect of noise augmentation, transferability analysis. • Strongest attacks: PGD, Carlini & Wagner; most imperceptible adversarial samples: Carlini & Wagner; best defense: PGD-based adversarial training. Robust speaker recognition, including in the presence of malicious attacks, is becoming increasingly important and essential, especially due to the proliferation of smart speakers and personal agents that interact with an individual's voice commands to perform diverse and even sensitive tasks. Adversarial attack is a recently revived domain which is shown to be effective in breaking deep neural network-based classifiers, specifically, by forcing them to change their posterior distribution by only perturbing the input samples by a very small amount. Although, significant progress in this realm has been made in the computer vision domain, advances within speaker recognition is still limited. We present an expository paper that considers several adversarial attacks to a deep speaker recognition system, employs strong defense methods as countermeasures, and reports a comprehensive set of ablation studies to better understand the problem. The experiments show that the speaker recognition systems are vulnerable to adversarial attacks, and the strongest attacks can reduce the accuracy of the system from 94% to even 0%. The study also compares the performances of the employed defense methods in detail, and finds adversarial training based on Projected Gradient Descent (PGD) to be the best defense method in our setting. We hope that the experiments presented in this paper provide baselines that can be useful for the research community interested in further studying adversarial robustness of speaker recognition systems. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

32. Deep convolutional neural networks for construction and demolition waste classification: VGGNet structures, cyclical learning rate, and knowledge transfer.

Author: Lin, Kunsen, Zhou, Tao, Gao, Xiaofeng, Li, Zongshen, Duan, Huabo, Wu, Huanyu, Lu, Guanyou, and Zhao, Youcai
Subjects: *CONSTRUCTION & demolition debris, *KNOWLEDGE transfer, *ARTIFICIAL neural networks, *CONVOLUTIONAL neural networks, *WASTE treatment, *VIRUSES, *WASTE management, *DATA augmentation
Abstract: The sorting of Construction and Demolition (C&D) waste is a critical step to linking the recycling system and to the macro prediction, which helps to promote the development of the circular economy. Moreover, the effective classification and automated separation process will also help to stop the spreading of pathogenic organisms, such as virus and bacteria, by minimizing human intervention in the sorting process, while also helping to prevent further contamination by COVID-19 virus. This study aims to develop an efficient method to sort C&D waste through deep learning combined with knowledge transfer approach. In this paper, CVGGNet models, that is four VGG structures (VGGNet-11, VGGNet-13, VGGNet-16, and VGGNet-19), based on knowledge transfer combined with the technology of data augmentation and cyclical learning rate, are proposed to classify ten types of C&D waste images. Results show that 2.5 × 10−4, 1.8 × 10−4, 0.8 × 10−4, and 1.0 × 10−4 are the optimum learning rate for CVGGNet-11, CVGGNet-13, CVGGNet-16, and CVGGNet-19, respectively. Knowledge transfer helped shorten the training time from 1039.45 s to 991.05 s, and while it improved the performance of the CVGGNet-11 model in training, validation, and test datasets. The average training time increases as the number of the layers in the CVGGNet architecture rises: CVGGNet-11 (991.05 s) ˂ CVGGNet-13 (1025.76 s) ˂ CVGGNet-16 (1090.48 s) ˂ CVGGNet-19 (1337.81 s). Compared to other CVGGNet models, CVGGNet-16 showed an excellent performance in various C&D waste types, in terms of accuracy (76.6%), weighted average precision (76.8%), weighted average recall (76.6%), weighted average F1-score (76.6%) and micro average ROC (87.0%). In addition, the t-distributed Stochastic Neighbor Embedding (t-SNE) approach can reduce the dataset to a lower dimension and distinctly separate each type of C&D waste. This study demonstrates the good performance of CVGGNet models that can be used to automatically sort most of the C&D waste, paving the way for better C&D waste management. [Display omitted] • Dataset of construction and demolition waste classification has been collected. • Transfer learning was applied to sort construction and demolition waste. • Cyclical learning rate was adopted to quickly tune hyperparameters. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

33. Signal-aware direction-of-arrival estimation using attention mechanisms.

Author: Mack, Wolfgang, Wechsler, Julian, and Habets, Emanuël A.P.
Subjects: *DIRECTION of arrival estimation, *ACOUSTIC measurements, *MULTICHANNEL communication, *SPEECH enhancement, *AUTOMATIC tracking, *SOUND reverberation, *ARTIFICIAL neural networks
Abstract: The direction-of-arrival (DOA) of sound sources is an essential acoustic parameter used, e.g., for multi-channel speech enhancement or source tracking. Complex acoustic scenarios consisting of sources-of-interest, interfering sources, reverberation, and noise make the estimation of the DOAs corresponding to the sources-of-interest a challenging task. Recently proposed attention mechanisms allow DOA estimators to focus on the sources-of-interest and disregard interference and noise, i.e., they are signal-aware. The attention is typically obtained by a deep neural network (DNN) from a short-time Fourier transform (STFT) based representation of a single microphone signal. Subsequently, attention has been applied as binary or ratio weighting to STFT-based microphone signal representations to reduce the impact of frequency bins dominated by noise, interference, or reverberation. The impact of attention on DOA estimators and different training strategies for attention and DOA DNNs are not yet studied in depth. In this paper, we evaluate systems consisting of different DNNs and signal processing-based methods for DOA estimation when attention is applied. Additionally, we propose training strategies for attention-based DOA estimation optimized via a DOA objective, i.e., end-to-end. The evaluation of the proposed and the baseline systems is performed using data generated with simulated and measured room impulse responses of a uniform-linear microphone array under various acoustic conditions, like reverberation times, noise, and source array distances. The data contains a single source-of-interest, noise, and directional interference. The best-performing systems are also evaluated using measured data. Our experiments show that DNNs used for DOA estimation are biased to the spectral source characteristics and the spectral attention distribution used during training (e.g., spectrally flat/sparse). We also show that this bias in the DOA estimator can be avoided if signal-processing methods are used in combination with attention. Overall, DOA estimation using attention in combination with signal-processing methods exhibits a far lower computational complexity than a fully DNN-based system; however, it yields comparable results. • Attention enables source-selective direction of arrival estimation. • Combining data-driven and signal-processing methods reduces complexity. • Attention can be estimated using masking concepts from source separation. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

34. Deep learning based multi-source localization with source splitting and its effectiveness in multi-talker speech recognition.

Author: Subramanian, Aswin Shanmugam, Weng, Chao, Watanabe, Shinji, Yu, Meng, and Yu, Dong
Subjects: *DEEP learning, *SOFTWARE localization, *CONVERSATION analysis, *SUPERVISED learning, *ARTIFICIAL neural networks, *DIRECTION of arrival estimation, *AUTOMATIC speech recognition
Abstract: Multi-source localization is an important and challenging technique for multi-talker conversation analysis. This paper proposes a novel supervised learning method using deep neural networks to estimate the direction of arrival (DOA) of all the speakers simultaneously from the audio mixture. At the heart of the proposal is a source splitting mechanism that creates source-specific intermediate representations inside the network. This allows our model to give source-specific posteriors as the output unlike the traditional multi-label classification approach. Existing deep learning methods perform a frame level prediction, whereas our approach performs an utterance level prediction by incorporating temporal selection and averaging inside the network to avoid post-processing. We also experiment with various loss functions and show that a variant of earth mover distance (EMD) is very effective in classifying DOA at a very high resolution by modeling inter-class relationships. In addition to using the prediction error as a metric for evaluating our localization model, we also establish its potency as a frontend with automatic speech recognition (ASR) as the downstream task. We convert the estimated DOAs into a feature suitable for ASR and pass it as an additional input feature to a strong multi-channel and multi-talker speech recognition baseline. This added input feature drastically improves the ASR performance and gives a word error rate (WER) of 6.3% on the evaluation data of our simulated noisy two speaker mixtures, while the baseline which does not use explicit localization input has a WER of 11.5%. We also perform ASR evaluation on real recordings with the overlapped set of the MC-WSJ-AV corpus in addition to simulated mixtures. • An end-to-end multi-source direction of arrival (DOA) estimation method is proposed. • The method tackles multi-source DOA estimation via single-source DOA estimation. • The DOA classification network is optimized to learn inter-class relationships for a fine-grained angle resolution. • DOA estimation is used as a frontend for multi-talker automatic speech recognition. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

35. A simulation-optimization approach for supporting conservative water allocation under uncertainties.

Author: Cai, Yanpeng, Li, Tong, Zhang, Yi, and Zhang, Xiaodong
Subjects: *WATER rights, *ARTIFICIAL neural networks, *WATER shortages, *RESOURCE allocation, *WATER supply, *WATERSHEDS
Abstract: In this paper, a hybrid method integrated unbiased grey model (UGM) and artificial neural network (ANN) into an interval two-stage fuzzy credibility-constrained programming (ITFCP) framework is proposed for water resources allocation of the Yalong River research area. Through the grey correlation analysis and the eXtreme Gradient Boosting (XGboost) algorithm, the economic and social indicators are related to the water demands of different water sectors in different regions can be obtained for building water demand prediction model. According to the unbiased grey prediction of the socio-economic development data of each region in the Yalong River Basin (YRB), water demand prediction models are constructed by using neural network. The establishment of a hybrid two-stage interval fuzzy credibility-constrained programming model can analyze the uncertainties existing in the process of water resources allocation. Taking 2020, 2025, and 2030 as the planning years, the developed model studies and reveals the system benefits at different credibility levels, the water shortage of each user in sub-regions and the water resources allocation situation to provide suggestion for managers to optimize the allocation of water resources. Compared to the previous methods, this integrated model can help decision-makers set management policies more sustainably and profitably. • A simulation-optimization method was developed. • The method represents a combination of unbiased grey model (UGM) and artificial neural network (ANN). • Interval two-stage fuzzy credibility-constrained programming was used as a general framework. • The developed method was applied to support water management in Yalong River Basin, China. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

36. Named entity recognition using neural language model and CRF for Hindi language.

Author: Sharma, Richa, Morwal, Sudha, and Agarwal, Basant
Subjects: *ARTIFICIAL neural networks, *MULTILINGUAL communication, *HINDI language, *NATURAL language processing, *HUMAN-computer interaction
Abstract: • A state-of-art Hindi NER system based on MuRIL language model and CRF. • Computation of token vector through various combination of encoder layers of MuRIL. • Classification of entities using variants of MuRIL and mBERT based models. • Evaluation of models on highly diverse set of Hindi named entities. Named Entity Recognition (NER) plays an important role in various Natural Language Processing (NLP) applications to extract the key information from a huge amount of unstructured text data. NER is a task of identifying and classifying the named entities into predefined categories for a given text. Recently, language models are highly appreciable in several NLP tasks as these state-of-the-art models result better even in resource scarcity. In this paper, we perform NER task on the Hindi language by incorporating the recently released multilingual language model MuRIL which stands for Multilingual Representation for Indian Languages. MuRIL is specially trained for 16 Indian languages. We develop a Hindi NER system using MuRIL with a conditional random field (CRF) layer and fine-tune the model on the ICON 2013 Hindi NER dataset. Further, in the proposed approach, we compute the addition of the last 4 layers representations of the MuRIL model instead of just using the last layer's representation and fine-tune the whole model. Several variants of this model are presented by applying different computations on token representations provided by different layers of 12-layered MuRIL architecture. The proposed model achieves state-of-the-art results as 87.89% precision, 83.74% recall and 85.77% F1-score and outperforms all other existing Hindi NER systems developed on the ICON 2013 dataset. Additionally, we develop a similar Hindi NER system by replacing the MuRIL language model with another state-of-the-art language model, called multilingual Bidirectional Encoder Representations from Transformers (mBERT) to analyze the efficiency of both language models over the Hindi NER task. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

37. Automatic detection of behavioural codes in team interactions.

Author: Hasan, Madina, Jefferson, Nicholas, Hain, Thomas, and Dawson, Jeremy
Subjects: *ARTIFICIAL neural networks, *KNOWLEDGE transfer, *PHONETIC transcriptions, *LEXICAL access, *SPEECH
Abstract: This paper investigates the feasibility of the task of automatic behaviour coding of spoken interactions in teamwork settings. We introduce the coding schema used to classify the behaviours of the group members and the corpus we collected to assess the coding schema reliability in real teamwork meetings. The behaviours embedded in spoken utterances are modelled using a discriminative approach based on conditional random fields, and state-of-the-art neural networks based models. Moreover, we fine-tune publicly available language models to fit our target domain and task and demonstrate how this type of knowledge transfer improves classification models' generalisation capacity. To utilise public resources, the AMI corpus was used for deploying the proposed framework. However, the models were evaluated on both AMI (matched task) and recordings of students solving an engineering challenge (mismatched task). Evaluation results reveal that neural networks are the best performing models in matched tasks, but that C R F models outperform them in mismatched tasks. Mitigating the effect of noisy data, by implementing a lightly supervised approach leads to relative improvements of 32% and 22%, in F 1 measures of C R F and B E R T , respectively. The proposed classifiers are used as a part of technological support to the training programme in collaborative skills for undergraduate students. • The viability of automatic verbal behaviour coding using speech transcriptions is investigated. • A new corpus with annotations following a precise metadiscourse coding scheme is introduced to the field of automatic behaviour analysis in teamwork meetings. The applied coding scheme is based on a well-studied psychological theory that is designed to wisely observe to provide valuable feedback. • A lightly supervised approach that produces a better version of manually coded training data is also proposed. Lexical and contextual text features are used to train sequential and neural network automatic metadiscourse tagging models. • The performance of these models is compared under different hierarchical levels of behaviour analysis. • To evaluate the proposed approach, AMI (a matched set) and a real application set taken from the ENG Challenge data (unmatched set), test sets are used. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

38. Prediction of speech intelligibility with DNN-based performance measures.

Author: Castro Martinez, Angel Mario, Spille, Constantin, Roßbach, Jana, Kollmeier, Birger, and Meyer, Bernd T.
Subjects: *INTELLIGIBILITY of speech, *AUTOMATIC speech recognition, *SPEECH audiometry, *ARTIFICIAL neural networks, *SPEECH processing systems
Abstract: This paper presents a speech intelligibility model based on automatic speech recognition (ASR), combining phoneme probabilities from deep neural networks (DNN) and a performance measure that estimates the word error rate from these probabilities. This model does not require the clean speech reference nor the word labels during testing as the ASR decoding step – which finds the most likely sequence of words given phoneme posterior probabilities – is omitted. The model is evaluated via the root-mean-squared error between the predicted and observed speech reception thresholds from eight normal-hearing listeners. The recognition task consists of identifying noisy words from a German matrix sentence test. The speech material was mixed with eight noise maskers covering different modulation types, from speech-shaped stationary noise to a single-talker masker. The prediction performance is compared to five established models and an ASR-model using word labels. Two combinations of features and networks were tested. Both include temporal information either at the feature level (amplitude modulation filterbanks and a feed-forward network) or captured by the architecture (mel-spectrograms and a time-delay deep neural network, TDNN). The TDNN model is on par with the DNN while reducing the number of parameters by a factor of 37; this optimization allows parallel streams on dedicated hearing aid hardware as a forward-pass can be computed within the 10 ms of each frame. The proposed model performs almost as well as the label-based model and produces more accurate predictions than the baseline models. [Display omitted] • New speech intelligibility model based on ASR probabilities and their estimated WER. • The model does not require clean speech nor the word labels during testing. • The model computes more accurate frame forward predictions within 10 ms. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

39. Predicting individual task contrasts from resting‐state functional connectivity using a surface‐based convolutional network.

Author: Ngo, Gia H., Khosla, Meenakshi, Jamison, Keith, Kuceyeski, Amy, and Sabuncu, Mert R.
Subjects: *FUNCTIONAL connectivity, *ARTIFICIAL neural networks, *DEEP learning, *CONVOLUTIONAL neural networks
Abstract: • Previous work has demonstrated that individual task-based brain activity can be predicted from resting-state functional connectivity. • We build on recent deep learning methods to create a surface-based fully-convolutional neural network model that works with a representation of the brain's cortical sheet. • The proposed model, BrainSurfCNN, can achieve state of the art predictive accuracy on independent test data from the Human Connectome Project. • BrainSurfCNN yields individual-level predicted maps that are on par with the target-repeat reliability of the measured contrast maps. • We further demonstrate that BrainSurfCNN can generalize well to novel domains with limited training data. Task-based and resting-state represent the two most common experimental paradigms of functional neuroimaging. While resting-state offers a flexible and scalable approach for characterizing brain function, task-based techniques provide superior localization. In this paper, we build on recent deep learning methods to create a model that predicts task-based contrast maps from resting-state fMRI scans. Specifically, we propose BrainSurfCNN, a surface-based fully-convolutional neural network model that works with a representation of the brain's cortical sheet. BrainSurfCNN achieves exceptional predictive accuracy on independent test data from the Human Connectome Project, which is on par with the repeat reliability of the measured subject-level contrast maps. Conversely, our analyses reveal that a previously published benchmark is no better than group-average contrast maps. Finally, we demonstrate that BrainSurfCNN can generalize remarkably well to novel domains with limited training data. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

40. An automatic Alzheimer's disease classifier based on spontaneous spoken English.

Author: Bertini, Flavio, Allevi, Davide, Lutero, Gianluca, Calzà, Laura, and Montesi, Danilo
Subjects: *ALZHEIMER'S disease, *SPOKEN English, *DEMENTIA, *ARTIFICIAL neural networks
Abstract: According to the World Health Organization, the number of people suffering from dementia worldwide will grow to 150 million by mid-century, and Alzheimer's disease is the most common form of dementia contributing to 60%–70% of cases. The problem is compounded by the fact that current pharmacologic treatments are only symptomatic, and therapies are ineffective in slow down or cure the degenerative process. An automatic and standardize classifier for Alzheimer's disease is thereby extremely important to rapidly respond and deliver as preventive as possible interventions. Speech alterations might be one of the earliest signs of cognitive defect and, recently, the researchers showed that they can be observable well in advance other cognitive deficits become manifest. In this paper, we propose a full automated method able to classify the spontaneous spoken production of the subjects. In particular, we trained an artificial neural network using the spectrogram of the audio signal, which is the visual representation of the speech of the subject. Moreover, to overcome the problem of the large amount of annotated data usually required for training deep learning models, we used a specific data augmentation approach that avoids distorting the original samples. We evaluated the proposed method using the English Pitt Corpus from DementiaBank. The used dataset consists of 180 subjects: 43 healthy controls and 137 Alzheimer's disease patients. The proposed method outperformed the other approaches in the literature based on manual and semi-automatic transcription and annotation of speech, improving the classification capability by 5.93%, and obtained good classification results compared to the state-of-the-art neuropsychological screening tests (i.e., the Mini-Mental State Examination and the Activities of Daily Living portion of the Blessed Dementia Rating Scale) exhibiting an accuracy of 93.30% and an F1 score of 88.50%. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

41. Cross-language article linking with deep neural network based paragraph encoding.

Author: Wang, Yu-Chun, Chuang, Chia-Min, Wu, Chun-Kai, Pan, Chao-Lin, and Tsai, Richard Tzong-Han
Subjects: *CROSS-language information retrieval, *ARTIFICIAL neural networks, *PARAGRAPHS, *ENCODING, *WIKIS, *DATA
Abstract: Cross-language article linking (CLAL), the task of generating links between articles in different languages from different encyclopedias, is critical for facilitating sharing among online knowledge bases. Some previous CLAL research has been done on creating links among Wikipedia wikis, but much of this work depends heavily on simple language patterns and encyclopedia format or metadata. In this paper, we propose a new CLAL method based on deep learning paragraph embeddings to link English Wikipedia articles with articles in Baidu Baike, the most popular online encyclopedia in mainland China. To measure article similarity for link prediction, we employ several neural networks with attention mechanisms, such as CNN and LSTM, to train paragraph encoders that create vector representations of the articles' semantics based only on article text, rather than link structure, as input data. Using our "Deep CLAL" method, we compile a data set consisting of Baidu Baike entries and corresponding English Wikipedia entries. Our approach does not rely on linguistic or structural features and can be easily applied to other language pairs by using pre-trained word embeddings, regardless of whether the two languages are on the same encyclopedia platform. • Cross-language article linking helps create a multilingual unified knowledge base. • Using attention-based neural network that learns to attend to the vital part of articles. • The novel method that does not rely on feature engineering and is scalable to large data. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

42. Improving the potential of Enhanced Teager Energy Cepstral Coefficients (ETECC) for replay attack detection.

Author: Patil, Ankur T., Acharya, Rajul, Patil, Hemant A., and Guido, Rodrigo Capobianco
Subjects: *AUTOMATIC speech recognition, *CEPSTRUM analysis (Mechanics), *ERROR rates, *ARTIFICIAL neural networks
Abstract: In the scope of voice biometrics, the term replay attack , (RA) refers to the dishonest attempt made by an impostor to spoof someone else's identity by replaying the subject's previously recorded speech close to the Automatic Speaker Verification (ASV) system under attack. State-of-the-art strategies for RA detection, such as the Enhanced Teager Energy Cepstral Coefficients (ETECC), have shown promising results due to their precision in measuring energy from high frequency components of speech, as a function of two recently defined concepts: signal mass and Enhanced Teager Energy Operator (ETEO). Nevertheless, since the replay mechanism prominently deteriorates the speech signal spectrum just in those spectral zones, we propose the association of ETEO with different strategies to further improve the previous results in getting effective countermeasures against RAs. Specifically, comprehensive evaluations which include a detailed mathematical analysis, a simulation on amplitude and frequency modulated (AM–FM) signals, and a spectrographic inspection involving different filterbank structures, along with their experimental results, are provided in this paper. In addition, ETEO-derived features are contrasted to existing feature sets by using Paraconsistent Feature Engineering (PFE) for feature ranking, expanding our previously published results. Lastly, experiments are performed with ASVSpoof-2017 version 2.0 dataset, Realistic Replay Attack Microphone Array Speech Corpus (ReMASC), BTAS-2016, dataset, ASVSpoof-2019 challenge dataset, and ASVSpoof-2015 challenge dataset, considering Gaussian Mixture Models (GMMs), Convolutional Neural Networks (CNNs) and Light-CNN architectures as being the classifiers. The standalone ETECC-GMM system showed the best performance by producing equal error rates (EERs) of 5.55% and 10.75% on development and evaluation sets, respectively. • Improved handcrafted feature extraction technique based on ETEO operator is presented, which provides a precise estimate of signal energy by means of the concept of signal mass, to leverage the performance of TEO. • We describe an innovative assessment, based on Paraconsistent Feature Engineering (PFE), which measures the efficacy of the proposed ETECC-based feature sets, along with the existing feature sets, for the intended classification task. • We comment on the results from the application of state-of-the-art filterbanks, namely Gammatone and Cochlear, in SSD tasks. The experiments were also performed with the Ricker wavelet-based filterbank, i.e., the negative normalized second derivative of the Gaussian, and Gabor filterbank. • We performed experiments on environment-dependent scenario on ASVspoof-2017 version 2.0 dataset, Realistic Replay Attack Microphone Array Speech Corpus (ReMASC), BTAS-2016, ASVspoof-2019 challenge dataset, and ASVspoof-2015 challenge dataset. • Extended the experiments by considering deep learning architectures, namely, Convolutional Neural Networks (CNNs) and Light-CNN architectures, to work as classifiers in conjunction with the extracted features, and reported the results over ASVspoof-2017 version-2 dataset. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

43. A multi-model data fusion methodology for seasonal drought forecasting under uncertainty: Application of Bayesian maximum entropy.

Author: Ghazipour, Fatemeh and Mahjouri, Najmeh
Subjects: *DROUGHT forecasting, *MULTISENSOR data fusion, *MONTE Carlo method, *OCEAN temperature, *ARTIFICIAL neural networks, *SEASONS
Abstract: In this paper, we present a new methodology for improving the results of seasonal drought forecasting by developing a Bayesian Maximum Entropy-based fusion (BMEF) model. The BMEF model combines the forecasts done by four individual (single-source) data-driven models to achieve better outcomes. Regional drought indices of Effective Drought Index (EDI) and Multiple Standard Precipitation Index (MSPI) are forecasted using the individual forecasting models of Artificial Neural Network (ANN), Adaptive Neuro Fuzzy Inference System (ANFIS), Support Vector Regression (SVR), and M5tree. The outputs of the individual models with the best performances are selected to be fused using the BMEF model and the results are analyzed and compared. The effect of different large-scale climate signals on rainfall and drought forecasting is analyzed and the most effective climate variables are selected as predictors in the forecasting models. Next, the uncertainty analysis on the results of the individual models as well as those of the BMEF model is carried out by deriving the probability mass functions of the drought indices using a resampling technique and Monte Carlo analysis. Finally, the results of the uncertainty analysis are evaluated to compare the performance of individual models and the BME-based fusion model in decreasing the uncertainty of seasonal drought forecasting. The performance of the proposed methodology is evaluated by using it to forecast seasonal drought conditions in the southwest of Iran. Based on the results of the uncertainty analysis, the BMEF model provides more reliable forecasts particularly for severe drought events than the individual models. It is also inferred that adding the SST to the predictors, decreases the uncertainty of drought forecasts. • A new methodology is proposed to improve seasonal drought forecasting. • A BME-based fusion (BMEF) model is developed for combining the individual forecasts. • Drought indices of EDI and MSPI are forecasted under uncertainty. • Favorable effect of sea surface temperature (SST) on drought forecasting is shown. • The BMEF model reduces the uncertainty of the drought forecasts. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

44. A study of uncertainties in groundwater vulnerability modelling using Bayesian model averaging (BMA).

Author: Gharekhani, Maryam, Nadiri, Ata Allah, Khatibi, Rahman, Sadeghfam, Sina, and Asghari Moghaddam, Asghar
Subjects: *ARTIFICIAL neural networks, *GROUNDWATER, *SUPPORT vector machines, *ARTIFICIAL intelligence
Abstract: Bayesian Model Averaging (BMA) is used to study inherent uncertainties using the Basic DRASTIC Framework (BDF) for assessing the groundwater vulnerability in a study area related to Lake Urmia. BMA is naturally an Inclusive Multiple Modelling (IMM) strategy at two levels; at Level 1 multiple models are constructed and the paper constructs three AI (Artificial Intelligence) models, which comprise ANN (Artificial Neural Network), GEP (Gene Expression Programming), and SVM (Support Vector Machines) but their outputs are fed to the next level model; at Level 2, BMA combines ANN, GEP and SVM (the Level 1 models) to quantify their inherent uncertainty in terms of within and in-between model errors. The model performance is tested by using the nitrate-N concentrations measured for the aquifer. The results show that in this particular study area, Level 1 models, even BDF, are quite accurate, but the above modelling strategy maximises the extracted information from the local data and BMA reveals that the higher uncertainties occur at areas with higher vulnerability; whereas lower uncertainties are observed at areas with lower vulnerabilities. [Display omitted] • DRASTIC vulnerability is studied by a strategy by Inclusive Multiple Models (IMM). • IMM uses 3 AI models at Level 1 and BMA (Bayesian) at Level 2 to study uncertainty. • Accuracy is good for Level 1 models (AUC = 0.998), even for basic DRASTIC (AUC = 0.995). • IMM at Level 2 is still needed to study error residuals and inherent uncertainties. • BMA results are defensible and show high uncertainty at high vulnerability zones. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

45. Spatial quantification method of grassland utilization intensity on the Qinghai-Tibetan Plateau: A case study on the Selinco basin.

Author: Ma, Changhui, Xie, Yaowen, Duan, Hanming, Wang, Xiaoyun, Bie, Qiang, Guo, Zecheng, He, Lei, and Qin, Wenhua
Subjects: *GRASSLAND soils, *FEEDFORWARD neural networks, *ARTIFICIAL intelligence, *GRASSLANDS, *ARTIFICIAL neural networks, *SPATIAL resolution
Abstract: Existing methods for spatial quantification of grassland utilization intensity cannot meet the demand for accurate detection of the spatial distribution of grassland utilization intensity in the Qinghai-Tibetan Plateau with high spatial resolution. In this paper, a method based on remote-sensing observations and simulations of grassland growth dynamics is proposed. The grassland enhanced vegetation index (EVI) time-series curve during the growing season characterizes the growth of grassland in the corresponding pixel; The deviation between the observed and potential EVI curves indicates the disturbance on grassland growth imposed by human activities, and it can characterize the grassland utilization intensity during the growing season. Based on the main idea described above, absolute and relative disturbances are calculated and used as quantitative indicators of grassland utilization intensity defined from different perspectives. Livestock amount at the pixel scale is obtained by pixel-by-pixel calculations based on the function relationship at the township scale between absolute disturbance and livestock density, which is specific quantitative indicator that considers the mode of grassland utilization. In simulating the potential EVI of grassland, the lag and accumulation effects of meteorological factors are investigated at the daily scale using a multi-objective genetic algorithm. Further, the nonlinear functions between multiple environmental factors (e.g., grassland type, topography, soil, meteorology) and the grassland EVI are established using an error back-propagation feedforward artificial neural network (ANN-BP) with parameter optimization. Finally, the potential EVIs of all grassland pixels are simulated on the basis of this model. The method is applied to the Selinco basin on the Qinghai-Tibetan Plateau and validated by examining the spatial consistency of the results with township-scale livestock density and grazing pressure. The final results indicate that the proposed method can accurately detect the spatial distribution of grassland utilization intensity which is appliable in the similar regions. [Display omitted] • Based on MODIS images, the deviation of observed EVI time series curve of grassland from potential EVI time series curve during the growing season was used to characterize the intensity of human use on grassland. • A simulation method of grassland potential EVI based on artificial intelligence algorithm (MOGA, ANN-BP) was proposed. • Livestock amounts at pixel scale were calculated based on the logarithm function relationship (P < 0.01, R2 = 0.65) between absolute disturbance and livestock density at township scale. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

46. Spatial quantification method of grassland utilization intensity on the Qinghai-Tibetan Plateau: A case study on the Selinco basin.

Author: Ma, Changhui, Xie, Yaowen, Duan, Hanming, Wang, Xiaoyun, Bie, Qiang, Guo, Zecheng, He, Lei, and Qin, Wenhua
Subjects: *GRASSLAND soils, *FEEDFORWARD neural networks, *ARTIFICIAL intelligence, *GRASSLANDS, *ARTIFICIAL neural networks, *SPATIAL resolution
Abstract: Existing methods for spatial quantification of grassland utilization intensity cannot meet the demand for accurate detection of the spatial distribution of grassland utilization intensity in the Qinghai-Tibetan Plateau with high spatial resolution. In this paper, a method based on remote-sensing observations and simulations of grassland growth dynamics is proposed. The grassland enhanced vegetation index (EVI) time-series curve during the growing season characterizes the growth of grassland in the corresponding pixel; The deviation between the observed and potential EVI curves indicates the disturbance on grassland growth imposed by human activities, and it can characterize the grassland utilization intensity during the growing season. Based on the main idea described above, absolute and relative disturbances are calculated and used as quantitative indicators of grassland utilization intensity defined from different perspectives. Livestock amount at the pixel scale is obtained by pixel-by-pixel calculations based on the function relationship at the township scale between absolute disturbance and livestock density, which is specific quantitative indicator that considers the mode of grassland utilization. In simulating the potential EVI of grassland, the lag and accumulation effects of meteorological factors are investigated at the daily scale using a multi-objective genetic algorithm. Further, the nonlinear functions between multiple environmental factors (e.g., grassland type, topography, soil, meteorology) and the grassland EVI are established using an error back-propagation feedforward artificial neural network (ANN-BP) with parameter optimization. Finally, the potential EVIs of all grassland pixels are simulated on the basis of this model. The method is applied to the Selinco basin on the Qinghai-Tibetan Plateau and validated by examining the spatial consistency of the results with township-scale livestock density and grazing pressure. The final results indicate that the proposed method can accurately detect the spatial distribution of grassland utilization intensity which is appliable in the similar regions. [Display omitted] • Based on MODIS images, the deviation of observed EVI time series curve of grassland from potential EVI time series curve during the growing season was used to characterize the intensity of human use on grassland. • A simulation method of grassland potential EVI based on artificial intelligence algorithm (MOGA, ANN-BP) was proposed. • Livestock amounts at pixel scale were calculated based on the logarithm function relationship (P < 0.01, R2 = 0.65) between absolute disturbance and livestock density at township scale. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

47. Arabic speech recognition by end-to-end, modular systems and human.

Author: Hussein, Amir, Watanabe, Shinji, and Ali, Ahmed
Subjects: *AUTOMATIC speech recognition, *ARABIC alphabet, *END-to-end delay, *TRANSCRIPTION, *ENGLISH language, *MARKOV processes, *ARTIFICIAL neural networks
Abstract: Recent advances in automatic speech recognition (ASR) have achieved accuracy levels comparable to human transcribers, which led researchers to debate if the machine has reached human performance. Previous work focused on the English language and modular hidden Markov model-deep neural network (HMM–DNN) systems. In this paper, we perform a comprehensive benchmarking for end-to-end transformer ASR, modular HMM–DNN ASR, and human speech recognition (HSR) on the Arabic language and its dialects. For the HSR, we evaluate linguist performance and lay-native speaker performance on a new dataset collected as a part of this study. For ASR the end-to-end work led to 12.5%, 27.5% , 33.8% WER; a new performance milestone for the MGB2, MGB3, and MGB5 challenges respectively. Our results suggest that human performance in the Arabic language is still considerably better than the machine with an absolute WER gap of 3.5% on average. • Comprehensive benchmarking for end-to-end, modular, and human on the Arabic language. • New milestone in Arabic speech recognition with E2E transformer-based ASR. • The machine arguably outperforms the performance of the native speaker. • Machine mistakes showed high similarity with the expert linguist transcription. • Expert linguist performance in Arabic language is still considerably better than the machine. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

48. Sequential routing framework: Fully capsule network-based speech recognition.

Author: Lee, Kyungmin, Joe, Hyunwhan, Lim, Hyeontaek, Kim, Kwangyoun, Kim, Sungsoo, Han, Chang Woo, and Kim, Hong-Gee
Subjects: *AUTOMATIC speech recognition, *ARTIFICIAL neural networks, *SEQUENCE analysis, *ROUTING (Computer network management), *PARAMETER estimation
Abstract: • Capsule network only structures can successfully map sequences to sequences. • Mappings are refined by initializing routing iteration based on the previous output. • Sequence-wise routing iteration allows for non-iterative inference. • Structures of capsule network are more important than the number of parameters. • Top layer capsules become similar to the capsule corresponding to a sequence label. Capsule networks (CapsNets) have recently gotten attention as a novel neural architecture. This paper presents the sequential routing framework which we believe is the first method to adapt a CapsNet-only structure to sequence-to-sequence recognition. Input sequences are capsulized then sliced by a window size. Each slice is classified to a label at the corresponding time through iterative routing mechanisms. Afterwards, losses are computed by connectionist temporal classification (CTC). During routing, the required number of parameters can be controlled by the window size regardless of the length of sequences by sharing learnable weights across the slices. We additionally propose a sequential dynamic routing algorithm to replace traditional dynamic routing. The proposed technique can minimize decoding speed degradation caused by the routing iterations since it can operate in a non-iterative manner without dropping accuracy. The method achieves a 1.1% lower word error rate at 16.9% on the Wall Street Journal corpus compared to bidirectional long short-term memory-based CTC networks. On the TIMIT corpus, it attains a 0.7% lower phone error rate at 17.5% compared to convolutional neural network-based CTC networks (Zhang et al., 2016). [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

49. Glimpse-based estimation of speech intelligibility from speech-in-noise using artificial neural networks.

Author: Tang, Yan
Subjects: *INTELLIGIBILITY of speech, *ARTIFICIAL neural networks, *SPEECH perception, *SIGNAL-to-noise ratio, *NATURAL language processing, *SPEECH enhancement
Abstract: While human listeners can, to some extent, understand the information conveyed by the speech signal when it is mixed with noise, traditional objective intelligibility measures usually fail to operate without a priori knowledge of the clean speech signal. This hence limits the usability of those measures in situations where the clean speech signal is inaccessible. In this paper a glimpse-based method is extended to make speech intelligibility predictions directly from speech-plus-noise mixtures. Using a neural network, the proposed method estimates the time-frequency regions with a local speech-to-noise ratio above a given threshold – known as glimpses – from the mixture signal, instead of separately comparing the speech signal against the noise signal. The number and locations of the glimpses can then be used to produce an intelligibility score. In Experiment I where listener intelligibilities were measured in one stationary and nine fluctuating noise maskers, the predictions produced by the proposed method were highly correlated with the subjective data, with correlation coefficients above 0.90. In Experiment II, with the same neural network trained on normal natural speech as in Experiment I, the proposed method was used to predict the intelligibility of speech signals modified by intelligibility-enhancement algorithms and synthetic speech. The method can still maintain its predictive power by demonstrating a similar performance to its intrusive counterpart with an overall correlation coefficient of 0.81, which is superior to many modern traditional measures evaluated under the same conditions. Therefore, the proposed method can be used to estimate speech intelligibility in place of traditional measures in conditions where their capacity falls short. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

50. Exploring neural models for predicting dementia from language.

Author: Kong, Weirui, Jang, Hyeju, Carenini, Giuseppe, and Field, Thalia S.
Subjects: *ARTIFICIAL neural networks, *DIAGNOSIS of dementia, *DEMENTIA prevention, *DEMENTIA patients, *ALZHEIMER'S disease
Abstract: • We use neural network models for dementia prediction to avoid task-specific features. • Our model for combining multimodal features give performances comparable to baselines. • Coherence scores are not different between dementia patients and healthy controls. • Our HAN-AGE model achieves state-of-the-art performance for dementia detection. Early prediction of neurodegenerative disorders such as Alzheimer's disease (AD) and related dementias may facilitate earlier access to medical and social supports. Further, detection of individuals with preclinical disease may help to enrich clinical trial populations for studies examining disease-modifying interventions. Changes in speech and language patterns may occur in the early stages of neurodegenerative diseases such as AD and frontotemporal dementia, with worsening as the disease progresses. This has led to recent attempts to create automatic methods that predict cognitive impairment and dementia through language analysis. Previous works have improved the prediction accuracy by introducing some task-specific features in addition to task-agnostic linguistic and acoustic features. However, task-specific features prevent the model from generalizing to other tests and languages. In this paper, we focus on exploring the effectiveness of neural network models that require no task-specific feature for dementia prediction in three different ways. First, we use a multimodal neural model to fuse linguistic features and acoustic features, and investigate the performance change compared to simply concatenating these features. Second, we propose a novel coherence feature generated by a neural coherence model, and investigate the predictiveness of this new feature for dementia prediction. Finally, we apply an end-to-end neural method which is free from feature engineering and achieves state-of-the-art classification result on a widely used dementia dataset. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

50 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources