123 results on '"Audio recognition"'
Search Results
2. Smart Education-Filtering Real Time Stream
- Author
-
Jain, Shreyans, Soni, Sagar, Jain, Shubbham, Kamthania, Deepali, Li, Gang, Series Editor, Filipe, Joaquim, Series Editor, Xu, Zhiwei, Series Editor, and Malhotra, Manisha, editor
- Published
- 2025
- Full Text
- View/download PDF
3. Toward Birds Conservation in Dry Forest Ecosystems Through Audio Recognition via Deep Learning
- Author
-
Rodríguez, Tyrone, Guilindro, Adriana, Piedrahita, Paolo, Realpe, Miguel, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Yang, Xin-She, editor, Sherratt, R. Simon, editor, Dey, Nilanjan, editor, and Joshi, Amit, editor
- Published
- 2025
- Full Text
- View/download PDF
4. Recognition of the Sound of the Lonchura Maja Bird and the Threat of House Sparrows Using Edge Impulses Based on a Custom Deep Neural Network to Protect Rice Plants.
- Author
-
Rachmad, Aeri, Setiawan, Eko, and Hasbullah, Abdul Wahib
- Subjects
ARTIFICIAL neural networks ,MACHINE learning ,ENGLISH sparrow ,BIRDHOUSES ,ENVIRONMENTAL quality - Abstract
The presence of birds can be used as a biological indicator related to the quality of environmental health in development. However, the presence of pest birds is a threat to farmers. This paper employs edge machine learning regarding audio recognition of birds Lonchura Maja and the sound of birds of house sparrow, which can be applied to a low-power microcontroller. We also train another nearby bird sound of turtledove, which is often seen around the rice fields on Bangkalan, to act as noise or background sound; we test the reliability of four machine learning (ML) algorithms, then embed them in the microcontroller RP2040 and connect. The first machine learning model is a custom deep neural network (CNN) 1D with two layers, and the second model uses transfer learning-based architecture. The Edge Impulse embedded platform learning machine is used to conduct training and testing. The resulting learning model was then implemented as an Arduino Library, as an Unoptimized float (32-bit) and Optimized integer quantization (8-bit). The estimated values produced by the microcontroller are evaluated in 4 cases, using the EON compiler and Tensor Flow Lite. In this paper, the custom 1D CNN model provides the best accuracy value, with 87.4% accuracy during training and 84.59% accuracy on testing, and it uses very efficient resources, 66.2 Kbyte Flash memory and 11.8 Kbyte Peak RAM. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Noise control of audio recognition equipment for multimedia system.
- Author
-
Qianru Li, Jinkun Liu, and Yu Sun
- Subjects
- *
NOISE control , *MULTIMEDIA systems , *SOUND systems , *AUDIO equipment , *PARAMETER estimation - Abstract
Noise control is one of the most critical technical indicators to improve the performance of intelligent audio recognition system. Based on the noise cancellation technology, a distributed low noise amplification circuit design was proposed, and the PE15-0P technology was applied to realize broadband low noise amplification. The amplifier circuit used diodes and resistors for voltage division, which effectively achieved bias saturation at the transistor and then diode structure. According to the design of low noise amplifier, the noise output characteristics were simulated and analyzed. An audio enhancement method based on noise type recognition was proposed, which can optimize noise estimation by selecting parameter combinations according to noise type, so as to improve the quality and intelligibility of noise frequency signals in various noise environments. From the aspects of hardware and algorithm design, the noise signal was comprehensively reduced, and the accuracy of audio recognition was significantly improved. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. 基于欠定盲源分离和深度学习的生猪状态音频识别.
- Author
-
潘伟豪, 盛卉子, 王春宇, 闫顺丕, 周小波, 辜丽川, and 焦 俊
- Abstract
Copyright of Journal of South China Agricultural University is the property of Gai Kan Bian Wei Hui and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
7. An Audio Correlation-Based Graph Neural Network for Depression Recognition
- Author
-
Sun, Chenjian, Dong, Yihong, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Liu, Qingshan, editor, Wang, Hanzi, editor, Ma, Zhanyu, editor, Zheng, Weishi, editor, Zha, Hongbin, editor, Chen, Xilin, editor, Wang, Liang, editor, and Ji, Rongrong, editor
- Published
- 2024
- Full Text
- View/download PDF
8. Audio Recognition of the Percussion Sounds Generated by a 3D Auto-Drum Machine System via Machine Learning.
- Author
-
Brezas, Spyros, Skoulakis, Alexandros, Kaliakatsos-Papakostas, Maximos, Sarantis-Karamesinis, Antonis, Orphanos, Yannis, Tatarakis, Michael, Papadogiannis, Nektarios A., Bakarezos, Makis, Kaselouris, Evaggelos, and Dimitriou, Vasilis
- Subjects
MACHINE learning ,DEEP learning ,MICROPHONES ,PERCUSSION instruments ,SOUND recordings ,SOUNDS - Abstract
A novel 3D auto-drum machine system for the generation and recording of percussion sounds is developed and presented. The capabilities of the machine, along with a calibration, sound production, and collection protocol are demonstrated. The sounds are generated by a drumstick at pre-defined positions and by known impact forces from the programmable 3D auto-drum machine. The generated percussion sounds are accompanied by the spatial excitation coordinates and the correspondent impact forces, allowing for large databases to be built, which are required by machine learning models. The recordings of the radiated sound by a microphone are analyzed using a pre-trained deep learning model, evaluating the consistency of the physical sample generation method. The results demonstrate the ability to perform regression and classification tasks when fine tuning the deep learning model with the gathered data. The produced databases can properly train machine learning models, aiding in the investigation of alternative and cost-effective materials and geometries with relevant sound characteristics and in the development of accurate vibroacoustic numerical models for studying percussion instruments sound synthesis. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. A Proposed CNN Model for Audio Recognition on Embedded Device.
- Author
-
Minh Pham Ngoc, Tan Ngo Duy, Hoan Huynh Duc, and Kiet Tran Anh
- Subjects
MACHINE learning ,CONVOLUTIONAL neural networks ,MICROPHONES ,SMART devices ,AUTONOMOUS vehicles ,DRIVERLESS cars ,SOUND systems - Abstract
The audio detection system enables autonomous cars to recognize their surroundings based on the noise produced by moving vehicles. This paper proposes the utilization of a machine learning model based on convolutional neural networks (CNN) integrated into an embedded system supported by a microphone. The system includes a specialized microphone and a main processor. The microphone enables the transmission of an accurate analog signal to the main processor, which then analyzes the recorded signal and provides a prediction in return. While designing an adequate hardware system is a crucial task that directly impacts the predictive capability of the system, it is equally imperative to train a CNN model with high accuracy. To achieve this goal, a dataset containing over 3000 up-to-5-second WAV files for four classes was obtained from open-source research. The dataset is then divided into training, validation, and testing sets. The training data is converted into images using the spectrogram technique before training the CNN. Finally, the generated model is tested on the testing segment, resulting in a model accuracy of 77.54%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. 音频技术在禽畜养殖与果蔬种植中的应用研究进展.
- Author
-
李文伟, 郑永军, 杨圣慧, 江世界, 赵航行, 王 慧, 苏道毕力格, and 谭 彧
- Abstract
Audio technology is characterized by the high qualities of speed, accuracy, cost-effectiveness, non-contact, and noninvasiveness. Therefore, the key technology has been widely used in livestock breeding, and the cultivation of fruits and vegetables to drive the digitization and intelligence of agriculture. In this study, a comprehensive overview was presented on three audio technologies: audio enhancement, audio recognition, and audio control, in the livestock, fruits, and plants. Firstly, traditional filtering, short-time spectral estimation, and wavelet denoising were employed in audio enhancement. Standardized techniques were just simply applied without considering outside noises, leading to the serious extraction of pure audio. Thus, it was necessary to focus on the audio enhancement of livestock in follow-up studies. Secondly, the audio recognition was reviewed on the non-destructive testing of agricultural products, animal disease monitoring, species identification, and pest detection. A target detection model was also constructed using audio features, according to the differences between animal spontaneous vocalizations and plant excited vocalizations. It was noted that sound recognition was dominated to enhance the recognition models in current research. However, it was still lacking in the theoretical investigation into the underlying processes of both spontaneous and excited vocalization in plants and animals. Moreover, the denoising techniques were either overly simplistic or entirely absent in the pre-processing stage of audio recognition. The stability and accuracy of audio recognition were required to consider the external environment. Thirdly, the audio control was examined in fruit and vegetable cultivation, as well as livestock breeding. The existing studies were predominantly focused on the influence of audio or music on specific states of livestock, fruits, and plants. It was highly demanded to determine the dynamic changes in these states over time, particularly in response to environmental variations. Finally, the future audio technology was outlined in the context of livestock, fruits, and plants: 1) Audio enhancement can be expected with neural network and multi-channel separation, in order to provide the high-quality audio for audio recognition without external noise interference; 2) The underlying mechanisms can be clarified on both spontaneous and excited vocalizations in plants and animals. Specifically, the theoretical foundation can be offered to construct spontaneous audio recognition models in animals. Technical support can be used to design the plant excitation devices; 3) The governing mechanisms can be delved to clarify the dynamic impact of audio on livestock, fruits, and plants. The findings can greatly contribute to the real-time dynamic control of the growth and physiological state of livestock, fruits, and plants. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Design and Optimization of Frequency Identification Algorithm for Monomelody Musical Instruments Based on Artificial Intelligence Technology
- Author
-
Wang, Wenxiao, Yao, Sanjun, Tsihrintzis, George A., Series Editor, Virvou, Maria, Series Editor, Jain, Lakhmi C., Series Editor, Favorskaya, Margarita N., editor, Kountchev, Roumen, editor, and Patnaik, Srikanta, editor
- Published
- 2023
- Full Text
- View/download PDF
12. Design of Audio-Based Accident and Crime Detection and Its Optimization
- Author
-
Afis Asryullah Pratama, Sritrusta Sukaridhoto, Mauridhi Hery Purnomo, Vita Lystianingrum, and Rizqi Putri Nourma Budiarti
- Subjects
audio recognition ,dataset manipulation ,optimization ,neural networks ,surveillance system. ,Computer software ,QA76.75-76.765 - Abstract
The development of transportation technology is increasing every day; it impacts the number of transportation and their users. The increase positively impacts the economy's growth but also has a negative impact, such as accidents and crime on the highway. In 2018, the number of accidents in Indonesia reached 109,215 cases, with a death rate of 29,472 people, which was mostly caused by the late treatment of the casualties. On the other hand, in the same year, there were 8,423 mugs, and 90,757 snitches cases in Indonesia, with only 23.99% of cases reported. This low reporting rate is mostly caused by the lack of awareness and knowledge about where to report. Therefore, a quick response surveillance system is needed. In this study, an audio-based accident and crime detection system was built using a neural network. To improve the system's robustness, we enhance our dataset by mixing it with certain noises which likely to occur on the road. The system was tested with several parameters of segment duration, bandpass filter cut-off frequency, feature extraction, architecture, and threshold values to obtain optimal accuracy and performance. Based on the test, the best accuracy was obtained by convolutional neural network architecture using 200ms segment duration, 0.5 overlap ratio, 100Hz and 12000Hz as bandpass cut-off frequency, and a threshold value of 0.9. By using mentioned parameters, our system gives 93.337% accuracy. In the future, we hope to implement this system in a real environment.
- Published
- 2023
- Full Text
- View/download PDF
13. Hearing to the Unseen: AudioMoth and BirdNET as a Cheap and Easy Method for Monitoring Cryptic Bird Species.
- Author
-
Bota, Gerard, Manzano-Rubio, Robert, Catalán, Lidia, Gómez-Catasús, Julia, and Pérez-Granados, Cristian
- Subjects
- *
DEEP learning , *SOUND recordings , *MACHINE learning , *FOREST birds , *ENVIRONMENTAL monitoring , *SPECIES - Abstract
The efficient analyses of sound recordings obtained through passive acoustic monitoring (PAM) might be challenging owing to the vast amount of data collected using such technique. The development of species-specific acoustic recognizers (e.g., through deep learning) may alleviate the time required for sound recordings but are often difficult to create. Here, we evaluate the effectiveness of BirdNET, a new machine learning tool freely available for automated recognition and acoustic data processing, for correctly identifying and detecting two cryptic forest bird species. BirdNET precision was high for both the Coal Tit (Peripatus ater) and the Short-toed Treecreeper (Certhia brachydactyla), with mean values of 92.6% and 87.8%, respectively. Using the default values, BirdNET successfully detected the Coal Tit and the Short-toed Treecreeper in 90.5% and 98.4% of the annotated recordings, respectively. We also tested the impact of variable confidence scores on BirdNET performance and estimated the optimal confidence score for each species. Vocal activity patterns of both species, obtained using PAM and BirdNET, reached their peak during the first two hours after sunrise. We hope that our study may encourage researchers and managers to utilize this user-friendly and ready-to-use software, thus contributing to advancements in acoustic sensing and environmental monitoring. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
14. Effect of Spectrogram Parameters and Noise Types on The Performance of Spectro-temporal Peaks Based Audio Search Method.
- Author
-
KOSEOGLU, Murat and UYANIK, Hakan
- Subjects
- *
HUMAN fingerprints , *SPECTROGRAMS , *FEATURE extraction , *NOISE , *POPULAR music genres , *DATABASES - Abstract
Audio search algorithms are used to detect the queried file in large databases, especially in multimedia applications. These algorithms are expected to perform the detection in a reliable and robust way within the shortest time. In this study, based on spectral peaks method, an audio fingerprint algorithm with a few minor modifications was developed to detect the matching audio file in target database. This method has two stages as the audio fingerprint extraction and matching. In the first stage, fingerprint features are extracted from spectral peaks on the spectrograms of audio files by hash functions. This state-of-art technique reduces the processing load and time considerably compared to traditional methods. In the second stage, fingerprint data of the queried file are compared with the data created in the first stage in the database. The algorithm was demonstrated, and the effect of spectrogram parameters (window size, overlap, number of FFT) was investigated by considering reliability and robustness under different noise sources. Also, it was aimed to contribute to new audio retrieval studies based on spectral peaks method. It was observed that the variation in the spectrogram parameters significantly affected the number of matchings, reliability and robustness. Under high noise conditions, the optimal spectrogram parameters were determined as 512 (window size), 50% (overlap), 512 (number of FFT). It was seen in general that the algorithm successfully detected the queried file in the database even in high noise conditions for these parameters. No significant effect of music genre was observed. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
15. Domestic pig sound classification based on TransformerCNN.
- Author
-
Liao, Jie, Li, Hongxiang, Feng, Ao, Wu, Xuan, Luo, Yuanjiang, Duan, Xuliang, Ni, Ming, and Li, Jun
- Subjects
ANIMAL sounds ,SWINE ,DEEP learning ,ARTIFICIAL intelligence ,FEATURE extraction ,EMOTIONAL state ,MANUFACTURING processes - Abstract
Excellent performance has been demonstrated in implementing challenging agricultural production processes using modern information technology, especially in the use of artificial intelligence methods to improve modern production environments. However, most of the existing work uses visual methods to train models that extract image features of organisms to analyze their behavior, and it may not be truly intelligent. Because vocal animals transmit information through grunts, the information obtained directly from the grunts of pigs is more useful to understand their behavior and emotional state, which is important for monitoring and predicting the health conditions and abnormal behavior of pigs. We propose a sound classification model called TransformerCNN, which combines the advantages of CNN spatial feature representation and the Transformer sequence coding to form a powerful global feature perception and local feature extraction capability. Through detailed qualitative and quantitative evaluations and by comparing state-of-the-art traditional animal sound recognition methods with deep learning methods, we demonstrate the advantages of our approach for classifying domestic pig sounds. The scores for domestic pig sound recognition accuracy, AUC and recall were 96.05%, 98.37% and 90.52%, respectively, all higher than the comparison model. In addition, it has good robustness and generalization capability with low variation in performance for different input features. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
16. Acoustic Based Emergency Vehicle Detection Using Ensemble of deep Learning Models.
- Author
-
Mittal, Usha and Chawla, Priyanka
- Subjects
DEEP learning ,ARTIFICIAL neural networks ,EMERGENCY vehicles ,RECURRENT neural networks ,CONVOLUTIONAL neural networks ,FEATURE extraction - Abstract
The temporal and spectral structure is possessed in the time-frequency domain by sound events. Analyzing and classifying acoustic environment using sound recording is an emerging research area. Convolutional layers can quickly extract high-level features and shift-invariant features from the time-frequency domain. In this work, emergency vehicle detection (EVD) like fire brigades, ambulances, and police cars is done based upon their siren sounds. Dataset from Google Audioset ontology was collected and features are extracted by Mel-frequency Cepstral Coefficient (MFCC). Three deep neural networks (DNN) models (dense layer, Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN)) with different configurations and parameters have been investigated. Then, an ensemble model has been designed with optimum selected models by performing experimental tests on various configurations with hyper-parameter tuning. The proposed ensemble model provides the highest accuracy of 98.7%, while the recurrent neural network (RNN) model provides an accuracy of 94.5%. Also, performance analysis of deep learning models is done with various machine learning models like Perceptron, SVM, decision tree etc. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
17. Shearer drum load identification method based on audio recognition
- Author
-
ZHUANG Deyu
- Subjects
shearer ,drum load identification ,audio recognition ,coal-rock interface recognition ,dynamic energy normalization ,maximum dissimilarity coefficient ,Mining engineering. Metallurgy ,TN1-997 - Abstract
In order to solve the problems of the existing shearer drum load identification methods, such as difficult implementation of related algorithms, complex engineering implementation mode and high application difficulty, through analyzing the characteristics of the audio signal during shearer operation, a shearer drum load identification method based on audio recognition is proposed. In order to ensure that the audio signal in each analysis period has the same load condition under the same operation standard, the cutting current and the traction speed are introduced into the dynamic energy calculation as variables, and the dynamic energy normalization algorithm (DENA) is adopted to normalize the original audio signal of the shearer. The normalized signal is compared and analyzed with the signal in the standard operation condition library, and the difference between the two is judged by the maximum dissimilarity coefficient, so as to determine the characteristics of the drum load and realize the identification and judgment of the drum load. The test results show that DENA can effectively suppress the noise energy in the audio signal and improve the resolution of the key characteristic values in the audio signal. The boundary of the characteristic parameters of the audio signal is obvious when the shearer cuts coal and rock, and there is no cross aliasing phenomenon. Under ideal conditions, that is, when the maximum dissimilarity coefficient is less than 0.189, the total coal-rock interface recognition rate can reach 78.6%.
- Published
- 2022
- Full Text
- View/download PDF
18. Incremental multiclass open-set audio recognition.
- Author
-
Jleed, Hitham and Bouchard, Martin
- Subjects
MACHINE learning ,SUPPORT vector machines ,RECOGNITION (Psychology) - Abstract
Incremental learning aims to learn new classes if they emerge while maintaining the performance for previously known classes. It acquires useful information from incoming data to update the existing models. Open-set recognition, however, requires the ability to recognize examples from known classes and reject examples from new/unknown classes. There are two main challenges in this matter. First, new class discovery: the algorithm needs to not only recognize known classes but it must also detect unknown classes. Second, model extension: after the new classes are identified, the model needs to be updated. Focusing on this matter, we introduce incremental open-set multiclass support vector machine algorithms that can classify examples from seen/unseen classes, using incremental learning to increase the current model with new classes without entirely retraining the system. Comprehensive evaluations are carried out on both open set recognition and incremental learning. For open-set recognition, we adopt the openness test that examines the effectiveness of a varying number of known/unknown labels. For incremental learning, we adapt the model to detect a single novel class in each incremental phase and update the model with unknown classes. Experimental results show promising performance for the proposed methods, compared with some representative previous methods. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
19. Drone Audio recognition based on Machine Learning Techniques.
- Author
-
Lacava, Giovanni, Mercaldo, Francesco, Martinelli, Fabio, Santone, Antonella, and Pizzi, Mario
- Subjects
DRONE aircraft - Abstract
Unmanned aerial vehicles (drones) are widely available on today's market for use for use both as a hobby and to meet specific industrial needs, such as agriculture or parcel transport. This intensive use in various areas has also led to an increase in the illegitimate and criminal use of them, which has led to an interest in identification and detection techniques in the academic world. To address this topic, we propose a method that automates drone detection processes using their acoustic characteristics with different machine learning algorithms. In this study, we show the advantage of using machine learning techniques for drone detection. Experimental analysis demonstrates that the proposed method is promising in drone detection. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
20. 基于音频识别的采煤机滚筒载荷识别方法.
- Author
-
庄德玉
- Abstract
Copyright of Journal of Mine Automation is the property of Industry & Mine Automation Editorial Department and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2022
- Full Text
- View/download PDF
21. Audio-Vision Emergency Vehicle Detection.
- Author
-
Tran, Van-Thuan and Tsai, Wei-Ho
- Abstract
Emergency vehicles (EVs) are permitted to travel at high speed to quickly reach the destination with the aid of audible and visual warning signals, and other road users are required to clear the path for EVs. However, car drivers may sometimes be unaware of nearby EVs, leading to delay in response or even traffic collisions. This work proposes audio-based and vision-based EV detection systems (EVD) that can detect EVs and alert car drivers to respond appropriately. First, we propose a modified YOLO model tailored to the EVD problem, namely YOLO-EVD, and develop a novel image dataset for vision-based EVD (V-EVD). We utilize cross-stage partial connections at the YOLO-EVD’s neck to enhance the detection performance, in which YOLO-EVD achieves 95.5% mean average precision that is better than those of the other single-stage object detectors. Second, we propose WaveResNet, an end-to-end convolutional neural network, for audio-based EVD (A-EVD) based on the classification of siren sound and traffic noise. With raw waveform input of at least one second, the WaveResNet attains high accuracies, at above 98% in normal traffic, and is robust to noise. Both YOLO-EVD and WaveResNet meet the real-time operation requirement. Also, we integrate YOLO-EVD and WaveResNet to develop a prototype of the audio-vision EVD system (AV-EVD) that is a novel approach in the literature of the EVD problem. Our experiments show the promising results of the AV-EVD system as it produces a low misdetection rate of 1.54%. The proposed A-EVD, V-EVD, and AV-EVD systems can be applied to provide safety functions for private cars, self-driving cars, or smart road infrastructure. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
22. Open Set Audio Recognition for Multi-Class Classification With Rejection
- Author
-
Hitham Jleed and Martin Bouchard
- Subjects
Open-set recognition ,audio recognition ,sound event recognition ,multi-class classification ,support vector machine ,peak side ratio ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Most supervised audio recognition systems developed to this point have used a testing set which includes the same categories as the training set database. Such systems are called closed-set recognition (CSR). However, audio recognition in real applications can be more complicated, where the datasets can be dynamic, and novel categories can ceaselessly be detected. Hence, in practice, the usual methods will assign to these novel classes labels which are often incorrect. This work aims to investigate audio open-set recognition (OSR) suitable for multi-classes classification recognition, with a rejection option for classes never seen by the system. A probabilistic calibration of a support vector machine classifier is utilized and formulated under the open-set scenario. For this, it is proposed to apply a threshold technique called peak side ratio (PSR) to the audio recognition task. A candidate label is first examined by a Platt-calibrated support vector machine (SVM) to produce posterior probabilities. The PSR is then used to characterize the distribution of posterior probabilities values. This process helps to determine a threshold in order to reject or accept a particular class. Our proposed method is evaluated on different variations of open sets, using well-known metrics. Experimental results reveal that our proposed method outperforms previous OSR approaches over a wide range of openness values.
- Published
- 2020
- Full Text
- View/download PDF
23. Acoustic-Based Emergency Vehicle Detection Using Convolutional Neural Networks
- Author
-
Van-Thuan Tran and Wei-Ho Tsai
- Subjects
Audio recognition ,convolutional neural networks ,emergency vehicle detection ,siren sounds ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
This work investigates how to detect emergency vehicles such as ambulances, fire engines, and police cars based on their siren sounds. Recognizing that car drivers may sometimes be unaware of the siren warnings from the emergency vehicles, especially when in-vehicle audio systems are used, we propose to develop an automatic detection system that determines whether there are siren sounds from emergency vehicles nearby to alert other vehicles' drivers to pay attention. A convolutional neural network (CNN)-based ensemble model (SirenNet) with two network streams is designed to classify sounds of traffic soundscape to siren sounds, vehicle horns, and noise, in which the first stream (WaveNet) directly processes raw waveform, and the second one (MLNet) works with a combined feature formed by MFCC (Mel-frequency cepstral coefficients) and log-mel spectrogram. Our experiments conducted on a diverse dataset show that the raw data can complement the MFCC and log-mel features to achieve a promising accuracy of 98.24% in the siren sound detection. In addition, the proposed system can work very well with variable input length. Even for short samples of 0.25 seconds, the system still achieves a high accuracy of 96.89%. The proposed system could be helpful for not only drivers but also autopilot systems.
- Published
- 2020
- Full Text
- View/download PDF
24. Emotion Recognition in Sound
- Author
-
Popova, Anastasiya S., Rassadin, Alexandr G., Ponomarenko, Alexander A., Kacprzyk, Janusz, Series editor, Kryzhanovsky, Boris, editor, Dunin-Barkowski, Witali, editor, and Redko, Vladimir, editor
- Published
- 2018
- Full Text
- View/download PDF
25. Ambient Sound Recognition of Daily Events by Means of Convolutional Neural Networks and Fuzzy Temporal Restrictions.
- Author
-
Polo-Rodriguez, Aurora, Vilchez Chiachio, Jose Manuel, Paggetti, Cristiano, and Medina-Quero, Javier
- Subjects
FUZZY neural networks ,CONVOLUTIONAL neural networks ,TIME-varying networks ,STREAMING audio ,KALMAN filtering - Abstract
The use of multimodal sensors to describe activities of daily living in a noninvasive way is a promising research field in continuous development. In this work, we propose the use of ambient audio sensors to recognise events which are generated from the activities of daily living carried out by the inhabitants of a home. An edge–fog computing approach is proposed to integrate the recognition of audio events with smart boards where the data are collected. To this end, we compiled a balanced dataset which was collected and labelled in controlled conditions. A spectral representation of sounds was computed using convolutional network inputs to recognise ambient sounds with encouraging results. Next, fuzzy processing of audio event streams was included in the IoT boards by means of temporal restrictions defined by protoforms to filter the raw audio event recognition, which are key in removing false positives in real-time event recognition. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
26. AudioRepInceptionNeXt: A lightweight single-stream architecture for efficient audio recognition.
- Author
-
Lau, Kin Wai, Rehman, Yasar Abbas Ur, and Po, Lai-Man
- Subjects
- *
CONVOLUTIONAL neural networks , *MEMORY - Abstract
Recent research has successfully adapted vision-based convolutional neural network (CNN) architectures for audio recognition tasks using Mel-Spectrograms. However, these CNNs have high computational costs and memory requirements, limiting their deployment on low-end edge devices. Motivated by the success of efficient vision models like InceptionNeXt and ConvNeXt, we propose AudioRepInceptionNeXt, a single-stream architecture. Its basic building block breaks down the parallel multi-branch depth-wise convolutions with descending scales of k × k kernels into a cascade of two multi-branch depth-wise convolutions. The first multi-branch consists of parallel multi-scale 1 × k depth-wise convolutional layers followed by a similar multi-branch employing parallel multi-scale k × 1 depth-wise convolutional layers. This reduces computational and memory footprint while separating time and frequency processing of Mel-Spectrograms. The large kernels capture global frequencies and long activities, while small kernels get local frequencies and short activities. We also reparameterize the multi-branch design during inference to further boost speed without losing accuracy. Experiments show that AudioRepInceptionNeXt reduces parameters and computations by 50%+ and improves inference speed 1. 28 × over state-of-the-art CNNs like the Slow–Fast while maintaining comparable accuracy. It also learns robustly across a variety of audio recognition tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Lightning-generated Whistlers recognition for accurate disaster monitoring in China and its surrounding areas based on a homologous dual-feature information enhancement framework.
- Author
-
Wang, Zijie, Yi, Jizheng, Yuan, Jing, Hu, Ronglong, Peng, Xiangji, Chen, Aibin, and Shen, Xuhui
- Subjects
- *
RECOGNITION (Psychology) , *SPACE environment , *BLOCK ciphers , *SOURCE code , *DISASTERS , *NATURAL disasters - Abstract
Natural disasters, such as earthquakes and volcanic eruptions, pose a significant threat to Earth's biodiversity and ecological environment. The ability of Lightning-generated Whistlers (LWs) to foresee these events is invaluable. However, the accurate recognition of LWs is hindered due to the spatial environmental interference and a lack of comprehensive information. This study proposes a novel framework called Dual-features Information Enhancement Framework (DIEF) with three versions: the cost-effective DIEF-B, the highly accurate DIEF-M, and the lightweight yet efficient DIEF-T. This framework aims to integrate homologous dual-feature and mitigate space environment effects for LWs recognition. Specifically, the Dual-feature Information Enhancement (DIE) module, which is based on Transformers, merges the waveform signal with the time-frequency spectrum of LWs to enhance the information representation within the feature space. In addition, Multi-scale Feature Integration (MFI) is designed to address the challenge of recognizing faint LWs in waveform signal. To correct errors in time-frequency spectrum recognition caused by space environmental interference, we adopt Mel-scale Frequency Cepstrum Coefficients (MFCCs) to enhance waveform signal features. Afterwards, the long-distance dependences between signals are improved through the Bi-directional Long Short-Term Memory (BiLSTM) network. Finally, an efficient Lightning-generated Whistlers Classifier (LWC) is developed. Numerous tests demonstrate the excellent performance and robustness of the DIEF series, which achieve 99.30% recognition accuracy on the 10,200 segments of LWs dataset acquired by Zhangheng-1 (ZH-1) satellite. The DIEF series achieves accuracy of 95.27% in audio recognition on the UrbanSound8k dataset, which is better than most current ones. Our framework can quickly and accurately recognize valuable LWs events in an interference environment, thereby benefiting for global natural disaster monitoring. Source Code is available at https://github.com/KotlinWang/DIEF. • Created with 10,200 lightning-generated whistlers of Zhang heng-1 datasets. • A homologous dual-feature information enhancement algorithm is proposed. • Developed three versions of the Dual-feature Information Enhancement Framework (DIEF). • Two different types of datasets prove the good robustness of our DIEF. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Hearing to the Unseen: AudioMoth and BirdNET as a Cheap and Easy Method for Monitoring Cryptic Bird Species
- Author
-
Universidad de Alicante. Departamento de Ecología, Bota, Gerard, Manzano-Rubio, Robert, Catalán, Lidia, Gómez-Catasús, Julia, Pérez-Granados, Cristian, Universidad de Alicante. Departamento de Ecología, Bota, Gerard, Manzano-Rubio, Robert, Catalán, Lidia, Gómez-Catasús, Julia, and Pérez-Granados, Cristian
- Abstract
The efficient analyses of sound recordings obtained through passive acoustic monitoring (PAM) might be challenging owing to the vast amount of data collected using such technique. The development of species-specific acoustic recognizers (e.g., through deep learning) may alleviate the time required for sound recordings but are often difficult to create. Here, we evaluate the effectiveness of BirdNET, a new machine learning tool freely available for automated recognition and acoustic data processing, for correctly identifying and detecting two cryptic forest bird species. BirdNET precision was high for both the Coal Tit (Peripatus ater) and the Short-toed Treecreeper (Certhia brachydactyla), with mean values of 92.6% and 87.8%, respectively. Using the default values, BirdNET successfully detected the Coal Tit and the Short-toed Treecreeper in 90.5% and 98.4% of the annotated recordings, respectively. We also tested the impact of variable confidence scores on BirdNET performance and estimated the optimal confidence score for each species. Vocal activity patterns of both species, obtained using PAM and BirdNET, reached their peak during the first two hours after sunrise. We hope that our study may encourage researchers and managers to utilize this user-friendly and ready-to-use software, thus contributing to advancements in acoustic sensing and environmental monitoring.
- Published
- 2023
29. Ambient Sound Recognition of Daily Events by Means of Convolutional Neural Networks and Fuzzy Temporal Restrictions
- Author
-
Aurora Polo-Rodriguez, Jose Manuel Vilchez Chiachio, Cristiano Paggetti, and Javier Medina-Quero
- Subjects
activity recognition ,audio recognition ,fuzzy protoforms ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
The use of multimodal sensors to describe activities of daily living in a noninvasive way is a promising research field in continuous development. In this work, we propose the use of ambient audio sensors to recognise events which are generated from the activities of daily living carried out by the inhabitants of a home. An edge–fog computing approach is proposed to integrate the recognition of audio events with smart boards where the data are collected. To this end, we compiled a balanced dataset which was collected and labelled in controlled conditions. A spectral representation of sounds was computed using convolutional network inputs to recognise ambient sounds with encouraging results. Next, fuzzy processing of audio event streams was included in the IoT boards by means of temporal restrictions defined by protoforms to filter the raw audio event recognition, which are key in removing false positives in real-time event recognition.
- Published
- 2021
- Full Text
- View/download PDF
30. TimeScaleNet: A Multiresolution Approach for Raw Audio Recognition Using Learnable Biquadratic IIR Filters and Residual Networks of Depthwise-Separable One-Dimensional Atrous Convolutions.
- Author
-
Bavu, Eric, Ramamonjy, Aro, Pujol, Hadrien, and Garcia, Alexandre
- Abstract
In this paper, we show the benefit of a multi-resolution approach that allows us to encode the relevant information contained in unprocessed time-domain acoustic signals. TimeScaleNet aims at learning an efficient representation of a sound, by learning time dependencies both at the sample level and at the frame level. The proposed approach allows us to improve the interpretability of the learning scheme, by unifying advanced deep learning and signal processing techniques. In particular, TimeScaleNet's architecture introduces a new form of recurrent neural layer, which is directly inspired from digital infinite impulse-response (IIR) signal processing. This layer acts as a learnable passband biquadratic digital IIR filterbank. The learnable filterbank allows us to build a time-frequency-like feature map that self-adapts to the specific recognition task and dataset, with a large receptive field and very few learnable parameters. The obtained frame-level feature map is then processed using a residual network of depthwise separable atrous convolutions. This second scale of analysis aims at efficiently encoding relationships between the time fluctuations at the frame timescale, in different learnt pooled frequency bands, in the range of [20 ms ; 200 ms]. TimeScaleNet is tested both using the Speech Commands Dataset and the ESC-10 Dataset. We report a high mean accuracy of $94.87 \pm 0.24 \%$ (macro averaged F1-score : $94.9 \pm 0.24 \%$) for speech recognition, and a rather moderate accuracy of $69.71 \pm 1.91 \%$ (macro averaged F1-score : $70.14 \pm 1.57 \%$) for the environmental sound classification task. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
31. A novel study for depression detecting using audio signals based on graph neural network.
- Author
-
Sun, Chenjian, Jiang, Min, Gao, Linlin, Xin, Yu, and Dong, Yihong
- Subjects
VIRTUAL networks ,MENTAL illness ,DEEP learning ,REPRESENTATIONS of graphs ,EMOTION recognition ,MENTAL depression - Abstract
Depression is a prevalent mental health disorder. The absence of specific biomarkers makes clinical diagnosis highly subjective. This makes it difficult to make a definitive diagnosis for the patient. Recently, deep learning methods have shown promise for depression detection. However, current methods tend to focus solely on the connections within or between audio signals, leading to limitations in the model's ability to recognize depression-related cues in audio signals and affecting its classification performance. To address these limitations, we propose a graph neural network approach for depression recognition that incorporates potential connections within and between audio signals. Specifically, we first use a gated recurrent unit (GRU) to extract time-series information between frame-level features of audio signals. We then construct two graph neural network modules sequentially to explore the potential connections within and between audio signals. The first graph network module constructs a graph using the frame-level features of each audio sample as nodes. The output is obtained as a graph-embedded feature vector representation after the graph convolution layers. Subsequently, the output graph embedding feature vector representation of the first graph network model is used as the nodes of the graph to construct the second graph network. The internal relationship between audio signals is encoded by the property of node neighborhood information propagation. In addition, we use a pre-trained emotion recognition network to extract emotional features that are highly correlated with depression. By further strengthening the connection weights among nodes in the second graph network through a self-attention mechanism, relevant cues are provided for the model to complete depression detection from audio signals. We conducted extensive experiments on three depression datasets, including DAIC-WOZ, MODMA, and D-Vlog. The proposed model achieves better results on several performance evaluation metrics such as accuracy, F1-score, precision, and recall compared to all the compared algorithms, validating its effectiveness. • We introduce a novel GNN model for detecting depression from audio signals. • Our model explores the inter-class variability and intra-class consistency of audios. • We use the self-attention mechanism fuses depression-related emotional features. • The proposed model achieves better results on three depression datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. IMPACT-Intelligent Memory Pool Assisted Cognition Tool : A Cueing device for the Memory Impaired
- Author
-
Naves, Samuel Cyril, Wyld, David C., editor, Wozniak, Michal, editor, Chaki, Nabendu, editor, Meghanathan, Natarajan, editor, and Nagamalai, Dhinaharan, editor
- Published
- 2011
- Full Text
- View/download PDF
33. Application of Discriminant Analysis to Distinction of Musical Instruments on the Basis of Selected Sound Parameters
- Author
-
Wieczorkowska, Alicja, Kubik-Komar, Agnieszka, Kacprzyk, Janusz, editor, Cyran, Krzysztof A., editor, Kozielski, Stanisław, editor, Peters, James F., editor, Stańczyk, Urszula, editor, and Wakulicz-Deja, Alicja, editor
- Published
- 2009
- Full Text
- View/download PDF
34. A Study on the Sound Recognition Method of Autonomous Vehicle using CNN
- Author
-
Kim, Taeho, Yoo, Minhyeok, Shin, Dae Kyeon, Park, Gooman, and Kim, Seongkweon
- Subjects
Audio Recognition ,Mel-spectrogram ,Deep Learning ,Autonomous Vehicle ,Convolutional Neural Network - Abstract
In this paper, a study on the algorithm that recognizes and judges sound source using convolutional neural network (CNN) is introduced. It is assumed that multiple of microphones are attached to receive sound information. The received sound information is then converted to visual information with the Mel-spectrogram which expands 1-dimensional sound information to 2-dimensional information. However, the shorter the extraction time by reducing n_mels, the lower the resolution of the image and the lower the performance as learning data. The value of n_mels = 64 is suggested to minimize the extraction time of Mel-spectrogram because this algorithm should be used in the autonomous vehicle. Through the computational experiment, 95% accuracy was obtained through CNN, machine learning.
- Published
- 2022
35. Piano Note Recognition: Classification Aided by Convolutional Neural Networks
- Author
-
Linares Pellicer, Jordi Joan, Jakonen, Ismo, Universitat Politècnica de València. Escuela Politécnica Superior de Alcoy - Escola Politècnica Superior d'Alcoi, Universitat Politècnica de València. Instituto Universitario Valenciano de Investigación en Inteligencia Artificial - Institut Universitari Valencià de Recerca en Intel·ligència Artificial, Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació, Girbés Mínguez, Juan, Linares Pellicer, Jordi Joan, Jakonen, Ismo, Universitat Politècnica de València. Escuela Politécnica Superior de Alcoy - Escola Politècnica Superior d'Alcoi, Universitat Politècnica de València. Instituto Universitario Valenciano de Investigación en Inteligencia Artificial - Institut Universitari Valencià de Recerca en Intel·ligència Artificial, Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació, and Girbés Mínguez, Juan
- Abstract
[ES] Una tarea que humanos pueden realizar con facilidad es el reconocimiento de notas de piano. Por ello he decidido usar una Red Neuronal Convolucional para enfrentar este problema. El entrenamiento se hizo vía un dataset que fue generado por software. Este problema también se expandió a casos de sonido de fondo aleatorio. Dado que el dataset es muy robusto y es relativamente simple, el objetivo fue alcanzado con creces. En diferentes pruebas con distintos datasets, siempre la Red consigue predecir con una precisión mejor al 95%., [EN] Artificial Neural Networks has changed how we solve problems that seemed unsolvable. One of such problems is audio recognition. The aim of the thesis was to recognize piano musical notes using neural networks. In this case, it was chosen the task of Note Identification, a task that most humans are not capable. It was tackled using a Convolutional Neural Network, as a problem of Supervised Learning. The Dataset was generated specially for this work, and in some cases, it was mixed with background rain noise. Even was expanded to the recognition of chords. Neural network efficiency was quantified as the recognition accuracy in unseen data. A Convolutional Neural Network was created that recognized piano notes. Accuracy was 100% for one note, and 96.94% for 3-note chords with background noise. In conclusion, Convolutional Neural Network allows the recognition of notes in environments with background noise. This approach could be used for more complex recognitions, including other instruments or sounds.
- Published
- 2022
36. Audio Recognition in Incremental Open-set Environments
- Author
-
Jleed, Hitham
- Subjects
Audio Recognition ,Incremental Learning ,Open-set recognition - Abstract
Machine learning algorithms have shown their abilities to tackle difficult recognition problems, but they are still rife with challenges. Among these challenges is how to deal with problems where new categories constantly occur, and the datasets can dynamically grow. Most contemporary learning algorithms developed to this point are governed by the assumptions that all testing data classes must be the same as training data classes, often with equal distribution. Under these assumptions, machine-learning algorithms can perform very well, using their ability to handle large feature spaces and classify outliers. The systems under these assumptions are called Closed Set Recognition systems (CSR). However, these assumptions cannot reflect practical applications in which out-of-set data may be encountered. This adversely affects the recognition prediction performances. When samples from a new class occur, they will be classified as one of the known classes. Even if this sample is far from any of the training samples, the algorithm may classify it with a high probability, that is, the algorithm will not only be wrong, but it may also be very confident in its results. A more practical problem is Open Set Recognition (OSR), where samples of classes not seen during training may show up at testing time. Inherently, there is a problem how the system can identify the novel sound classes and how the system can update its models with new classes. This thesis highlights the problems of multi-class recognition for OSR of sounds as well as incremental model adaptation and proposes solutions towards addressing these problems. The proposed solutions are validated through extensive experiments and are shown to provide improved performance over a wide range of openness values for sound classification scenarios.
- Published
- 2022
- Full Text
- View/download PDF
37. Piano Note Recognition: Classification Aided by Convolutional Neural Networks
- Author
-
Girbés Mínguez, Juan
- Subjects
Artificial intelligence ,Grado en Ingeniería Informática-Grau en Enginyeria Informàtica ,Machine learning ,Reconocimiento de audio ,Notas de piano ,Audio recognition ,Red neuronal ,Piano notes ,LENGUAJES Y SISTEMAS INFORMATICOS ,Inteligencia artificial ,Neural network - Abstract
[ES] Una tarea que humanos pueden realizar con facilidad es el reconocimiento de notas de piano. Por ello he decidido usar una Red Neuronal Convolucional para enfrentar este problema. El entrenamiento se hizo vía un dataset que fue generado por software. Este problema también se expandió a casos de sonido de fondo aleatorio. Dado que el dataset es muy robusto y es relativamente simple, el objetivo fue alcanzado con creces. En diferentes pruebas con distintos datasets, siempre la Red consigue predecir con una precisión mejor al 95%., [EN] Artificial Neural Networks has changed how we solve problems that seemed unsolvable. One of such problems is audio recognition. The aim of the thesis was to recognize piano musical notes using neural networks. In this case, it was chosen the task of Note Identification, a task that most humans are not capable. It was tackled using a Convolutional Neural Network, as a problem of Supervised Learning. The Dataset was generated specially for this work, and in some cases, it was mixed with background rain noise. Even was expanded to the recognition of chords. Neural network efficiency was quantified as the recognition accuracy in unseen data. A Convolutional Neural Network was created that recognized piano notes. Accuracy was 100% for one note, and 96.94% for 3-note chords with background noise. In conclusion, Convolutional Neural Network allows the recognition of notes in environments with background noise. This approach could be used for more complex recognitions, including other instruments or sounds.
- Published
- 2022
38. Maschinelles Hören mit Hilfe von Convolutional Neural Networks
- Author
-
Bohl, Anna Tilly
- Subjects
Machine Learning ,Audio Recognition ,Machine Hearing ,Spektrogramm ,Spracherkennung ,Spectrogram ,CNN ,Maschinelles Lernen - Abstract
Diese Arbeit beschäftigt sich mit Convolutional Neural Networks (CNNs) und erklärt, wie diese funktionieren und für Aufgaben im Bereich des maschinellen Hörens genutzt werden können. Außerdem wird ein kurzer Überblick über künstliche neuronale Netze (KNNs) und ihr Training im Allgemeinen gegeben. Die Möglichkeiten verschiedener Techniken zur bildlichen Darstellung von Audiodaten im Zusammenhang mit CNNs werden diskutiert. Einige der zitierten Arbeiten zeigen, dass CNNs sogar aus unverarbeiteten Wellenformen aussagekräftige Informationen bezüglich der ursprünglichen Audiodaten extrahieren können. Möglichkeiten für zukünftige Arbeiten sind beispielsweise der Entwurf eines KNN das auf dem auditiven Cortex basiert, sowie weitere Forschung im Bereich der visuellen Darstellung von Audiodaten. This thesis investigates how convolutional neural networks (CNNs) work and how they can be used for machine hearing tasks, despite originally being designed for computer vision. It also gives a general overview over artificial neural networks (ANNs) and their training. It discusses the possibilities of different preprocessing techniques that are used to visually represent audio data and pass it to CNNs. Contrary to prior assumptions, even raw timedomain waveforms have been found to work well as input data for CNNs, as shown by some of the work cited in this thesis. Possibilities for future work include the design of an ANN based on the human auditory cortex and, until then, deeper investigation of techniques for visual representation of audio data.
- Published
- 2022
39. Erkennung von Audio Deepfakes mithilfe von Convolutional Neural Networks
- Author
-
Bohl, Anna Tilly
- Subjects
Deepfake ,Machine Learning ,Audio Recognition ,Künstliche Neuronale Netze ,Sprachsynthese ,Machine Hearing ,Spracherkennung ,Speech Synthesis ,Maschinelles Lernen ,CNN ,Feature Extraction - Abstract
Diese Arbeit untersucht Möglichkeiten zur Erkennung von Audio Deepfakes mithilfe von Deep Learning (DL) Algorithmen. Es wird ein Überblick über Convolutional Neural Networks (CNNs) und Techniken zur Erstellung von Deepfakes gegeben, zudem werden aktuelle DL Modelle zur Erkennung von Deepfakes präsentiert. Ein eigenes Modell, bestehend aus einem flachen CNN, wird vorgestellt und mithilfe eines selbsterstellten Datensatzes getestet. Anders als erwartet funktioniert das Modell mit den vorhandenen Daten nicht, es liefert aber dennoch Fragen, die in zukünftigen Arbeiten geklärt werden können. Zudem findet ein kurzer Exkurs zur Feature Extraction statt. Dafür werden einzelne Audiodateien aus dem vorhandenen Datensatz genutzt und an eine Web Applikation übergeben. Die von der Web App auf verschiedenen Ebenen eines CNN erstellten Feature Maps werden anschließend verglichen. This work explores the possibilities of detecting audio deepfakes using deep learning algorithms. Relevant background information on Convolutional Neural Networks (CNNs) and deepfake creation is given before state of the art models for deepfake detection are presented. A model containing a shallow CNN is proposed and tested using a dataset made by the author. The proposed model does not work as intended using the present dataset but it still raises questions that may be answered in future work. A brief digression on feature extraction is made using the self made dataset and a web application to compare feature maps of different layers of a CNN.
- Published
- 2022
40. 2D-Haar声学特征超向量快速生成方法.
- Author
-
谢尔曼, 罗森林, and 潘丽敏
- Abstract
Copyright of Transactions of Beijing Institute of Technology is the property of Beijing University of Technology and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2016
- Full Text
- View/download PDF
41. AIoT-based Audio Recognition System for Smart Home Applications
- Author
-
Tzu-Hsiung Chen, Ming-Hwa Sheu, Szu-Hong Wang, Bo-Wei Chen, Hao-Ting Pai, and Yu-Syuan Jhang
- Subjects
Raspberry pi ,Frequency conversion ,Noise measurement ,Computer science ,Analytics ,business.industry ,Home automation ,QUIET ,Speech recognition ,Audio recognition ,Line (text file) ,business - Abstract
In this paper, we design an audio recognition system to detect events of lighters sound, which names Audio Recognition System (ARS). ARS is composed of AIOT device (i.e. Raspberry Pi), deep-learning-based analytics, and real-time alarming advisory (e.g. Line Notify). We conduct experiments with 8,000 observations. The result shows ARS achieves 97% accuracy in a quiet place and 94% accuracy in a noisy environment.
- Published
- 2021
- Full Text
- View/download PDF
42. TimeScaleNet: A Multiresolution Approach for Raw Audio Recognition Using Learnable Biquadratic IIR Filters and Residual Networks of Depthwise-Separable One-Dimensional Atrous Convolutions
- Author
-
Eric Bavu, Alexandre Garcia, Aro Ramamonjy, Hadrien Pujol, Laboratoire de Mécanique des Structures et des Systèmes Couplés (LMSSC), and Conservatoire National des Arts et Métiers [CNAM] (CNAM)
- Subjects
Computer science ,Audio recognition ,Learnable Biquadratic filters ,01 natural sciences ,Convolution ,Separable space ,03 medical and health sciences ,Deep Learning ,0302 clinical medicine ,[STAT.ML]Statistics [stat]/Machine Learning [stat.ML] ,[INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing ,0103 physical sciences ,Feature (machine learning) ,Electrical and Electronic Engineering ,Multiresolution ,030223 otorhinolaryngology ,010301 acoustics ,Infinite impulse response ,[SPI.ACOU]Engineering Sciences [physics]/Acoustics [physics.class-ph] ,Signal processing ,Artificial neural network ,business.industry ,Deep learning ,Machine hearing ,Time domain modelling ,Filter bank ,Signal Processing ,Artificial intelligence ,business ,Algorithm - Abstract
In this paper, we show the benefit of a multi-resolution approach that allows us to encode the relevant information contained in unprocessed time-domain acoustic signals. TimeScaleNet aims at learning an efficient representation of a sound, by learning time dependencies both at the sample level and at the frame level. The proposed approach allows us to improve the interpretability of the learning scheme, by unifying advanced deep learning and signal processing techniques. In particular, TimeScaleNet's architecture introduces a new form of recurrent neural layer, which is directly inspired from digital infinite impulse-response (IIR) signal processing. This layer acts as a learnable passband biquadratic digital IIR filterbank. The learnable filterbank allows us to build a time-frequency-like feature map that self-adapts to the specific recognition task and dataset, with a large receptive field and very few learnable parameters. The obtained frame-level feature map is then processed using a residual network of depthwise separable atrous convolutions. This second scale of analysis aims at efficiently encoding relationships between the time fluctuations at the frame timescale, in different learnt pooled frequency bands, in the range of [20 ms ; 200 ms]. TimeScaleNet is tested both using the Speech Commands Dataset and the ESC-10 Dataset. We report a high mean accuracy of $94.87 \pm 0.24 \%$ (macro averaged F1-score : $94.9 \pm 0.24 \%$ ) for speech recognition, and a rather moderate accuracy of $69.71 \pm 1.91 \%$ (macro averaged F1-score : $70.14 \pm 1.57 \%$ ) for the environmental sound classification task.
- Published
- 2019
- Full Text
- View/download PDF
43. Ambient Sound Recognition of Daily Events by Means of Convolutional Neural Networks and Fuzzy Temporal Restrictions
- Author
-
Jose Manuel Vilchez Chiachio, Cristiano Paggetti, Javier Medina-Quero, and Aurora Polo-Rodriguez
- Subjects
Technology ,Computer science ,QH301-705.5 ,Speech recognition ,QC1-999 ,Ambient noise level ,02 engineering and technology ,Fuzzy logic ,Convolutional neural network ,Field (computer science) ,Activity recognition ,Raw audio format ,0202 electrical engineering, electronic engineering, information engineering ,General Materials Science ,activity recognition ,Biology (General) ,Instrumentation ,QD1-999 ,fuzzy protoforms ,Fluid Flow and Transfer Processes ,Event (computing) ,Process Chemistry and Technology ,Physics ,General Engineering ,020206 networking & telecommunications ,audio recognition ,Engineering (General). Civil engineering (General) ,Computer Science Applications ,Chemistry ,Filter (video) ,020201 artificial intelligence & image processing ,TA1-2040 - Abstract
The use of multimodal sensors to describe activities of daily living in a noninvasive way is a promising research field in continuous development. In this work, we propose the use of ambient audio sensors to recognise events which are generated from the activities of daily living carried out by the inhabitants of a home. An edge–fog computing approach is proposed to integrate the recognition of audio events with smart boards where the data are collected. To this end, we compiled a balanced dataset which was collected and labelled in controlled conditions. A spectral representation of sounds was computed using convolutional network inputs to recognise ambient sounds with encouraging results. Next, fuzzy processing of audio event streams was included in the IoT boards by means of temporal restrictions defined by protoforms to filter the raw audio event recognition, which are key in removing false positives in real-time event recognition.
- Published
- 2021
44. Combining and Comparing Multiple Algorithms for Better Learning and Classification: A Case Study of MARF
- Author
-
Serguei Mokhov
- Subjects
Java ,business.industry ,Computer science ,Digital forensics ,Modular design ,Data structure ,Machine learning ,computer.software_genre ,Pipeline (software) ,Identification (information) ,ComputingMethodologies_PATTERNRECOGNITION ,Audio recognition ,Artificial intelligence ,business ,Design methods ,computer ,Algorithm ,computer.programming_language - Abstract
We presented an overview of MARF, a modular and extensible pattern recognition framework for a reasonably diverse spectrum of the learning and recognition tasks. We outlined the pipeline and the data structures used in this open-source project in a practical manner. We provided some typical results one can obtain by running MARF’s implementations for various learning and classification problems. 8.1 Advantages and disadvantages of the approach The framework approach is both an advantage and a disadvantage. The advantage is obvious – a consistent and uniform environment and implementing platform for comparative studies with a plug-in architecture. However, as the number of algorithms grows it is more difficult to adjust the framework’s API itself without breaking all the modules that depend on it. The coverage of algorithms is as good as the number of them implemented in / contributed to the project. In the results mentioned in Section 7 we could have attained better precision in some cases if better algorithm implementations were available (or any bugs in exiting ones fixed).
- Published
- 2021
45. SLOW-FAST AUDITORY STREAMS FOR AUDIO RECOGNITION
- Author
-
Dima Damen, Andrew Zisserman, Arsha Nagrani, and Evangelos Kazakos
- Subjects
FOS: Computer and information sciences ,Sound (cs.SD) ,fusion ,Computer science ,Speech recognition ,Computer Vision and Pattern Recognition (cs.CV) ,channel capacity ,Computer Science - Computer Vision and Pattern Recognition ,02 engineering and technology ,Computer Science - Sound ,030218 nuclear medicine & medical imaging ,Convolution ,03 medical and health sciences ,Channel capacity ,conferences ,0302 clinical medicine ,Audio and Speech Processing (eess.AS) ,0202 electrical engineering, electronic engineering, information engineering ,FOS: Electrical engineering, electronic engineering, information engineering ,Audio recognition ,convolution ,visualization ,training ,action recognition ,multi-stream networks ,speech recognition ,audio recognition ,time-frequency analysis ,Time–frequency analysis ,Visualization ,Temporal resolution ,Spectrogram ,020201 artificial intelligence & image processing ,State (computer science) ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
We propose a two-stream convolutional network for audio recognition, that operates on time-frequency spectrogram inputs. Following similar success in visual recognition, we learn Slow-Fast auditory streams with separable convolutions and multi-level lateral connections. The Slow pathway has high channel capacity while the Fast pathway operates at a fine-grained temporal resolution. We showcase the importance of our two-stream proposal on two diverse datasets: VGG-Sound and EPIC-KITCHENS-100, and achieve state-of-the-art results on both., Comment: Accepted for presentation at ICASSP 2021
- Published
- 2021
46. Audio recognition techniques: signal processing approaches with secure cloud storage
- Author
-
Murtadha Arif Bin Sahbudin
- Subjects
audio recognition ,signal processing ,fingerprint ,cloud storage ,digital music ,database management ,Settore ING-INF/05 - Sistemi di Elaborazione delle Informazioni - Published
- 2021
47. Piano Note Recognition: Classification Aided by Convolutional Neural Networks
- Author
-
Girbés Mínguez, Juan
- Subjects
Artificial intelligence ,Grado en Ingeniería Informática-Grau en Enginyeria Informàtica ,Machine learning ,Reconocimiento de audio ,Notas de piano ,Audio recognition ,Red neuronal ,Piano notes ,LENGUAJES Y SISTEMAS INFORMATICOS ,Inteligencia artificial ,Neural network - Abstract
[ES] Una tarea que humanos pueden realizar con facilidad es el reconocimiento de notas de piano. Por ello he decidido usar una Red Neuronal Convolucional para enfrentar este problema. El entrenamiento se hizo vía un dataset que fue generado por software. Este problema también se expandió a casos de sonido de fondo aleatorio. Dado que el dataset es muy robusto y es relativamente simple, el objetivo fue alcanzado con creces. En diferentes pruebas con distintos datasets, siempre la Red consigue predecir con una precisión mejor al 95%. [EN] Artificial Neural Networks has changed how we solve problems that seemed unsolvable. One of such problems is audio recognition. The aim of the thesis was to recognize piano musical notes using neural networks. In this case, it was chosen the task of Note Identification, a task that most humans are not capable. It was tackled using a Convolutional Neural Network, as a problem of Supervised Learning. The Dataset was generated specially for this work, and in some cases, it was mixed with background rain noise. Even was expanded to the recognition of chords. Neural network efficiency was quantified as the recognition accuracy in unseen data. A Convolutional Neural Network was created that recognized piano notes. Accuracy was 100% for one note, and 96.94% for 3-note chords with background noise. In conclusion, Convolutional Neural Network allows the recognition of notes in environments with background noise. This approach could be used for more complex recognitions, including other instruments or sounds.
- Published
- 2021
48. Using Spasmodic Closure Patterns to Simplify Visual Voice Activity Detection
- Author
-
Ananth Goyal
- Subjects
Range (mathematics) ,Interval (music) ,Voice activity detection ,Movement (music) ,Computer science ,Speech recognition ,Audio recognition ,True positive rate - Abstract
While speaking, humans exhibit a number of recognizable patterns; most notably, the repetitive nature of mouth movement from closed to open. The following paper presents a novel method to computationally determine when video data contains a person speaking through the recognition and tally of lip facial closures within a given interval. A combination of Haar-Feature detection and eigenvectors are used to recognize when a target individual is present, but by detecting and quantifying spasmodic lip movements and comparing them to the ranges seen in true positives, we are able to predict when true speech occurs without the need for complex facial mappings. Although the results are within a reasonable accuracy range when compared to current methods, the comprehensibility and simple nature of the approach used can reduce the strenuousness of current techniques and, if paired with synchronous audio recognition methods, can streamline the future of voice activity detection as a whole.
- Published
- 2020
- Full Text
- View/download PDF
49. High-Precision Specific Audio Event Recognition Method Combining SVM and GMM.
- Author
-
LUO Sen-lin, WANG Kun, XIE Er-man, PAN Li-min, and LI Jin-yu
- Subjects
PATTERN recognition systems ,PRECISION (Information retrieval) ,SUPPORT vector machines ,GAUSSIAN mixture models ,COMPUTATIONAL complexity ,DATA analysis - Abstract
There are several problems such as high time consumption and low recognition accuracy on the short duration audio events. In this paper, an audio event recognition method that combined GMM and SVM was put forward. The method used the statistical distribution description of GMM and the promote generalization ability of SVM, and used the respective result of the recognition of GMM and SVM fusion processing, ten-type gunshots, such as handguns, rifles, machine guns etc. were used as the experimental data. Compared with the ordinary methods, which have to train 10 specific templates from each gunshot type, the proposed method just needed 2 templates to fulfill the recognition task. Experimental results show that the proposed method yields an accuracy of 92. 71%. Furthermore, because much fewer templates and training processes are required, this method is easy to implement and improves the efficiency significantly with a low algorithm complexity. [ABSTRACT FROM AUTHOR]
- Published
- 2014
50. Dual Stage Learning Based Dynamic Time-Frequency Mask Generation for Audio Event Classification
- Author
-
Hanseok Ko, David K. Han, Jaihyun Park, and Donghyeon Kim
- Subjects
Computer science ,Event (relativity) ,Speech recognition ,Audio recognition ,Learning based ,Dual stage ,Time–frequency analysis - Published
- 2020
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.