1,680 results on '"sound processing"'
Search Results
2. Sound-Based Industrial Machine Malfunction Identification System by Deep Learning Approach.
- Author
-
Prasetio, Barlian Henryranu and Syaifuddin, Tio
- Abstract
Malfunctions in industrial machines pose significant challenges and require timely identification to prevent disruptions in manufacturing processes. Traditional methods of detection involve manual inspection by technicians, which are often inefficient. Nowadays, sound processing technology has been introduced as a potential way of identifying malfunctioning machines by integrating automated systems. This work offers a new system aimed at detecting improper functioning of industrial machines employing the sound processing technology. The Mel-filterbank energy (MFE) technique is used for feature extraction because it has been proven to perform well in non-voice acoustic scenarios. The classification process incorporates Convolutional Neural Network (CNN) procedures to improve the accuracy of the malfunction detection. The machine learning model is developed on the Edge Impulse platform and subsequently embedded in an ESP32 microcontroller, enabling real-time processing at the edge. To evaluate the system's performance, the publicly available Malfunctioning Industrial Machine Investigation and Inspection (MIMII) dataset is utilized, comprising four distinct types of industrial machines: fan, valve, pump, and slide rail. Each machine type consists of two classes, normal and abnormal. The system is controlled and monitored through an Android smartphone application via Bluetooth communication. Experimental tests with 10 normal and 10 abnormal samples for each machine type demonstrate promising results: 80% accuracy for fan malfunctions, 75% for valve malfunctions, 95% for pump malfunctions, and 100% for slide rail malfunctions. The overall computing time is measured at 213 ms, highlighting the efficiency and real-time capabilities of the proposed system. This research provides an opportunity to discover various methods of industrial machine malfunction detection, except conventional visual inspection. The combination of sound processing technology, MFE feature extraction and CNN classification on edge devices appears to be a more practical approach with possibilities in numerous industrial fields. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Machine Learning and Sound Processing in Vocal Disease Detection
- Author
-
Mihai-Andrei Costandache
- Subjects
machine learning ,sound processing ,medicine ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
We present in this paper some of the existing machine learning and sound processing techniques involved in the medical process and show how they can be applied in the context of a vocal disease detection task. The machine learning techniques are the usual, ranging from decision trees to neural networks, and when it comes to the sound processing approaches some scientists probably are familiar with them from the speech processing or music analysis areas. However, the techniques are adapted to the particularities of the medical field -- data collection, verification, etc. Through the concrete example of the vocal disease detection task we worked on, we made some interesting observations, both on the classification ability of the model and on the influence of the data.
- Published
- 2024
- Full Text
- View/download PDF
4. Lung Sound Classification for Respiratory Disease Identification Using Deep Learning: A Survey.
- Author
-
Wanasinghe, Thinira, Bandara, Sakuni, Madusanka, Supun, Meedeniya, Dulani, Bandara, Meelan, and De La Torre Díez, Isabel
- Subjects
NOSOLOGY ,DEEP learning ,RESPIRATORY diseases ,LUNGS ,MEDICAL personnel ,EARLY diagnosis - Abstract
Integrating artificial intelligence (AI) into lung sound classification has markedly improved respiratory disease diagnosis by analysing intricate patterns within audio data. This study is driven by the widespread issue of lung diseases, which affect around 500 million people globally. Early detection of respiratory diseases is crucial for delivering timely and effective treatment. Our study consists of a comprehensive survey of lung sound classification methodologies, exploring the advancements made in leveraging AI to identify and classify respiratory diseases. This survey thoroughly investigates lung sound classification models, along with data augmentation, feature extraction, explainable techniques and support tools to improve systems for diagnosing respiratory conditions. Our goal is to provide meaningful insights for healthcare professionals, researchers and technologists who are dedicated to developing methodologies for the early detection of pulmonary diseases. The paper provides a summary of the current status of lung sound classification research, highlighting both advancements and challenges in the use of AI for more accurate and efficient diagnostic methods in respiratory healthcare. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Is it too loud? Ask your brain!
- Author
-
Philipp Zelger, Josef Seebacher, Simone Graf, and Sonja Rossi
- Subjects
Sound processing ,Loudness perception ,Cognitive neuroscience ,Event-related potentials ,Neurosciences. Biological psychiatry. Neuropsychiatry ,RC321-571 - Abstract
Purpose: In this study, the objectification of the subjective perception of loudness was investigated using electroencephalography (EEG). In particular, the emergence of objective markers in the domain of the acoustic discomfort threshold was examined. Methods: A cohort of 27 adults with normal hearing, aged between 18 and 30, participated in the study. The participants were presented with 500 ms long noise stimuli via in-ear headphones. The acoustic signals were presented with sound levels of [55, 65, 75, 85, 95 dB]. After each stimulus, the subjects provided their subjective assessment of the perceived loudness using a colored scale on a touchscreen. EEG signals were recorded, and afterward, event-related potentials (ERPs) locked to sound onset were analyzed. Results: Our findings reveal a linear dependency between the N100 component and both the sound level and the subjective loudness categorization of the sound. Additionally, the data demonstrated a nonlinear relationship between the P300 potential and the sound level as well as for the subjective loudness rating. The P300 potential was elicited exclusively when the stimuli had been subjectively rated as ”very loud” or ”too loud”. Conclusion: The findings of the present study suggest the possibility of the identification of the subjective uncomfortable loudness level by objective neural parameters.
- Published
- 2024
- Full Text
- View/download PDF
6. An AI-Mediated VR Sound Installation
- Author
-
Santini, Giovanni, Chen, Zhonghao, di Prisco, Marco, Series Editor, Chen, Sheng-Hong, Series Editor, Vayas, Ioannis, Series Editor, Kumar Shukla, Sanjay, Series Editor, Sharma, Anuj, Series Editor, Kumar, Nagesh, Series Editor, Wang, Chien Ming, Series Editor, Cui, Zhen-Dong, Series Editor, Di Marco, Giancarlo, editor, Lombardi, Davide, editor, and Tedjosaputro, Mia, editor
- Published
- 2024
- Full Text
- View/download PDF
7. Immersivity in music performance with original compositions
- Author
-
Michael, Kyriacos
- Subjects
Immersive ,Immersivity ,Music ,Performance ,Taxonomy ,Proximity ,Envelopment ,Sound Processing ,Visual Processing ,Audience Engagement ,Practice-Based ,Practice-Led ,Spatial Music ,Sound Diffusion ,Acousmatic Music ,Electroacoustic Music ,Soundscape Music ,Binaural Audio ,Surround-Sound ,Ambisonics ,Object-Based Audio ,Mono ,Stereophony ,Phantom Imaging - Abstract
The aim of this study is to critically investigate immersivity in music performance. It will evaluate how the combination of the performance space, musical material and delivery methods can produce unique and valuable sonic experiences to an audience. This investigative process hopes to highlight what compositional and spatial characteristics define this performance paradigm, with the goal to provide a taxonomy of key characteristics which creators must consider when devising an immersive music performance.
- Published
- 2023
- Full Text
- View/download PDF
8. Machine Learning and Sound Processing in Vocal Disease Detection.
- Author
-
Costandache, Mihai-Andrei
- Subjects
- *
MUSICAL analysis , *DECISION trees , *SPEECH , *ACQUISITION of data - Abstract
We present in this paper some of the existing machine learning and sound processing techniques involved in the medical process and show how they can be applied in the context of a vocal disease detection task. The machine learning techniques are the usual, ranging from decision trees to neural networks, and when it comes to the sound processing approaches some scientists probably are familiar with them from the speech processing or music analysis areas. However, the techniques are adapted to the particularities of the medical field - data collection, verification, etc. Through the concrete example of the vocal disease detection task we worked on, we made some interesting observations, both on the classification ability of the model and on the influence of the data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. A Survey on Deep Learning Based Forest Environment Sound Classification at the Edge.
- Author
-
MEEDENIYA, DULANI, ARIYARATHNE, ISURU, BANDARA, MEELAN, JAYASUNDARA, ROSHINIE, and PERERA, CHARITH
- Published
- 2024
- Full Text
- View/download PDF
10. Recording, Processing, and Reproduction of Vibrations Produced by Impact Noise Sources in Buildings †.
- Author
-
Dolezal, Franz, Reichenauer, Andreas, Wilfling, Armin, Neusser, Maximilian, and Prislan, Rok
- Subjects
ACOUSTIC field ,WHOLE-body vibration ,REPRODUCTION ,LIGHTWEIGHT construction ,AUDITORY perception ,BUILT environment ,UNDERWATER noise - Abstract
Several studies on the perception of impact sounds question the correlation of standardized approaches with perceived annoyance, while more recent studies have come to inconsistent conclusions. All these studies neglected the aspect of whole-body vibrations, which are known to be relevant for the perception of low-frequency sound and can be perceived especially in lightweight constructions. Basically, the contribution of vibrations to impact sound annoyance is still unknown and could be the reason for the contradictory results. To investigate this aspect, we measured vibrations on different types of floors under laboratory conditions and in situ. For this purpose, a vibration-sensing device was developed to record vibrations more cost-effectively and independently of commercial recording instruments. The vibrations of predefined impact sequences were recorded together with the sound field using a higher-order ambisonics microphone. In addition, a vibration exposure device was developed to expose the test objects to the exact vibrations that occur in the built environment. The vibration exposure device is integrated into the ambisonics reproduction system, which consists of a large number of loudspeakers in a spherical configuration. The article presents the development and performance achieved using the vibration-sensing unit and the vibration exposure device. The study is relevant for conducting future impact sound listening tests under laboratory conditions, which can be extended to include the reproduction of vibrations. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. ECOGEN: Bird sounds generation using deep learning
- Author
-
Axel‐Christian Guei, Sylvain Christin, Nicolas Lecomte, and Éric Hervet
- Subjects
bird songs ,computer vision ,deep learning ,sound generation ,sound processing ,variational autoencoders ,Ecology ,QH540-549.5 ,Evolution ,QH359-425 - Abstract
Abstract Large‐scale acoustic projects generate vast amounts of data that can now be efficiently processed using deep learning tools. However, these tools often face limitations due to sound labeling and imbalanced sampling. Data augmentation can help overcome such challenges, particularly through the generation of synthetic and lifelike sounds. Synthetic samples can be valuable not only for deep learning but also for species with limited available data. Despite advancements in computer power, sound generation remains a time‐consuming process, even requiring a substantial number of samples. We present ECOGEN, a novel deep learning approach designed to generate realistic bird songs for biologists and ecologists. The primary objective of ECOGEN is to enhance the number of samples in under‐represented bird song classes, thereby improving the performance and robustness of classifiers in ecological research.The ECOGEN framework employs spectrograms as a representation of bird songs and leverages proven image generation techniques to create new spectrograms, subsequently converted back to digital audio signals. As a class‐agnostic tool, ECOGEN is applicable to a wide range of biophonic sounds, including mammal and insect calls. We show that adding samples generated by ECOGEN to a bird song classifier improved the classification accuracy by 12% on average and improved results compared with classic data augmentation techniques 80% of the time. Our approach is both fast and efficient, enabling the generation of synthetic bird songs on standard computing resources. By facilitating the creation of synthetic bird songs, ECOGEN can contribute to the conservation of endangered bird species, while providing valuable insights into their vocalizations, behaviours and habitat preferences. Future development of ECOGEN can be easily implemented and could focus on incorporating additional configurable parameters during the generation phase for increased control over the output, catering to the specific needs of biologists.
- Published
- 2024
- Full Text
- View/download PDF
12. Lung Sound Classification With Multi-Feature Integration Utilizing Lightweight CNN Model
- Author
-
Thinira Wanasinghe, Sakuni Bandara, Supun Madusanka, Dulani Meedeniya, Meelan Bandara, and Isabel De La Torre Diez
- Subjects
Artificial intelligence ,explainability ,respiratory diseases ,sound processing ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Detecting respiratory diseases is of utmost importance, considering that respiratory ailments represent one of the most prevalent categories of diseases globally. The initial stage of lung disease detection involves auscultation conducted by specialists, relying significantly on their expertise. Therefore, automating the auscultation process for the detection of lung diseases can yield enhanced efficiency. Artificial intelligence (AI) has shown promise in improving the accuracy of lung sound classification by extracting features from lung sounds that are relevant to the classification task and learning the relationships between these features and the different pulmonary diseases. This paper utilizes two publicly available respiratory sound recordings namely, ICBHI 2017 challenge dataset and another lung sound dataset available at Mendeley Data. Foremost in this paper, we provide a detailed exposition about employing a Convolutional Neural Network (CNN) that utilizes feature extraction from Mel spectrograms, Mel frequency cepstral coefficients (MFCCs), and Chromagram. The highest accuracy achieved in the developed classification is 91.04% for 10 classes. Extending the contribution, this paper elaborates on the explanation of the classification model prediction by employing Explainable Artificial Intelligence (XAI). The novel contribution of this study is a CNN model that classifies lung sounds into 10 classes by combining audio-specific features to enhance the classification process.
- Published
- 2024
- Full Text
- View/download PDF
13. Chainsaw Sound Detection Using DNN Algorithm
- Author
-
Florin Bogdan MARIN and Mihaela MARIN
- Subjects
sound processing ,sound recognition ,chainsaw sound detection ,Mining engineering. Metallurgy ,TN1-997 - Abstract
Deforestation and illegal logging stand as important environmental problems. In this paper we propose a DNN architecture for sound recognition of chainsaw detection. Various parameters need to be tuned in order to identify the sound of chainsaw but not to produce too much amounts of false positive detection. The task is challenging as different sound emerge in the forest.
- Published
- 2023
- Full Text
- View/download PDF
14. MFCC Selection by LASSO for Honey Bee Classification.
- Author
-
Libal, Urszula and Biernacki, Pawel
- Subjects
HONEYBEES ,BEEKEEPING ,BEE colonies ,BEES ,SPRING ,MACHINE learning ,SUMMER - Abstract
Featured Application: An automatic honey bee classification system based on audio signals for tracking the frequency of workers and drones entering and leaving a hive. The recent advances in smart beekeeping focus on remote solutions for bee colony monitoring and applying machine learning techniques for automatic decision making. One of the main applications is a swarming alarm, allowing beekeepers to prevent the bee colony from leaving their hive. Swarming is a naturally occurring phenomenon, mainly during late spring and early summer, but it is extremely hard to predict its exact time since it is highly dependent on many factors, including weather. Prevention from swarming is the most effective way to keep bee colonies; however, it requires constant monitoring by the beekeeper. Drone bees do not survive the winter and they occur in colonies seasonally with a peak in late spring, which is associated with the creation of drone congregation areas, where mating with young queens takes place. The paper presents a method of early swarming mood detection based on the observation of drone bee activity near the entrance to a hive. Audio recordings are represented by Mel Frequency Cepstral Coefficients and their first and second derivatives. The study investigates which MFCC coefficients, selected by the Least Absolute Shrinkage and Selection Operator, are significant for the worker bee and drone bee classification task. The classification results, obtained by an autoencoder neural network, allow to improve the detection performance, achieving accuracy slightly above 95% for the chosen set of signal features, selected by the proposed method, compared to the standard set of MFCC coefficients with only up to 90% accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. ECOGEN: Bird sounds generation using deep learning.
- Author
-
Guei, Axel‐Christian, Christin, Sylvain, Lecomte, Nicolas, and Hervet, Éric
- Subjects
BIRDSONGS ,DEEP learning ,RARE birds ,SONGBIRDS ,BIRD conservation ,DIGITAL audio ,BIRD populations - Abstract
Large‐scale acoustic projects generate vast amounts of data that can now be efficiently processed using deep learning tools. However, these tools often face limitations due to sound labeling and imbalanced sampling. Data augmentation can help overcome such challenges, particularly through the generation of synthetic and lifelike sounds. Synthetic samples can be valuable not only for deep learning but also for species with limited available data. Despite advancements in computer power, sound generation remains a time‐consuming process, even requiring a substantial number of samples.We present ECOGEN, a novel deep learning approach designed to generate realistic bird songs for biologists and ecologists. The primary objective of ECOGEN is to enhance the number of samples in under‐represented bird song classes, thereby improving the performance and robustness of classifiers in ecological research.The ECOGEN framework employs spectrograms as a representation of bird songs and leverages proven image generation techniques to create new spectrograms, subsequently converted back to digital audio signals. As a class‐agnostic tool, ECOGEN is applicable to a wide range of biophonic sounds, including mammal and insect calls.We show that adding samples generated by ECOGEN to a bird song classifier improved the classification accuracy by 12% on average and improved results compared with classic data augmentation techniques 80% of the time.Our approach is both fast and efficient, enabling the generation of synthetic bird songs on standard computing resources. By facilitating the creation of synthetic bird songs, ECOGEN can contribute to the conservation of endangered bird species, while providing valuable insights into their vocalizations, behaviours and habitat preferences. Future development of ECOGEN can be easily implemented and could focus on incorporating additional configurable parameters during the generation phase for increased control over the output, catering to the specific needs of biologists. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Autistic Verbal Behavior Parameters
- Author
-
López De Luise, Daniela, Pablo, Pescio, Saad, Ben Raul, Saliwonczyk, Christian, Ibacache, Tiago, Soria, Lucas, Bocîi, Liviu Sevastian, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Balas, Valentina Emilia, editor, Jain, Lakhmi C., editor, Balas, Marius Mircea, editor, and Baleanu, Dumitru, editor
- Published
- 2023
- Full Text
- View/download PDF
17. Audio recognition method of coal machine fault based on intelligent inspection robot
- Author
-
LIAO Zhiwei, ZHAO Hongju, and CUI Mingming
- Subjects
intelligent inspection robot ,audio recognition of coal machine fault ,sound processing ,caffe c + +deep learning ,cnn + lstm model ,Mining engineering. Metallurgy ,TN1-997 - Abstract
With the rapid development of robot technology and the higher requirements of safe and efficient production in coal mine, underground mechanical and electrical equipment has changed from the traditional manual inspection to the robot inspection with the function of “monitoring, detection and early warning”. In this paper, sound processing and deep learning are introduced into the intelligent processing of mining system to analyze the data intelligently. The key technologies of voice preprocessing, spectrogram generation, feature extraction and classification are studied, which solve the problems that the speech features can not be described simultaneously in time domain and frequency domain, and lack of effective use of dynamic sequence information. The experimental data show that the CNN + LSTM + Softmax network based on CNN + LSTM model and caffe C + + deep learning framework can effectively improve the accuracy and robustness of abnormal sound recognition of coal mine equipment, reduce the complexity of the algorithm to adapt to the operation of the algorithm in embedded equipment, and realize the fault audio identification and diagnosis of robotic coal machine.
- Published
- 2023
- Full Text
- View/download PDF
18. Neural Processing of Speech Sounds in ASD and First-Degree Relatives.
- Author
-
Patel, Shivani P., Winston, Molly, Guilfoyle, Janna, Nicol, Trent, Martin, Gary E., Nayar, Kritika, Kraus, Nina, and Losh, Molly
- Subjects
- *
LANGUAGE disorder diagnosis , *SPEECH perception , *BIOMARKERS , *COMMUNICATIVE competence , *AUTISM , *PEOPLE with disabilities , *PHENOTYPES ,PHYSIOLOGICAL aspects of speech - Abstract
Efficient neural encoding of sound plays a critical role in speech and language, and when impaired, may have reverberating effects on communication skills. This study investigated disruptions to neural processing of temporal and spectral properties of speech in individuals with ASD and their parents and found evidence of inefficient temporal encoding of speech sounds in both groups. The ASD group further demonstrated less robust neural representation of spectral properties of speech sounds. Associations between neural processing of speech sounds and language-related abilities were evident in both groups. Parent–child associations were also detected in neural pitch processing. Together, results suggest that atypical neural processing of speech sounds is a heritable ingredient contributing to the ASD language phenotype. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
19. Residual Hearing Does Not Influence the Effectiveness of Beamforming when Using a Cochlear Implant in Conjunction with Contralateral Routing of Signals.
- Author
-
Stronks, Hendrik Christiaan, Briaire, Jeroen Johannes, and Frijns, Johan Hubertus Maria
- Subjects
- *
COCHLEAR implants , *SPEECH perception , *BEAMFORMING , *SIGNAL-to-noise ratio , *INTELLIGIBILITY of speech , *SPEECH - Abstract
Introduction: Contralateral routing of signals (CROS) overcomes the head shadow effect by redirecting speech signals from the contralateral ear to the better-hearing cochlear implant (CI) ear. Here we tested the performance of an adaptive monaural beamformer (MB) and a fixed binaural beamformer (BB) using the CROS system of Advanced Bionics. Methods: In a group of 17 unilateral CI users, we evaluated the benefits of MB and BB for speech recognition by measuring speech reception threshold (SRT) with and without beamforming. MB and BB were additionally evaluated with signal-to-noise ratio (SNR) measurements using a KEMAR manikin. We also assessed the effect of residual hearing in the CROS ear on the benefits of MB and BB. Speech was delivered in front of the listener in a background of homogeneous 8-talker babble noise. Results: With CI-CROS in omnidirectional settings with the T-mic active on the CI as a reference, BB significantly improved SRT by 1.4 dB, whereas MB yielded no significant improvements. The difference in effects on SRT between the two beamformers was, however, not significant. SNR effects were substantially larger, at 2.1 dB for MB and 5.8 dB for BB. CI-CROS with default omnidirectional settings also improved SRT and SNR by 1 dB over CI alone. Residual hearing did not significantly affect beamformer performance. Discussion: We recommend the use of BB over MB for CI-CROS users. Residual hearing in the CROS ear is not a limiting factor for fitting a CROS device, although a bimodal option should be considered. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
20. FF-BTP Model for Novel Sound-Based Community Emotion Detection
- Author
-
Arif Metehan Yildiz, Masayuki Tanabe, Makiko Kobayashi, Ilknur Tuncer, Prabal Datta Barua, Sengul Dogan, Turker Tuncer, Ru-San Tan, and U. Rajendra Acharya
- Subjects
FF-BTP ,sound community emotion classification ,sound processing ,textural feature extraction ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Most emotion classification schemes to date have concentrated on individual inputs rather than crowd-level signals. In addressing this gap, we introduce Sound-based Community Emotion Recognition (SCED) as a fresh challenge in the machine learning domain. In this pursuit, we crafted the FF-BTP-based feature engineering model inspired by deep learning principles, specifically designed for discerning crowd sentiments. Our unique dataset was derived from 187 YouTube videos, summing up to 2733 segments each of 3 seconds (sampled at 44.1 KHz). These segments, capturing overlapping speech, ambient sounds, and more, were meticulously categorized into negative, neutral, and positive emotional content. Our architectural design fuses the BTP, a textural feature extractor, and an innovative handcrafted feature selector inspired by Hinton’s FF algorithm. This combination identifies the most salient feature vector using calculated mean square error. Further enhancements include the incorporation of a multilevel discrete wavelet transform for spatial and frequency domain feature extraction, and a sophisticated iterative neighborhood component analysis for feature selection, eventually employing a support vector machine for classification. On testing, our FF-BTP model showcased an impressive 97.22% classification accuracy across three categories using the SCED dataset. This handcrafted approach, although inspired by deep learning’s feature analysis depth, requires significantly lower computational resources and still delivers outstanding results. It holds promise for future SCED-centric applications.
- Published
- 2023
- Full Text
- View/download PDF
21. Autistic Verbal Behavior Language Parameterization
- Author
-
De Luise, Daniela López, Saad, Ben Raúl, Ibacache, Tiago, Saliwonczyk, Christian, Pescio, Pablo, Soria, Lucas, Kacprzyk, Janusz, Series Editor, Jain, Lakhmi C., Series Editor, Lim, Chee-Peng, editor, Vaidya, Ashlesha, editor, Jain, Kiran, editor, and Mahorkar, Virag U., editor
- Published
- 2022
- Full Text
- View/download PDF
22. Automated Heart Murmur Detection using Sound Processing Techniques.
- Author
-
Costandache, Mihai-Andrei, Cioată, Matei-Alexandru, and Iftene, Adrian
- Subjects
HEART murmurs ,SCIENTIFIC computing ,MACHINE learning ,MEDICAL personnel ,ARTIFICIAL intelligence ,COMPUTER science ,TECHNOLOGICAL progress - Abstract
The technological progress in computer science (particularly, in machine learning) has contributed to the improvement of medical services, both in detecting and treating diseases. The large volumes of data, that are overwhelming for human experts (doctors, nurses), can easily be managed by automated systems, as long as we have the computational resources. Obviously, human experts are still essential in the process - we think of the use of computer science in medicine as a collaboration between medical staff and artificial intelligence. The usual types of data that can be processed by automated systems are text, sound, and image types. In this paper, we approach the diagnosis subject and focus on data consisting of sound. We created a heart murmur detection system - it analyzes recordings and tells the user whether the sound samples indicate a heart murmur or not, based on a trained machine learning model. One of the main advantages of our system is the fact that we ran a large number of experiments, with different configurations of denoising techniques and features taken into consideration. We were able to draw some interesting conclusions, for example, we found out which features are the most important for the classification and which features are not worth computing. Also, our work denotes a thorough understanding of sound processing. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
23. Classification of Engine Type of Vehicle Based on Audio Signal as a Source of Identification.
- Author
-
Materlak, Mateusz and Majda-Zdancewicz, Ewelina
- Subjects
DIESEL motors ,INTELLIGENT transportation systems ,SERVICE stations ,ENGINES ,SIGNAL processing ,MACHINE learning - Abstract
In this work, a combination of signal processing and machine learning techniques is applied for petrol and diesel engine identification based on engine sound. The research utilized real recordings acquired in car dealerships within Poland. The sound database recorded by the authors contains 80 various audio signals, equally divided. The study was conducted using feature engineering techniques based on frequency analysis for the generation of sound signal features. The discriminatory ability of feature vectors was evaluated using different machine learning techniques. In order to test the robustness of the proposed solution, the authors executed a number of system experimental tests, including different work conditions for the proposed system. The results show that the proposed approach produces a good accuracy at a level of 91.7%. The proposed system can support intelligent transportation systems through employing a sound signal as a medium carrying information on the type of car moving along a road. Such solutions can be implemented in the so-called 'clean transport zones', where only petrol-powered vehicles can freely move. Another potential application is to prevent misfuelling diesel to a petrol engine or petrol to a diesel engine. This kind of system can be implemented in petrol stations to recognize the vehicle based on the sound of the engine. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
24. Using the Bag-of-Audio-Words approach for emotion recognition
- Author
-
Vetráb Mercedes and Gosztolya Gábor
- Subjects
bag-of-audio-words ,emotion detection ,human voice ,sound processing ,68r15 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
The problem of varying length recordings is a well-known issue in paralinguistics. We investigated how to resolve this problem using the bag-of-audio-words feature extraction approach. The steps of this technique involve preprocessing, clustering, quantization and normalization. The bag-of-audio-words technique is competitive in the area of speech emotion recognition, but the method has several parameters that need to be precisely tuned for good efficiency. The main aim of our study was to analyse the effectiveness of bag-of-audio-words method and try to find the best parameter values for emotion recognition. We optimized the parameters one-by-one, but built on the results of each other. We performed the feature extraction, using openSMILE. Next we transformed our features into same-sized vectors with openXBOW, and finally trained and evaluated SVM models with 10-fold-crossvalidation and UAR. In our experiments, we worked with a Hungarian emotion database. According to our results, the emotion classification performance improves with the bag-of-audio-words feature representation. Not all the BoAW parameters have the optimal settings but later we can make clear recommendations on how to set bag-of-audio-words parameters for emotion detection tasks.
- Published
- 2022
- Full Text
- View/download PDF
25. Estimation of Asymmetry in Head Related Transfer Functions
- Author
-
Maciej Jasiński and Jan Żera
- Subjects
acoustics ,sound processing ,virtual reality ,spatial sound ,head related transfer functions ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 ,Telecommunication ,TK5101-6720 - Abstract
The individual Head-Related Transfer Functions (HRTFs) typically show large left-right ear differences. This work evaluates HRTF left-right differences by means of the rms measure called the Root Mean Square Difference (RMSD). The RMSD was calculated for HRTFs measured with the participation of a group of 15 subjects in our laboratory, for the HRTFs taken from the LISTEN database and for the acoustic manikin. The results showed that the RMSD varies in relation to the frequency and as expected is small for more symmetrical HRTFs at low frequencies (0.3÷1 kHz). For higher frequency bands (1÷5 kHz and above 5 kHz), the left-right differences are higher as an effect of the complex filtering caused by anatomical shape of the head and the pinnae. Results obtained for the subjects and for data taken from the LISTEN database were similar, whereas different for the acoustic manikin. This means that measurements with the use of the manikin cannot be considered as perfect average representation of the results obtained for people. The method and results of this study may be useful in assessing the symmetry of the HRTFs, and further analysis and improvement of how to considered the HRTFs individualization and personalization algorithms.
- Published
- 2022
- Full Text
- View/download PDF
26. An Open-Set Recognition and Few-Shot Learning Dataset for Audio Event Classification in Domestic Environments.
- Author
-
Naranjo-Alcazar, Javier, Perez-Castanos, Sergi, Zuccarello, Pedro, Torres, Ana M., Lopez, Jose J., Ferri, Francesc J., and Cobos, Maximo
- Subjects
- *
DEEP learning , *MACHINE learning , *DOORBELLS , *DETECTION alarms , *FIRE alarms , *CLASSIFICATION - Abstract
• An audio dataset with FSL and OSR considerations is proposed. • It contains real recordings from 34 classes of alarms and common domestic sounds. • Recordings were captured in real environments using the same device. • This dataset will avoid manipulations on existing datasets (common practice today). • Two baselines are proposed based on a transfer learning approach as starting points. The problem of training with a small set of positive samples is known as few-shot learning (FSL). It is widely known that traditional deep learning algorithms usually show very good performance when trained with large datasets. However, in many applications, it is not possible to obtain such a high number of samples. This paper deals with the application of FSL to the detection of specific and intentional acoustic events given by different types of sound alarms, such as door bells or fire alarms, using a limited number of samples. These sounds typically occur in domestic environments where many events corresponding to a wide variety of sound classes take place. Therefore, the detection of such alarms in a practical scenario can be considered an open-set recognition (OSR) problem. To address the lack of a dedicated public dataset for audio FSL, researchers usually make modifications on other available datasets. This paper is aimed at providing the audio recognition community with a carefully annotated dataset 1 for FSL in an OSR context comprised of 1360 clips from 34 classes divided into pattern sounds and unwanted sounds. To facilitate and promote research on this area, results with state-of-the-art baseline systems based on transfer learning are also presented. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
27. CLASSIFICATION OF ORIGINAL AND COUNTERFEIT GOLD MATTERS BY APPLYING DEEP NEURAL NETWORKS AND SUPPORT VECTOR MACHINES
- Author
-
Yekta Said Can
- Subjects
goruntu isleme ,ses isleme ,sahte altın farklılaşması ,destek vektör makineleri ,evrişimli sinir ağı ,görüntü bölümlendirme ,image processing ,sound processing ,counterfeit gold differentiation ,support vector machines ,convolutional neural network ,image segmentation ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 - Abstract
Gold is one of the most counterfeited precious metals. The color of copper is like gold. For this reason, copper is one of the most used materials for color counterfeiting. When the chemical properties are concerned, wolfram is like gold (density of gold and tungsten are 19.30 g/ml and 19.25 g/ml, respectively), so it can be used as a chemical counterfeit. The purity of gold can be determined by X-ray, but this method is costly. The current low-cost methods of jewelers have been experimented with for counterfeit gold detection in this paper. When a gold matter is hit by a subject, the sound frequency is higher than the frequency of sound when the same experiment is done with copper. Furthermore, counterfeit gold color is brighter than real ones. The color of gold is unique, and it is called "gold yellow". In this research, by employing sound and image processing, counterfeit and original gold are differentiated. For the image processing part, first a Convolutional Neural Network (CNN)-based toolbox for segmenting the gold material is applied. Then, deep CNNs for differentiating the color of the gold and copper materials are employed. Promising results are achieved with both sound and image processing techniques.
- Published
- 2022
- Full Text
- View/download PDF
28. Is it too loud? Ask your brain!
- Author
-
Zelger, Philipp, Seebacher, Josef, Graf, Simone, and Rossi, Sonja
- Subjects
- *
EVOKED potentials (Electrophysiology) , *EARBUDS , *AUDITORY perception , *COGNITIVE neuroscience , *LOUDNESS - Abstract
• P300 potential as a marker for the subjective loudness perception. • Event-related potentials show a relation to uncomfortably loud stimuli. • The P300 potential could serve as a tool for objectively assessing discomfort levels in infants or adult people who cannot self-report. In this study, the objectification of the subjective perception of loudness was investigated using electroencephalography (EEG). In particular, the emergence of objective markers in the domain of the acoustic discomfort threshold was examined. A cohort of 27 adults with normal hearing, aged between 18 and 30, participated in the study. The participants were presented with 500 ms long noise stimuli via in-ear headphones. The acoustic signals were presented with sound levels of [55, 65, 75, 85, 95 dB]. After each stimulus, the subjects provided their subjective assessment of the perceived loudness using a colored scale on a touchscreen. EEG signals were recorded, and afterward, event-related potentials (ERPs) locked to sound onset were analyzed. Our findings reveal a linear dependency between the N100 component and both the sound level and the subjective loudness categorization of the sound. Additionally, the data demonstrated a nonlinear relationship between the P300 potential and the sound level as well as for the subjective loudness rating. The P300 potential was elicited exclusively when the stimuli had been subjectively rated as "very loud" or "too loud". The findings of the present study suggest the possibility of the identification of the subjective uncomfortable loudness level by objective neural parameters. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Binaural Hearing with Devices
- Author
-
Ricketts, Todd Andrew, Kan, Alan, Fay, Richard R., Series Editor, Popper, Arthur N., Series Editor, Avraham, Karen, Editorial Board Member, Bass, Andrew, Editorial Board Member, Cunningham, Lisa, Editorial Board Member, Fritzsch, Bernd, Editorial Board Member, Groves, Andrew, Editorial Board Member, Hertzano, Ronna, Editorial Board Member, Le Prell, Colleen, Editorial Board Member, Litovsky, Ruth, Editorial Board Member, Manis, Paul, Editorial Board Member, Manley, Geoffrey, Editorial Board Member, Moore, Brian, Editorial Board Member, Simmons, Andrea, Editorial Board Member, Yost, William, Editorial Board Member, Litovsky, Ruth Y., editor, and Goupell, Matthew J., editor
- Published
- 2021
- Full Text
- View/download PDF
30. Words and non-speech sounds access lexical and semantic knowledge differently
- Author
-
Chen, Peiyao, Bartolotti, James, Schroeder, Scott R, Rochanavibhata, Sirada, and Marian, Viorica
- Subjects
speech comprehension ,sound processing ,lexical competition ,semantic competition ,eye-tracking - Abstract
Using an eye-tracking paradigm, we examined the strengthand speed of access to lexical knowledge (e.g., ourrepresentation of the word dog in our mental vocabulary) andsemantic knowledge (e.g., our knowledge that a dog isassociated with a leash) via both spoken words (e.g., “dog”)and characteristic sounds (e.g., a dog’s bark). Results showthat both spoken words and characteristic sounds activatelexical and semantic knowledge, but with different patterns.Spoken words activate lexical knowledge faster thancharacteristic sounds do, but with the same strength. Incontrast, characteristic sounds access semantic knowledgestronger than spoken words do, but with the same speed.These findings reveal similarities and differences in theactivation of conceptual knowledge by verbal and non-verbalmeans and advance our understanding of how auditory inputis cognitively processed.
- Published
- 2018
31. Extended liquid state machines for speech recognition.
- Author
-
Deckers, Lucas, Ing Jyh Tsang, Van Leekwijck, Werner, and Latré, Steven
- Subjects
SPEECH perception ,ARTIFICIAL neural networks ,LIQUIDS ,MACHINERY - Abstract
A liquid state machine (LSM) is a biologically plausible model of a cortical microcircuit. It exists of a random, sparse reservoir of recurrently connected spiking neurons with fixed synapses and a trainable readout layer. The LSM exhibits low training complexity and enables backpropagation-free learning in a powerful, yet simple computing paradigm. In this work, the liquid state machine is enhanced by a set of bio-inspired extensions to create the extended liquid state machine (ELSM), which is evaluated on a set of speech data sets. Firstly, we ensure excitatory/inhibitory (E/I) balance to enable the LSM to operate in edge-of-chaos regime. Secondly, spike-frequency adaptation (SFA) is introduced in the LSM to improve the memory capabilities. Lastly, neuronal heterogeneity, by means of a differentiation in time constants, is introduced to extract a richer dynamical LSM response. By including E/I balance, SFA, and neuronal heterogeneity, we show that the ELSM consistently improves upon the LSM while retaining the benefits of the straightforward LSM structure and training procedure. The proposed extensions led up to an 5.2% increase in accuracy while decreasing the number of spikes in the ELSM up to 20.2% on benchmark speech data sets. On some benchmarks, the ELSM can even attain similar performances as the current state-of-the-art in spiking neural networks. Furthermore, we illustrate that the ELSM input-liquid and recurrent synaptic weights can be reduced to 4-bit resolution without any significant loss in classification performance. We thus show that the ELSM is a powerful, biologically plausible and hardware-friendly spiking neural networkmodel that can attain near state-of-the-art accuracy on speech recognition benchmarks for spiking neural networks. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
32. Behavioural Responses of Common Dolphins Delphinus delphis to a Bio-Inspired Acoustic Device for Limiting Fishery By-Catch.
- Author
-
Lehnhoff, Loïc, Glotin, Hervé, Bernard, Serge, Dabin, Willy, Le Gall, Yves, Menut, Eric, Meheust, Eleonore, Peltier, Hélène, Pochat, Alain, Pochat, Krystel, Rimaud, Thomas, Sourget, Quiterie, Spitz, Jérôme, Van Canneyt, Olivier, and Mérigot, Bastien
- Abstract
By-catch is the most direct threat to marine mammals globally. Acoustic repellent devices (pingers) have been developed to reduce dolphin by-catch. However, mixed results regarding their efficiency have been reported. Here, we present a new bio-inspired acoustic beacon, emitting returning echoes from the echolocation clicks of a common dolphin 'Delphinus delphis' from a fishing net, to inform dolphins of its presence. Using surface visual observations and the automatic detection of echolocation clicks, buzzes, burst-pulses and whistles, we assessed wild dolphins' behavioural responses during sequential experiments (i.e., before, during and after the beacon's emission), with or without setting a net. When the device was activated, the mean number of echolocation clicks and whistling time of dolphins significantly increased by a factor of 2.46 and 3.38, respectively (p < 0.01). Visual surface observations showed attentive behaviours of dolphins, which kept a distance of several metres away from the emission source before calmly leaving. No differences were observed among sequences for buzzes/burst-pulses. Our results highlight that this prototype led common dolphins to echolocate more and communicate differently, and it would favour net detection. Complementary tests of the device during the fishing activities of professional fishermen should further contribute to assessment of its efficiency. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
33. CNN and Sound Processing-Based Audio Classifier for Alarm Sound Detection
- Author
-
Ramesh, Babu Durai C., Vishnu, Ram S., Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Dash, Subhransu Sekhar, editor, Lakshmi, C., editor, Das, Swagatam, editor, and Panigrahi, Bijaya Ketan, editor
- Published
- 2020
- Full Text
- View/download PDF
34. Investigating the Corpus Independence of the Bag-of-Audio-Words Approach
- Author
-
Vetráb, Mercedes, Gosztolya, Gábor, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Sojka, Petr, editor, Kopeček, Ivan, editor, Pala, Karel, editor, and Horák, Aleš, editor
- Published
- 2020
- Full Text
- View/download PDF
35. Extended liquid state machines for speech recognition
- Author
-
Lucas Deckers, Ing Jyh Tsang, Werner Van Leekwijck, and Steven Latré
- Subjects
spiking neural networks ,liquid state machine ,reservoir computing ,sound processing ,E/I balance ,spike-frequency adaptation ,Neurosciences. Biological psychiatry. Neuropsychiatry ,RC321-571 - Abstract
A liquid state machine (LSM) is a biologically plausible model of a cortical microcircuit. It exists of a random, sparse reservoir of recurrently connected spiking neurons with fixed synapses and a trainable readout layer. The LSM exhibits low training complexity and enables backpropagation-free learning in a powerful, yet simple computing paradigm. In this work, the liquid state machine is enhanced by a set of bio-inspired extensions to create the extended liquid state machine (ELSM), which is evaluated on a set of speech data sets. Firstly, we ensure excitatory/inhibitory (E/I) balance to enable the LSM to operate in edge-of-chaos regime. Secondly, spike-frequency adaptation (SFA) is introduced in the LSM to improve the memory capabilities. Lastly, neuronal heterogeneity, by means of a differentiation in time constants, is introduced to extract a richer dynamical LSM response. By including E/I balance, SFA, and neuronal heterogeneity, we show that the ELSM consistently improves upon the LSM while retaining the benefits of the straightforward LSM structure and training procedure. The proposed extensions led up to an 5.2% increase in accuracy while decreasing the number of spikes in the ELSM up to 20.2% on benchmark speech data sets. On some benchmarks, the ELSM can even attain similar performances as the current state-of-the-art in spiking neural networks. Furthermore, we illustrate that the ELSM input-liquid and recurrent synaptic weights can be reduced to 4-bit resolution without any significant loss in classification performance. We thus show that the ELSM is a powerful, biologically plausible and hardware-friendly spiking neural network model that can attain near state-of-the-art accuracy on speech recognition benchmarks for spiking neural networks.
- Published
- 2022
- Full Text
- View/download PDF
36. BanglaSER: A speech emotion recognition dataset for the Bangla language
- Author
-
Rakesh Kumar Das, Nahidul Islam, Md. Rayhan Ahmed, Salekul Islam, Swakkhar Shatabda, and A.K.M. Muzahidul Islam
- Subjects
Speech emotion recognition ,Sound processing ,Deep Learning ,Bangla language ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Science (General) ,Q1-390 - Abstract
The speech emotion recognition system determines a speaker's emotional state by analyzing his/her speech audio signal. It is an essential at the same time a challenging task in human-computer interaction systems and is one of the most demanding areas of research using artificial intelligence and deep machine learning architectures. Despite being the world's seventh most widely spoken language, Bangla is still classified as one of the low-resource languages for speech emotion recognition tasks because of inadequate availability of data. There is an apparent lack of speech emotion recognition dataset to perform this type of research in Bangla language. This article presents a Bangla language-based emotional speech-audio recognition dataset to address this problem. BanglaSER is a Bangla language-based speech emotion recognition dataset. It consists of speech-audio data of 34 participating speakers from diverse age groups between 19 and 47 years, with a balanced 17 male and 17 female nonprofessional participating actors. This dataset contains 1467 Bangla speech-audio recordings of five rudimentary human emotional states, namely angry, happy, neutral, sad, and surprise. Three trials are conducted for each emotional state. Hence, the total number of recordings involves 3 statements × 3 repetitions × 4 emotional states (angry, happy, sad, and surprise) × 34 participating speakers = 1224 recordings + 3 statements × 3 repetitions × 1 emotional state (neutral) × 27 participating speakers = 243 recordings, resulting in a total number of recordings of 1467. BanglaSER dataset is created by recording speech-audios through smartphones, and laptops, having a balanced number of recordings in each category with evenly distributed participating male and female actors, and would serve as an essential training dataset for the Bangla speech emotion recognition model in terms of generalization. BanglaSER is compatible with various deep learning architectures such as Convolutional neural networks, Long short-term memory, Gated recurrent unit, Transformer, etc. The dataset is available at https://data.mendeley.com/datasets/t9h6p943xy/5 and can be used for research purposes.
- Published
- 2022
- Full Text
- View/download PDF
37. Recognition of Fill and Empty Walnuts Using Acoustic Analysis and Fuzzy Logic
- Author
-
Reza Khakrangin, Davood MohamadZamani, and Seyed Mohamad Javidan
- Subjects
fuzzy logic ,sorting ,sound processing ,walnut ,Agriculture (General) ,S1-972 - Abstract
To increase the amount of export and marketability of walnuts, a quick, cheap and non-destructive sorting approach should be used. The overall objective is to sort the full, half full and empty walnuts relying on fuzzy logic and sound analysis methods. To sort the walnuts the sound processing technique was used. In this regard, effective parameters on sorting and quality such as: size and shape of walnut were studied. For this purpose, 300 dried walnuts were randomly selected from a walnut orchard for use in experiments. An electronic system consisting of a computer, a microphone, and a mechanical section consisting of a sound chamber were designed to measure the sound intensity of a walnut. At this stage, each walnut was released in three directions: back, side and abdomen 30 cm above the surface of the sound chamber. The sounds were recorded by a microphone with acoustic beats on a sound chamber made of wood and a 45-degree slope. The data from the sound signals were stored in the time domain on the computer and then processed by the MATLAB software. In order to eliminate the ambient noise of signals, Kalman filter algorithm was used to achieve high accuracy and fast convergence. Then these data were analyzed by fuzzy logic method. In this research, WEKA software and J48 algorithm have been used to classify walnuts based on their filling and using the features extracted from the walnut collision with a wooden plate. In order to classify walnuts according to the fullness of walnut kernels, a scientific and innovative index called Full Kernel Index (FK) was used. The results of this study showed that for classification of walnut, decision trees due to simplicity of structure and creation of fuzzy rules and threshold values of membership functions make fuzzy inference system with high accuracy. The final fuzzy model was presented to classify walnut into two classes with 0.087% separation accuracy and 3 classes with 0.080% separation accuracy.
- Published
- 2021
- Full Text
- View/download PDF
38. Auditory Dysfunction in Animal Models of Autism Spectrum Disorder.
- Author
-
Castro, Ana Carolina and Monteiro, Patricia
- Subjects
AUTISM spectrum disorders ,AUDITORY pathways ,SENSORY perception ,AUDITORY perception ,ANIMAL models in research ,NEUROBIOLOGY - Abstract
Autism spectrum disorder (ASD) is a neurodevelopmental disorder mainly characterized by social-communication impairments, repetitive behaviors and altered sensory perception. Auditory hypersensitivity is the most common sensory-perceptual abnormality in ASD, however, its underlying neurobiological mechanisms remain elusive. Consistently with reports in ASD patients, animal models for ASD present sensory-perception alterations, including auditory processing impairments. Here we review the current knowledge regarding auditory dysfunction in rodent models of ASD, exploring both shared and distinct features among them, mechanistic and molecular underpinnings, and potential therapeutic approaches. Overall, auditory dysfunction in ASD models seems to arise from impaired central processing. Depending on the model, impairments may arise at different steps along the auditory pathway, from auditory brainstem up to the auditory cortex. Common defects found across models encompass atypical tonotopicity in different regions of the auditory pathway, temporal and spectral processing impairments and histological differences. Imbalance between excitation and inhibition (E/I imbalance) is one of the most well-supported mechanisms explaining the auditory phenotype in the ASD models studied so far and seems to be linked to alterations in GABAergic signaling. Such E/I imbalance may have a large impact on the development of the auditory pathway, influencing the establishment of connections responsible for normal sound processing. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
39. Estimation of Asymmetry in Head Related Transfer Functions.
- Author
-
Jasiński, Maciej and Żera, Jan
- Subjects
- *
HEAT transfer , *ACOUSTICS , *COMPUTER sound processing , *VIRTUAL reality , *ARTIFICIAL intelligence - Abstract
The individual Head-Related Transfer Functions (HRTFs) typically show large left-right ear differences. This work evaluates HRTF left-right differences by means of the rms measure called the Root Mean Square Difference (RMSD). The RMSD was calculated for HRTFs measured with the participation of a group of 15 subjects in our laboratory, for the HRTFs taken from the LISTEN database and for the acoustic manikin. The results showed that the RMSD varies in relation to the frequency and as expected is small for more symmetrical HRTFs at low frequencies (0.3÷1 kHz). For higher frequency bands (1÷5 kHz and above 5 kHz), the left-right differences are higher as an effect of the complex filtering caused by anatomical shape of the head and the pinnae. Results obtained for the subjects and for data taken from the LISTEN database were similar, whereas different for the acoustic manikin. This means that measurements with the use of the manikin cannot be considered as perfect average representation of the results obtained for people. The method and results of this study may be useful in assessing the symmetry of the HRTFs, and further analysis and improvement of how to considered the HRTFs individualization and personalization algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
40. Designing and Implementation a Simple Algorithm Considering the Maximum Audio Frequency of Persian Vocabulary in Order to Robot Speech Control Based on Arduino
- Author
-
Moshayedi, Ata Jahangir, Agda, Abolfazl Moradian, Arabzadeh, Morteza, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Ruediger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Hirche, Sandra, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Liang, Qilian, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Möller, Sebastian, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zhang, Junjie James, Series Editor, and Montaser Kouhsari, Shahram, editor
- Published
- 2019
- Full Text
- View/download PDF
41. Auditory Dysfunction in Animal Models of Autism Spectrum Disorder
- Author
-
Ana Carolina Castro and Patricia Monteiro
- Subjects
autism spectrum disorder (ASD) ,sensory perception ,sound processing ,rodent models ,auditory dysfunction ,Neurosciences. Biological psychiatry. Neuropsychiatry ,RC321-571 - Abstract
Autism spectrum disorder (ASD) is a neurodevelopmental disorder mainly characterized by social-communication impairments, repetitive behaviors and altered sensory perception. Auditory hypersensitivity is the most common sensory-perceptual abnormality in ASD, however, its underlying neurobiological mechanisms remain elusive. Consistently with reports in ASD patients, animal models for ASD present sensory-perception alterations, including auditory processing impairments. Here we review the current knowledge regarding auditory dysfunction in rodent models of ASD, exploring both shared and distinct features among them, mechanistic and molecular underpinnings, and potential therapeutic approaches. Overall, auditory dysfunction in ASD models seems to arise from impaired central processing. Depending on the model, impairments may arise at different steps along the auditory pathway, from auditory brainstem up to the auditory cortex. Common defects found across models encompass atypical tonotopicity in different regions of the auditory pathway, temporal and spectral processing impairments and histological differences. Imbalance between excitation and inhibition (E/I imbalance) is one of the most well-supported mechanisms explaining the auditory phenotype in the ASD models studied so far and seems to be linked to alterations in GABAergic signaling. Such E/I imbalance may have a large impact on the development of the auditory pathway, influencing the establishment of connections responsible for normal sound processing.
- Published
- 2022
- Full Text
- View/download PDF
42. Inhibition in the auditory cortex.
- Author
-
Studer, Florian and Barkat, Tania Rinaldi
- Subjects
- *
AUDITORY cortex , *AUDITORY perception , *AUDITORY pathways , *INTERNEURONS , *NEURONS - Abstract
• Intrinsic properties of inhibitory cells enable sound processing in auditory cortex. • 3 main interneuron types play distinct and complementary roles in sound perception. • Cortical inhibitory neurons orchestrate context-dependent modulation of sound. The auditory system provides us with extremely rich and precise information about the outside world. Once a sound reaches our ears, the acoustic information it carries travels from the cochlea all the way to the auditory cortex, where its complexity and nuances are integrated. In the auditory cortex, functional circuits are formed by subpopulations of intermingled excitatory and inhibitory cells. In this review, we discuss recent evidence of the specific contributions of inhibitory neurons in sound processing and integration. We first examine intrinsic properties of three main classes of inhibitory interneurons in the auditory cortex. Then, we describe how inhibition shapes the responsiveness of the auditory cortex to sound. Finally, we discuss how inhibitory interneurons contribute to the sensation and perception of sounds. Altogether, this review points out the crucial role of cortical inhibitory interneurons in integrating information about the context, history, or meaning of a sound. It also highlights open questions to be addressed for increasing our understanding of the staggering complexity leading to the subtlest auditory perception. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
43. Residual Hearing Affects Contralateral Routing of Signals in Cochlear Implant Users.
- Author
-
Stronks, H. Christiaan, Briaire, Jeroen J., and Frijns, Johan H.M.
- Subjects
- *
COCHLEAR implants , *VERBAL behavior testing , *INTELLIGIBILITY of speech - Abstract
Introduction: Contralateral routing of signals (CROS) can be used to eliminate the head shadow effect. In unilateral cochlear implant (CI) users, CROS can be achieved with placement of a microphone on the contralateral ear, with the signal streamed to the CI ear. CROS was originally developed for unilateral CI users without any residual hearing in the nonimplanted ear. However, the criteria for implantation are becoming progressively looser, and the nonimplanted ear can have substantial residual hearing. In this study, we assessed how residual hearing in the contralateral ear influences CROS effectiveness in unilateral CI users. Methods: In a group of unilateral CI users (N = 17) with varying amounts of residual hearing, we deployed free-field speech tests to determine the effects of CROS on the speech reception threshold (SRT) in amplitude-modulated noise. We compared 2 spatial configurations: (1) speech presented to the CROS ear and noise to the CI ear (SCROSNCI) and (2) the reverse (SCINCROS). Results: Compared with the use of CI only, CROS improved the SRT by 6.4 dB on average in the SCROSNCI configuration. In the SCINCROS configuration, however, CROS deteriorated the SRT by 8.4 dB. The benefit and disadvantage of CROS both decreased significantly with the amount of residual hearing. Conclusion: CROS users need careful instructions about the potential disadvantage when listening in conditions where the CROS ear mainly receives noise, especially if they have residual hearing in the contralateral ear. The CROS device should be turned off when it is on the noise side (SCINCROS). CI users with residual hearing in the CROS ear also should understand that contralateral amplification (i.e., a bimodal hearing solution) will yield better results than a CROS device. Unilateral CI users with no functional contralateral hearing should be considered the primary target population for a CROS device. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
44. Predicting neuronal response properties from hemodynamic responses in the auditory cortex
- Author
-
Isma Zulfiqar, Martin Havlicek, Michelle Moerel, and Elia Formisano
- Subjects
Auditory cortex ,Sound processing ,Rostral and caudal belt ,Forward model ,Dynamic neuronal model ,Biophysical hemodynamic model ,Neurosciences. Biological psychiatry. Neuropsychiatry ,RC321-571 - Abstract
Recent functional MRI (fMRI) studies have highlighted differences in responses to natural sounds along the rostral-caudal axis of the human superior temporal gyrus. However, due to the indirect nature of the fMRI signal, it has been challenging to relate these fMRI observations to actual neuronal response properties. To bridge this gap, we present a forward model of the fMRI responses to natural sounds combining a neuronal model of the auditory cortex with physiological modeling of the hemodynamic BOLD response. Neuronal responses are modeled with a dynamic recurrent firing rate model, reflecting the tonotopic, hierarchical processing in the auditory cortex along with the spectro-temporal tradeoff in the rostral-caudal axis of its belt areas. To link modeled neuronal response properties with human fMRI data in the auditory belt regions, we generated a space of neuronal models, which differed parametrically in spectral and temporal specificity of neuronal responses. Then, we obtained predictions of fMRI responses through a biophysical model of the hemodynamic BOLD response (P-DCM). Using Bayesian model comparison, our results showed that the hemodynamic BOLD responses of the caudal belt regions in the human auditory cortex were best explained by modeling faster temporal dynamics and broader spectral tuning of neuronal populations, while rostral belt regions were best explained through fine spectral tuning combined with slower temporal dynamics. These results support the hypotheses of complementary neural information processing along the rostral-caudal axis of the human superior temporal gyrus.
- Published
- 2021
- Full Text
- View/download PDF
45. OVERSIZED ORE PIECES DETECTION METHOD BASED ON COMPUTER VISION AND SOUND PROCESSING FOR VALIDATION OF VIBRATIONAL SIGNALS IN DIAGNOSTICS OF MINING SCREEN.
- Author
-
Skoczylas, Artur, Anufriiev, Sergii, and Stefaniak, Paweł
- Subjects
- *
COMPUTER vision , *ORES , *SHALE shakers , *SIGNAL processing , *MINES & mineral resources , *PLANT assimilation - Abstract
The main purpose in mineral processing plant is to obtain the highest possible value from a processed raw material.In majority of concentrators in the world mining the first step of the processing technology is sieving of orethat is performed using vibrating screens. Usually, in the whole technological system there are only several such machines. That is why their continuous operation is critical and they are often concerned as a bottleneck of the whole system. Although the vibrating screens are expected to keep the highest reliability indicators, their maintenance is still very often carried out using a planned and preventive strategy. Usually the diagnostics of such systems is based on vibro-acoustic methods. But the collected vibrational signals may be significantly disrupted be large pieces of ore hitting the screen. That’s why the development of diagnostic techniques, especially for rotational elements, should take into account validation methods associated with the identification of processed material impact on the screen and its elimination from the structure of the vibrational signal. Paper presents a method for detecting large rocks with the help of computer vision and audio signal processing. Usage of two sources of data makes possible cross validation of the results obtained by particular methods, which increases the robustness of the algorithm. The main purpose of the algorithm is to filter the vibrational signals from the screen for further analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
46. 2008: Eight Years of Practice on the Hyper-Flute: Technological and Musical Perspectives
- Author
-
Palacio-Quintin, Cléo, Bader, Rolf, Series editor, Leman, Marc, Series editor, Godoy, Rolf-Inge, Series editor, Jensenius, Alexander Refsum, editor, and Lyons, Michael J., editor
- Published
- 2017
- Full Text
- View/download PDF
47. What If Your Instrument Is Invisible?
- Author
-
Naphtali, Dafna, Bovermann, Till, editor, de Campo, Alberto, editor, Egermann, Hauke, editor, Hardjowirogo, Sarah-Indriyati, editor, and Weinzierl, Stefan, editor
- Published
- 2017
- Full Text
- View/download PDF
48. The integration of Gaussian noise by long-range amygdala inputs in frontal circuit promotes fear learning in mice
- Author
-
Mattia Aime, Elisabete Augusto, Vladimir Kouskoff, Tiago Campelo, Christelle Martin, Yann Humeau, Nicolas Chenouard, and Frederic Gambino
- Subjects
fear learning ,dendritic integration ,sound processing ,cortical plasticity ,in vivo ,Medicine ,Science ,Biology (General) ,QH301-705.5 - Abstract
Survival depends on the ability of animals to select the appropriate behavior in response to threat and safety sensory cues. However, the synaptic and circuit mechanisms by which the brain learns to encode accurate predictors of threat and safety remain largely unexplored. Here, we show that frontal association cortex (FrA) pyramidal neurons of mice integrate auditory cues and basolateral amygdala (BLA) inputs non-linearly in a NMDAR-dependent manner. We found that the response of FrA pyramidal neurons was more pronounced to Gaussian noise than to pure frequency tones, and that the activation of BLA-to-FrA axons was the strongest in between conditioning pairings. Blocking BLA-to-FrA signaling specifically at the time of presentation of Gaussian noise (but not 8 kHz tone) between conditioning trials impaired the formation of auditory fear memories. Taken together, our data reveal a circuit mechanism that facilitates the formation of fear traces in the FrA, thus providing a new framework for probing discriminative learning and related disorders.
- Published
- 2020
- Full Text
- View/download PDF
49. ТHE METHODOLOGICAL SUPPORT OF SOUND ENGINEERING TRAINING OF FUTURE MASTERS OF MUSICAL ART
- Author
-
Олексій Корякін
- Subjects
multimedia ,musical art ,sound processing ,sound processing training ,Power Point ,Adobe Audition ,Steinberg Cubase ,professional training ,master of musical art ,Education (General) ,L7-991 - Abstract
The article is devoted to specifying the basics of methodological support of sound processing training of future masters of musical art of performing specializations in the specialty «Musical Art». The article analyzes the general direction of professional training in the specialty «Musical Art» and defines the main content of sound processing training of future masters of musical art of performing specializations; outlines the conditions for incorporating computer information technology into the content of professional training of future masters of musical art of performing specializations. This article provides few examples of the use of multimedia computer technology and software for content of sound processing training for future masters of musical art in the specialty «Musical Art». The article contains general characteristics of the educational discipline of the cycle of professional training of the future masters of musical art «Sound processing and musical acoustics», as well as few examples of its use in the teaching of various multimedia technologies and computer programs. The article also defines the main stages of the technique of preparation of training session with the future masters of musical arts of performing specializations with the use of information multimedia computer technologies and software, as well as the basic conditions for the inclusion in the content of professional training of masters of musical arts of performing specializations of computer technology. The emphasis is placed on the use in the process of sound processing training of future masters of musical arts of performing specializations in the specialty «Musical Art» of multimedia technologies and software, which is an important component of ensuring the competitiveness of graduates in today's labor market
- Published
- 2020
- Full Text
- View/download PDF
50. Spectro-Temporal Processing in a Two-Stream Computational Model of Auditory Cortex
- Author
-
Isma Zulfiqar, Michelle Moerel, and Elia Formisano
- Subjects
auditory cortex ,sound processing ,dynamic neuronal modeling ,temporal coding ,rate coding ,Neurosciences. Biological psychiatry. Neuropsychiatry ,RC321-571 - Abstract
Neural processing of sounds in the dorsal and ventral streams of the (human) auditory cortex is optimized for analyzing fine-grained temporal and spectral information, respectively. Here we use a Wilson and Cowan firing-rate modeling framework to simulate spectro-temporal processing of sounds in these auditory streams and to investigate the link between neural population activity and behavioral results of psychoacoustic experiments. The proposed model consisted of two core (A1 and R, representing primary areas) and two belt (Slow and Fast, representing rostral and caudal processing respectively) areas, differing in terms of their spectral and temporal response properties. First, we simulated the responses to amplitude modulated (AM) noise and tones. In agreement with electrophysiological results, we observed an area-dependent transition from a temporal (synchronization) to a rate code when moving from low to high modulation rates. Simulated neural responses in a task of amplitude modulation detection suggested that thresholds derived from population responses in core areas closely resembled those of psychoacoustic experiments in human listeners. For tones, simulated modulation threshold functions were found to be dependent on the carrier frequency. Second, we simulated the responses to complex tones with missing fundamental stimuli and found that synchronization of responses in the Fast area accurately encoded pitch, with the strength of synchronization depending on number and order of harmonic components. Finally, using speech stimuli, we showed that the spectral and temporal structure of the speech was reflected in parallel by the modeled areas. The analyses highlighted that the Slow stream coded with high spectral precision the aspects of the speech signal characterized by slow temporal changes (e.g., prosody), while the Fast stream encoded primarily the faster changes (e.g., phonemes, consonants, temporal pitch). Interestingly, the pitch of a speaker was encoded both spatially (i.e., tonotopically) in Slow area and temporally in Fast area. Overall, performed simulations showed that the model is valuable for generating hypotheses on how the different cortical areas/streams may contribute toward behaviorally relevant aspects of auditory processing. The model can be used in combination with physiological models of neurovascular coupling to generate predictions for human functional MRI experiments.
- Published
- 2020
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.