96 results on '"auditory modeling"'
Search Results
2. Insights Into Electrophysiological Metrics of Cochlear Health in Cochlear Implant Users Using a Computational Model.
- Author
-
Takanen, Marko, Strahl, Stefan, and Schwarz, Konrad
- Subjects
COCHLEAR implants ,ACTION potentials ,ACOUSTIC nerve ,ELECTROPHYSIOLOGY ,NERVE fibers - Abstract
Purpose: The hearing outcomes of cochlear implant users depend on the functional status of the electrode-neuron interface inside the cochlea. This can be assessed by measuring electrically evoked compound action potentials (eCAPs). Variations in cochlear neural health and survival are reflected in eCAP-based metrics. The difficulty in translating promising results from animal studies into clinical use has raised questions about to what degree eCAP-based metrics are influenced by non-neural factors. Here, we addressed these questions using a computational model. Methods: A 2-D computational model was designed to simulate how electrical signals from the stimulating electrode reach the auditory nerve fibers distributed along the cochlea, evoking action potentials that can be recorded as compound responses at the recording electrodes. Effects of physiologically relevant variations in neural survival and in electrode-neuron and stimulating-recording electrode distances on eCAP amplitude growth functions (AGFs) were investigated. Results: In line with existing literature, the predicted eCAP AGF slopes and the inter-phase gap (IPG) effects depended on the neural survival, but only when the IPG effect was calculated as the difference between the slopes of the two AGFs expressed in linear input–output scale. As expected, shallower eCAP AGF slopes were obtained for increased stimulating-recording electrode distance and larger eCAP thresholds for greater electrode-neuron distance. These non-neural factors had also minor interference on the predicted IPG effect. Conclusions: The model predictions demonstrate previously found dependencies of eCAP metrics on neural survival and non-neural aspects. The present findings confirm data from animal studies and provide insights into applying described metrics in clinical practice. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Learnable axonal delay in spiking neural networks improves spoken word recognition.
- Author
-
Pengfei Sun, Yansong Chua, Devos, Paul, and Botteldooren, Dick
- Subjects
ARTIFICIAL neural networks ,WORD recognition ,RECURRENT neural networks ,CONVOLUTIONAL neural networks ,SPEECH - Abstract
Spiking neural networks (SNNs), which are composed of biologically plausible spiking neurons, and combined with bio-physically realistic auditory periphery models, offer a means to explore and understand human auditory processing-especially in tasks where precise timing is essential. However, because of the inherent temporal complexity in spike sequences, the performance of SNNs has remained less competitive compared to artificial neural networks (ANNs). To tackle this challenge, a fundamental research topic is the configuration of spike-timing and the exploration of more intricate architectures. In this work, we demonstrate a learnable axonal delay combined with local skip-connections yields state-of-the-art performance on challenging benchmarks for spoken word recognition. Additionally, we introduce an auxiliary loss term to further enhance accuracy and stability. Experiments on the neuromorphic speech benchmark datasets, NTIDIDIGITS and SHD, show improvements in performance when incorporating our delay module in comparison to vanilla feed forward SNNs. Specifically, with the integration of our delay module, the performance on NTIDIDIGITS and SHD improves by 14% and 18%, respectively. When paired with local skip-connections and the auxiliary loss, our approach surpasses both recurrent and convolutional neural networks, yet uses 10x fewer parameters for NTIDIDIGITS and 7x fewer for SHD. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
4. A Study of Different Aspects of Neural Networks: Neural Representations, Connectivity and Computation
- Author
-
De, Anandita
- Subjects
Neurosciences ,Applied mathematics ,Physics ,Auditory modeling ,Expander graph ,Manifold ,Population coding ,Working memory - Abstract
This dissertation is split into 3 parts. In the first part (Chapter 2) we look at shapesor manifolds on which neural activity over time lies. In a neural state space, whereeach axis represents a neuron, neural activity over time forms a point cloud. This pointcloud often occupies a small region in the space of all possible activity patterns thusrevealing structure in data. We consider point clouds from neural activities from commonpopulation codes known as “tuning curve models”. In these models, the firing rate ofeach neuron is a function of a latent variable which might be a stimulus variable or avariable related to an internal state and a tuning curve parameter which labels eachneuron. We address the question: How close are point clouds formed by such models toa linear subspace? To answer this, we define the linear dimension of the data to be thenumber of dimensions which captures a very high fraction of variance, for example 95%variance in this data. We show that the linear dimension grows exponentially with thenumber of latent variables encoded by the population. Thus the manifolds formed by theneural activities from these models are extremely non-linear. Linear dimension is not agood measure for the intrinsic dimension of the manifold on which this point cloud lies.In the second part (Chapter 3), we model connections between distant brain regionsby sparse random connections. We start by observing that such a network has a specialproperty known as the expander property. Using this property it can be shown thatinformation can be transmitted efficiently from a source region to a target region evenif the target region has fewer neurons than the source region. We also consider if thecompressed patterns in the target region can be re-coded or expanded to perform somecomputation. We show that the compressed patterns can be re-expanded by algorithmsknown as Locally Competitive Algorithms (LCA) and the re-expanded patterns can beseparated by a downstream neuron into arbitrarily defined classes. We next considerwhether long range reciprocal connections between two regions can be used to maintainpersistent activity in both the regions. Such activity is thought to be a substrate forworking memory, the ability to hold things in mind. We show that the network can indeedmaintain sparse patterns of activity through simple network dynamics. We conclude thatsparse random connections can be used to transmit information effectively and improvethe performance of certain computations compared to dense random connections.In the last part (Chapter 4), we built a computational rate model for the pre-cortexbiological neural circuit responsible for the localisation of sound in the vertical plane.Interaction of incoming sound waves with the outer ear filters out energy from specificfrequency bands in the spectrum of the incoming sound. The frequency bands with zeroor reduced power in them are known as notches. The position of the notches is a functionof the angle of elevation of the sound source. There is a dedicated set of neurons in theauditory pathway which are sensitive to the position of these notches and hence thoughtto be responsible for the localization of sound in the vertical plane. These neurons showdifferent levels of excitation or inhibition above or below their spontaneous rates for different combinations of frequencies and intensities of sound. We built a computational model to probe how this complex set of responses arise from the interaction between the various populations of neurons in the auditory pathway.
- Published
- 2023
5. Binaural detection thresholds and audio quality of speech and music signals in complex acoustic environments.
- Author
-
Biberger, Thomas and Ewert, Stephan D.
- Subjects
SPEECH ,STREAMING audio - Abstract
Every-day acoustical environments are often complex, typically comprising one attended target sound in the presence of interfering sounds (e.g., disturbing conversations) and reverberation. Here we assessed binaural detection thresholds and (supra-threshold) binaural audio quality ratings of four distortions types: spectral ripples, non-linear saturation, intensity and spatial modifications applied to speech, guitar, and noise targets in such complex acoustic environments (CAEs). The target and (up to) two masker sounds were either co-located as if contained in a common audio stream, or were spatially separated as if originating from different sound sources. The amount of reverberation was systematically varied. Masker and reverberation had a significant effect on the distortion-detection thresholds of speech signals. Quality ratings were affected by reverberation, whereas the effect of maskers depended on the distortion. The results suggest that detection thresholds and quality ratings for distorted speech in anechoic conditions are also valid for rooms with mild reverberation, but not for moderate reverberation. Furthermore, for spectral ripples, a significant relationship between the listeners' individual detection thresholds and quality ratings was found. The current results provide baseline data for detection thresholds and audio quality ratings of different distortions of a target sound in CAEs, supporting the future development of binaural auditory models. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
6. Binaural detection thresholds and audio quality of speech and music signals in complex acoustic environments
- Author
-
Thomas Biberger and Stephan D. Ewert
- Subjects
audio quality ,detection thresholds ,complex acoustic environments ,auditory modeling ,reverberation ,Psychology ,BF1-990 - Abstract
Every-day acoustical environments are often complex, typically comprising one attended target sound in the presence of interfering sounds (e.g., disturbing conversations) and reverberation. Here we assessed binaural detection thresholds and (supra-threshold) binaural audio quality ratings of four distortions types: spectral ripples, non-linear saturation, intensity and spatial modifications applied to speech, guitar, and noise targets in such complex acoustic environments (CAEs). The target and (up to) two masker sounds were either co-located as if contained in a common audio stream, or were spatially separated as if originating from different sound sources. The amount of reverberation was systematically varied. Masker and reverberation had a significant effect on the distortion-detection thresholds of speech signals. Quality ratings were affected by reverberation, whereas the effect of maskers depended on the distortion. The results suggest that detection thresholds and quality ratings for distorted speech in anechoic conditions are also valid for rooms with mild reverberation, but not for moderate reverberation. Furthermore, for spectral ripples, a significant relationship between the listeners’ individual detection thresholds and quality ratings was found. The current results provide baseline data for detection thresholds and audio quality ratings of different distortions of a target sound in CAEs, supporting the future development of binaural auditory models.
- Published
- 2022
- Full Text
- View/download PDF
7. Towards Personalized Auditory Models: Predicting Individual Sensorineural Hearing-Loss Profiles From Recorded Human Auditory Physiology.
- Author
-
Keshishzadeh, Sarineh, Garrett, Markus, and Verhulst, Sarah
- Subjects
AUDITORY cortex physiology ,ACOUSTIC nerve ,AUDITORY evoked response ,AUDITORY perception ,BIOLOGICAL models ,COCHLEA ,DEAFNESS ,ELECTROPHYSIOLOGY ,HAIR cells ,HEARING aids - Abstract
Over the past decades, different types of auditory models have been developed to study the functioning of normal and impaired auditory processing. Several models can simulate frequency-dependent sensorineural hearing loss (SNHL) and can in this way be used to develop personalized audio-signal processing for hearing aids. However, to determine individualized SNHL profiles, we rely on indirect and noninvasive markers of cochlear and auditory-nerve (AN) damage. Our progressive knowledge of the functional aspects of different SNHL subtypes stresses the importance of incorporating them into the simulated SNHL profile, but has at the same time complicated the task of accomplishing this on the basis of noninvasive markers. In particular, different auditory-evoked potential (AEP) types can show a different sensitivity to outer-hair-cell (OHC), inner-hair-cell (IHC), or AN damage, but it is not clear which AEP-derived metric is best suited to develop personalized auditory models. This study investigates how simulated and recorded AEPs can be used to derive individual AN- or OHC-damage patterns and personalize auditory processing models. First, we individualized the cochlear model parameters using common methods of frequency-specific OHC-damage quantification, after which we simulated AEPs for different degrees of AN damage. Using a classification technique, we determined the recorded AEP metric that best predicted the simulated individualized cochlear synaptopathy profiles. We cross-validated our method using the data set at hand, but also applied the trained classifier to recorded AEPs from a new cohort to illustrate the generalizability of the method. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
8. Towards a simplified and generalized monaural and binaural auditory model for psychoacoustics and speech intelligibility
- Author
-
Biberger Thomas and Ewert Stephan D.
- Subjects
auditory modeling ,psychoacoustic masking ,binaural hearing ,speech intelligibility ,Acoustics in engineering. Acoustical engineering ,TA365-367 ,Acoustics. Sound ,QC221-246 - Abstract
Auditory perception involves cues in the monaural auditory pathways, as well as binaural cues based on interaural differences. So far, auditory models have often focused on either monaural or binaural experiments in isolation. Although binaural models typically build upon stages of (existing) monaural models, only a few attempts have been made to extend a monaural model by a binaural stage using a unified decision stage for monaural and binaural cues. A typical prototype of binaural processing has been the classical equalization-cancelation mechanism, which either involves signal-adaptive delays and provides a single channel output, or can be implemented with tapped delays providing a high-dimensional multichannel output. This contribution extends the (monaural) generalized envelope power spectrum model by a non-adaptive binaural stage with only a few, fixed output channels. The binaural stage resembles features of physiologically motivated hemispheric binaural processing, as simplified signal-processing stages, yielding a 5-channel monaural and binaural matrix feature “decoder” (BMFD). The back end of the existing monaural model is applied to the BMFD output and calculates short-time envelope power and power features. The resulting model accounts for several published psychoacoustic and speech-intelligibility experiments and achieves a prediction performance comparable to existing state-of-the-art models with more complex binaural processing.
- Published
- 2022
- Full Text
- View/download PDF
9. Sources of Variability in Consonant Perception and Implications for Speech Perception Modeling
- Author
-
Zaar, Johannes, Dau, Torsten, COHEN, IRUN R., Series Editor, LAJTHA, ABEL, Series Editor, LAMBRIS, JOHN D., Series Editor, PAOLETTI, RODOLFO, Series Editor, REZAEI, NIMA, Series Editor, van Dijk, Pim, editor, Başkent, Deniz, editor, Gaudrain, Etienne, editor, de Kleine, Emile, editor, Wagner, Anita, editor, and Lanting, Cris, editor
- Published
- 2016
- Full Text
- View/download PDF
10. Modeling Pitch Perception With an Active Auditory Model Extended by Octopus Cells
- Author
-
Tamas Harczos and Frank Markus Klefenz
- Subjects
auditory modeling ,latency-phase coding ,inter-spike interval histogram ,time domain parameterization ,pitch ,pitch estimation ,Neurosciences. Biological psychiatry. Neuropsychiatry ,RC321-571 - Abstract
Pitch is an essential category for musical sensations. Models of pitch perception are vividly discussed up to date. Most of them rely on definitions of mathematical methods in the spectral or temporal domain. Our proposed pitch perception model is composed of an active auditory model extended by octopus cells. The active auditory model is the same as used in the Stimulation based on Auditory Modeling (SAM), a successful cochlear implant sound processing strategy extended here by modeling the functional behavior of the octopus cells in the ventral cochlear nucleus and by modeling their connections to the auditory nerve fibers (ANFs). The neurophysiological parameterization of the extended model is fully described in the time domain. The model is based on latency-phase en- and decoding as octopus cells are latency-phase rectifiers in their local receptive fields. Pitch is ubiquitously represented by cascaded firing sweeps of octopus cells. Based on the firing patterns of octopus cells, inter-spike interval histograms can be aggregated, in which the place of the global maximum is assumed to encode the pitch.
- Published
- 2018
- Full Text
- View/download PDF
11. Modeling Pitch Perception With an Active Auditory Model Extended by Octopus Cells.
- Author
-
Harczos, Tamas and Klefenz, Frank Markus
- Subjects
ABSOLUTE pitch ,NERVE fibers ,COCHLEAR nucleus ,AUDITORY pathways ,NEUROPHYSIOLOGY - Abstract
Pitch is an essential category for musical sensations. Models of pitch perception are vividly discussed up to date. Most of them rely on definitions of mathematical methods in the spectral or temporal domain. Our proposed pitch perception model is composed of an active auditory model extended by octopus cells. The active auditory model is the same as used in the Stimulation based on Auditory Modeling (SAM), a successful cochlear implant sound processing strategy extended here by modeling the functional behavior of the octopus cells in the ventral cochlear nucleus and by modeling their connections to the auditory nerve fibers (ANFs). The neurophysiological parameterization of the extended model is fully described in the time domain. The model is based on latency-phase en- and decoding as octopus cells are latency-phase rectifiers in their local receptive fields. Pitch is ubiquitously represented by cascaded firing sweeps of octopus cells. Based on the firing patterns of octopus cells, inter-spike interval histograms can be aggregated, in which the place of the global maximum is assumed to encode the pitch. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
12. Combination of binaural and harmonic masking release effects in the detection of a single component in complex tones.
- Author
-
Klein-Hennig, Martin, Dietz, Mathias, and Hohmann, Volker
- Subjects
- *
HEARING , *BINAURAL audio , *AUDITORY processing disorder , *AUDIO frequency , *DICHOTIC listening tests - Abstract
Both harmonic and binaural signal properties are relevant for auditory processing. To investigate how these cues combine in the auditory system, detection thresholds for an 800-Hz tone masked by a diotic (i.e., identical between the ears) harmonic complex tone were measured in six normal-hearing subjects. The target tone was presented either diotically or with an interaural phase difference (IPD) of 180° and in either harmonic or “mistuned” relationship to the diotic masker. Three different maskers were used, a resolved and an unresolved complex tone (fundamental frequency: 160 and 40 Hz) with four components below and above the target frequency and a broadband unresolved complex tone with 12 additional components. The target IPD provided release from masking in most masker conditions, whereas mistuning led to a significant release from masking only in the diotic conditions with the resolved and the narrowband unresolved maskers. A significant effect of mistuning was neither found in the diotic condition with the wideband unresolved masker nor in any of the dichotic conditions. An auditory model with a single analysis frequency band and different binaural processing schemes was employed to predict the data of the unresolved masker conditions. Sensitivity to modulation cues was achieved by including an auditory-motivated modulation filter in the processing pathway. The predictions of the diotic data were in line with the experimental results and literature data in the narrowband condition, but not in the broadband condition, suggesting that across-frequency processing is involved in processing modulation information. The experimental and model results in the dichotic conditions show that the binaural processor cannot exploit modulation information in binaurally unmasked conditions. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
13. Precise motor mapping with transcranial magnetic stimulation
- Author
-
Weise, Konstantin, Numssen, Ole, Kalloch, Benjamin, Zier, Anna Leah, Thielscher, Axel, Haueisen, Jens, Hartwigsen, Gesa, Knösche, Thomas R., Weise, Konstantin, Numssen, Ole, Kalloch, Benjamin, Zier, Anna Leah, Thielscher, Axel, Haueisen, Jens, Hartwigsen, Gesa, and Knösche, Thomas R.
- Abstract
We describe a routine to precisely localize cortical muscle representations within the primary motor cortex with transcranial magnetic stimulation (TMS) based on the functional relation between induced electric fields at the cortical level and peripheral muscle activation (motor-evoked potentials; MEPs). Besides providing insights into structure–function relationships, this routine lays the foundation for TMS dosing metrics based on subject-specific cortical electric field thresholds. MEPs for different coil positions and orientations are combined with electric field modeling, exploiting the causal nature of neuronal activation to pinpoint the cortical origin of the MEPs. This involves constructing an individual head model using magnetic resonance imaging, recording MEPs via electromyography during TMS and computing the induced electric fields with numerical modeling. The cortical muscle representations are determined by relating the TMS-induced electric fields to the MEP amplitudes. Subsequently, the coil position to optimally stimulate the origin of the identified cortical MEP can be determined by numerical modeling. The protocol requires 2 h of manual preparation, 10 h for the automated head model construction, one TMS session lasting 2 h, 12 h of computational postprocessing and an optional second TMS session lasting 30 min. A basic level of computer science expertise and standard TMS neuronavigation equipment suffices to perform the protocol.
- Published
- 2022
14. Speech intelligibility prediction based on modulation frequency-selective processing
- Author
-
Relaño-Iborra, Helia, Dau, Torsten, Relaño-Iborra, Helia, and Dau, Torsten
- Abstract
Speech intelligibility models can provide insights regarding the auditory processes involved in human speech perception and communication. One successful approach to modelling speech intelligibility has been based on the analysis of the amplitude modulations present in speech as well as competing interferers. This review covers speech intelligibility models that include a modulation-frequency selective processing stage i.e., a modulation filterbank, as part of their front end. The speech-based envelope power spectrum model [sEPSM, Jørgensen and Dau (2011). J. Acoust. Soc. Am. 130(3), 1475-1487], several variants of the sEPSM including modifications with respect to temporal resolution, spectro-temporal processing and binaural processing, as well as the speech-based computational auditory signal processing and perception model [sCASP; Relaño-Iborra et al. J. Acoust. Soc. Am. 146(5), 3306–3317], which is based on an established auditory signal detection and masking model, are discussed. The key processing stages of these models for the prediction of speech intelligibility across a variety of acoustic conditions are addressed in relation to competing modeling approaches. The strengths and weaknesses of the modulation-based analysis are outlined and perspectives presented, particularly in connection with the challenge of predicting the consequences of individual hearing loss on speech intelligibility.
- Published
- 2022
15. Synchrony-Based Feature Extraction for Robust Automatic Speech Recognition.
- Author
-
de-la-Calle-Silos, Fernando and Stern, Richard M.
- Subjects
AUTOMATIC speech recognition ,FEATURE extraction ,ECHO suppression - Abstract
This letter discusses the application of models of temporal patterns of auditory-nerve firings to enhance robustness of automatic speech recognition systems. Most conventional feature extraction schemes (such as mel-frequency cepstral coefficients and perceptual linear processing coefficients) are based on short-time energy in each frequency band, and the temporal patterns of auditory-nerve activity are discarded. We compare the impact on speech recognition accuracy of several types of feature extraction schemes based on the putative synchrony of auditory-nerve activity, including feature extraction based on a modified version of the generalized synchrony detector proposed by Seneff, and a modified version of the averaged localized synchrony response proposed by Young and Sachs. It was found that the use of features based on auditory-nerve synchrony can indeed improve speech recognition accuracy in the presence of additive noise based on experiments using multiple standard speech databases. Recognition accuracy obtained using the synchrony-based features is further increased if some form of noise removal is applied to the signal before the synchrony measure is estimated. Signal processing for noise removal based on the noise suppression that is a part of PNCC feature extraction is more effective toward this end than conventional spectral subtraction. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
16. Speech intelligibility prediction based on modulation frequency-selective processing
- Author
-
Torsten Dau and Helia Relaño Iborra
- Subjects
Auditory modeling ,Acoustic Stimulation ,Speech intelligibility ,Speech Intelligibility ,Speech Perception ,Humans ,Modulation processing ,Auditory Threshold ,Perceptual Masking ,Speech Acoustics ,Sensory Systems ,Hearing impairment - Abstract
Speech intelligibility models can provide insights regarding the auditory processes involved in human speech perception and communication. One successful approach to modelling speech intelligibility has been based on the analysis of the amplitude modulations present in speech as well as competing interferers. This review covers speech intelligibility models that include a modulation-frequency selective processing stage i.e., a modulation filterbank, as part of their front end. The speech-based envelope power spectrum model [sEPSM, Jørgensen and Dau (2011). J. Acoust. Soc. Am. 130(3), 1475-1487], several variants of the sEPSM including modifications with respect to temporal resolution, spectro-temporal processing and binaural processing, as well as the speech-based computational auditory signal processing and perception model [sCASP; Relaño-Iborra et al. (2019). J. Acoust. Soc. Am. 146(5), 3306-3317], which is based on an established auditory signal detection and masking model, are discussed. The key processing stages of these models for the prediction of speech intelligibility across a variety of acoustic conditions are addressed in relation to competing modeling approaches. The strengths and weaknesses of the modulation-based analysis are outlined and perspectives presented, particularly in connection with the challenge of predicting the consequences of individual hearing loss on speech intelligibility.
- Published
- 2022
- Full Text
- View/download PDF
17. Sensitivity to Interaural Time Differences Conveyed in the Stimulus Envelope: Estimating Inputs of Binaural Neurons Through the Temporal Analysis of Spike Trains.
- Author
-
Dietz, Mathias, Wang, Le, Greenberg, David, and McAlpine, David
- Abstract
Sound-source localization in the horizontal plane relies on detecting small differences in the timing and level of the sound at the two ears, including differences in the timing of the modulated envelopes of high-frequency sounds (envelope interaural time differences (ITDs)). We investigated responses of single neurons in the inferior colliculus (IC) to a wide range of envelope ITDs and stimulus envelope shapes. By a novel means of visualizing neural activity relative to different portions of the periodic stimulus envelope at each ear, we demonstrate the role of neuron-specific excitatory and inhibitory inputs in creating ITD sensitivity (or the lack of it) depending on the specific shape of the stimulus envelope. The underlying binaural brain circuitry and synaptic parameters were modeled individually for each neuron to account for neuron-specific activity patterns. The model explains the effects of envelope shapes on sensitivity to envelope ITDs observed in both normal-hearing listeners and in neural data, and has consequences for understanding how ITD information in stimulus envelopes might be maximized in users of bilateral cochlear implants-for whom ITDs conveyed in the stimulus envelope are the only ITD cues available. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
18. A Model for Pitch Estimation Using Wavelet Packet Transform Based Cepstrum Method.
- Author
-
Muhaseena, T.K. and Lekshmi, M.S.
- Abstract
A computationally efficient model for pitch estimation of mixed audio signals is presented. Pitch estimation plays a significant role in music audition like music information retrieval, automatic music transcription, melody extraction etc. The proposed system consists of channel separation and periodicity detection. The input signal is created by mixing two sound signals. First removes the short time correlations of the mixed signal. The model divides the signal into number of channels using wavelet packet transform. Computes the cepstrum of each channel and sums the cepstrum functions. The summary cepstrum function is further processed to extract the pitch frequency of two input signal separately. The model performance is demonstrated to be comparable to those of recent multichannel models. The proposed system can be verified by simulating the system in MATLAB. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
19. Source Separation with One Ear: Proposition for an Anthropomorphic Approach
- Author
-
Ramin Pichevar and Jean Rouat
- Subjects
auditory modeling ,source separation ,amplitude modulation ,auditory scene analysis ,spiking neurons ,temporal correlation. ,Telecommunication ,TK5101-6720 ,Electronics ,TK7800-8360 - Abstract
We present an example of an anthropomorphic approach, in which auditory-based cues are combined with temporal correlation to implement a source separation system. The auditory features are based on spectral amplitude modulation and energy information obtained through 256 cochlear filters. Segmentation and binding of auditory objects are performed with a two-layered spiking neural network. The first layer performs the segmentation of the auditory images into objects, while the second layer binds the auditory objects belonging to the same source. The binding is further used to generate a mask (binary gain) to suppress the undesired sources from the original signal. Results are presented for a double-voiced (2 speakers) speech segment and for sentences corrupted with different noise sources. Comparative results are also given using PESQ (perceptual evaluation of speech quality) scores. The spiking neural network is fully adaptive and unsupervised.
- Published
- 2005
- Full Text
- View/download PDF
20. How much individualization is required to predict the individual effect of suprathreshold processing deficits? Assessing Plomp's distortion component with psychoacoustic detection thresholds and FADE.
- Author
-
Hülsmeier, David and Kollmeier, Birger
- Subjects
- *
SPEECH perception , *SENSORINEURAL hearing loss , *AUDIOGRAM , *HEARING disorders - Abstract
• Unique and comprehensive SRT data set with 40 listeners with various hearing loss. • Compare machine-learning-based FADE with Plomps empirical A & D model. • Precise prediction of SRT (RMS error of 3.3 dB), A and D component. • FADE is advantageous for consistent & precise modeling & individualization. • For clinical purposes, a linear regression for the D component appears sufficient. Plomp introduced an empirical separation of the increased speech recognition thresholds (SRT) in listeners with a sensorineural hearing loss into an Attenuation (A) component (which can be compensated by amplification) and a non-compensable Distortion (D) component. Previous own research backed up this notion by speech recognition models that derive their SRT prediction from the individual audiogram with or without a psychoacoustic measure of suprathreshold processing deficits. To determine the precision in separating the A and D component for the individual listener with various individual measures and individualized models, SRTs with 40 listeners with a variation in hearing impairment were obtained in quiet, stationary noise, and fluctuating noise (ICRA 5–250 and babble). Both the clinical audiogram and an adaptive, precise sweep audiogram were obtained as well as tone-in-noise detection thresholds at four frequencies to characterize the individual hearing impairment. For predicting the SRT, the FADE-model (which is based on machine learning) was used with either of the two audiogram procedures and optionally the individual tone-in-noise detection thresholds. The results indicate that the precisely measured swept tone audiogram allows for a more precise prediction of the individual SRT in comparison to the clinical audiogram (RMS error of 4.3 dB vs. 6.4 dB, respectively). While an estimation from the precise audiogram and FADE performed equally well in predicting the individual A and D component, the further refinement of including the tone-in-noise detection threshold with FADE led to a slight improvement of prediction accuracy (RMS error of 3.3 dB, 4.6 dB and 1.4 dB, for SRT, A and D component, respectively). Hence, applying FADE is advantageous for scientific purposes where a consistent modeling of different psychoacoustical effects in the same listener with a minimum amount of assumptions is desirable. For clinical purposes, however, a precisely measured audiogram and an estimation of the expected D component using a linear regression appears to be a satisfactory first step towards precision audiology. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
21. Investigating the role of harmonic cancellation in speech-on-speech masking.
- Author
-
Prud'homme, Luna, Lavandier, Mathieu, and Best, Virginia
- Subjects
- *
INTELLIGIBILITY of speech , *SPEECH , *AMPLITUDE modulation , *COCKTAIL parties - Abstract
• Speech intelligibility measured with maskers ranging from noise to speech. • Behavioral data compared to predictions from speech intelligibility models. • No strong evidence for harmonicity-based effects with speech maskers. • Harmonic cancellation not a crucial component of speech intelligibility models. This study investigated the role of harmonic cancellation in the intelligibility of speech in "cocktail party" situations. While there is evidence that harmonic cancellation plays a role in the segregation of simple harmonic sounds based on fundamental frequency (F0), its utility for mixtures of speech containing non-stationary F0s and unvoiced segments is unclear. Here we focused on the energetic masking of speech targets caused by competing speech maskers. Speech reception thresholds were measured using seven maskers: speech-shaped noise, monotonized and intonated harmonic complexes, monotonized speech, noise-vocoded speech, reversed speech and natural speech. These maskers enabled an estimate of how the masking potential of speech is influenced by harmonic structure, amplitude modulation and variations in F0 over time. Measured speech reception thresholds were compared to the predictions of two computational models, with and without a harmonic cancellation component. Overall, the results suggest a minor role of harmonic cancellation in reducing energetic masking in speech mixtures. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
22. Predicting speech intelligibility in hearing-impaired listeners using a physiologically inspired auditory model.
- Author
-
Zaar, Johannes and Carney, Laurel H.
- Subjects
- *
INTELLIGIBILITY of speech , *AUDITORY perception , *SPEECH , *HEARING disorders , *PREDICTION models - Abstract
• Across-frequency fluctuation profiles predict speech intelligibility in noise • Model predicts effects of level and different noise types, incl. masking release • Model predicts effects of individual hearing loss • Plausible effects of inner- and outer-hair-cell impairment This study presents a major update and full evaluation of a speech intelligibility (SI) prediction model previously introduced by Scheidiger, Carney, Dau, and Zaar [(2018), Acta Acust. United Ac. 104 , 914-917]. The model predicts SI in speech-in-noise conditions via comparison of the noisy speech and the noise-alone reference. The two signals are processed through a physiologically inspired nonlinear model of the auditory periphery, for a range of characteristic frequencies (CFs), followed by a modulation analysis in the range of the fundamental frequency of speech. The decision metric of the model is the mean of a series of short-term, across-CF correlations between population responses to noisy speech and noise alone, with a sensitivity-limitation process imposed. The decision metric is assumed to be inversely related to SI and is converted to a percent-correct score using a single data-based fitting function. The model performance was evaluated in conditions of stationary, fluctuating, and speech-like interferers using sentence-based speech-reception thresholds (SRTs) previously obtained in 5 normal-hearing (NH) and 13 hearing-impaired (HI) listeners. For the NH listener group, the model accurately predicted SRTs across the different acoustic conditions (apart from a slight overestimation of the masking release observed for fluctuating maskers), as well as plausible effects in response to changes in presentation level. For HI listeners, the model was adjusted to account for the individual audiograms using standard assumptions concerning the amount of HI attributed to inner-hair-cell (IHC) and outer-hair-cell (OHC) impairment. HI model results accounted remarkably well for elevated individual SRTs and reduced masking release. Furthermore, plausible predictions of worsened SI were obtained when the relative contribution of IHC impairment to HI was increased. Overall, the present model provides a useful tool to accurately predict speech-in-noise outcomes in NH and HI listeners, and may yield important insights into auditory processes that are crucial for speech understanding. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
23. A dynamic binaural harmonic-cancellation model to predict speech intelligibility against a harmonic masker varying in intonation, temporal envelope, and location.
- Author
-
Prud'homme, Luna, Lavandier, Mathieu, and Best, Virginia
- Subjects
- *
INTELLIGIBILITY of speech , *INTONATION (Phonetics) , *AMPLITUDE modulation , *PREDICTION models - Abstract
• A binaural non-stationary harmonic-cancellation speech intelligibility model. • Predictions of SRTs for speech masked by a harmonic complex. • Accounts for spatial separation, intonation and amplitude modulation of the masker. • Harmonic cancellation and binaural unmasking might be mutually exclusive mechanisms. The aim of this study was to extend the harmonic-cancellation model proposed by Prud'homme et al. [J. Acoust. Soc. Am. 148 (2020) 3246-–3254] to predict speech intelligibility against a harmonic masker, so that it takes into account binaural hearing, amplitude modulations in the masker and variations in masker fundamental frequency (F0) over time. This was done by segmenting the masker signal into time frames and combining the previous long-term harmonic-cancellation model with the binaural model proposed by Vicente and Lavandier [Hear. Res. 390 (2020) 107937]. The new model was tested on the data from two experiments involving harmonic complex maskers that varied in spatial location, temporal envelope and F0 contour. The interactions between the associated effects were accounted for in the model by varying the time frame duration and excluding the binaural unmasking computation when harmonic cancellation is active. Across both experiments, the correlation between data and model predictions was over 0.96, and the mean and largest absolute prediction errors were lower than 0.6 and 1.5 dB, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
24. 2D Psychoacoustic modeling of equivalent masking for automatic speech recognition.
- Author
-
Dai, Peng, Rudzicz, Frank, Soon, Ing Yann, Mihailidis, Alex, and Ding, Huijun
- Subjects
- *
AUTOMATIC speech recognition , *PSYCHOACOUSTICS , *TWO-dimensional models , *NOISE , *SOUND pressure , *MATHEMATICAL models - Abstract
Noise robustness has long been one of the most important goals in speech recognition. While the performance of automatic speech recognition (ASR) deteriorates in noisy situations, the human auditory system is relatively adept at handling noise. To mimic this adeptness, we study and apply psychoacoustic models in speech recognition as a means to improve robustness of ASR systems. Psychoacoustic models are usually implemented in a subtractive manner with the intention to remove noise. However, this is not necessarily the only approach to this challenge. This paper presents a novel algorithm which implements psychoacoustic models additively. The algorithm is motivated by the fact that weak sound elements that are below the masking threshold are the same for the human auditory system, regardless of the actual sound pressure level. Another important contribution of our proposed algorithm is a superior implementation of masking effect. Only those sounds that fall below the masking threshold are modified, which better reflects physical masking effects. We give detailed experimental results showing relationships between the subtractive and additive approaches. Since all the parameters of the proposed filters are positive or zero, they are named 2D psychoacoustic P-filters. Detailed theoretical analysis is provided to show the noise removal ability of these filters. Experiments are carried out on the AURORA2 database. Experimental results show that the word recognition rate using our proposed feature extraction method has been effectively increased. Given models trained with clean speech, our proposed method achieves up to 84.23% word recognition on noisy data. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
25. Analyse von Methoden des maschinellen Lernens zur Klassikation von Spezies und Geschlecht von Moskitos anhand deren Geräuschen
- Author
-
Mahmoud Siai, Tim Ziemer, and Udo Frese
- Subjects
tropical diseases ,auditory modeling ,malaria ,dengue fever ,mosquito ,yellow fever - Abstract
Bachelorarbeitzur Erlangung des akademischen Grades Bachelor of Science (B.Sc.)
- Published
- 2021
- Full Text
- View/download PDF
26. Gammatone wavelet Cepstral Coefficients for robust speech recognition.
- Author
-
Adiga, Aniruddha, Magimai, Mathew, and Seelamantula, Chandra Sekhar
- Abstract
We develop noise robust features using Gammatone wavelets derived from the popular Gammatone functions. These wavelets incorporate the characteristics of human peripheral auditory systems, in particular the spatially-varying frequency response of the basilar membrane. We refer to the new features as Gammatone Wavelet Cepstral Coefficients (GWCC). The procedure involved in extracting GWCC from a speech signal is similar to that of the conventional Mel-Frequency Cepstral Coefficients (MFCC) technique, with the difference being in the type of filterbank used. We replace the conventional mel filterbank in MFCC with a Gammatone wavelet filterbank, which we construct using Gammatone wavelets. We also explore the effect of Gammatone filterbank based features (Gammatone Cepstral Coefficients (GCC)) for robust speech recognition. On AURORA 2 database, a comparison of GWCCs and GCCs with MFCCs shows that Gammatone based features yield a better recognition performance at low SNRs. [ABSTRACT FROM PUBLISHER]
- Published
- 2013
- Full Text
- View/download PDF
27. Towards personalized auditory models : predicting individual sensorineural hearing-loss profiles from recorded human auditory physiology
- Author
-
Markus Garrett, Sarineh Keshishzadeh, and Sarah Verhulst
- Subjects
medicine.medical_specialty ,Technology and Engineering ,auditory-evoked potentials ,Computer science ,Hearing Loss, Sensorineural ,cochlear synaptopathy ,Model parameters ,Audiology ,sensorineural hearing loss ,03 medical and health sciences ,Speech and Hearing ,0302 clinical medicine ,Hearing ,medicine ,Evoked Potentials, Auditory, Brain Stem ,otorhinolaryngologic diseases ,Humans ,Generalizability theory ,Vestibulocochlear Physiological Phenomena ,030304 developmental biology ,0303 health sciences ,envelope following response ,Auditory Threshold ,medicine.disease ,electrophysiology ,Cochlea ,Data set ,Otorhinolaryngology ,individualized hearing-loss profile ,auditory modeling ,Sensorineural hearing loss ,Metric (unit) ,sense organs ,Auditory Physiology ,030217 neurology & neurosurgery ,ISAAR 2019 Special Collection: Original Article - Abstract
Over the past decades, different types of auditory models have been developed to study the functioning of normal and impaired auditory processing. Several models can simulate frequency-dependent sensorineural hearing loss (SNHL) and can in this way be used to develop personalized audio-signal processing for hearing aids. However, to determine individualized SNHL profiles, we rely on indirect and noninvasive markers of cochlear and auditory-nerve (AN) damage. Our progressive knowledge of the functional aspects of different SNHL subtypes stresses the importance of incorporating them into the simulated SNHL profile, but has at the same time complicated the task of accomplishing this on the basis of noninvasive markers. In particular, different auditory-evoked potential (AEP) types can show a different sensitivity to outer-hair-cell (OHC), inner-hair-cell (IHC), or AN damage, but it is not clear which AEP-derived metric is best suited to develop personalized auditory models. This study investigates how simulated and recorded AEPs can be used to derive individual AN- or OHC-damage patterns and personalize auditory processing models. First, we individualized the cochlear model parameters using common methods of frequency-specific OHC-damage quantification, after which we simulated AEPs for different degrees of AN damage. Using a classification technique, we determined the recorded AEP metric that best predicted the simulated individualized cochlear synaptopathy profiles. We cross-validated our method using the data set at hand, but also applied the trained classifier to recorded AEPs from a new cohort to illustrate the generalizability of the method.
- Published
- 2021
28. Duifhuis pitch: neuromagnetic representation and auditory modeling.
- Author
-
Andermann, Martin, Patterson, Roy D., Geldhauser, Michael, Sieroka, Norman, and Rupp, André
- Subjects
- *
AUDITORY cortex physiology , *MAGNETOENCEPHALOGRAPHY , *ACOUSTIC nerve , *NEURAL physiology , *SENSORY perception - Abstract
When a high harmonic is removed from a cosine-phase harmonic complex, we hear a sine tone pop out of the perception; the sine tone has the pitch of the high harmonic, while the tone complex has the pitch of its fundamental frequency, f0. This phenomenon is commonly referred to as Duifhuis Pitch (DP). This paper describes, for the first time, the cortical representation of DP observed with magnetoencephalography. In experiment 1, conditions that produce the perception of a DP were observed to elicit a classic onset response in auditory cortex (P1m, N1m, P2m), and an increment in the sustained field (SF) established in response to the tone complex. Experiment 2 examined the effect of the phase spectrum of the complex tone on the DP activity: Schroederphase negative waves elicited a transient DP complex with a similar shape to that observed with cosine-phase waves but with much longer latencies. Following the transient DP activity, the responses of the negative and positive Schroeder-phase waves converged, and the increment in the SF slowly died away. In the absence of DP, the two Schroeder-phase conditions with low peak factors both produced larger SFs than cosine-phase waves with large peak factors. A model of the auditory periphery that includes coupling between adjacent frequency channels is used to explain the early neuromagnetic activity observed in auditory cortex. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
29. Applying Biophysical Auditory Periphery Models for Real-time Applications and Studies of Hearing Impairment
- Author
-
Van Den Broucke, Arthur, Drakopoulos, Fotios, Baby, Deepak, Verhulst, Sarah, and Universiteit Gent = Ghent University [Belgium] (UGENT)
- Subjects
Real-time models ,Machine Learning ,[PHYS.MECA.VIBR]Physics [physics]/Mechanics [physics]/Vibrations [physics.class-ph] ,Technology and Engineering ,Medicine and Health Sciences ,otorhinolaryngologic diseases ,Hearing loss ,Auditory Modeling ,[PHYS.MECA.ACOU]Physics [physics]/Mechanics [physics]/Acoustics [physics.class-ph] - Abstract
International audience; Biophysically realistic models of the cochlea are based on cascaded transmission-line (TL) models which capture longitudinal coupling, cochlear nonlinearities, as well as the human frequency selectivity. However, these models are slow to compute (in the order of seconds/minutes), explaining why less-accurate model descriptions of cochlear processing (e.g., gammatone, DRNL, MFCC) are still the standard for feature extractors or auditory front-ends. To overcome this gap, we present a hybrid approach in which convolutional neural network (CNN) techniques are combined with computational modelling to yield a real-time model of the human auditory periphery. A CNN was trained on speech corpus material to mimic a state-of-the-art biophysical model that can accurately represent the human cochlea and the ascending auditory pathway. The performance was compared against human data and simulations of the original model using basic stimuli (pure tones, clicks, etc.). Because the original peripheral model can simulate different degrees of sensorineural hearing loss, the normal-hearing CNN model can be adjusted in the same fashion to simulate hearing impairment. The neural-network character of these architectures allows for real-time, parallel and differentiable computations, which can serve in the next generation of hearing-aid and machine-hearing applications. Work supported by European Research Council ERC-StG-678120 (RobSpear)
- Published
- 2020
- Full Text
- View/download PDF
30. An improved model of masking effects for robust speech recognition system
- Author
-
Dai, Peng and Soon, Ing Yann
- Subjects
- *
ROBUST control , *MASKING (Psychology) , *AUTOMATIC speech recognition , *AUDITORY pathways , *SIMULATION methods & models , *FEATURE extraction , *HIDDEN Markov models , *SIGNAL processing - Abstract
Abstract: Performance of an automatic speech recognition system drops dramatically in the presence of background noise unlike the human auditory system which is more adept at noisy speech recognition. This paper proposes a novel auditory modeling algorithm which is integrated into the feature extraction front-end for Hidden Markov Model (HMM). The proposed algorithm is named LTFC which simulates properties of the human auditory system and applies it to the speech recognition system to enhance its robustness. It integrates simultaneous masking, temporal masking and cepstral mean and variance normalization into ordinary mel-frequency cepstral coefficients (MFCC) feature extraction algorithm for robust speech recognition. The proposed method sharpens the power spectrum of the signal in both the frequency domain and the time domain. Evaluation tests are carried out on the AURORA2 database. Experimental results show that the word recognition rate using our proposed feature extraction method has been effectively increased. [Copyright &y& Elsevier]
- Published
- 2013
- Full Text
- View/download PDF
31. Auditory model based direction estimation of concurrent speakers from binaural signals
- Author
-
Dietz, Mathias, Ewert, Stephan D., and Hohmann, Volker
- Subjects
- *
MATHEMATICAL models , *FEATURE extraction , *LECTURERS , *ROBUST control , *ESTIMATION theory , *ACOUSTIC localization , *MATHEMATICAL functions , *TIME delay systems - Abstract
Abstract: Humans show a very robust ability to localize sounds in adverse conditions. Computational models of binaural sound localization and technical approaches of direction-of-arrival (DOA) estimation also show good performance, however, both their binaural feature extraction and the strategies for further analysis partly differ from what is currently known about the human auditory system. This study investigates auditory model based DOA estimation emphasizing known features and limitations of the auditory binaural processing such as (i) high temporal resolution, (ii) restricted frequency range to exploit temporal fine-structure, (iii) use of temporal envelope disparities, and (iv) a limited range to compensate for interaural time delay. DOA estimation performance was investigated for up to five concurrent speakers in free field and for up to three speakers in the presence of noise. The DOA errors in these conditions were always smaller than 5°. A condition with moving speakers was also tested and up to three moving speakers could be tracked simultaneously. Analysis of DOA performance as a function of the binaural temporal resolution showed that short time constants of about 5ms employed by the auditory model were crucial for robustness against concurrent sources. [Copyright &y& Elsevier]
- Published
- 2011
- Full Text
- View/download PDF
32. Binaural pitch perception in normal-hearing and hearing-impaired listeners
- Author
-
Santurette, Sébastien and Dau, Torsten
- Subjects
- *
HEARING disorders , *EAR diseases , *AUDITORY perception , *HEARING - Abstract
Abstract: The effects of hearing impairment on the perception of binaural-pitch stimuli were investigated. Several experiments were performed with normal-hearing and hearing-impaired listeners, including detection and discrimination of binaural pitch, and melody recognition using different types of binaural pitches. For the normal-hearing listeners, all types of binaural pitches could be perceived immediately and were musical. The hearing-impaired listeners could be divided into three groups based on their results: (a) some perceived all types of binaural pitches, but with decreased salience or musicality compared to normal-hearing listeners; (b) some could only perceive the strongest pitch types; (c) some were unable to perceive any binaural pitch at all. The performance of the listeners was not correlated with audibility. Additional experiments investigated the correlation between performance in binaural-pitch perception and performance in measures of spectral and temporal resolution. Reduced frequency discrimination appeared to be linked to poorer melody recognition skills. Reduced frequency selectivity was also found to impede the perception of binaural-pitch stimuli. Overall, binaural-pitch stimuli might be very useful tools within clinical diagnostics for detecting specific deficiencies in the auditory system. [Copyright &y& Elsevier]
- Published
- 2007
- Full Text
- View/download PDF
33. Human auditory steady-state responses to changes in interaural correlation
- Author
-
Dajani, Hilmi R. and Picton, Terence W.
- Subjects
- *
BINAURAL hearing aids , *HEARING , *HEARING impaired , *DIAGNOSTIC imaging - Abstract
Abstract: Steady-state responses were evoked by noise stimuli that alternated between two levels of interaural correlation ρ at a frequency f m. With ρ alternating between +1 and 0, responses at f m dropped steeply above 4Hz, but persisted up to 64Hz. Two time constants of 47 and 4.4ms with delays of 198 and 36ms, respectively, were obtained by fitting responses to a transfer function based on symmetric exponential windows. The longer time constant, possibly reflecting cortical integration, is consistent with perceptual binaural “sluggishness”. The shorter time constant may reflect running cross-correlation in the high brainstem or primary auditory cortex. Responses at 2f m peaked with an amplitude of 848±479nV (f m =4Hz). Investigation of this robust response revealed that: (1) changes in ρ and lateralization evoked similar responses, suggesting a common neural origin, (2) response was most dependent on stimulus frequencies below 1000Hz, but frequencies up to 4000Hz also contributed, and (3) when ρ alternated between [0.2–1] and 0, response amplitude varied linearly with ρ, and the physiological response threshold was close to the average behavioral threshold (ρ =0.31). This steady-state response may prove useful in the objective investigation of binaural hearing. [Copyright &y& Elsevier]
- Published
- 2006
- Full Text
- View/download PDF
34. Localization Uncertainty In Time-Amplitude Stereophonic Reproduction
- Author
-
Toon van Waterschoot, Marc Moonen, Enzo De Sena, Huseyin Hacihabiboglu, and Zoran Cvetkovic
- Subjects
Stereophony ,Acoustics and Ultrasonics ,Computer science ,recording and reproduction ,01 natural sciences ,law.invention ,030507 speech-language pathology & audiology ,03 medical and health sciences ,symbols.namesake ,Position (vector) ,law ,Audio and Speech Processing (eess.AS) ,0103 physical sciences ,Computer Science (miscellaneous) ,FOS: Electrical engineering, electronic engineering, information engineering ,Active listening ,Electrical and Electronic Engineering ,010301 acoustics ,Sweet spot ,Pearson product-moment correlation coefficient ,Computational Mathematics ,Stereophonic sound ,Amplitude ,auditory modeling ,symbols ,localization uncertainty ,panning ,0305 other medical science ,Algorithm ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
This article studies the effects of inter-channel time and level differences in stereophonic reproduction on perceived localization uncertainty, which is defined as how difficult it is for a listener to tell where a sound source is located. Towards this end, a computational model of localization uncertainty is proposed first. The model calculates inter-aural time and level difference cues, and compares them to those associated to free-field point-like sources. The comparison is carried out using a particular distance functional that replicates the increased uncertainty observed experimentally with inconsistent inter-aural time and level difference cues. The model is validated by formal listening tests, achieving a Pearson correlation of 0.99. The model is then used to predict localization uncertainty for stereophonic setups and a listener in central and off-central positions. Results show that amplitude methods achieve a slightly lower localization uncertainty for a listener positioned exactly in the center of the sweet spot. As soon as the listener moves away from that position, the situation reverses, with time-amplitude methods achieving a lower localization uncertainty.
- Published
- 2020
- Full Text
- View/download PDF
35. A Model for Pitch Estimation Using Wavelet Packet Transform Based Cepstrum Method
- Author
-
T.K. Muhaseena and M.S. Lekshmi
- Subjects
cepstrum ,wavelet packet transform ,Computer science ,Speech recognition ,02 engineering and technology ,01 natural sciences ,Signal ,Wavelet packet decomposition ,0203 mechanical engineering ,0103 physical sciences ,Cepstrum ,Music information retrieval ,010301 acoustics ,periodicity detection ,General Environmental Science ,Audio signal ,business.industry ,Transcription (music) ,Pattern recognition ,ComputingMethodologies_PATTERNRECOGNITION ,020303 mechanical engineering & transports ,auditory modeling ,General Earth and Planetary Sciences ,Multipitch analysis ,Artificial intelligence ,Mel-frequency cepstrum ,business ,Communication channel - Abstract
A computationally efficient model for pitch estimation of mixed audio signals is presented. Pitch estimation plays a significant role in music audition like music information retrieval, automatic music transcription, melody extraction etc. The proposed system consists of channel separation and periodicity detection. The input signal is created by mixing two sound signals. First removes the short time correlations of the mixed signal. The model divides the signal into number of channels using wavelet packet transform. Computes the cepstrum of each channel and sums the cepstrum functions. The summary cepstrum function is further processed to extract the pitch frequency of two input signal separately. The model performance is demonstrated to be comparable to those of recent multichannel models. The proposed system can be verified by simulating the system in MATLAB.
- Published
- 2016
- Full Text
- View/download PDF
36. Biologically inspired assessment of noticeability of sound events in context
- Author
-
Filipan, Karlo, De Coensel, Bert, Verhulst, Sarah, and Botteldooren, Dick
- Subjects
Technology and Engineering ,Sound events ,noticeability ,auditory modeling - Abstract
Annoyance caused by environmental noise intruding the private dwelling and perception of the sonic environment in public spaces share a critical dependence on the detection of salient sound events.During everyday activities, the probability of noticing a sound in the complex sonic environment is proportional to how much this sound stands out of its context. Based on a thorough review of human auditory processing, scene analysis and attention, we propose a computational model that allows to identify salient sounds in a complex environment. The tonotopic model possesses a unique capability to trace amplitude modulations and phase sweeps, which are features that the human auditory system is highly sensitive to. The model is validated by exploring its response to sound environments with known annoying characteristics such as short rise times and impulses and results are contrasted against Zwicker loudness.
- Published
- 2018
37. Enhancing the sensitivity of the envelope-following response for cochlear synaptopathy screening in humans: The role of stimulus envelope.
- Author
-
Vasilkov, Viacheslav, Garrett, Markus, Mauermann, Manfred, and Verhulst, Sarah
- Subjects
- *
TEMPORAL bone , *AUDITORY evoked response , *HUMAN beings - Abstract
• Modifying the stimulus envelope to evoke more synchronous auditory-nerve firing, enhances the EFR. • Improved EFR analysis methods include multiple harmonics and correct for the individual noise floor. • EFRs to stimuli with rectangular envelopes (duty cycle 20–25%) are optimal for synaptopathy diagnosis. • Older NH and HI listeners had reduced RAM-EFRs, suggesting that they had age-related synaptopathy. • Sensitive diagnostic markers of synaptopathy are crucial when studying its perceptual consequences. Auditory de-afferentation, a permanent reduction in the number of inner-hair-cells and auditory-nerve synapses due to cochlear damage or synaptopathy, can reliably be quantified using temporal bone histology and immunostaining. However, there is an urgent need for non-invasive markers of synaptopathy to study its perceptual consequences in live humans and to develop effective therapeutic interventions. While animal studies have identified candidate auditory-evoked-potential (AEP) markers for synaptopathy, their interpretation in humans has suffered from translational issues related to neural generator differences, unknown hearing-damage histopathologies or lack of measurement sensitivity. To render AEP-based markers of synaptopathy more sensitive and differential to the synaptopathy aspect of sensorineural hearing loss, we followed a combined computational and experimental approach. Starting from the known characteristics of auditory-nerve physiology, we optimized the stimulus envelope to stimulate the available auditory-nerve population optimally and synchronously to generate strong envelope-following-responses (EFRs). We further used model simulations to explore which stimuli evoked a response that was sensitive to synaptopathy, while being maximally insensitive to possible co-existing outer-hair-cell pathologies. We compared the model-predicted trends to AEPs recorded in younger and older listeners (N=44, 24f) who had normal or impaired audiograms with suspected age-related synaptopathy in the older cohort. We conclude that optimal stimulation paradigms for EFR-based quantification of synaptopathy should have sharply rising envelope shapes, a minimal plateau duration of 1.7-2.1 ms for a 120-Hz modulation rate, and inter-peak intervals which contain near-zero amplitudes. From our recordings, the optimal EFR-evoking stimulus had a rectangular envelope shape with a 25% duty cycle and a 95% modulation depth. Older listeners with normal or impaired audiometric thresholds showed significantly reduced EFRs, which were consistent with how (age-induced) synaptopathy affected these responses in the model. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
38. Further validation of a binaural model predicting speech intelligibility against envelope-modulated noises.
- Author
-
Vicente, Thibault and Lavandier, Mathieu
- Subjects
- *
INTELLIGIBILITY of speech , *MODEL validation , *FORECASTING , *NOISE , *PREDICTION models - Abstract
Collin and Lavandier [J. Acoust. Soc. Am. 134, 1146–1159 (2013)] proposed a binaural model predicting speech intelligibility against envelope-modulated noises, evaluated in 24 acoustic conditions, involving similar masker types. The aim of the present study was to test the model robustness modeling 80 additional conditions, and evaluate the influence of its parameters using an approach inspired by a variance-based sensitivity analysis. First, the data from four experiments from the literature and one specifically designed for the present study were used to evaluate the prediction performance of the model, investigate potential interactions between its parameters, and define their values leading to the best predictions. A revision of the model allowed to account for binaural sluggishness. Finally, the optimized model was tested on an additional dataset not used to define its parameters. Overall, one hundred conditions split into six experiments were modeled. Correlation between data and predictions ranged from 0.85 to 0.96 across experiments, and mean absolute prediction errors were between 0.5 and 1.4 dB. • A binaural model predicting speech intelligibility in modulated noises is tested. • Speech reception thresholds measured in 100 conditions are accurately predicted. • The effect of binaural sluggishness on binaural unmasking is implemented. • Better-ear listening probably involves monaural and binaural temporal resolutions. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
39. L'efficacité du système auditif humain pour la reconnaissance de sons naturels
- Author
-
Isnard, Vincent, Sciences et Technologies de la Musique et du Son (STMS), Institut de Recherche et Coordination Acoustique/Musique (IRCAM)-Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS), Université Pierre et Marie Curie - Paris VI, Isabelle Viaud-Delmon, and STAR, ABES
- Subjects
Auditory modeling ,Modélisation auditive ,Rapidité de traitement auditif ,Théorie de détection du signal ,Auditory recognition ,[SDV.NEU.SC]Life Sciences [q-bio]/Neurons and Cognition [q-bio.NC]/Cognitive Sciences ,Reconnaissance auditive ,Natural sounds ,Codage parcimonieux ,[SDV.NEU.SC] Life Sciences [q-bio]/Neurons and Cognition [q-bio.NC]/Cognitive Sciences ,Timbre - Abstract
The efficacy of auditory recognition relies on two different aspects: the quantity of information necessary and the processing speed. The objective of this thesis was to experimentally evaluate these two aspects. In a first experimental part, we explored the amount of information by creating sparse representations of original natural sounds to form what is called auditory sketches. We showed that an auditory sketch is recognizable despite the very limited quantity of auditory information in the stimuli. To achieve these results, we dedicated an important part of our work on the elaboration of adequate tools in function of the tested sound categories. Thus, for the analysis of auditory stimuli, we have developed an auditory distance model between sound categories. For the analysis of the performances of the participants, we have developed a model to calculate the sensitivity by sound category and taking into account the bias, which falls within the signal detection theory. These analyses allowed us to show that, actually, the results are not equivalent between the different sound categories. Voices stand out from the other categories tested (e.g. musical instruments): the technique of selection of the sparse information does not seem adapted to the voice features. In a second experimental part, we investigated the temporal course of auditory recognition. To estimate the time necessary for the auditory system to recognize a sound, we used a recent paradigm of Rapid Audio Sequential Presentation (RASP). We showed that less than 50 ms are enough to recognize a short natural sound, with a better recognition for the human voice., L'efficacité de la reconnaissance auditive peut être décrite et quantifiée suivant deux aspects différents : la quantité d'information nécessaire pour y parvenir et sa rapidité. L'objectif de cette thèse est d'évaluer expérimentalement ces deux aspects. Dans une première partie expérimentale, nous nous sommes intéressés à la quantité d'information en créant des représentations parcimonieuses de sons naturels originaux appelées esquisses auditives. Nous avons montré qu'une esquisse auditive est reconnue malgré la quantité très limitée d'information auditive présente dans les stimuli. Pour l'analyse des stimuli auditifs, nous avons développé un modèle de distance auditive entre catégories sonores. Pour l'analyse des performances des participants, nous avons développé un modèle pour le calcul de la sensibilité par catégorie sonore et tenant compte du biais, qui s'intègre dans la théorie de détection du signal. Ces analyses nous ont permis de montrer qu'en réalité les résultats ne sont pas équivalents entre les différentes catégories sonores. La voix se démarque des autres catégories testées (e.g. instruments de musique) : la technique de sélection de l'information parcimonieuse ne semble pas adaptée aux indices de la voix. Dans une seconde partie expérimentale, nous avons étudié le décours temporel de la reconnaissance auditive. Afin d'estimer le temps nécessaire au système auditif pour reconnaître un son, nous avons utilisé un récent paradigme de présentation audio séquentielle rapide (RASP, pour Rapid Audio Sequential Presentation). Nous avons montré que moins de 50 ms suffisent pour reconnaître un son naturel court, avec une meilleure reconnaissance pour la pour la voix humaine.
- Published
- 2016
40. Source separation with one ear : proposition for an anthropomorphic approach
- Abstract
We present an example of an anthropomorphic approach, in which auditory-based cues are combined with temporal correlation to implement a source separation system. The auditory features are based on spectral amplitudemodulation and energy information obtained through 256 cochlear filters. Segmentation and binding of auditory objects are performed with a two-layered spiking neural network. The first layer performs the segmentation of the auditory images into objects, while the second layer binds the auditory objects belonging to the same source. The binding is further used to generate a mask (binary gain) to suppress the undesired sources fromthe original signal. Results are presented for a double-voiced (2 speakers) speech segment and for sentences corrupted with different noise sources. Comparative results are also given using PESQ (perceptual evaluation of speech quality) scores. The spiking neural network is fully adaptive and unsupervised.
- Published
- 2017
41. Source separation with one ear : proposition for an anthropomorphic approach
- Abstract
We present an example of an anthropomorphic approach, in which auditory-based cues are combined with temporal correlation to implement a source separation system. The auditory features are based on spectral amplitudemodulation and energy information obtained through 256 cochlear filters. Segmentation and binding of auditory objects are performed with a two-layered spiking neural network. The first layer performs the segmentation of the auditory images into objects, while the second layer binds the auditory objects belonging to the same source. The binding is further used to generate a mask (binary gain) to suppress the undesired sources fromthe original signal. Results are presented for a double-voiced (2 speakers) speech segment and for sentences corrupted with different noise sources. Comparative results are also given using PESQ (perceptual evaluation of speech quality) scores. The spiking neural network is fully adaptive and unsupervised.
- Published
- 2017
42. Biotemp: an auditory-based spectrotemporal model for robust classification of music and speech
- Author
-
Spencer, Jeffrey and Spencer, Jeffrey
- Abstract
This thesis is concerned with the development of template-based computational model of sound recognition called Biotemp. Biotemp is developed using neurobiologically plausible processing stages and based on the observation that sound recognition initiates early in auditory processing. Biotemp is developed to recognize both music and speech with normal hearing inputs and using the degraded hearing inputs of cochlear implant users. The development of a general sound recognition model that can accept input for both speech and music is in contrast to most current models that are typically developed to recognize either speech or music, and optimized to run on digital computers with little regard for implementing neurobiologically plausible processing stages. Biotemp is first optimized with musical chords of harmonic complexes. This shows the importance of lateral inhibition, adaptation, and saturation processes seen in the auditory brainstem to increase the recognition selectivity of the model, especially in noisy conditions. The optimized model is then used to recognize Klatt- synthesized vowels and the recognition rates are above 80 % at signal-to-noise ratios (SNRs) down to 0 dB with babble noise added. Furthermore, the Biotemp model outperforms a mel-frequency cepstral coefficient (MFCC) model for the vowels and a model using chroma features for the chords in clean and noisy conditions. Biotemp is then extended to the temporal domain for recognition of chords with varying rates of amplitude modulation (AM). Temporal inhibition increases the recognition selectivity of the model and reliable recognition above 15 Hz AM is seen, which is used to explain the perceptual transition from beating to roughness for amplitude modulated signals. Furthermore, an onset detector was developed and optimized for recognition of spoken digits, and Biotemp’s maximum recognition accuracy of the spoken digits was 95 %. Finally, Biotemp is used with the degraded pathways of cochlear implant
- Published
- 2016
43. A new procedure for automatic fitting of the basilar-membrane input-output function to individual behavioral data.
- Author
-
Kowalewski, Borys, Fereczkowski, Michal, MacDonald, Ewen, Dau, Torsten, Kowalewski, Borys, Fereczkowski, Michal, MacDonald, Ewen, and Dau, Torsten
- Abstract
The basilar membrane input-output function (BM I/O) in a healthy cochlea is highly nonlinear. One of the consequences of sensorineural hearing loss (SNHL) is a partial or full loss of this nonlinearity. Behavioral estimates of the individual BM I/O can be useful for modeling the impaired auditory system and, potentially, for clinical diagnostics. Computational algorithms are available that mimic the functioning of the nonlinear cochlear processing. One such algorithm is the dual resonance non-linear (DRNL) filterbank [6]. Its parameters can be modified to account for individual hearing loss, e.g., based on behavioral, temporal masking curves (TMC) data. This approach was used within the framework of the computational auditory signal-processing and perception (CASP) model to account for various aspects of SNHL [4]. However, due to the computational complexity, on-line fitting of the DRNL parameters is difficult. Until recently, the parameters were manually adjusted and the fitting process was indirect. A new approach is described here, based on a search through a lookup table of pre-computed filterbank input-output functions. The aim of this approach is to provide a fast, stable, and more objective fitting procedure.
- Published
- 2016
44. Modeling spectro - temporal modulation perception in normal - hearing listeners
- Author
-
Kropp, Wolfgang, Sanchez Lopez, Raul, Dau, Torsten, Kropp, Wolfgang, Sanchez Lopez, Raul, and Dau, Torsten
- Published
- 2016
45. Source Separation with One Ear: Proposition for an Anthropomorphic Approach
- Author
-
Rouat, Jean and Pichevar, Ramin
- Published
- 2005
- Full Text
- View/download PDF
46. Towards a Unifying Basis of Auditory Thresholds: The Effects of Hearing Loss on Temporal Integration Reconsidered
- Author
-
Neubauer, Heinrich and Heil, Peter
- Published
- 2004
- Full Text
- View/download PDF
47. The sensitivity matrix for a spectro-temporal auditory model
- Author
-
Plasberg, Jan H., Zhao, D. Y., Kleijn, W. B., Plasberg, Jan H., Zhao, D. Y., and Kleijn, W. B.
- Abstract
Perceptually optimal processing of speech and audio signals demands distortion measures that are based on sophisticated auditory models. High-rate theory can simplify these models by means of a sensitivity matrix. We present a method to derive the sensitivity matrix for distortion measures based on spectro-temporal auditory models under the assumption of small errors. This method is applied to an example auditory model and the region of validity of the approximation as well as a way to analyze the characteristics of the model with subspace methods are discussed., QC 20161129
- Published
- 2015
48. Effects of Specific Cochlear Pathologies on the Auditory Functions : Modelling, Simulations and Clinical Implications
- Author
-
Saremi, Amin G.
- Subjects
noise-induced hearing loss ,Auditory modeling ,age-related hearing loss ,Medicin och hälsovetenskap ,sensorineural hearing impairment ,inner ear pathologies ,cochlear mechanics ,otorhinolaryngologic diseases ,Medical and Health Sciences - Abstract
A hearing impairment is primarily diagnosed by measuring the hearing thresholds at a range of auditory frequencies (air-conduction audiometry). Although this clinical procedure is simple, affordable, reliable and fast, it does not offer differential information about origins of the hearing impairment. The main goal of this thesis is to quantitatively link specific cochlear pathologies to certain changes in the spectral and temporal characteristics of the auditory system. This can help better understand the underlying mechanisms associated with sensorineural hearing impairments, beyond what is shown in the audiogram. Here, an electromechanical signal-transmission model is devised in MATLAB where the parameters of the model convey biological interpretations of mammalian cochlear structures. The model is exploited to simulate the cell-level cochlear pathologies associated with two common types of sensorineural hearing impairments, 1: presbyacusis (age-related hearing impairment) and, 2: noise-induced hearing impairment. Furthermore, a clinical study, consisting of different psychoacoustic and physiological tests, was performed to trace and validate the model predictions in human. The results of the clinical tests were collated and compared with the model predictions, showing a reasonable agreement. In summary, the present model provides a biophysical foundation for simulating the effect of specific cellular lesions, due to different inner-ear diseases and external insults, on the entire cochlear mechanism and thereby on the whole auditory system. This is a multidisciplinary work in the sense that it connects the ‘biological processes’ with ‘acoustic modelling’ and ‘clinical audiology’ in a translational context.
- Published
- 2014
49. Auditory Modeling in Sport: Theoretical Framework and Practical Applications
- Author
-
SORS, FABRIZIO, GERBINO, WALTER, AGOSTINI, TIZIANO, Bernardis Paolo, Fantoni Carlo, Gerbino Walter, Sors, Fabrizio, Gerbino, Walter, and Agostini, Tiziano
- Subjects
sound ,Auditory Models ,auditory modeling ,Theory of Event Coding ,Second Order Biofeedback ,Perception ,sport ,Movement Sonification ,Perception, Auditory Models, Second Order Biofeedback - Abstract
Visual models, i.e. live demonstrations or film clips, are widely used in sport as training instruments. Nevertheless, in recent years some research demonstrated that the well known property of sounds to effectively represent the temporal structure of a given task and to promote its accurate reproduction, is not valid only for simple motor gestures, but also for the complex movements that characterize sport performances. As a consequence, there is a growing interest towards the study and the implementation of auditory models as an alternative to the visual ones traditionally used. The present work begin by theoretically frameworking the use of auditory modeling in sport according to the Theory of Event Coding. Then, some of the practical applications of the two auditory modeling techniques, i.e. Movement Sonification and Second Order Biofeedback, are synthetically reviewed.
- Published
- 2014
50. Making use of auditory models for better mimicking of normal hearing processes with cochlear implants: the SAM coding strategy
- Author
-
Anja Chilian, Tamas Harczos, Peter Husar, and Publica
- Subjects
Engineering ,jitter ,music perception ,hair cells ,Speech recognition ,Loudness Perception ,Biomedical Engineering ,adaptation ,computer.software_genre ,Models, Biological ,cochlear delays ,basilar membrane ,Hearing ,otorhinolaryngologic diseases ,Humans ,Psychoacoustics ,Electrical and Electronic Engineering ,binaural processing ,Audio signal processing ,outer and middle ear ,Electrodes ,speech processing strategy ,Cochlea ,Signal processing ,pitch discrimination ,business.industry ,fine structure processing ,nerve fibers ,Filter (signal processing) ,Neurophysiology ,compression ,Basilar membrane ,Cochlear Implants ,Acoustic Stimulation ,Nonlinear Dynamics ,auditory modeling ,cochlear implant evaluation ,Audiometry, Pure-Tone ,sense organs ,business ,computer ,Coding (social sciences) - Abstract
Mimicking the human ear on the basis of auditory models has become a viable approach in many applications by now. However, only a few attempts have been made to extend the scope of physiological ear models to be employed in cochlear implants (CI). Contemporary CI systems rely on much simpler filter banks and simulate the natural signal processing of a healthy cochlea to only a very limited extent. When looking at rehabilitation outcomes, current systems seem to have reached their peak potential, which signals the need for better algorithms and/or technologies. In this paper, we present a novel sound processing strategy, SAM (Stimulation based on Auditory Modeling), that is based on neurophysiological models of the human ear and can be employed in auditory prostheses. It incorporates active cochlear filtering (basilar membrane and outer hair cells) along with the mechanoelectrical transduction of the inner hair cells, so that several psychoacoustic phenomena are accounted for inherently. Although possible, current implementation does not make use of parallel stimulation of the electrodes, which matches state-of-the-art CI hardware. This paper elaborates on SAM's signal processing and provides a computational evaluation of the strategy. Results show that aspects of normal cochlear processing that are missing in common strategies can be replicated by SAM. This is supposed to improve overall CI user performance, which we have at least partly proven in a pilot study with implantees.
- Published
- 2013
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.