68 results on '"Sound separation"'
Search Results
2. ANALYSIS OF THE SOUND EVENT DETECTION METHODS AND SYSTEMS
- Author
-
Andriy Kovalenko and Anton Poroshenko
- Subjects
sound event detection ,sound event recognition ,monophonic sounds ,polyphonic sounds ,standard deviation ,median filter ,dynamic threshold ,sound separation ,Computer software ,QA76.75-76.765 ,Information theory ,Q350-390 - Abstract
Detection and recognition of loud sounds and characteristic noises can significantly increase the level of safety and ensure timely response to various emergency situations. Audio event detection is the first step in recognizing audio signals in a continuous audio input stream. This article presents a number of problems that are associated with the development of sound event detection systems, such as the deviation for each environment and each sound category, overlapping audio events, unreliable training data, etc. Both methods for detecting monophonic impulsive audio event and polyphonic sound event detection methods which are used in the state-of-the-art sound event detection systems are presented. Such systems are presented in Detection and Classification of Acoustic Scenes and Events (DCASE) challenges and workshops, which take place every year. Beside a majority of works focusing on the improving overall performance in terms of accuracy many other aspects have also been studied. Several systems presented at DCASE 2021 task 4 were considered, and based on their analysis, there was a conclusion about possible future for sound event detection systems. Also the actual directions in the development of modern audio analytics systems are presented, including the study and use of various architectures of neural networks, the use of several data augmentation techniques, such as universal sound separation, etc.
- Published
- 2022
- Full Text
- View/download PDF
3. Separating overlapping bat calls with a bi‐directional long short‐term memory network.
- Author
-
ZHANG, Kangkang, LIU, Tong, SONG, Shengjing, ZHAO, Xin, SUN, Shijun, METZNER, Walter, FENG, Jiang, and LIU, Ying
- Subjects
- *
ARTIFICIAL neural networks , *ANIMAL sounds , *ANIMAL sound production , *DIGITAL communications , *BAT sounds , *CLUSTER analysis (Statistics) , *BISTATIC radar - Abstract
Acquiring clear acoustic signals is critical for the analysis of animal vocalizations. Bioacoustics studies commonly face the problem of overlapping signals, which can impede the structural identification of vocal units, but there is currently no satisfactory solution. This study presents a bi‐directional long short‐term memory network to separate overlapping echolocation‐communication calls of 6 different bat species and reconstruct waveforms. The separation quality was evaluated using 7 temporal‐spectrum parameters. All the echolocation pulses and syllables of communication calls in the overlapping signals were separated and parameter comparisons showed no significant difference and negligible deviation between the extracted and original calls. Clustering analysis was conducted with separated echolocation calls from each bat species to provide an example of practical application of the separated and reconstructed calls. The result of clustering analysis showed high corrected rand index (82.79%), suggesting the reconstructed waveforms could be reliably used for species classification. These results demonstrate a convenient and automated approach for separating overlapping calls. The study extends the application of deep neural networks to separate overlapping animal sounds. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
4. Monaural Instrument Sound Segregation by Stacked Recurrent Neural Network.
- Author
-
WEN-HSING LAI and SIOU-LIN WANG
- Subjects
RECURRENT neural networks ,ELECTRIC guitar ,INSTRUMENTAL music ,ELECTRIC testing ,CONVOLUTIONAL neural networks ,MUSICAL perception - Abstract
A stacked recurrent neural network (sRNN) with gated recurrent units (GRUs) and jointly optimized soft time-frequency mask was proposed for extracting target musical instrument sounds from a mixture of instrumental sound. The sRNN model stacks and links multiple simple recurrent neural networks (RNNs), which makes sRNN an excellent model with temporal dynamic behavior and real deepness. The GRU improves the gate foundations of long short-term memory and reduces the operating time. Experiments were conducted to test the proposed method. A musical dataset collected from real instrumental music was used for training and testing; electric guitar and drum sounds were the target sounds. Objective and subjective assessment scores obtained for the proposed method were compared with those obtained for two models, namely Wave-U-Net and SH-4stack, and a conventional RNN model. The results indicated that electric guitar and drum sounds can be successfully extracted through the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
5. In-Process Noise Inspection System for Product Fault Detection in a Loud Shop-Floor Environment.
- Author
-
Baek, Woonsang and Kim, Duck Young
- Subjects
- *
ACOUSTIC generators , *ANECHOIC chambers , *NOISE measurement , *NOISE control , *MANUFACTURING processes , *MATRIX decomposition , *NOISE - Abstract
Abnormal noise originating from within faulty products often irritates customers, which may lead to expensive warranty claims. Therefore, it is important to identify these faulty products proactively in the manufacturing process. However, noise detection in a loud shop-floor is not straightforward because inspection in an anechoic chamber is very costly, and some prerequisites for conventional noise reduction and source separation methods, such as stationary and independent signals and prior knowledge about the signal of interest, are sometimes not feasible in practice. Therefore, we developed an in-process noise inspection system that supports dual-channel acoustic data collection during the inspection process. By using two different groups of acoustic signals, abnormal sound separation and noise detection are made possible through three main steps: in-process background noise training, abnormal noise separation, and significance evaluation. The efficiency of the proposed procedure is demonstrated with two case studies: car door trim panels and dual-channel sound generator and collector. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
6. Speech segregation under reverberant conditions
- Author
-
Shamsoddini, Ahmad
- Subjects
610.28 ,Sound separation ,Speech enhancement ,Pitch - Published
- 1997
7. Modeling Two-Stream Correspondence for Visual Sound Separation
- Author
-
Yixuan He, Fumin Shen, Yang Yang, Jingran Zhang, Heng Tao Shen, and Xing Xu
- Subjects
Computer science ,Acoustics ,Sound separation ,Media Technology ,Electrical and Electronic Engineering - Published
- 2022
- Full Text
- View/download PDF
8. Wheezing Sound Separation Based on Informed Inter-Segment Non-Negative Matrix Partial Co-Factorization
- Author
-
Juan De La Torre Cruz, Francisco Jesús Cañadas Quesada, Nicolás Ruiz Reyes, Pedro Vera Candeas, and Julio José Carabias Orti
- Subjects
sound separation ,non-negative matrix partial co-factorization ,bases ,repetitive ,sharing ,wheezing ,Chemical technology ,TP1-1185 - Abstract
Wheezing reveals important cues that can be useful in alerting about respiratory disorders, such as Chronic Obstructive Pulmonary Disease. Early detection of wheezing through auscultation will allow the physician to be aware of the existence of the respiratory disorder in its early stage, thus minimizing the damage the disorder can cause to the subject, especially in low-income and middle-income countries. The proposed method presents an extended version of Non-negative Matrix Partial Co-Factorization (NMPCF) that eliminates most of the acoustic interference caused by normal respiratory sounds while preserving the wheezing content needed by the physician to make a reliable diagnosis of the subject’s airway status. This extension, called Informed Inter-Segment NMPCF (IIS-NMPCF), attempts to overcome the drawback of the conventional NMPCF that treats all segments of the spectrogram equally, adding greater importance for signal reconstruction of repetitive sound events to those segments where wheezing sounds have not been detected. Specifically, IIS-NMPCF is based on a bases sharing process in which inter-segment information, informed by a wheezing detection system, is incorporated into the factorization to reconstruct a more accurate modelling of normal respiratory sounds. Results demonstrate the significant improvement obtained in the wheezing sound quality by IIS-NMPCF compared to the conventional NMPCF for all the Signal-to-Noise Ratio (SNR) scenarios evaluated, specifically, an SDR, SIR and SAR improvement equals 5.8 dB, 4.9 dB and 7.5 dB evaluating a noisy scenario with SNR = −5 dB.
- Published
- 2020
- Full Text
- View/download PDF
9. Аналіз методів та систем детектування аудіоподій
- Subjects
sound event recognition ,monophonic sounds ,поліфонічні звуки ,polyphonic sounds ,sound event detection ,звуковий поділ ,dynamic threshold ,виявлення звукових подій ,динамічний поріг ,median filter ,розпізнавання звукових подій ,sound separation ,середньоквадратичне відхилення ,медіанний фільтр ,монофонічні звуки ,standard deviation - Abstract
Detection and recognition of loud sounds and characteristic noises can significantly increase the level of safety and ensure timely response to various emergency situations. Audio event detection is the first step in recognizing audio signals in a continuous audio input stream. This article presents a number of problems that are associated with the development of sound event detection systems, such as the deviation for each environment and each sound category, overlapping audio events, unreliable training data, etc. Both methods for detecting monophonic impulsive audio event and polyphonic sound event detection methods which are used in the state-of-the-art sound event detection systems are presented. Such systems are presented in Detection and Classification of Acoustic Scenes and Events (DCASE) challenges and workshops, which take place every year. Beside a majority of works focusing on the improving overall performance in terms of accuracy many other aspects have also been studied. Several systems presented at DCASE 2021 task 4 were considered, and based on their analysis, there was a conclusion about possible future for sound event detection systems. Also the actual directions in the development of modern audio analytics systems are presented, including the study and use of various architectures of neural networks, the use of several data augmentation techniques, such as universal sound separation, etc., Виявлення та розпізнавання гучних звуків і характерних шумів дозволяє значно підвищити рівень безпеки та забезпечити своєчасне реагування на різні аварійні ситуації. Детектування аудіоподій – це перший крок у розпізнаванні аудіосигналів з безперервним вхідним аудіопотоком. У даній статті представлено ряд проблем, пов'язаних з розробкою систем виявлення аудіоподій, таких як відхилення для кожного середовища і кожної звукової категорії, звукові події, що перекриваються, недостовірні навчальні дані та ін. Представлені як методи виявлення монофонічних імпульсних звукових подій, так і методи виявлення поліфонічних аудіоподій, які використовуються в сучасних системах виявлення звукових подій. Такі системи представлені у завданнях та семінарах Detection and Classification of Acoustic Scenes and Events (DCASE), які відбуваються щороку. Більшість робіт спрямовані на покращення загальної продуктивності з точки зору точності, хоча також були вивчені багато інших аспектів. Було розглянуто кілька систем, представлених на DCASE 2021 в задачі 4, і на основі їх аналізу був зроблений висновок про можливе майбутнє систем виявлення звукових подій. Також представлені актуальні напрямки розвитку сучасних систем аудіоаналітики, в тому числі вивчення та використання різних архітектур нейронних мереж, використання декількох методів попередньої обробки даних, таких як універсальний розділ звуку та ін.
- Published
- 2022
10. Аналіз методів та систем детектування аудіоподій
- Author
-
Andriy Kovalenko and Anton Poroshenko
- Subjects
sound event recognition ,monophonic sounds ,поліфонічні звуки ,polyphonic sounds ,General Engineering ,sound event detection ,звуковий поділ ,dynamic threshold ,виявлення звукових подій ,динамічний поріг ,median filter ,розпізнавання звукових подій ,sound separation ,середньоквадратичне відхилення ,медіанний фільтр ,монофонічні звуки ,standard deviation - Abstract
Detection and recognition of loud sounds and characteristic noises can significantly increase the level of safety and ensure timely response to various emergency situations. Audio event detection is the first step in recognizing audio signals in a continuous audio input stream. This article presents a number of problems that are associated with the development of sound event detection systems, such as the deviation for each environment and each sound category, overlapping audio events, unreliable training data, etc. Both methods for detecting monophonic impulsive audio event and polyphonic sound event detection methods which are used in the state-of-the-art sound event detection systems are presented. Such systems are presented in Detection and Classification of Acoustic Scenes and Events (DCASE) challenges and workshops, which take place every year. Beside a majority of works focusing on the improving overall performance in terms of accuracy many other aspects have also been studied. Several systems presented at DCASE 2021 task 4 were considered, and based on their analysis, there was a conclusion about possible future for sound event detection systems. Also the actual directions in the development of modern audio analytics systems are presented, including the study and use of various architectures of neural networks, the use of several data augmentation techniques, such as universal sound separation, etc. Виявлення та розпізнавання гучних звуків і характерних шумів дозволяє значно підвищити рівень безпеки та забезпечити своєчасне реагування на різні аварійні ситуації. Детектування аудіоподій – це перший крок у розпізнаванні аудіосигналів з безперервним вхідним аудіопотоком. У даній статті представлено ряд проблем, пов'язаних з розробкою систем виявлення аудіоподій таких, як відхилення для кожного середовища і кожної звукової категорії, звукові події, що перекриваються, недостовірні навчальні дані та ін. Представлені, як методи виявлення монофонічних імпульсних звукових подій, так і методи виявлення поліфонічних аудіоподій, які використовуються в сучасних системах виявлення звукових подій. Такі системи представлені у завданнях та семінарах Detection and Classification of Acoustic Scenes and Events (DCASE), які відбуваються щороку. Більшість робіт спрямовані на покращення загальної продуктивності з точки зору точності, хоча також були вивчені багато інших аспектів. Було розглянуто кілька систем, представлених на DCASE 2021 в задачі 4, і на основі їх аналізу був зроблений висновок про можливе майбутнє систем виявлення звукових подій. Також представлені актуальні напрямки розвитку сучасних систем аудіоаналітики, в тому числі вивчення та використання різних архітектур нейронних мереж, використання декількох методів попередньої обробки даних, таких як універсальний розділ звуку та ін.
- Published
- 2022
11. Bringing the Scene Back to the Tele-operator: Auditory Scene Manipulation for Tele-presence Systems.
- Author
-
Chaoran Liu, Ishi, Carlos T., and Hiroshi Ishiguro
- Subjects
ROBOTICS ,REMOTE control ,MICROPHONE arrays ,USER interfaces ,ALGORITHMS ,VIRTUAL reality - Abstract
In a tele-operated robot system, the reproduction of auditory scenes, conveying 3D spatial information of sound sources in the remote robot environment, is important for the transmission of remote presence to the tele-operator. We proposed a tele-presence system which is able to reproduce and manipulate the auditory scenes of a remote robot environment, based on the spatial information of human voices around the robot, matched with the operator's head orientation. In the robot side, voice sources are localized and separated by using multiple microphone arrays and human tracking technologies, while in the operator side, the operator's head movement is tracked and used to relocate the spatial positions of the separated sources. Interaction experiments with humans in the robot environment indicated that the proposed system had significantly higher accuracy rates for perceived direction of sounds, and higher subjective scores for sense of presence and listenability, compared to a baseline system using stereo binaural sounds obtained by two microphones located at the humanoid robot's ears. We also proposed three different user interfaces for augmented auditory scene control. Evaluation results indicated higher subjective scores for sense of presence and usability in two of the interfaces (control of voice amplitudes based on virtual robot positioning, and amplification of voices in the frontal direction). [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
12. Separate Sound into STFT Frames to Eliminate Sound Noise Frames in Sound Classification
- Author
-
Huy Kien Bui, Thanh Tran, Nhat Truong Pham, Consolatina Liguori, Jan Lundgren, and Marco Carratù
- Subjects
Short Time Fourier Transform ,Sound classification ,Sound separation ,Transfer learning - Published
- 2021
- Full Text
- View/download PDF
13. Monaural Musical Octave Sound Separation Using Relaxed Extended Common Amplitude Modulation
- Author
-
Longquan Dai and Yukai Gong
- Subjects
Amplitude modulation ,Hardware and Architecture ,Acoustics ,Sound separation ,Octave ,General Medicine ,Musical ,Electrical and Electronic Engineering ,Monaural ,Mathematics - Abstract
Monaural music sound separation isolates individual instrument sources from a mono-channel polyphonic mixture. The primary challenge is to separate the source partials overlapped in time-frequency regions, especially for the full overlapping cases that at least one source does not have any nonoverlapping partial. Due to the lack of effective methods to separate the sources with full overlapping partials, this paper put forward a relaxed extended common amplitude modulation (RECAM) approach to deal with the octave sound separation, one of the most difficult cases. Our strategy uses a multi-band co-processing way for each short-time partial wave segment. Extensive experiments are conducted on octave mixture samples drawn from the Iowa University Musical Instrument Database. Results confirm that our RECAM achieves the best separation performance. For nonvibrato and vibrato mixtures, the average improvement of RECAM in each measure exceeds 3dB and 2dB, respectively.
- Published
- 2021
- Full Text
- View/download PDF
14. Separate Sound into STFT Frames to Eliminate Sound Noise Frames in Sound Classification
- Author
-
Tran, Thanh, Huy, Kien Bui, Pham, Nhat Truong, Carratù, Marco, Liguori, Consolatina, Thim, Jan, Tran, Thanh, Huy, Kien Bui, Pham, Nhat Truong, Carratù, Marco, Liguori, Consolatina, and Thim, Jan
- Abstract
Sounds always contain acoustic noise and background noise that affects the accuracy of the sound classification system. Hence, suppression of noise in the sound can improve the robustness of the sound classification model. This paper investigated a sound separation technique that separates the input sound into many overlapped-content Short-Time Fourier Transform (STFT) frames. Our approach is different from the traditional STFT conversion method, which converts each sound into a single STFT image. Contradictory, separating the sound into many STFT frames improves model prediction accuracy by increasing variability in the data and therefore learning from that variability. These separated frames are saved as images and then labeled manually as clean and noisy frames which are then fed into transfer learning convolutional neural networks (CNNs) for the classification task. The pre-trained CNN architectures that learn from these frames become robust against the noise. The experimental results show that the proposed approach is robust against noise and achieves 94.14% in terms of classifying 21 classes including 20 classes of sound events and a noisy class. An open-source repository of the proposed method and results is available at https://github.com/nhattruongpham/soundSepsound.
- Published
- 2021
- Full Text
- View/download PDF
15. A Stereo Music Preprocessing Scheme for Cochlear Implant Users.
- Author
-
Buyens, Wim, van Dijk, Bas, Wouters, Jan, and Moonen, Marc
- Subjects
- *
COCHLEAR implants , *ARTIFICIAL implants , *BIOACOUSTICS , *HEARING , *MUSIC - Abstract
Objective: Listening to music is still one of the more challenging aspects of using a cochlear implant (CI) for most users. Simple musical structures, a clear rhythm/beat, and lyrics that are easy to follow are among the top factors contributing to music appreciation for CI users. Modifying the audio mix of complex music potentially improves music enjoyment in CI users. Methods: A stereo music preprocessing scheme is described in which vocals, drums, and bass are emphasized based on the representation of the harmonic and the percussive components in the input spectrogram, combined with the spatial allocation of instruments in typical stereo recordings. The scheme is assessed with postlingually deafened CI subjects (N = 7) using pop/rock music excerpts with different complexity levels. Results: The scheme is capable of modifying relative instrument level settings, with the aim of improving music appreciation in CI users, and allows individual preference adjustments. The assessment with CI subjects confirms the preference for more emphasis on vocals, drums, and bass as offered by the preprocessing scheme, especially for songs with higher complexity. Conclusion: The stereo music preprocessing scheme has the potential to improve music enjoyment in CI users by modifying the audio mix in widespread (stereo) music recordings. Significance: Since music enjoyment in CI users is generally poor, this scheme can assist the music listening experience of CI users as a training or rehabilitation tool. [ABSTRACT FROM PUBLISHER]
- Published
- 2015
- Full Text
- View/download PDF
16. Improving Sound Event Detection In Domestic Environments Using Sound Separation
- Author
-
Turpault, Nicolas, Wisdom, Scott, Erdogan, Hakan, Hershey, John, Serizel, Romain, Fonseca, Eduardo, Seetharaman, Prem, Salamon, Justin, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Google Inc., Music Technology Group (MTG), Universitat Pompeu Fabra [Barcelona] (UPF), Northwestern University [Evanston], Adobe Research, We would like to thank the other organizers of DCASE 2020 task 4: Daniel P. W. Ellis and Ankit Parag Shah., Grid'5000, ANR-18-CE23-0020,LEAUDS,Apprentissage statistique pour la compréhension de scènes audio(2018), Universitat Pompeu Fabra [Barcelona], and ANR-18-CE23-0020,LEAUDS,LEARNING TO UNDERSTAND AUDIO SCENES(2018)
- Subjects
Signal Processing (eess.SP) ,FOS: Computer and information sciences ,Sound (cs.SD) ,Computer Science - Sound ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,Synthetic soundscapes ,Sound event detection ,Audio and Speech Processing (eess.AS) ,TheoryofComputation_LOGICSANDMEANINGSOFPROGRAMS ,[INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD] ,FOS: Electrical engineering, electronic engineering, information engineering ,Electrical Engineering and Systems Science - Signal Processing ,Sound separation ,Index Terms-Sound event detection ,[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
International audience; Performing sound event detection on real-world recordings often implies dealing with overlapping target sound events and non-target sounds, also referred to as interference or noise. Until now these problems were mainly tackled at the classifier level. We propose to use sound separation as a pre-processing for sound event detection. In this paper we start from a sound separation model trained on the Free Universal Sound Separation dataset and the DCASE 2020 task 4 sound event detection baseline. We explore different methods to combine separated sound sources and the original mixture within the sound event detection. Furthermore, we investigate the impact of adapting the sound separation model to the sound event detection data on both the sound separation and the sound event detection.
- Published
- 2020
17. Sound Event Detection and Separation: a Benchmark on Desed Synthetic Soundscapes
- Author
-
Romain Serizel, Hakan Erdogan, Justin Salamon, Nicolas Turpault, John R. Hershey, Scott Wisdom, Eduardo Fonseca, Prem Seetharaman, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL), Google Inc, Research at Google, Universitat Pompeu Fabra [Barcelona] (UPF), Descript, Inc., Adobe Research, Part of this work was made with the support of the French National Research Agency, in the framework of the project LEAUDS 'Learning to understand audio scenes' (ANR-18-CE23-0020) and the French region Grand-Est. High Performance Computing resources were partially provided by the EXPLOR centre hosted by the University de Lorraine., Grid'5000, ANR-18-CE23-0020,LEAUDS,Apprentissage statistique pour la compréhension de scènes audio(2018), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
FOS: Computer and information sciences ,Sound localization ,Sound (cs.SD) ,Reverberation ,Soundscape ,Computer science ,Speech recognition ,02 engineering and technology ,Computer Science - Sound ,[INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing ,Audio and Speech Processing (eess.AS) ,Robustness (computer science) ,FOS: Electrical engineering, electronic engineering, information engineering ,0202 electrical engineering, electronic engineering, information engineering ,Sound (geography) ,synthetic soundscapes ,geography ,Signal processing ,geography.geographical_feature_category ,Event (computing) ,Sound event detection ,[INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD] ,Benchmark (computing) ,sound separation ,020201 artificial intelligence & image processing ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
International audience; We propose a benchmark of state-of-the-art sound event detection systems (SED). We designed synthetic evaluation sets to focus on specific sound event detection challenges. We analyze the performance of the submissions to DCASE 2021 task 4 depending on time related modifications (time position of an event and length of clips) and we study the impact of non-target sound events and reverberation. We show that the localization in time of sound events is still a problem for SED systems. We also show that reverberation and non-target sound events are severely degrading the performance of the SED systems. In the latter case, sound separation seems like a promising solution.
- Published
- 2020
- Full Text
- View/download PDF
18. Harmonic/Percussive Sound Separation and Spectral Complexity Reduction of Music Signals for Cochlear Implant Listeners
- Author
-
Rainer Martin, Johannes Gauer, Benjamin Lentz, and Anil Nagathil
- Subjects
Computer science ,medicine.medical_treatment ,Speech recognition ,Sound separation ,020206 networking & telecommunications ,02 engineering and technology ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Popular music ,Rhythm ,Cochlear implant ,0202 electrical engineering, electronic engineering, information engineering ,medicine ,Rock music ,0305 other medical science ,Beat (music) - Abstract
Cochlear implant (CI) users suffer from limitations in music perception and thus prefer music which has a clear rhythm/beat and is played with only a few instruments. Therefore, existing music pre-processing methods aim to enhance music signals for CI users by either emphasizing preferred voices or reducing the spectral complexity of the signals. In this work, a music pre-processing scheme is described which combines these approaches and is applicable to a wider variety of music genres.The proposed method is evaluated and compared to other recently developed methods using instrumental measures and a listening test with vocoded pop/rock music excerpts and normal hearing listeners. Unprocessed popular music pieces as well as different processed versions were rated comparatively in terms of distinctness of drums, distinctness of melody, and the overall impression. The listening test showed significantly better ratings for the proposed method compared to unprocessed music and most of the other processing schemes. As the instrumental measures also indicate improvements, the proposed combined strategy is a promising candidate for music enhancement for CI listeners.
- Published
- 2020
- Full Text
- View/download PDF
19. Wheezing Sound Separation Based on Informed Inter-Segment Non-Negative Matrix Partial Co-Factorization
- Author
-
Julio José Carabias Orti, Juan De La Torre Cruz, Nicolas Ruiz Reyes, Francisco Jesús Cañadas Quesada, and Pedro Vera Candeas
- Subjects
Computer science ,Speech recognition ,Normal respiratory sounds ,Sound separation ,inter-segment ,informed ,02 engineering and technology ,lcsh:Chemical technology ,01 natural sciences ,Biochemistry ,Article ,Analytical Chemistry ,Matrix (mathematics) ,Pulmonary Disease, Chronic Obstructive ,Factorization ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,medicine ,Humans ,lcsh:TP1-1185 ,bases ,Electrical and Electronic Engineering ,Respiratory system ,010301 acoustics ,Instrumentation ,normal respiratory sounds ,Respiratory Sounds ,medicine.diagnostic_test ,wheezing ,020206 networking & telecommunications ,Auscultation ,non-negative matrix partial co-factorization ,Atomic and Molecular Physics, and Optics ,sharing ,repetitive ,sound separation ,Noise ,Algorithms - Abstract
Wheezing reveals important cues that can be useful in alerting about respiratory disorders, such as Chronic Obstructive Pulmonary Disease. Early detection of wheezing through auscultation will allow the physician to be aware of the existence of the respiratory disorder in its early stage, thus minimizing the damage the disorder can cause to the subject, especially in low-income and middle-income countries. The proposed method presents an extended version of Non-negative Matrix Partial Co-Factorization (NMPCF) that eliminates most of the acoustic interference caused by normal respiratory sounds while preserving the wheezing content needed by the physician to make a reliable diagnosis of the subject&rsquo, s airway status. This extension, called Informed Inter-Segment NMPCF (IIS-NMPCF), attempts to overcome the drawback of the conventional NMPCF that treats all segments of the spectrogram equally, adding greater importance for signal reconstruction of repetitive sound events to those segments where wheezing sounds have not been detected. Specifically, IIS-NMPCF is based on a bases sharing process in which inter-segment information, informed by a wheezing detection system, is incorporated into the factorization to reconstruct a more accurate modelling of normal respiratory sounds. Results demonstrate the significant improvement obtained in the wheezing sound quality by IIS-NMPCF compared to the conventional NMPCF for all the Signal-to-Noise Ratio (SNR) scenarios evaluated, specifically, an SDR, SIR and SAR improvement equals 5.8 dB, 4.9 dB and 7.5 dB evaluating a noisy scenario with SNR = &minus, 5 dB.
- Published
- 2020
20. DMMAN: A two-stage audio-visual fusion framework for sound separation and event localization
- Author
-
Zhi Ri Tang, Qijun Huang, Sheng Chang, Hu Ruihan, Songbing Zhou, Wei Han, Edmond Q. Wu, and Yisen Liu
- Subjects
0209 industrial biotechnology ,Fusion ,Computer science ,business.industry ,Cognitive Neuroscience ,Sound separation ,Pattern recognition ,02 engineering and technology ,020901 industrial engineering & automation ,Modal ,Deep Learning ,Acoustic Stimulation ,Artificial Intelligence ,Audio visual ,0202 electrical engineering, electronic engineering, information engineering ,Auditory Perception ,Visual Perception ,Humans ,020201 artificial intelligence & image processing ,Attention ,Artificial intelligence ,Neural Networks, Computer ,business ,Classifier (UML) ,Photic Stimulation - Abstract
Videos are used widely as the media platforms for human beings to touch the physical change of the world. However, we always receive the mixed sound from the multiple sound objects, and cannot distinguish and localize the sounds as the separate entities in videos. In order to solve this problem, a model named the Deep Multi-Modal Attention Network (DMMAN), is established to model the unconstrained video datasets for further finishing the sound source separation and event localization tasks in this paper. Based on the multi-modal separator and multi-modal matching classifier module, our model focuses on the sound separation and modal synchronization problems using two stage fusion of the sound and visual features. To link the multi-modal separator and multi-modal matching classifier modules, the regression and classification losses are employed to build the loss function of the DMMAN. The estimated spectrum masks and attention synchronization scores calculated by the DMMAN can be easily generalized to the sound source and event localization tasks. The quantitative experimental results show the DMMAN not only separates the high quality of the sound sources evaluated by Signal-to-Distortion Ratio and Signal-to-Interference Ratio metrics, but also is suitable for the mixed sound scenes that are never heard jointly. Meanwhile, DMMAN achieves better classification accuracy than other contrast baselines for the event localization tasks.
- Published
- 2020
21. What's All the FUSS About Free Universal Sound Separation Data?
- Author
-
Romain Serizel, Prem Seetharaman, Justin Salamon, Daniel P. W. Ellis, John R. Hershey, Scott Wisdom, Eduardo Fonseca, Nicolas Turpault, Hakan Erdogan, Google Inc, Research at Google, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Universitat Pompeu Fabra [Barcelona] (UPF), Adobe Research, Descript, Inc., and ANR-18-CE23-0020,LEAUDS,Apprentissage statistique pour la compréhension de scènes audio(2018)
- Subjects
FOS: Computer and information sciences ,Reverberation ,Sound (cs.SD) ,open-source datasets ,Computer science ,Sound separation ,Separation (aeronautics) ,02 engineering and technology ,Impulse (physics) ,Computer Science - Sound ,Data modeling ,[INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing ,Audio and Speech Processing (eess.AS) ,0202 electrical engineering, electronic engineering, information engineering ,Open domain ,FOS: Electrical engineering, electronic engineering, information engineering ,business.industry ,Deep learning ,deep learning ,020206 networking & telecommunications ,Universal sound separation ,variable source sep- aration ,[INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD] ,020201 artificial intelligence & image processing ,Artificial intelligence ,Variable number ,business ,Algorithm ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
International audience; We introduce the Free Universal Sound Separation (FUSS) dataset, a new corpus for experiments in separating mixtures of an unknown number of sounds from an open domain of sound types. The dataset consists of 23 hours of single-source audio data drawn from 357 classes, which are used to create mixtures of one to four sources. To simulate reverberation, an acoustic room simulator is used to generate impulse responses of box shaped rooms with frequency-dependent reflective walls. Additional open-source data augmentation tools are also provided to produce new mixtures with different combinations of sources and room simulations. Finally, we introduce an open-source baseline separation model, based on an improved time-domain convolutional network (TDCN++), that can separate a variable number of sources in a mixture. This model achieves 9.8 dB of scale-invariant signal-to-noise ratio improvement (SI-SNRi) on mixtures with two to four sources, while reconstructing single-source inputs with 35.5 dB absolute SI-SNR. We hope this dataset will lower the barrier to new research and allow for fast iteration and application of novel techniques from other machine learning domains to the sound separation challenge.
- Published
- 2020
- Full Text
- View/download PDF
22. A sound-selective hearing support system using environment sensor network
- Author
-
Norihiro Hagita, Carlos Toshinori Ishi, Jani Even, and Chaoran Liu
- Subjects
Selective auditory attention ,Acoustics and Ultrasonics ,Computer science ,Acoustics ,Sound separation ,Intelligibility (communication) ,01 natural sciences ,030507 speech-language pathology & audiology ,03 medical and health sciences ,0103 physical sciences ,Support system ,0305 other medical science ,010301 acoustics ,Wireless sensor network - Published
- 2018
- Full Text
- View/download PDF
23. On the optimality of ideal binary time–frequency masks
- Author
-
Li, Yipeng and Wang, DeLiang
- Subjects
- *
MATHEMATICAL optimization , *SPEECH , *VOICE frequency , *SIGNAL-to-noise ratio , *DATABASES , *AUDIO frequency - Abstract
The concept of ideal binary time-frequency masks has received attention recently in monaural and binaural sound separation. Although often assumed, the optimality of ideal binary masks in terms of signal-to-noise ratio has not been rigorously addressed. In this paper we give a formal treatment on this issue and clarify the conditions for ideal binary masks to be optimal. We also experimentally compare the performance of ideal binary masks to that of ideal ratio masks on a speech mixture database and a music database. The results show that ideal binary masks are close in performance to ideal ratio masks which are closely related to the Wiener filter, the theoretically optimal linear filter. [Copyright &y& Elsevier]
- Published
- 2009
- Full Text
- View/download PDF
24. Separation of Singing Voice From Music Accompaniment for Monaural Recordings.
- Author
-
Yipeng Li and DeLiang Wang
- Subjects
SIGNAL separation ,HUMAN voice ,MUSICAL accompaniment ,LIVE sound recordings ,MUSICAL perception ,SPEECH perception - Abstract
Separating singing voice from music accompaniment is very useful in many applications, such as lyrics recognition and alignment, singer identification, and music information retrieval. Although speech separation has been extensively studied for decades, singing voice separation has been little investigated. We propose a system to separate singing voice from music accompaniment for monaural recordings. Our system consists of three stages. The singing voice detection stage partitions and classifies an input into vocal and nonvocal portions. For vocal portions, the predominant pitch detection stage detects the pitch of the singing voice and then the separation stage uses the detected pitch to group the time-frequency segments of the singing voice. Quantitative results show that the system performs the separation task successfully. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
25. Distinction of Heart Sound and Respiratory Sound Using Body Conduction Sound Sensor Based on HPSS
- Author
-
Hiroyuki Yagami, Kanya Tanaka, Kaede Torii, Shota Nakashima, Hiroshi Nakamura, and Tetsuo Ooyagi
- Subjects
geography ,Audio signal ,geography.geographical_feature_category ,Computer science ,Acoustics ,Sound separation ,otorhinolaryngologic diseases ,Harmonic ,Human heart ,Thermal conduction ,Signal ,Sound (geography) - Abstract
In this dissertation, a research on heart sound and respiratory sound distinction system by using body conduction sound sensor is presented. In order to separate heart sound and respiratory sound effectively, a new and fast method by using the Harmonic/Percussive Sound Separation measurement has been proposed and verified in this research. In this research, a body conduction sound sensor has been used to collect the mixed heart sound signal and respiratory sound signal from the human body. A simple and fast method based on harmonic percussive sound separation (HPSS) is applied on the mixed sound to be separated. The aim of this research is to study the effectiveness of HPSS measurement to separate a human heart sound and respiratory sound. To evaluate the effectiveness of the proposed measuring method, I conducted an experiment using the proposed method on several subjects. The experimental result showed that the proposed method is effective and can be used to monitor our heart sound and respiratory sound abnormalities in the future.
- Published
- 2019
- Full Text
- View/download PDF
26. Biocoustic Sound Separation Based on FastICA and Infomax Algorithms
- Author
-
Norsalina Hassan and Dzati Athiar Ramli
- Subjects
Computer science ,Bioacoustics ,Sound separation ,FastICA ,Infomax ,Sound recognition ,Multiple species ,Blind signal separation ,Algorithm - Abstract
In bioacoustics technology, advances such as automated sound recognition based on animal vocalization help in biological research and environmental monitoring. However, in a noisy acoustic environment, where there will be interferences such as overlapping sounds made by multiple species, may greatly hamper the automated sound recognizer performance to identify the specific species. Hence, it is desirable to extract the sound made by the target species from the interferences as a pre-process prior to the recognition to get more accurate results. This paper exploits two Blind Source Separation (BSS) algorithms namely Info-max and FastICA to obtain the target frog sounds from the mixtures. The comparison of algorithm performances is expressed according to Signal-to-Interfere (SIR). The empirical simulation results show that FastICA outperforms Infomax in terms of separation quality.
- Published
- 2019
- Full Text
- View/download PDF
27. Rindik rod sound separation with spectral subtraction method
- Author
-
I D M B A Darmawan and Y Christian
- Subjects
Physics ,History ,Spectral subtraction ,Acoustics ,Sound separation ,ComputingMethodologies_COMPUTERGRAPHICS ,Computer Science Applications ,Education - Abstract
Rindik is a traditional music instrument originated from Bali that consists of 11 bamboo rods and played by a person hitting the bamboo rod with a rubber mallet in each player’s hands. Documentation of Rindik songs with automatic music transcription work is easier to do by separating it first. To overcome that challenge, two Rindik rods sound were separated using the spectral subtraction method. The noise spectrum is the spectrum of a single rod sound that needs to be muffled. The resulting audio is the other single rod sound and vice versa to get both single rod sound. The data consisted of single rod hit sound recordings of 11 single rod sound and 55 combinations of two-rods sounds being hit at the same time. The performance of the spectral subtraction method in separating Rindik sound was measured with MSE and SIR and also heard the noise exists in the separated audio signal. The experiment demonstrated a state-of-the-art performance consists of spectral subtraction with squared noise average magnitude with average MSE value 0.0126 and average SIR value 55.68 dB.
- Published
- 2021
- Full Text
- View/download PDF
28. Acoustic Signal Classification from Monaural Recordings
- Author
-
Rupali Shete
- Subjects
Control and Optimization ,Computer Networks and Communications ,Computer science ,Speech recognition ,Sound separation ,Monaural ,Signal ,Computer Science Applications ,Domain (software engineering) ,Human-Computer Interaction ,Signal classification ,Artificial Intelligence ,Modeling and Simulation ,Frequency domain ,Signal Processing ,Mel-frequency cepstrum ,Singing - Abstract
Acoustic domain contains signals related to sound. Speech and music though are included in this domain, both the signals differ with various features. Features used for speech separation does not provide sufficient cue for music separation. This paper covers musical sound separation for monaural recordings. A system is proposed to classify singing voice and music from monaural recordings. For classification, time and frequency domain features along with Mel Frequency Cepstral Coefficients (MFCC) applied to input signal. Information carried by these signals permit to establish results Quantitative experimental results shows that the system performs the separation task successfully in monaural environment.
- Published
- 2014
- Full Text
- View/download PDF
29. Sound-Separation System using Spherical Microphone Array with Three-Dimensional Directivity—KIKIWAKE 3D: Language Game for Children
- Author
-
Takahiro Nakadai, Tomohiro Nakayama, Tomoki Taguchi, Ryohei Egusa, Miki Namatame, Masanori Sugimoto, Fusako Kusunoki, Etsuji Yamaguchi, Shigenori Inagaki, Yoshiaki Takeda, and Hiroshi Mizoguchi
- Subjects
and evaluation ,Microphone array ,Signal processing ,lcsh:T ,Computer science ,Acoustics ,Sound separation ,Language-game ,lcsh:Technology ,Directivity ,Supporting learning system ,signal processing ,frequency-band selection ,Control and Systems Engineering ,lcsh:Technology (General) ,lcsh:T1-995 ,Sound sources ,Electrical and Electronic Engineering ,implementation - Abstract
Mixed sounds can be separated from multiple sound sources using microphone array sensor and signal processing. We believe that promotion of interest in this technique can lead to significant future development in science and technology. To investigate this technique, we designed a language game for children called “KIKIWAKE 3D” that uses a sound-source-separation system to arouse children’s interest in this technology. However, the microphone array sensor in a previous research had a limited scope in separating sounds. We developed a spherical microphone array sensor with three-dimensional directivity designed for this game. In this paper, we report the evaluation of this microphone array sensor in adapting to this game by separating the sound level and using questionnaires.
- Published
- 2014
- Full Text
- View/download PDF
30. On an improvement of the sound separation ratio according to exciter attachment method of flat TV
- Author
-
Kwanho Park, Lee Sungtae, Hyung Woo Park, and Myung-Jin Bae
- Subjects
geography ,geography.geographical_feature_category ,Acoustics and Ultrasonics ,Arts and Humanities (miscellaneous) ,Computer science ,Position (vector) ,Acoustics ,Sound separation ,Exciter ,Sound (geography) - Abstract
In the case of a flat-screen TV, the position of the sound was properly configured by reproducing the sound directly from the left and right sides of the screen. However, focusing on the image qual...
- Published
- 2019
- Full Text
- View/download PDF
31. Wheezing Sound Separation Based on Informed Inter-Segment Non-Negative Matrix Partial Co-Factorization.
- Author
-
De La Torre Cruz, Juan, Cañadas Quesada, Francisco Jesús, Ruiz Reyes, Nicolás, Vera Candeas, Pedro, and Carabias Orti, Julio José
- Subjects
- *
NONNEGATIVE matrices , *OBSTRUCTIVE lung diseases , *SIGNAL reconstruction , *LOW-income countries , *MIDDLE-income countries , *SIGNAL-to-noise ratio - Abstract
Wheezing reveals important cues that can be useful in alerting about respiratory disorders, such as Chronic Obstructive Pulmonary Disease. Early detection of wheezing through auscultation will allow the physician to be aware of the existence of the respiratory disorder in its early stage, thus minimizing the damage the disorder can cause to the subject, especially in low-income and middle-income countries. The proposed method presents an extended version of Non-negative Matrix Partial Co-Factorization (NMPCF) that eliminates most of the acoustic interference caused by normal respiratory sounds while preserving the wheezing content needed by the physician to make a reliable diagnosis of the subject's airway status. This extension, called Informed Inter-Segment NMPCF (IIS-NMPCF), attempts to overcome the drawback of the conventional NMPCF that treats all segments of the spectrogram equally, adding greater importance for signal reconstruction of repetitive sound events to those segments where wheezing sounds have not been detected. Specifically, IIS-NMPCF is based on a bases sharing process in which inter-segment information, informed by a wheezing detection system, is incorporated into the factorization to reconstruct a more accurate modelling of normal respiratory sounds. Results demonstrate the significant improvement obtained in the wheezing sound quality by IIS-NMPCF compared to the conventional NMPCF for all the Signal-to-Noise Ratio (SNR) scenarios evaluated, specifically, an SDR, SIR and SAR improvement equals 5.8 dB, 4.9 dB and 7.5 dB evaluating a noisy scenario with SNR = −5 dB. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
32. Hearing support system using environment sensor network
- Author
-
Norihiro Hagita, Jani Even, Carlos Toshinori Ishi, and Chaoran Liu
- Subjects
Hearing aid ,Engineering ,business.industry ,medicine.medical_treatment ,Speech recognition ,Sound separation ,020206 networking & telecommunications ,02 engineering and technology ,Intelligibility (communication) ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Intelligent robots ,0202 electrical engineering, electronic engineering, information engineering ,medicine ,Sound sources ,Support system ,0305 other medical science ,business ,Spatial analysis ,Wireless sensor network - Abstract
In order to solve the problems of current hearing aid devices, we make use of environment sensor network, and propose a hearing support system, where individual target and anti-target sound sources in the environment can be selected, and spatial information of the target sound sources is reconstructed. The performance of the selective sound separation module was evaluated for different noise conditions. Results showed that signal-to-noise ratios of around 15dB could be achieved by the proposed system for a 65dB babble noise plus directional music noise condition. In the same noise condition, subjective intelligibility tests were conducted, and an improvement of 65 to 90% word intelligibility rates could be achieved by using the proposed hearing support system.
- Published
- 2016
- Full Text
- View/download PDF
33. Music genre recognition using spectrograms with harmonic-percussive sound separation
- Author
-
Yandre M. G. Costa, Rafael de Lima Aguiar, and Loris Nanni
- Subjects
Computer science ,Speech recognition ,media_common.quotation_subject ,Sound separation ,020206 networking & telecommunications ,02 engineering and technology ,Support vector machine ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Perception ,0202 electrical engineering, electronic engineering, information engineering ,Spectrogram ,0305 other medical science ,Classifier (UML) ,media_common - Abstract
In this work we assesses the music genre classification using spectrograms taken from the original signal, percussive content signal, and harmonic content signal. The rationale behind this is that classifiers obtained from this three different representation of the signal may present some complementarity to each other. By this way, one can improve the recognition rates already obtained in previous works which has explored only the original signal content. LBP texture features were used to represent the spectrogram content, and the classification step was supported by SVM. The spectrogram images were zoned taking to account a perceptual scale, and a specific classifier was created for each zone, which has led us to combine classifiers outputs to get the final decision. The performance of our approach reaches the recognition rate about 88.56% which, to the best of our knowledge, is the best rate ever obtained on the LMD dataset using artist filter constraint.
- Published
- 2016
- Full Text
- View/download PDF
34. Wheezing Sound Separation Based on Informed Inter-Segment Non-Negative Matrix Partial Co-Factorization.
- Author
-
Cruz JT, Cañadas Quesada FJ, Reyes NR, Candeas PV, and Carabias Orti JJ
- Subjects
- Auscultation, Humans, Noise, Algorithms, Pulmonary Disease, Chronic Obstructive diagnosis, Respiratory Sounds diagnosis
- Abstract
Wheezing reveals important cues that can be useful in alerting about respiratory disorders, such as Chronic Obstructive Pulmonary Disease. Early detection of wheezing through auscultation will allow the physician to be aware of the existence of the respiratory disorder in its early stage, thus minimizing the damage the disorder can cause to the subject, especially in low-income and middle-income countries. The proposed method presents an extended version of Non-negative Matrix Partial Co-Factorization (NMPCF) that eliminates most of the acoustic interference caused by normal respiratory sounds while preserving the wheezing content needed by the physician to make a reliable diagnosis of the subject's airway status. This extension, called Informed Inter-Segment NMPCF (IIS-NMPCF), attempts to overcome the drawback of the conventional NMPCF that treats all segments of the spectrogram equally, adding greater importance for signal reconstruction of repetitive sound events to those segments where wheezing sounds have not been detected. Specifically, IIS-NMPCF is based on a bases sharing process in which inter-segment information, informed by a wheezing detection system, is incorporated into the factorization to reconstruct a more accurate modelling of normal respiratory sounds. Results demonstrate the significant improvement obtained in the wheezing sound quality by IIS-NMPCF compared to the conventional NMPCF for all the Signal-to-Noise Ratio (SNR) scenarios evaluated, specifically, an SDR, SIR and SAR improvement equals 5.8 dB, 4.9 dB and 7.5 dB evaluating a noisy scenario with SNR = -5 dB.
- Published
- 2020
- Full Text
- View/download PDF
35. Musical Sound Separation Based on Binary Time-Frequency Masking
- Author
-
Yipeng Li and DeLiang Wang
- Subjects
Masking (art) ,Acoustics and Ultrasonics ,Computer science ,Binary decision diagram ,Speech recognition ,Emphasis (telecommunications) ,Sound separation ,lcsh:QC221-246 ,Binary number ,02 engineering and technology ,Monaural ,01 natural sciences ,lcsh:QA75.5-76.95 ,Computer Science::Sound ,Computational auditory scene analysis ,Harmonics ,lcsh:Acoustics. Sound ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,lcsh:Electronic computers. Computer science ,Electrical and Electronic Engineering ,010301 acoustics - Abstract
The problem of overlapping harmonics is particularly acute in musical sound separation and has not been addressed adequately. We propose a monaural system based on binary time-frequency masking with an emphasis on robust decisions in time-frequency regions, where harmonics from different sources overlap. Our computational auditory scene analysis system exploits the observation that sounds from the same source tend to have similar spectral envelopes. Quantitative results show that utilizing spectral similarity helps binary decision making in overlapped time-frequency regions and significantly improves separation performance.
- Published
- 2009
- Full Text
- View/download PDF
36. Heart Sound Separation Using Fast Independent Component Analysis
- Author
-
Ikhlas Abdel Qader, Zichun Tong, and Fadi Abu-Amara
- Subjects
Medical services ,Phonocardiogram ,Computer science ,Patient information ,Speech recognition ,Feature extraction ,Sound separation ,computer.software_genre ,Blind signal separation ,Independent component analysis ,computer ,Expert system - Abstract
In this paper, we propose a method to separate the heart sound signals based on the assumption that heart sound signals originating from cardiac structures are statistically independent and have non-Gaussian distribution. This assumption cast the problem under blind signal separation where in this paper independent component analysis method is used. The objective of this work is intended to improve detection of cardiac abnormalities and generate a murmur expert system that aids physician in their diagnosis. Results indicate that four independent components out of six mixed signals are successfully separated. The separated signals can be utilized with other patient information for better diagnosis.
- Published
- 2015
- Full Text
- View/download PDF
37. Adaptive Fourier decomposition approach for lung-heart sound separation
- Author
-
Ze Wang, Feng Wan, and Janir Nuno da Cruz
- Subjects
Noise ,Adaptive fourier decomposition ,Computer science ,Speech recognition ,Sound separation ,Lung heart ,Spectrogram ,Interference (wave propagation) ,Algorithm ,Signal ,Least squares ,Hilbert–Huang transform - Abstract
Interference often occurs between the lung sound (LS) and the heart sound (HS). Due to the overlap in their frequency spectrums, it is difficult to separate them. This paper proposes a novel separation method based on the adaptive Fourier decomposition (AFD) to separate the HS and the LS with the minimum energy loss. This AFD-based separation method is validated on the real HS signal from the University of Michigan Heart Sound and Murmur Library as well as the real LS signal from the 3M repository. Simulation results indicate that the proposed method is better than other extraction methods based on the recursive least square (RLS), the standard empirical mode decomposition (EMD) and various extensions of the EMD including the ensemble EMD (EEMD), the multivariate EMD (M-EMD) and the noise assisted M-EMD (NAM-EMD).
- Published
- 2015
- Full Text
- View/download PDF
38. Improved sound separation using three loudspeakers
- Author
-
Jun Yang, Woon-Seng Gan, and See-Ee Tan
- Subjects
Crosstalk ,Sound recording and reproduction ,Computer science ,Robustness (computer science) ,Acoustics ,Sound separation ,General Physics and Astronomy ,Loudspeaker ,Robust control ,Directional sound ,Virtual reality - Abstract
In a virtual sound imaging system, crosstalk cancellation filters are used to create an effective sweet spot for 3D sound reproduction via multiple loudspeakers. A new 3-channel system is proposed to improve system performance on sound separation. Based on the robustness analysis of a crosstalk canceller, a modified-inverse filter technique is explored and demonstrated using two different examples of symmetric speaker positions. The simulation results indicate that the present system is robust over a wider bandwidth compared to a conventional 2-channel system.
- Published
- 2003
- Full Text
- View/download PDF
39. Implementation and evaluation of listenability-centered sound separation system
- Author
-
Yoshiaki Takeda, Hiroshi Mizoguchi, Tomoki Taguchi, Shigenori Inagaki, Miki Namatame, Ryohei Egusa, Etsuji Yamaguchi, Fusako Kusunoki, Masanori Sugimoto, and Takahiro Nakadai
- Subjects
Background noise ,Engineering ,Focus (computing) ,Microphone array ,Signal-to-noise ratio ,business.industry ,Speech recognition ,Separation (aeronautics) ,Sound separation ,Systems design ,business ,Human voice - Abstract
We developed a sound separation system that uses a microphone array to separate children's voices from background noise in general living environments. Our focus in the development of this system is on “a clear separation of the human voice” so that it will be easy for children to use. Evaluations conducted of sound separation performance in previous research have not always measured how easily listeners hear separate sounds. In this paper, we examine the validity of our system design by evaluating this factor from the perspectives of engineering and psychology.
- Published
- 2014
- Full Text
- View/download PDF
40. On the Performance and Robustness of Crosstalk Cancelation with Multiple Loudspeakers
- Author
-
Xing Yang, Zhong-Hua Fu, Yonghong Yan, Shuichi Sakamoto, Risheng Xia, Yôiti Suzuki, and Junfeng Li
- Subjects
Crosstalk ,Signal processing ,Robustness (computer science) ,Computer science ,Acoustics ,Crosstalk measurement ,Sound separation ,Active listening ,Loudspeaker - Abstract
Acoustic crosstalk cancelation is a signal processing technique that uses two or more loudspeakers reconstruct the acoustic pressures at the listener's ears. The crosstalk arises because each speaker sends sound to the ipsilateral ear, as well as undesired sound to the contra lateral ear. Taking crosstalk cancelation technique can deliver desired signals exactly at the listener's ears that would result from the natural listening situation to be simulated and using crosstalk cancelation technique people can enjoy spatial audio without wearing a headphone. Classical crosstalk cancelation systems (CCS) employ only two loudspeakers, and their performance is usually unsatisfactory in practice. While the idea of using more loudspeakers has been investigated, we design a new 3-channel system to improve the performance on sound separation and robustness to head. The simulation results indicate that multichannel loudspeaker can improve the performance and the robustness of three-dimensional (3D) audio system.
- Published
- 2014
- Full Text
- View/download PDF
41. Nasalance in the Speech of Children With Normal Hearing and Children With Hearing Loss
- Author
-
Hendarto Hendarmin, Fuad Mahfuzh, and Samuel G. Fletcher
- Subjects
Linguistics and Language ,medicine.medical_specialty ,business.industry ,Hearing loss ,Some limitation ,Sound separation ,Audiology ,Normal group ,Speech and Hearing ,medicine.anatomical_structure ,Otorhinolaryngology ,otorhinolaryngologic diseases ,Developmental and Educational Psychology ,Sound emission ,medicine ,medicine.symptom ,Nasalance ,business ,Standard position ,Nose - Abstract
Three new tests were introduced in this study to compare nasal resonance and speaking time of 30 children 8 to 11 years old who were profoundly deaf with that of 30 children with normal hearing in a matched control group. The ANS-P plane was introduced to position the palatometer sound separation plate. This enabled the plate to be brought easily and repeatedly into the desired standard position despite widely varying facial contours of the subjects. The findings from this study showed the group with hearing loss had significantly more nasalance than did the normal group when nasal consonants were absent and significantly less when an utterance was loaded heavily with nasal consonants. These differences were interpreted as evidence of some limitation in the ability of the children with hearing loss to monitor and control nasal versus oral sound emission. Speaking time was longer in the group of children with hearing loss but was not related to the nasalance score.
- Published
- 1999
- Full Text
- View/download PDF
42. Effect of sound separation using pulse-compression on accuracy of localization with acoustic beacons
- Author
-
Satoki Ogiso, Naoto Wakatsuki, Takuji Kawagishi, Koichi Mizutani, and Keiichi Zempo
- Subjects
Physics ,Pulse compression ,Acoustics ,Sound separation ,Beacon - Published
- 2015
- Full Text
- View/download PDF
43. Investigation on Sound Source Separation using Depth Image Sensor
- Author
-
Takahiro Kigawa, Hiroshi Takemura, Hiroshi Mizoguchi, and Taisuke Sakano
- Subjects
Sound source separation ,Acoustics ,Sound separation ,Image sensor ,Geology - Published
- 2016
- Full Text
- View/download PDF
44. Fundamentals of Computational Auditory Scene Analysis
- Author
-
Guy J. Brown and DeLiang Wang
- Subjects
Auditory scene analysis ,Computational auditory scene analysis ,Computer science ,Speech recognition ,Sound separation - Abstract
This chapter contains sections titled: Human Auditory Scene Analysis Computational Auditory Scene Analysis (CASA) Basics of CASA Systems CASA Evaluation Other Sound Separation Approaches A Brief History of CASA (Prior to 2000) Conclusions This chapter contains sections titled: Acknowledgments References ]]>
- Published
- 2011
- Full Text
- View/download PDF
45. Sound separation in noise and competing voice with normal-hearing subjects
- Author
-
Peter J. Blamey, Pei-Chen Liu, and Christopher J. James
- Subjects
Speech and Hearing ,Noise ,medicine.medical_specialty ,Otorhinolaryngology ,Microphone ,Noise reduction ,Sound separation ,medicine ,Monaural ,Audiology ,Psychology - Abstract
national Journal of Audiology 41: 100–112. Ricketts T, Henry P, Gnewikow. (2002) Full time directional versus user selectable microphone modes in hearing aids. Ear and Hearing 24(5): 424–439. Schum D, Pogash R (2002) Initial clinical verification of a DSP instrument. Hearing Review 9(9): 48–51. Thompson SC (1999) Dual microphones or directional-plus-omni: which is best? High Performance Hearing Solutions 3: 31–35. Wouters J, Vanden Berghe J (2001) Speech recognition in noise for cochlear implantees with a twomicrophone monaural adaptive noise reduction system. Ear and Hearing 22(5): 420–430.
- Published
- 2008
46. Generalized blind delayed source separation model for online non-invasive twin-fetal sound separation: a phantom study
- Author
-
Roland Priemer and Vivek Nigam
- Subjects
Sound separation ,Medicine (miscellaneous) ,Health Informatics ,Blind signal separation ,Imaging phantom ,Fetal Heart ,Health Information Management ,Pregnancy ,Source separation ,Medicine ,Humans ,Diagnosis, Computer-Assisted ,Twin Pregnancy ,Ultrasonography ,Fetus ,Phonocardiogram ,business.industry ,Non invasive ,Phonocardiography ,Signal Processing, Computer-Assisted ,Heart Sounds ,Anesthesia ,embryonic structures ,Female ,business ,Algorithms ,Information Systems ,Biomedical engineering ,Heart Auscultation - Abstract
The fetal phonocardiogram, which is the acoustic recording of mechanical activity of the fetal heart, facilitates the measurement of the instantaneous fetal heart rate, beat-to-beat differences and duration of systolic and diastolic phases. These measures are sensitive indicators of cardiac function, reflecting fetal well-being. This paper provides an algorithm to non-invasively estimate the phonocardiogram of an individual fetus in a multiple fetus pregnancy. A mixture of fetal phonocardiograms is modeled by a generalized pure delayed mixing model. Mutual independence of fetal phonocardiograms is assumed to apply blind source separation based techniques to extract the fetal phonocardiograms from their mixtures. The performance of the algorithm is verified through simulation results and on experimental data obtained from a phantom that is used to simulate a twin pregnancy.
- Published
- 2008
47. Extended Nonnegative Tensor Factorisation Models for Musical Sound Source Separation
- Author
-
Matt Cranitch, Derry Fitzgerald, Eugene Coyle, and Enterprise Ireland, IMMAS Project
- Subjects
General Computer Science ,Article Subject ,Computer science ,General Mathematics ,Speech recognition ,Sound source separation ,Basis function ,Extended Nonnegative Tensor Factorisation Models ,Musical Sound Source Separation ,lcsh:Computer applications to medicine. Medical informatics ,lcsh:RC321-571 ,Factorization ,music sound source separation ,Tensor (intrinsic definition) ,music ,Nonnegative tensor ,lcsh:Neurosciences. Biological psychiatry. Neuropsychiatry ,Other Engineering ,General Neuroscience ,pitched musical instruments ,Additive synthesis ,General Medicine ,Electrical and Computer Engineering ,extended nonnegative tensor factorisation models ,Computer Science::Sound ,Harmonic ,lcsh:R858-859.7 ,sound separation ,Spectrogram ,Algorithm ,Research Article - Abstract
Recently, shift-invariant tensor factorisation algorithms have been proposed for the purposes of sound source separation of pitched musical instruments. However, in practice, existing algorithms require the use of log-frequency spectrograms to allow shift invariance in frequency which causes problems when attempting to resynthesise the separated sources. Further, it is difficult to impose harmonicity constraints on the recovered basis functions. This paper proposes a new additive synthesis-based approach which allows the use of linear-frequency spectrograms as well as imposing strict harmonic constraints, resulting in an improved model. Further, these additional constraints allow the addition of a source filter model to the factorisation framework, and an extended model which is capable of separating mixtures of pitched and percussive instruments simultaneously.
- Published
- 2008
48. Estimation of the Ideal Binary Mask using Directional Systems
- Author
-
Boldt, Jesper, Kjems, Ulrik, Pedersen, Michael Syskind, Lunner, Thomas, and Wang, DeLiang
- Subjects
Directional systems ,Mathematics::Commutative Algebra ,Computer Science::Sound ,Speech Intelligibility ,Sound separation ,Time-Frequency Masking - Abstract
The ideal binary mask is often seen as a goal for time-frequency masking algorithms trying to increase speech intelligibility, but the required availability of the unmixed signals makes it difficult to calculate the ideal binary mask in any real-life applications. In this paper we derive the theory and the requirements to enable calculations of the ideal binary mask using a directional system without the availability of the unmixed signals. The proposed method has a low complexity and is verified using computer simulation in both ideal and non-ideal setups showing promising results.
- Published
- 2008
49. Probabilistic Decompositions of Spectra for Sound Separation
- Author
-
Paris Smaragdis
- Subjects
Computer science ,Sound separation ,Mathematical analysis ,Source separation ,Probabilistic logic ,Independent component analysis ,Spectral line ,Non-negative matrix factorization - Published
- 2007
- Full Text
- View/download PDF
50. Separation of Speech by Computational Auditory Scene Analysis
- Author
-
DeLiang Wang and Guy J. Brown
- Subjects
Auditory scene analysis ,Computer science ,Computational auditory scene analysis ,Perception ,media_common.quotation_subject ,Speech recognition ,Sound separation ,Cocktail party ,Functional approach ,Field (computer science) ,Term (time) ,media_common - Abstract
The term auditory scene analysis (ASA) refers to the ability of human listeners to form perceptual representations of the constituent sources in an acoustic mixture, as in the well-known ‘cocktail party’ effect. Accordingly, computational auditory scene analysis (CASA) is the field of study which attempts to replicate ASA in machines. Some CASA systems are closely modelled on the known stages of auditory processing, whereas others adopt a more functional approach. However, all are broadly based on the principles underlying the perception and organization of sound by human listeners, and in this respect they differ from ICA and other approaches to sound separation. In this chapter, we review the principles underlying ASA and show how they can be implemented in CASA systems. We also consider the link between CASA and automatic speech recognition, and draw distinctions between the CASA and ICA approaches.
- Published
- 2005
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.