Author: "Bjorn W. Schuller" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Bjorn W. Schuller"' showing total 88 results

Start Over Author "Bjorn W. Schuller"

88 results on '"Bjorn W. Schuller"'

1. Domain Adapting Deep Reinforcement Learning for Real-World Speech Emotion Recognition

Author: Thejan Rajapakshe, Rajib Rana, Sara Khalifa, and Bjorn W. Schuller
Subjects: Reinforcement learning, speech emotion recognition, domain adaptation, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Speech-emotion recognition (SER) enables computers to engage with people in an emotionally intelligent way. The inability to adapt an existing model to a new domain is one of the significant limitations of SER methods. To overcome this challenge, domain adaptation techniques have been developed to transfer the knowledge learnt by a model across domains. Although existing domain adaptation techniques have improved the performance of SER models across domains, there is a need to improve their ability to adapt to real-world situations where models can self-tune while deployed. This paper presents a deep reinforcement learning-based strategy (RL-DA) for adapting a pre-trained SER model to a real-world setting by interacting with the environment and collecting continuous feedback. The proposed RL-DA technique is evaluated on SER tasks, including cross-corpus and cross-language domain adaptation scenarios. Our evaluation results show that RL-DA achieves significant improvements of 11% and 14% in testing accuracy over a fully supervised baseline for cross-corpus and cross-language scenarios, respectively, in the real-world setting. This technique also outperforms the baseline model’s performance for both speaker independent and speaker dependent SER tasks.
Published: 2024
Full Text: View/download PDF

2. emoDARTS: Joint Optimization of CNN and Sequential Neural Network Architectures for Superior Speech Emotion Recognition

Author: Thejan Rajapakshe, Rajib Rana, Sara Khalifa, Berrak Sisman, Bjorn W. Schuller, and Carlos Busso
Subjects: Speech emotion recognition, neural architecture search, deep learning, DARTS, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Speech Emotion Recognition (SER) is crucial for enabling computers to understand the emotions conveyed in human communication. With recent advancements in Deep Learning (DL), the performance of SER models has significantly improved. However, designing an optimal DL architecture requires specialised knowledge and experimental assessments. Fortunately, Neural Architecture Search (NAS) provides a potential solution for automatically determining the best DL model. The Differentiable Architecture Search (DARTS) is a particularly efficient method for discovering optimal models. This study presents emoDARTS, a DARTS-optimised joint CNN and Sequential Neural Network (SeqNN: LSTM, RNN) architecture that enhances SER performance. The literature supports the selection of CNN and LSTM coupling to improve performance. While DARTS has previously been used to choose CNN and LSTM operations independently, our technique adds a novel mechanism for selecting CNN and SeqNN operations in conjunction using DARTS. Unlike earlier work, we do not impose limits on the layer order of the CNN. Instead, we let DARTS choose the best layer order inside the DARTS cell. We demonstrate that emoDARTS outperforms conventionally designed CNN-LSTM models and surpasses the best-reported SER results achieved through DARTS on CNN-LSTM by evaluating our approach on the IEMOCAP, MSP-IMPROV, and MSP-Podcast datasets.
Published: 2024
Full Text: View/download PDF

3. A text-based conversational agent for asthma support: Mixed-methods feasibility study

Author: Darren Cook, Dorian Peters, Laura Moradbakhti, Ting Su, Marco Da Re, Bjorn W. Schuller, Jennifer Quint, Ernie Wong, and Rafael A. Calvo
Subjects: Computer applications to medicine. Medical informatics, R858-859.7
Abstract: Objective Millions of people in the UK have asthma, yet 70% do not access basic care, leading to the largest number of asthma-related deaths in Europe. Chatbots may extend the reach of asthma support and provide a bridge to traditional healthcare. This study evaluates ‘Brisa’, a chatbot designed to improve asthma patients’ self-assessment and self-management. Methods We recruited 150 adults with an asthma diagnosis to test our chatbot. Participants were recruited over three waves through social media and a research recruitment platform. Eligible participants had access to ‘Brisa’ via a WhatsApp or website version for 28 days and completed entry and exit questionnaires to evaluate user experience and asthma control. Weekly symptom tracking, user interaction metrics, satisfaction measures, and qualitative feedback were utilised to evaluate the chatbot's usability and potential effectiveness, focusing on changes in asthma control and self-reported behavioural improvements. Results 74% of participants engaged with ‘Brisa’ at least once. High task completion rates were observed: asthma attack risk assessment (86%), voice recording submission (83%) and asthma control tracking (95.5%). Post use, an 8% improvement in asthma control was reported. User satisfaction surveys indicated positive feedback on helpfulness (80%), privacy (87%), trustworthiness (80%) and functionality (84%) but highlighted a need for improved conversational depth and personalisation. Conclusions The study indicates that chatbots are effective for asthma support, demonstrated by the high usage of features like risk assessment and control tracking, as well as a statistically significant improvement in asthma control. However, lower satisfaction in conversational flexibility highlights rising expectations for chatbot fluency, influenced by advanced models like ChatGPT. Future health-focused chatbots must balance conversational capability with accuracy and safety to maintain engagement and effectiveness.
Published: 2024
Full Text: View/download PDF

4. Toward Detecting and Addressing Corner Cases in Deep Learning Based Medical Image Segmentation

Author: Srividya Tirunellai Rajamani, Kumar Rajamani, Ashwin Venkateshvaran, Andreas Triantafyllopoulos, Alexander Kathan, and Bjorn W. Schuller
Subjects: Corner-case handling, medical image segmentation, research to clinical practice, cardiac MRI, chest X-ray, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Translating machine learning research into clinical practice has several challenges. In this paper, we identify some critical issues in translating research to clinical practice in the context of medical image segmentation and propose strategies to systematically address these challenges. Specifically, we focus on cases where the model yields erroneous segmentation, which we define as corner cases. One of the standard metrics used for reporting the performance of medical image segmentation algorithms is the average Dice score across all patients. We have discovered that this aggregate reporting has the inherent drawback that the corner cases where the algorithm or model has erroneous performance or very low metrics go unnoticed. Due to this reporting, models that report superior performance could end up producing completely erroneous results, or even anatomically impossible results in a few challenging cases, albeit without being noticed.We have demonstrated how corner cases go unnoticed using the Magnetic Resonance (MR) cardiac image segmentation task of the Automated Cardiac Diagnosis Challenge (ACDC) challenge. To counter this drawback, we propose a framework that helps to identify and report corner cases. Further, we propose a novel balanced checkpointing scheme capable of finding a solution that has superior performance even on these corner cases. Our proposed scheme leads to an improvement of 44.6% for LV, 46.1% for RV and 38.1% for the Myocardium on our identified corner case in the ACDC segmentation challenge. Further, we establish the generalisability of our proposed framework by also demonstrating its applicability in the context of chest X-ray lung segmentation. This framework has broader applications across multiple deep learning tasks even beyond medical image segmentation.
Published: 2023
Full Text: View/download PDF

5. Robot-Based Intervention for Children With Autism Spectrum Disorder: A Systematic Literature Review

Author: Katrin D. Bartl-Pokorny, Malgorzata Pykala, Pinar Uluer, Duygun Erol Barkana, Alice Baird, Hatice Kose, Tatjana Zorcec, Ben Robins, Bjorn W. Schuller, and Agnieszka Landowska
Subjects: Autism spectrum disorder, child-robot interaction, emotion expression, emotion recognition, intervention, socio-communicative abilities, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Children with autism spectrum disorder (ASD) have deficits in the socio-communicative domain and frequently face severe difficulties in the recognition and expression of emotions. Existing literature suggested that children with ASD benefit from robot-based interventions. However, studies varied considerably in participant characteristics, applied robots, and trained skills. Here, we reviewed robot-based interventions targeting emotion-related skills for children with ASD following the guidelines of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement. We systematically searched for all relevant articles published in English language until May 2021, using the databases Scopus, Web of Science, and PubMed. From a total of 609 identified papers, 60 publications including 50 original articles and 10 non-empirical articles including review articles and theoretical articles were eligible for the synthesis. A total of 892 participants were included in the robot-based intervention studies; 570 of them were children with ASD. Nao and ZECA were the most frequently used robots; recognition of basic emotions and getting into interaction were the most frequently trained skills, while happiness, sadness, fear, and anger were the most frequently trained emotions. The studies reported a wide range of challenges with respect to robot-based intervention, ranging from limitations for certain ASD subgroups and security aspects of the robots to efforts regarding the automatic recognition of the children’s emotional state by the robotic systems. Finally, we summarised and discussed recommendations regarding the application of robot-based interventions for children with ASD.
Published: 2021
Full Text: View/download PDF

6. High-Fidelity Audio Generation and Representation Learning With Guided Adversarial Autoencoder

Author: Kazi Nazmul Haque, Rajib Rana, and Bjorn W. Schuller
Subjects: Audio generation, representation learning, generative adversarial neural network, guided generative adversarial autoencoder, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Generating high-fidelity conditional audio samples and learning representation from unlabelled audio data are two challenging problems in machine learning research. Recent advances in the Generative Adversarial Neural Networks (GAN) architectures show great promise in addressing these challenges. To learn powerful representation using GAN architecture, it requires superior sample generation quality, which requires an enormous amount of labelled data. In this paper, we address this issue by proposing Guided Adversarial Autoencoder (GAAE), which can generate superior conditional audio samples from unlabelled audio data using a small percentage of labelled data as guidance. Representation learned from unlabelled data without any supervision does not guarantee its' usability for any downstream task. On the other hand, during the representation learning, if the model is highly biased towards the downstream task, it losses its generalisation capability. This makes the learned representation hardly useful for any other tasks that are not related to that downstream task. The proposed GAAE model also address these issues. Using this superior conditional generation, GAAE can learn representation specific to the downstream task. Furthermore, GAAE learns another type of representation capturing the general attributes of the data, which is independent of the downstream task at hand. Experimental results involving the S09 and the NSynth dataset attest the superior performance of GAAE compared to the state-of-the-art alternatives.
Published: 2020
Full Text: View/download PDF

7. Intelligent Music Intervention for Mental Disorders: Insights and Perspectives

Author: Kun Qian, Bjorn W. Schuller, Xiaohong Guan, and Bin Hu
Subjects: Human-Computer Interaction, Modeling and Simulation, Social Sciences (miscellaneous)
Published: 2023
Full Text: View/download PDF

8. Digital Mental Health—Breaking a Lance for Prevention

Author: Bjorn W. Schuller, Johanna Lochner, Kun Qian, and Bin Hu
Subjects: Human-Computer Interaction, Modeling and Simulation, Social Sciences (miscellaneous)
Published: 2022
Full Text: View/download PDF

9. Psychological Field Versus Physiological Field: From Qualitative Analysis to Quantitative Modeling of the Mental Status

Author: Bin Hu, Kun Qian, Qunxi Dong, Yuejia Luo, Yoshiharu Yamamoto, and Bjorn W. Schuller
Subjects: Human-Computer Interaction, Modeling and Simulation, Social Sciences (miscellaneous)
Published: 2022
Full Text: View/download PDF

10. Rethinking Auditory Affective Descriptors Through Zero-Shot Emotion Recognition in Speech

Author: Xinzhou Xu, Jun Deng, Zixing Zhang, Xijian Fan, Li Zhao, Laurence Devillers, and Bjorn W. Schuller
Subjects: Human-Computer Interaction, Modeling and Simulation, ddc:004, Social Sciences (miscellaneous)
Published: 2022
Full Text: View/download PDF

11. Guest Editorial: Special Issue on Affective Speech and Language Synthesis, Generation, and Conversion

Author: Shahin Amiriparian, Bjorn W. Schuller, Nabiha Asghar, Heiga Zen, and Felix Burkhardt
Subjects: Human-Computer Interaction, ddc:004, Software
Abstract: The papers in this special section focus on affective speech and language synthesis, generation, and conversion. As an inseparable and crucial part of spoken language, emotions play a substantial role in human-human and human-technology conversation. They convey information about a person’s needs, how one feels about the objectives of a conversation, the trustworthiness of one’s verbal communication, and more. Accordingly, substantial efforts have been made to generate affective text and speech for conversational AI, artificial storytelling, and machine translation. Similarly, there is a push for converting the affect in text and speech, ideally, in real-time and fully preserving intelligibility, e. g., to hide one’s emotion, for creative applications and in entertainment, or even to augment training data for affect analyzing AI.
Published: 2023
Full Text: View/download PDF

12. An overview of affective speech synthesis and conversion in the deep learning era

Author: Andreas Triantafyllopoulos, Bjorn W. Schuller, Gokce Iymen, Metin Sezgin, Xiangheng He, Zijiang Yang, Panagiotis Tzirakis, Shuo Liu, Silvan Mertes, Elisabeth Andre, Ruibo Fu, and Jianhua Tao
Subjects: FOS: Computer and information sciences, Sound (cs.SD), Computer Science - Machine Learning, Electrical and Electronic Engineering, ddc:004, Computer Science - Sound, Machine Learning (cs.LG)
Abstract: Speech is the fundamental mode of human communication, and its synthesis has long been a core priority in human-computer interaction research. In recent years, machines have managed to master the art of generating speech that is understandable by humans. But the linguistic content of an utterance encompasses only a part of its meaning. Affect, or expressivity, has the capacity to turn speech into a medium capable of conveying intimate thoughts, feelings, and emotions -- aspects that are essential for engaging and naturalistic interpersonal communication. While the goal of imparting expressivity to synthesised utterances has so far remained elusive, following recent advances in text-to-speech synthesis, a paradigm shift is well under way in the fields of affective speech synthesis and conversion as well. Deep learning, as the technology which underlies most of the recent advances in artificial intelligence, is spearheading these efforts. In the present overview, we outline ongoing trends and summarise state-of-the-art approaches in an attempt to provide a comprehensive overview of this exciting field., Submitted to the Proceedings of IEEE
Published: 2023

13. Investigating Individual- and Group-Level Model Adaptation for Self-Reported Runner Exertion Prediction from Biomechanics

Author: Alexander Kathan, Andreas Triantafyllopoulos, Shahin Amiriparian, Alexander Gebhard, Sandra Ottl, Maurice Gerczuk, Mirko Jaumann, David Hildner, Valerie Dieter, Patrick Schneeweiss, Inka Rosel, Inga Krauss, and Bjorn W. Schuller
Published: 2022
Full Text: View/download PDF

14. Towards Heart Rate Categorisation from Speech in Outdoor Running Conditions

Author: Alexander Gebhard, Shahin Amiriparian, Andreas Triantafyllopoulos, Alexander Kathan, Maurice Gerczuk, Sandra Ottl, Valerie Dieter, Mirko Jaumann, David Hildner, Patrick Schneeweiss, Inka Rosel, Inga Krauss, and Bjorn W. Schuller
Published: 2022
Full Text: View/download PDF

15. Combining a parallel 2D CNN with a self-attention Dilated Residual Network for CTC-based discrete speech emotion recognition

Author: Nicholas Cummins, Bjorn W. Schuller, Haishuai Wang, Qifei Li, Jianhua Tao, Zixing Zhang, and Ziping Zhao
Subjects: Male, 0209 industrial biotechnology, Computer science, Cognitive Neuroscience, Speech recognition, Emotions, 02 engineering and technology, Residual, Motion capture, Convolutional neural network, Field (computer science), 020901 industrial engineering & automation, Recurrent neural network, Connectionism, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, Humans, Speech, Spectrogram, Female, 020201 artificial intelligence & image processing, Neural Networks, Computer, Child, Block (data storage)
Abstract: A challenging issue in the field of the automatic recognition of emotion from speech is the efficient modelling of long temporal contexts. Moreover, when incorporating long-term temporal dependencies between features, recurrent neural network (RNN) architectures are typically employed by default. In this work, we aim to present an efficient deep neural network architecture incorporating Connectionist Temporal Classification (CTC) loss for discrete speech emotion recognition (SER). Moreover, we also demonstrate the existence of further opportunities to improve SER performance by exploiting the properties of convolutional neural networks (CNNs) when modelling contextual information. Our proposed model uses parallel convolutional layers (PCN) integrated with Squeeze-and-Excitation Network (SEnet), a system herein denoted as PCNSE, to extract relationships from 3D spectrograms across timesteps and frequencies; here, we use the log-Mel spectrogram with deltas and delta-deltas as input. In addition, a self-attention Residual Dilated Network (SADRN) with CTC is employed as a classification block for SER. To the best of the authors' knowledge, this is the first time that such a hybrid architecture has been employed for discrete SER. We further demonstrate the effectiveness of our proposed approach on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) and FAU-Aibo Emotion corpus (FAU-AEC). Our experimental results reveal that the proposed method is well-suited to the task of discrete SER, achieving a weighted accuracy (WA) of 73.1% and an unweighted accuracy (UA) of 66.3% on IEMOCAP, as well as a UA of 41.1% on the FAU-AEC dataset.
Published: 2021
Full Text: View/download PDF

16. Novel Insights on Induced Sparsity in Multi-Time Attention Networks

Author: Srividya Tirunellai Rajamani, Kumar Rajamani, Alexander Kathan, and Bjorn W. Schuller
Subjects: Electronic Health Records, Humans, Neural Networks, Computer, Algorithms
Abstract: Current deep learning approaches for dealing with sparse irregularly sampled time-series data do not exploit the extent of sparsity of the input data. Our work is inspired by the sparse and irregularly sampled nature of physiological time series data in electronic health records. We explore the effect of inducing varying degrees of sparsity on the predictive performance of Multi-Time Attention Networks (mTAN) [1]. Our methodology is to induce sparsity by first sub-sampling the time-series before feeding it to the mTAN network. We conduct empirical experiments with sub-sampling ranging from 10 to 90 %. We investigate the performance of our methodology on the Human Activity dataset and Physionet 2012 mortality prediction task. Our results demonstrate that our proposed time-point sub-sampling coupled with mTAN improves the performance by 2 % on the Human Activity dataset with 80 % lesser time-points for training. On the Physionet dataset, our approach achieves comparable performance as baseline with 30 % lesser time-points. Our experiments reveal that time-series data could be further coarsely acquired when used in tandem with state-of-the-art networks capable of handling sparse data (mTAN). This could be of immense help for various applications where data acquisition and labeling is a significant challenge.
Published: 2022

17. Novel no-reference multi-dimensional perceptual similarity metric

Author: Srividya Tirunellai Rajamani, Kumar Rajamani, Priya Rani, Rashmita Barick, Ramasubramanya M. S, Sridevi V Aithal, Rajkumar ElagiriRamalingam, Sahana D Gowda, and Bjorn W. Schuller
Subjects: Algorithms
Abstract: Enormous progress has been made in the domain of determining image quality. However, even the recently proposed deep learning based perceptual quality metrics and the classical structural similarity metric (SSIM) are not designed to operate in the absence of a good quality reference image. Many of the image acquisition processes, especially in medical imaging, would immensely benefit from a metric that can indicate if the quality of an image is improving or worsening based on adaptation of the acquisition parameters. In this work, we propose a novel multi-dimensional no-reference perceptual similarity metric that can compute the quality of a given image without a reference pristine quality image by combining no-reference image quality metric (PIQUE) and perceptual similarity. The dimensions of quality currently explored are in the axis of noise, blur, and contrast. Our experiments demonstrate that our proposed novel no-reference perceptual similarity metric correlates very well with the quality of an image in a multi-dimensional sense.
Published: 2022

18. Heart Sound Classification based on Residual Shrinkage Networks

Author: Lixian Zhu, Kun Qian, Zhihua Wang, Bin Hu, Yoshiharu Yamamoto, and Bjorn W. Schuller
Subjects: Heart Sounds, Hearing, Disease Progression, Humans, Neural Networks, Computer, Algorithms
Abstract: Heart sound classification is one of the non-invasive methods for early detection of the cardiovascular diseases (CVDs), the leading cause for deaths. In recent years, Computer Audition (CA) technology has become increasingly sophisticated, auxiliary diagnosis technology of heart disease based on CA has become a popular research area. This paper proposes a deep Convolutional Neural Network (CNN) model for heart sound classification. To improve the classification accuracy of heart sound, we design a classification algorithm combining classical Residual Network (ResNet) and Long Short-Term Memory (LSTM). The model performance is evaluated in the PhysioNet/CinC Challenges 2016 datasets using a 2D time-frequency feature. We extract the four features from different filter-bank coefficients, including Filterbank (Fbank), Mel-Frequency Spectral Coefficients (MFSCs), and Mel-Frequency Cepstral Coefficients (MFCCs). The experimental results show the MFSCs feature outperforms the other features in the proposed CNN model. The proposed model performs well on the test set, particularly the F1 score of 84.3 % - the accuracy of 84.4 %, the sensitivity of 84.3 %, and the specificity of 85.6 %. Compared with the classical ResNet model, an accuracy of 4.9 % improvement is observed in the proposed model.
Published: 2022

19. Insights on Modelling Physiological, Appraisal, and Affective Indicators of Stress using Audio Features

Author: Andreas Triantafyllopoulos, Sandra Zankert, Alice Baird, Julian Konzok, Brigitte M. Kudielka, and Bjorn W. Schuller
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Sound (cs.SD), Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Speech, Computer Science - Sound, Problem Solving, Stress, Psychological, Electrical Engineering and Systems Science - Audio and Speech Processing, Machine Learning (cs.LG)
Abstract: Stress is a major threat to well-being that manifests in a variety of physiological and mental symptoms. Utilising speech samples collected while the subject is undergoing an induced stress episode has recently shown promising results for the automatic characterisation of individual stress responses. In this work, we introduce new findings that shed light onto whether speech signals are suited to model physiological biomarkers, as obtained via cortisol measurements, or self-assessed appraisal and affect measurements. Our results show that different indicators impact acoustic features in a diverse way, but that their complimentary information can nevertheless be effectively harnessed by a multi-tasking architecture to improve prediction performance for all of them., Comment: Paper accepted for publication at IEEE EMBC 2022. Rights remain with IEEE
Published: 2022

20. EEG Emotion Recognition Based on Self-attention Dynamic Graph Neural Networks

Author: Chao, Li, Yong, Sheng, Haishuai, Wang, Mingyue, Niu, Peiguang, Jing, Ziping, Zhao, and Bjorn W, Schuller
Subjects: Brain-Computer Interfaces, Emotions, Attention, Electroencephalography, Neural Networks, Computer
Abstract: In recent years, due to the fundamental role played by the central nervous system in emotion expression, electroencephalogram (EEG) signals have emerged as the most robust signals for use in emotion recognition and inference. Current emotion recognition methods mainly employ deep learning technology to learn the spatial or temporal representation of each channel, then obtain complementary information from different EEG channels by adopting a multi-modal fusion strategy. However, emotional expression is usually accompanied by the dynamic spatio-temporal evolution of functional connections in the brain. Therefore, the effective learning of more robust long-term dynamic representations for the brain's functional connection networks is a key to improving the EEG-based emotion recognition system. To address these issues, we propose a brain network representation learning method that employs self-attention dynamic graph neural networks to obtain the spatial structure information and temporal evolution characteristics of brain networks. Experimental results on the AMIGOS dataset show that the proposed method is superior to the state-of-the-art methods.
Published: 2022
Full Text: View/download PDF

21. A Federated Learning Paradigm for Heart Sound Classification

Author: Wanyong, Qiu, Kun, Qian, Zhihua, Wang, Yi, Chang, Zhihao, Bao, Bin, Hu, Bjorn W, Schuller, and Yoshiharu, Yamamoto
Subjects: Heart Sounds, Auscultation, Privacy
Abstract: Cardiovascular diseases (CVDs) have been ranked as the leading cause for deaths. The early diagnosis of CVDs is a crucial task in the medical practice. A plethora of efforts were given to the automated auscultation of heart sound, which leverages the power of computer audition to develop a cheap, non-invasive method that can be used at any time and anywhere for measuring the status of the heart. Nevertheless, previous works ignore an important factor, namely, the privacy of the user data. On the one hand, learnt models are always hungry for bigger data. On the other hand, it can be difficult to protect personal private information when collecting such large amount of data. In this dilemma, we propose a federated learning (FL) framework for the heart sound classification task. To the best of our knowledge, this is the first time to introduce FL to this field. We conducted multiple experiments, analysed the impact of data distribution across collaborative institutions on model quality and learning patterns, and verified the feasibility and effectiveness of FL based on real data. Non- independent identically distributed (Non-IID) data and model quality can be effectively improved by adding a strategy of globally sharing data.
Published: 2022
Full Text: View/download PDF

22. CoughLIME: Sonified Explanations for the Predictions of COVID-19 Cough Classifiers

Author: Anne, Wullenweber, Alican, Akman, and Bjorn W, Schuller
Subjects: Cough, COVID-19, Humans, Speech, Pandemics
Abstract: Since the emergence of the COVID-19 pandemic, various methods to detect the illness from cough and speech audio data have been proposed. While many of them deliver promising results, they lack transparency in the form of expla-nations which is crucial for establishing trust in the classifiers. We propose CoughLIME which extends LIME to explanations for audio data, specifically tailored towards cough data. We show that CoughLIME is capable of generating faithful sonified explanations for COVID-19 detection. To quantify the performance of the explanations generated for the CIdeR model, we adopt pixel flipping to audio and introduce a novel metric to assess the performance of the XAI classifier. CoughLIME achieves a ΔAUC of 19.48 % generating explanations for CIdeR's predictions.
Published: 2022
Full Text: View/download PDF

23. Triplet Loss-Based Models for COVID-19 Detection from Vocal Sounds

Author: Adria, Mallol-Ragolta, Florian B, Pokorny, Katrin D, Bartl-Pokorny, Anastasia, Semertzidou, and Bjorn W, Schuller
Subjects: Voice, COVID-19, Humans, Speech, Acoustics, Neural Networks, Computer
Abstract: This work focuses on the automatic detection of COVID-19 from the analysis of vocal sounds, including sustained vowels, coughs, and speech while reading a short text. Specifically, we use the Mel-spectrogram representations of these acoustic signals to train neural network-based models for the task at hand. The extraction of deep learnt representations from the Mel-spectrograms is performed with Convolutional Neural Networks (CNNs). In an attempt to guide the training of the embedded representations towards more separable and robust inter-class representations, we explore the use of a triplet loss function. The experiments performed are conducted using the Your Voice Counts dataset, a new dataset containing German speakers collected using smartphones. The results obtained support the suitability of using triplet loss-based models to detect COVID-19 from vocal sounds. The best Unweighted Average Recall (UAR) of 66.5 % is obtained using a triplet loss-based model exploiting vocal sounds recorded while reading.
Published: 2022
Full Text: View/download PDF

24. An Overview of the FIRST ICASSP Special Session on Computer Audition for Healthcare

Author: Kun Qian, Tanja Schultz, and Bjorn W. Schuller
Published: 2022
Full Text: View/download PDF

25. A Glance-and-Gaze Network for Respiratory Sound Classification

Author: Shuai Yu, Yiwei Ding, Kun Qian, Bin Hu, Wei Li, and Bjorn W Schuller
Published: 2022
Full Text: View/download PDF

26. Convoluational Transformer With Adaptive Position Embedding For Covid-19 Detection From Cough Sounds

Author: Tianhao Yan, Hao Meng, Shuo Liu, Emilia Parada-Cabaleiro, Zhao Ren, and Bjorn W. Schuller
Published: 2022
Full Text: View/download PDF

27. Depression Diagnosis and Forecast based on Mobile Phone Sensor Data

Author: Xiangheng, He, Andreas, Triantafyllopoulos, Alexander, Kathan, Manuel, Milling, Tianhao, Yan, Srividya Tirunellai, Rajamani, Ludwig, Kuster, Mathias, Harrer, Elena, Heber, Inga, Grossmann, David D, Ebert, and Bjorn W, Schuller
Subjects: FOS: Computer and information sciences, Depressive Disorder, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Depression, Computer Science - Artificial Intelligence, Surveys and Questionnaires, Humans, Cell Phone, Machine Learning (cs.LG)
Abstract: Previous studies have shown the correlation between sensor data collected from mobile phones and human depression states. Compared to the traditional self-assessment questionnaires, the passive data collected from mobile phones is easier to access and less time-consuming. In particular, passive mobile phone data can be collected on a flexible time interval, thus detecting moment-by-moment psychological changes and helping achieve earlier interventions. Moreover, while previous studies mainly focused on depression diagnosis using mobile phone data, depression forecasting has not received sufficient attention. In this work, we extract four types of passive features from mobile phone data, including phone call, phone usage, user activity, and GPS features. We implement a long short-term memory (LSTM) network in a subject-independent 10-fold cross-validation setup to model both a diagnostic and a forecasting tasks. Experimental results show that the forecasting task achieves comparable results with the diagnostic task, which indicates the possibility of forecasting depression from mobile phone sensor data. Our model achieves an accuracy of 77.0 % for major depression forecasting (binary), an accuracy of 53.7 % for depression severity forecasting (5 classes), and a best RMSE score of 4.094 (PHQ-9, range from 0 to 27)., Accepted by EMBC 2022
Published: 2022

28. Journaling Data for Daily PHQ-2 Depression Prediction and Forecasting

Author: Alexander Kathan, Andreas Triantafyllopoulos, Xiangheng He, Manuel Milling, Tianhao Yan, Srividya Tirunellai Rajamani, Ludwig Kuster, Mathias Harrer, Elena Heber, Inga Grossmann, David D. Ebert, and Bjorn W. Schuller
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Depression, Computer Science - Artificial Intelligence, Surveys and Questionnaires, Humans, Patient Health Questionnaire, Machine Learning (cs.LG)
Abstract: Digital health applications are becoming increasingly important for assessing and monitoring the wellbeing of people suffering from mental health conditions like depression. A common target of said applications is to predict the results of self-assessed Patient-Health-Questionnaires (PHQ), indicating current symptom severity of depressive individuals. Many of the currently available approaches to predict PHQ scores use passive data, e.g., from smartphones. However, there are several other scores and data besides PHQ, e.g., the Behavioral Activation for Depression Scale-Short Form (BADSSF), the Center for Epidemiologic Studies Depression Scale (CESD), or the Personality Dynamics Diary (PDD), all of which can be effortlessly collected on a daily basis. In this work, we explore the potential of using actively-collected data to predict and forecast daily PHQ-2 scores on a newly-collected longitudinal dataset. We obtain a best MAE of 1.417 for daily prediction of PHQ-2 scores, which specifically in the used dataset have a range of 0 to 12, using leave-one-subject-out cross-validation, as well as a best MAE of 1.914 for forecasting PHQ-2 scores using data from up to the last 7 days. This illustrates the additive value that can be obtained by incorporating actively-collected data in a depression monitoring application.
Published: 2022

29. A Temporal-oriented Broadcast ResNet for COVID-19 Detection

Author: Xin Jing, Shuo Liu, Emilia Parada-Cabaleiro, Andreas Triantafyllopoulos, Meishu Song, Zijiang Yang, and Bjorn W. Schuller
Subjects: FOS: Computer and information sciences, Sound (cs.SD), Computer Science - Machine Learning, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Computer Science - Sound, Machine Learning (cs.LG), Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Detecting COVID-19 from audio signals, such as breathing and coughing, can be used as a fast and efficient pre-testing method to reduce the virus transmission. Due to the promising results of deep learning networks in modelling time sequences, and since applications to rapidly identify COVID in-the-wild should require low computational effort, we present a temporal-oriented broadcasting residual learning method that achieves efficient computation and high accuracy with a small model size. Based on the EfficientNet architecture, our novel network, named Temporal-oriented ResNet~(TorNet), constitutes of a broadcasting learning block, i.e. the Alternating Broadcast (AB) Block, which contains several Broadcast Residual Blocks (BC ResBlocks) and a convolution layer. With the AB Block, the network obtains useful audio-temporal features and higher level embeddings effectively with much less computation than Recurrent Neural Networks~(RNNs), typically used to model temporal information. TorNet achieves 72.2% Unweighted Average Recall (UAR) on the INTERPSEECH 2021 Computational Paralinguistics Challenge COVID-19 cough Sub-Challenge, by this showing competitive results with a higher computational efficiency than other state-of-the-art alternatives., 5 pages,submitted to Intesspeech 2022
Published: 2022

30. Heart Sound Classification based on Fractional Fourier Transformation Entropy

Author: Yang Tan, Zhihua Wang, Kun Qian, Bin Hu, Shiliang Zhao, Bjorn W. Schuller, and Yoshiharu Yamamoto
Published: 2022
Full Text: View/download PDF

31. Fatigue Prediction in Outdoor Running Conditions using Audio Data

Author: Andreas Triantafyllopoulos, Sandra Ottl, Alexander Gebhard, Esther Rituerto-Gonzalez, Mirko Jaumann, Steffen Huttner, Valerie Dieter, Patrick Schneeweiss, Inga Krauss, Maurice Gerczuk, Shahin Amiriparian, and Bjorn W. Schuller
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Sound (cs.SD), Audio and Speech Processing (eess.AS), Muscle Fatigue, FOS: Electrical engineering, electronic engineering, information engineering, Humans, Neural Networks, Computer, Computer Science - Sound, Fatigue, Electrical Engineering and Systems Science - Audio and Speech Processing, Machine Learning (cs.LG)
Abstract: Although running is a common leisure activity and a core training regiment for several athletes, between $29\%$ and $79\%$ of runners sustain an overuse injury each year. These injuries are linked to excessive fatigue, which alters how someone runs. In this work, we explore the feasibility of modelling the Borg received perception of exertion (RPE) scale (range: $[6-20]$), a well-validated subjective measure of fatigue, using audio data captured in realistic outdoor environments via smartphones attached to the runners' arms. Using convolutional neural networks (CNNs) on log-Mel spectrograms, we obtain a mean absolute error of $2.35$ in subject-dependent experiments, demonstrating that audio can be effectively used to model fatigue, while being more easily and non-invasively acquired than by signals from other sensors., Comment: Paper accepted at IEEE EMBC 2022. Rights remain with IEEE
Published: 2022
Full Text: View/download PDF

32. Guest Editorial: Introduction to the Special Section on Efficient Network Design for Convergence of Deep Learning and Edge Computing

Author: Shiping Wen, Tingwen Huang, Bjorn W. Schuller, and Ahmad Taher Azar
Subjects: Computer Networks and Communications, Control and Systems Engineering, Computer Science Applications
Published: 2022

33. Selective element and two orders vectorization networks for automatic depression severity diagnosis via facial changes

Author: Mingyue Niu, Ziping Zhao, Jianhua Tao, Ya Li, and Bjorn W. Schuller
Subjects: Media Technology, Electrical and Electronic Engineering, ddc:004
Published: 2022

34. Domain invariant feature learning for speaker-independent speech emotion recognition

Author: Cheng Lu, Yuan Zong, Wenming Zheng, Yang Li, Chuangao Tang, and Bjorn W. Schuller
Subjects: Computational Mathematics, Acoustics and Ultrasonics, Computer Science (miscellaneous), ddc:000, Electrical and Electronic Engineering
Published: 2022

35. Multitask Learning from Augmented Auxiliary Data for Improving Speech Emotion Recognition

Author: Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, and Bjorn W. Schuller
Subjects: Human-Computer Interaction, FOS: Computer and information sciences, Sound (cs.SD), Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Software, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Despite the recent progress in speech emotion recognition (SER), state-of-the-art systems lack generalisation across different conditions. A key underlying reason for poor generalisation is the scarcity of emotion datasets, which is a significant roadblock to designing robust machine learning (ML) models. Recent works in SER focus on utilising multitask learning (MTL) methods to improve generalisation by learning shared representations. However, most of these studies propose MTL solutions with the requirement of meta labels for auxiliary tasks, which limits the training of SER systems. This paper proposes an MTL framework (MTL-AUG) that learns generalised representations from augmented data. We utilise augmentation-type classification and unsupervised reconstruction as auxiliary tasks, which allow training SER systems on augmented data without requiring any meta labels for auxiliary tasks. The semi-supervised nature of MTL-AUG allows for the exploitation of the abundant unlabelled data to further boost the performance of SER. We comprehensively evaluate the proposed framework in the following settings: (1) within corpus, (2) cross-corpus and cross-language, (3) noisy speech, (4) and adversarial attacks. Our evaluations using the widely used IEMOCAP, MSP-IMPROV, and EMODB datasets show improved results compared to existing state-of-the-art methods., Comment: Under review IEEE Transactions on Affective Computing
Published: 2022
Full Text: View/download PDF

36. The influence of pleasant and unpleasant odours on the acoustics of speech

Author: Maurice Gerczuk, Anton Batliner, Shahin Amiriparian, Andreas Triantafyllopoulos, Franziska Heyne, Marie Klockow, Thomas Hummel, and Bjorn W. Schuller
Subjects: ddc:004
Abstract: Olfaction, i. e., the sense of smell is referred to as the ‘emotional sense’, as it has been shown to elicit affective responses. Yet, its influence on speech production has not been investigated. In this paper, we introduce a novel speech-based smell recognition approach, drawing from the fields of speech emotion recognition and personalised machine learning. In particular, we collected a corpus of 40 female speakers reading 2 short stories while either no scent, unpleasant odour (fish), or pleasant odour (peach) is applied through a nose clip. Further, we present a machine learning pipeline for the extraction of data representations, model training, and personalisation of the trained models. In a leave-one-speaker-out cross-validation, our best models trained on state-of-the-art wav2vec features achieve a classification rate of 68 % when distinguishing between speech produced under the influence of negative scent and no applied scent. In addition, we highlight the importance of personalisation approaches, showing that a speaker-based feature normalisation substantially improves performance across the evaluated experiments. In summary, the presented results indicate that odours have a weak, but measurable effect on the acoustics of speech.
Published: 2022

37. The Filtering Effect of Face Masks in their Detection from Speech

Author: Adria Mallol-Ragolta, Shuo Liu, and Bjorn W. Schuller
Subjects: Support Vector Machine, Support vector machines, Masks, Absorption, Transfer learning, Voice, Humans, Speech, Training, Feature extraction, Surgery, Convolutional neural networks, Neural Networks, Computer, ddc:004
Abstract: Face masks alter the speakers’ voice, as their intrinsic properties provide them with acoustic absorption capabilities. Hence, face masks act as filters to the human voice. This work focuses on the automatic detection of face masks from speech signals, emphasising on a previous work claiming that face masks attenuate frequencies above 1 kHz. We compare a paralinguistics-based and a spectrograms-based approach for the task at hand. While the former extracts paralinguistic features from filtered versions of the original speech samples, the latter exploits the spectrogram representations of the speech samples containing specific ranges of frequencies. The machine learning techniques investigated for the paralinguistics-based approach include Support Vector Machines (SVM), and a Multi-Layer Perceptron (MLP). For the spectrograms-based approach, we use a Convolutional Neural Network (CNN). Our experiments are conducted on the Mask Augsburg Speech Corpus (MASC), released for the Interspeech 2020 Computational Paralinguistics Challenge (COMPARE). The best performances on the test set from the paralinguistic analysis are obtained using the high-pass filtered versions of the original speech samples. Nonetheless, the highest Unweighted Average Recall (UAR) on the test set is obtained when exploiting the spectrograms with frequency content below 1 kHz.
Published: 2021
Full Text: View/download PDF

38. Automatic Recognition of Texture in Renaissance Music

Author: Emilia Parada-Cabaleiro, Maximilian Schmitt, Anton Batliner, Bjorn W. Schuller, and Markus Schedl
Subjects: ddc:004
Abstract: Renaissance music constitutes a resource of immense richness for Western culture, as shown by its central role in digital humanities. Yet, despite the advance of computational musicology in analysing other Western repertoires, the use of computer-based methods to automatically retrieve relevant information from Renaissance music, e. g., identifying word-painting strategies such as madrigalisms, is still underdeveloped. To this end, we propose a score-based machine learning approach for the classification of texture in Italian madrigals of the 16th century. Our outcomes indicate that Low Level Descriptors, such as intervals, can successfully convey differences in High Level features, such as texture. Furthermore, our baseline results, particularly the ones from a Convolutional Neural Network, show that machine learning can be successfully used to automatically identify sections in madrigals associated with specific textures from symbolic sources.
Published: 2021
Full Text: View/download PDF

39. Transferring Cross-Corpus Knowledge: An Investigation on Data Augmentation for Heart Sound Classification

Author: Tomoya, Koike, Kun, Qian, Bjorn W, Schuller, and Yoshiharu, Yamamoto
Subjects: Machine Learning, Heart Sounds, Databases, Factual, Humans, Signal Processing, Computer-Assisted, ddc:004
Abstract: Human auscultation has been regarded as a cheap, convenient and efficient method for the diagnosis of cardiovascular diseases. Nevertheless, training professional auscultation skills needs tremendous efforts and is time-consuming. Computer audition (CA) that leverages the power of advanced machine learning and signal processing technologies has increasingly attracted contributions to the field of automatic heart sound classification. While previous studies have shown promising results in CA based heart sound classification with the 'shuffle split' method, machine learning for heart sound classification decreases in accuracy with a cross-corpus test dataset. We investigate this problem with a cross-corpus evaluation using the PhysioNet CinC Challenge 2016 Dataset and propose a new combination of data augmentation techniques that leads to a CNN robust for such cross-corpus evaluation. Compared with the baseline, which is given without augmentation, our data augmentation techniques combined improve by 20.0 % the sensitivity and by 7.9 % the specificity on average across 6 databases, which is a significant difference on 4 out of these (p.05 by one-tailed z-test).
Published: 2021
Full Text: View/download PDF

40. COVID-19 Biomarkers in Speech: On Source and Filter Components

Author: Gauri, Deshpande and Bjorn W, Schuller
Subjects: SARS-CoV-2, COVID-19, Humans, Reproducibility of Results, Speech, ddc:004, Biomarkers
Abstract: This paper analyses the source of excitation and vocal tract influenced filter components to identify the biomarkers of COVID-19 in the human speech signal. The source-filter separated components of cough and breathing sounds collected from healthy and COVID-19 positive subjects are also analyzed. The source-filter separation techniques using cepstral, and phase domain approaches are compared and validated by using them in a neural network for the detection of COVID-19 positive subjects. A comparative analysis of the performance exhibited by vowels, cough, and breathing sounds is also presented. We use the public Coswara database for the reproducibility of our findings.
Published: 2021
Full Text: View/download PDF

41. Towards an Efficient Deep Learning Model for Emotion and Theme Recognition in Music

Author: Srividya Tirunellai Rajamani, Kumar Rajamani, and Bjorn W. Schuller
Published: 2021
Full Text: View/download PDF

42. Focused review on artificial intelligence for disease detection in infants

Author: Katrin D. Bartl-Pokorny, Claudia Zitta, Markus Beirit, Gunter Vogrinec, Björn W. Schuller, and Florian B. Pokorny
Subjects: artificial intelligence, machine learning, deep learning, infancy, disease, detection, Medicine, Public aspects of medicine, RA1-1270, Electronic computers. Computer science, QA75.5-76.95
Abstract: Over the last years, studies using artificial intelligence (AI) for the detection and prediction of diseases have increased and also concentrated more and more on vulnerable groups of individuals, such as infants. The release of ChatGPT demonstrated the potential of large language models (LLMs) and heralded a new era of AI with manifold application possibilities. However, the impact of this new technology on medical research cannot be fully estimated yet. In this work, we therefore aimed to summarise the most recent pre-ChatGPT developments in the field of automated detection and prediction of diseases and disease status in infants, i.e., within the first 12 months of life. For this, we systematically searched the scientific databases PubMed and IEEE Xplore for original articles published within the last five years preceding the release of ChatGPT (2018–2022). The search revealed 927 articles; a final number of 154 articles was included for review. First of all, we examined research activity over time. Then, we analysed the articles from 2022 for medical conditions, data types, tasks, AI approaches, and reported model performance. A clear trend of increasing research activity over time could be observed. The most recently published articles focused on medical conditions of twelve different ICD-11 categories; “certain conditions originating in the perinatal period” was the most frequently addressed disease category. AI models were trained with a variety of data types, among which clinical and demographic information and laboratory data were most frequently exploited. The most frequently performed tasks aimed to detect present diseases, followed by the prediction of diseases and disease status at a later point in development. Deep neural networks turned out as the most popular AI approach, even though traditional methods, such as random forests and support vector machines, still play a role—presumably due to their explainability or better suitability when the amount of data is limited. Finally, the reported performances in many of the reviewed articles suggest that AI has the potential to assist in diagnostic procedures for infants in the near future. LLMs will boost developments in this field in the upcoming years.
Published: 2024
Full Text: View/download PDF

43. Hierarchical Attention-Based Temporal Convolutional Networks for Eeg-Based Emotion Recognition

Author: Chao Li, Ziping Zhao, Nicholas Cummins, Boyang Chen, and Bjorn W. Schuller
Subjects: medicine.diagnostic_test, Computer science, business.industry, Speech recognition, Deep learning, Frame (networking), Electroencephalography, Field (computer science), Recurrent neural network, Discriminative model, medicine, Spectrogram, Artificial intelligence, ddc:004, business, Communication channel
Abstract: EEG-based emotion recognition is an effective way to infer the inner emotional state of human beings. Recently, deep learning methods, particularly long short-term memory recurrent neural networks (LSTM-RNNs), have made encouraging progress for in the field of emotion recognition. However, the LSTM-RNNs are time-consuming and have difficulty avoiding the problem of exploding/vanishing gradients when during training. In addition, EEG-based emotion recognition often suffers due to the existence of silent and emotional irrelevant frames from intra-channel. Not all channels carry the same emotional discriminative information. In order to tackle these problems, a hierarchical attention-based temporal convolutional networks (HATCN) for efficient EEG-based emotion recognition is proposed. Firstly, a spectrogram representation is generated from raw EEG signals in each channel to capture their time and frequency information. Secondly, temporal convolutional networks (TCNs) are utilised to automatically learn more robust/intrinsic long-term dynamic characters in emotion response. Next, a hierarchical attention mechanism is investigated that aggregates the emotional information at both the frame and channel level. The experimental results on the DEAP dataset show that our method achieves an average recognition accuracy of 0.716 and an F1-score of 0.642 over four emotional dimensions and outperforms other state-of-the-art methods in a user-independent scenario.
Published: 2021
Full Text: View/download PDF

44. A large-scale and PCR-referenced vocal audio dataset for COVID-19

Author: Jobie Budd, Kieran Baker, Emma Karoune, Harry Coppock, Selina Patel, Richard Payne, Ana Tendero Cañadas, Alexander Titcomb, David Hurley, Sabrina Egglestone, Lorraine Butler, Jonathon Mellor, George Nicholson, Ivan Kiskin, Vasiliki Koutra, Radka Jersakova, Rachel A. McKendry, Peter Diggle, Sylvia Richardson, Björn W. Schuller, Steven Gilmour, Davide Pigoli, Stephen Roberts, Josef Packham, Tracey Thornley, and Chris Holmes
Subjects: Science
Abstract: Abstract The UK COVID-19 Vocal Audio Dataset is designed for the training and evaluation of machine learning models that classify SARS-CoV-2 infection status or associated respiratory symptoms using vocal audio. The UK Health Security Agency recruited voluntary participants through the national Test and Trace programme and the REACT-1 survey in England from March 2021 to March 2022, during dominant transmission of the Alpha and Delta SARS-CoV-2 variants and some Omicron variant sublineages. Audio recordings of volitional coughs, exhalations, and speech were collected in the ‘Speak up and help beat coronavirus’ digital survey alongside demographic, symptom and self-reported respiratory condition data. Digital survey submissions were linked to SARS-CoV-2 test results. The UK COVID-19 Vocal Audio Dataset represents the largest collection of SARS-CoV-2 PCR-referenced audio recordings to date. PCR results were linked to 70,565 of 72,999 participants and 24,105 of 25,706 positive cases. Respiratory symptoms were reported by 45.6% of participants. This dataset has additional potential uses for bioacoustics research, with 11.3% participants self-reporting asthma, and 27.2% with linked influenza PCR test results.
Published: 2024
Full Text: View/download PDF

45. Detecting somatisation disorder via speech: introducing the Shenzhen Somatisation Speech Corpus

Author: Kun Qian, Ruolan Huang, Zhihao Bao, Yang Tan, Zhonghao Zhao, Mengkai Sun, Bin Hu, Björn W. Schuller, and Yoshiharu Yamamoto
Subjects: Somatisation disorder, Machine learning, Healthcare, Computer audition, Medical technology, R855-855.5
Abstract: Objective: Speech recognition technology is widely used as a mature technical approach in many fields. In the study of depression recognition, speech signals are commonly used due to their convenience and ease of acquisition. Though speech recognition is popular in the research field of depression recognition, it has been little studied in somatisation disorder recognition. The reason for this is the lack of a publicly accessible database of relevant speech and benchmark studies. To this end, we introduced our somatisation disorder speech database and gave benchmark results. Methods: By collecting speech samples of somatisation disorder patients, in cooperation with the Shenzhen University General Hospital, we introduced our somatisation disorder speech database, the Shenzhen Somatisation Speech Corpus (SSSC). Moreover, a benchmark for SSSC using classic acoustic features and a machine learning model was proposed in our work. Results: To obtain a more scientific benchmark, we compared and analysed the performance of different acoustic features, i. e., the full ComPare feature set, or only Mel frequency cepstral coefficients (MFCCs), fundamental frequency (F0), and frequency and bandwidth of the formants (F1-F3). By comparison, the best result of our benchmark was the 76.0% unweighted average recall achieved by a support vector machine with formants F1–F3. Conclusion: The proposal of SSSC may bridge a research gap in somatisation disorder, providing researchers with a publicly accessible speech database. In addition, the results of the benchmark could show the scientific validity and feasibility of computer audition for speech recognition in somatization disorders.
Published: 2024
Full Text: View/download PDF

46. Sensing the sounds of silence: a pilot study on the detection of model mice of autism spectrum disorder from ultrasonic vocalisations

Author: Kun Qian, Tomoya Koike, Kota Tamada, Toru Takumi, Bjorn W. Schuller, and Yoshiharu Yamamoto
Subjects: Mice, Sound, Autism Spectrum Disorder, Animals, Humans, Pilot Projects, Ultrasonics, Vocalization, Animal, ddc:004
Abstract: Studying the animal models of human neuropsychiatric disorders can facilitate the understanding of mechanisms of symptoms both physiologically and genetically. Previous studies have shown that ultrasonic vocalisations (USVs) of mice might be efficient markers to distinguish the wild type group and the model of autism spectrum disorder (mASD). Nevertheless, in-depth analysis of these 'silence' sounds by leveraging the power of advanced computer audition technologies (e. g., deep learning) is limited. To this end, we propose a pilot study on using a large-scale pre-trained audio neural network to extract high-level representations from the USVs of mice for the task on detection of mASD. Experiments have shown a best result reaching an unweighted average recall of 79.2 % for the binary classification task in a rigorous subject-independent scenario. To the best of our knowledge, this is the first time to analyse the sounds that cannot be heard by human beings for the detection of mASD mice. The novel findings can be significant to motivate future works with according means on studying animal models of human patients.
Published: 2021

47. Audio-Visual Gated-Sequenced Neural Networks for Affect Recognition

Author: Decky Aspandi, Federico Sukno, Bjorn W. Schuller, and Xavier Binefa
Subjects: Human-Computer Interaction, Software
Published: 2022
Full Text: View/download PDF

48. Dual Attention and Element Recalibration Networks for Automatic Depression Level Prediction

Author: Mingyue Niu, Ziping Zhao, Jianhua Tao, Ya Li, and Bjorn W. Schuller
Subjects: Human-Computer Interaction, Software
Published: 2022
Full Text: View/download PDF

49. Battling with the low-resource condition for snore sound recognition: introducing a meta-learning strategy

Author: Jingtan Li, Mengkai Sun, Zhonghao Zhao, Xingcan Li, Gaigai Li, Chen Wu, Kun Qian, Bin Hu, Yoshiharu Yamamoto, and Björn W. Schuller
Subjects: Computer audition, Snore sound classification, Meta-learning, Low-resource, Obstructive sleep apnoea, Digital health, Acoustics. Sound, QC221-246, Electronic computers. Computer science, QA75.5-76.95
Abstract: Abstract Snoring affects 57 % of men, 40 % of women, and 27 % of children in the USA. Besides, snoring is highly correlated with obstructive sleep apnoea (OSA), which is characterised by loud and frequent snoring. OSA is also closely associated with various life-threatening diseases such as sudden cardiac arrest and is regarded as a grave medical ailment. Preliminary studies have shown that in the USA, OSA affects over 34 % of men and 14 % of women. In recent years, polysomnography has increasingly been used to diagnose OSA. However, due to its drawbacks such as being time-consuming and costly, intelligent audio analysis of snoring has emerged as an alternative method. Considering the higher demand for identifying the excitation location of snoring in clinical practice, we utilised the Munich-Passau Snore Sound Corpus (MPSSC) snoring database which classifies the snoring excitation location into four categories. Nonetheless, the problem of small samples remains in the MPSSC database due to factors such as privacy concerns and difficulties in accurate labelling. In fact, accurately labelled medical data that can be used for machine learning is often scarce, especially for rare diseases. In view of this, Model-Agnostic Meta-Learning (MAML), a small sample method based on meta-learning, is used to classify snore signals with less resources in this work. The experimental results indicate that even when using only the ESC-50 dataset (non-snoring sound signals) as the data for meta-training, we are able to achieve an unweighted average recall of 60.2 % on the test dataset after fine-tuning on just 36 instances of snoring from the development part of the MPSSC dataset. While our results only exceed the baseline by 4.4 %, they still demonstrate that even with fine-tuning on a few instances of snoring, our model can outperform the baseline. This implies that the MAML algorithm can effectively tackle the low-resource problem even with limited data resources.
Published: 2023
Full Text: View/download PDF

50. Evaluating the feasibility and exploring the efficacy of an emotion-based approach-avoidance modification training (eAAMT) in the context of perceived stress in an adult sample — protocol of a parallel randomized controlled pilot study

Author: Marie Keinert, Bjoern M. Eskofier, Björn W. Schuller, Stephanie Böhme, and Matthias Berking
Subjects: Perceived stress, Dysfunctional beliefs, Approach-avoidance modification, Emotion, Smartphone-based intervention, Parallel randomized controlled pilot trial, Medicine (General), R5-920
Abstract: Abstract Background Stress levels and thus the risk of developing related physical and mental health conditions are rising worldwide. Dysfunctional beliefs contribute to the development of stress. Potentially, such beliefs can be modified with approach-avoidance modification trainings (AAMT). As previous research indicates that effects of AAMTs are small, there is a need for innovative ways of increasing the efficacy of these interventions. For this purpose, we aim to evaluate the feasibility of the intervention and study design and explore the efficacy of an innovative emotion-based AAMT version (eAAMT) that uses the display of emotions to move stress-inducing beliefs away from and draw stress-reducing beliefs towards oneself. Methods We will conduct a parallel randomized controlled pilot study at the Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany. Individuals with elevated stress levels will be randomized to one of eight study conditions (n = 10 per condition) — one of six variants of the eAAMT, an active control intervention (swipe-based AAMT), or an inactive control condition. Participants in the intervention groups will engage in four sessions of 20–30 min (e)AAMT training on consecutive days. Participants in the inactive control condition will complete the assessments via an online tool. Non-blinded assessments will be taken directly before and after the training and 1 week after training completion. The primary outcome will be perceived stress. Secondary outcomes will be dysfunctional beliefs, symptoms of depression, emotion regulation skills, and physiological stress measures. We will compute effect sizes and conduct mixed ANOVAs to explore differences in change in outcomes between the eAAMT and control conditions. Discussion The study will provide valuable information to improve the intervention and study design. Moreover, if shown to be effective, the approach can be used as an automated smartphone-based intervention. Future research needs to identify target groups benefitting from this intervention utilized either as stand-alone treatment or an add-on intervention that is combined with other evidence-based treatments. Trial registration The trial has been registered in the German Clinical Trials Register (Deutsches Register Klinischer Studien; DRKS00023007 ; September 7, 2020).
Published: 2023
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

88 results on '"Bjorn W. Schuller"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources