Author: "Schultz, Tanja" / Publication Year Range: This year - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Schultz, Tanja"' showing total 20 results

Start Over Author "Schultz, Tanja" Publication Year Range This year

20 results on '"Schultz, Tanja"'

1. Investigating Effective Speaker Property Privacy Protection in Federated Learning for Speech Emotion Recognition

Author: Tan, Chao, Li, Sheng, Cao, Yang, Ren, Zhao, and Schultz, Tanja
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: Federated Learning (FL) is a privacy-preserving approach that allows servers to aggregate distributed models transmitted from local clients rather than training on user data. More recently, FL has been applied to Speech Emotion Recognition (SER) for secure human-computer interaction applications. Recent research has found that FL is still vulnerable to inference attacks. To this end, this paper focuses on investigating the security of FL for SER concerning property inference attacks. We propose a novel method to protect the property information in speech data by decomposing various properties in the sound and adding perturbations to these properties. Our experiments show that the proposed method offers better privacy-utility trade-offs than existing methods. The trade-offs enable more effective attack prevention while maintaining similar FL utility levels. This work can guide future work on privacy protection methods in speech processing.
Published: 2024

2. Speech as a Biomarker for Disease Detection

Author: Botelho, Catarina, Abad, Alberto, Schultz, Tanja, and Trancoso, Isabel
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: Speech is a rich biomarker that encodes substantial information about the health of a speaker, and thus it has been proposed for the detection of numerous diseases, achieving promising results. However, questions remain about what the models trained for the automatic detection of these diseases are actually learning and the basis for their predictions, which can significantly impact patients' lives. This work advocates for an interpretable health model, suitable for detecting several diseases, motivated by the observation that speech-affecting disorders often have overlapping effects on speech signals. A framework is presented that first defines "reference speech" and then leverages this definition for disease detection. Reference speech is characterized through reference intervals, i.e., the typical values of clinically meaningful acoustic and linguistic features derived from a reference population. This novel approach in the field of speech as a biomarker is inspired by the use of reference intervals in clinical laboratory science. Deviations of new speakers from this reference model are quantified and used as input to detect Alzheimer's and Parkinson's disease. The classification strategy explored is based on Neural Additive Models, a type of glass-box neural network, which enables interpretability. The proposed framework for reference speech characterization and disease detection is designed to support the medical community by providing clinically meaningful explanations that can serve as a valuable second opinion.
Published: 2024

3. NeuroSpex: Neuro-Guided Speaker Extraction with Cross-Modal Attention

Author: De Silva, Dashanka, Cai, Siqi, Pahuja, Saurav, Schultz, Tanja, and Li, Haizhou
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: In the study of auditory attention, it has been revealed that there exists a robust correlation between attended speech and elicited neural responses, measurable through electroencephalography (EEG). Therefore, it is possible to use the attention information available within EEG signals to guide the extraction of the target speaker in a cocktail party computationally. In this paper, we present a neuro-guided speaker extraction model, i.e. NeuroSpex, using the EEG response of the listener as the sole auxiliary reference cue to extract attended speech from monaural speech mixtures. We propose a novel EEG signal encoder that captures the attention information. Additionally, we propose a cross-attention (CA) mechanism to enhance the speech feature representations, generating a speaker extraction mask. Experimental results on a publicly available dataset demonstrate that our proposed model outperforms two baseline models across various evaluation metrics.
Published: 2024

4. On the Role of Visual Grounding in VQA

Author: Reich, Daniel and Schultz, Tanja
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Visual Grounding (VG) in VQA refers to a model's proclivity to infer answers based on question-relevant image regions. Conceptually, VG identifies as an axiomatic requirement of the VQA task. In practice, however, DNN-based VQA models are notorious for bypassing VG by way of shortcut (SC) learning without suffering obvious performance losses in standard benchmarks. To uncover the impact of SC learning, Out-of-Distribution (OOD) tests have been proposed that expose a lack of VG with low accuracy. These tests have since been at the center of VG research and served as basis for various investigations into VG's impact on accuracy. However, the role of VG in VQA still remains not fully understood and has not yet been properly formalized. In this work, we seek to clarify VG's role in VQA by formalizing it on a conceptual level. We propose a novel theoretical framework called "Visually Grounded Reasoning" (VGR) that uses the concepts of VG and Reasoning to describe VQA inference in ideal OOD testing. By consolidating fundamental insights into VG's role in VQA, VGR helps to reveal rampant VG-related SC exploitation in OOD testing, which explains why the relationship between VG and OOD accuracy has been difficult to define. Finally, we propose an approach to create OOD tests that properly emphasize a requirement for VG, and show how to improve performance on them.
Published: 2024

5. Speech Emotion Recognition under Resource Constraints with Data Distillation

Author: Chang, Yi, Ren, Zhao, Zhao, Zhonghao, Nguyen, Thanh Tam, Qian, Kun, Schultz, Tanja, and Schuller, Björn W.
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Speech emotion recognition (SER) plays a crucial role in human-computer interaction. The emergence of edge devices in the Internet of Things (IoT) presents challenges in constructing intricate deep learning models due to constraints in memory and computational resources. Moreover, emotional speech data often contains private information, raising concerns about privacy leakage during the deployment of SER models. To address these challenges, we propose a data distillation framework to facilitate efficient development of SER models in IoT applications using a synthesised, smaller, and distilled dataset. Our experiments demonstrate that the distilled dataset can be effectively utilised to train SER models with fixed initialisation, achieving performances comparable to those developed using the original full emotional speech dataset.
Published: 2024

6. Diff-ETS: Learning a Diffusion Probabilistic Model for Electromyography-to-Speech Conversion

Author: Ren, Zhao, Scheck, Kevin, Hou, Qinhan, van Gogh, Stefano, Wand, Michael, and Schultz, Tanja
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Electromyography-to-Speech (ETS) conversion has demonstrated its potential for silent speech interfaces by generating audible speech from Electromyography (EMG) signals during silent articulations. ETS models usually consist of an EMG encoder which converts EMG signals to acoustic speech features, and a vocoder which then synthesises the speech signals. Due to an inadequate amount of available data and noisy signals, the synthesised speech often exhibits a low level of naturalness. In this work, we propose Diff-ETS, an ETS model which uses a score-based diffusion probabilistic model to enhance the naturalness of synthesised speech. The diffusion model is applied to improve the quality of the acoustic features predicted by an EMG encoder. In our experiments, we evaluated fine-tuning the diffusion model on predictions of a pre-trained EMG encoder, and training both models in an end-to-end fashion. We compared Diff-ETS with a baseline ETS model without diffusion using objective metrics and a listening test. The results indicated the proposed Diff-ETS significantly improved speech naturalness over the baseline., Comment: Accepted by EMBC 2024
Published: 2024

7. STAA-Net: A Sparse and Transferable Adversarial Attack for Speech Emotion Recognition

Author: Chang, Yi, Ren, Zhao, Zhang, Zixing, Jing, Xin, Qian, Kun, Shao, Xi, Hu, Bin, Schultz, Tanja, and Schuller, Björn W.
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Computer Science - Human-Computer Interaction, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Speech contains rich information on the emotions of humans, and Speech Emotion Recognition (SER) has been an important topic in the area of human-computer interaction. The robustness of SER models is crucial, particularly in privacy-sensitive and reliability-demanding domains like private healthcare. Recently, the vulnerability of deep neural networks in the audio domain to adversarial attacks has become a popular area of research. However, prior works on adversarial attacks in the audio domain primarily rely on iterative gradient-based techniques, which are time-consuming and prone to overfitting the specific threat model. Furthermore, the exploration of sparse perturbations, which have the potential for better stealthiness, remains limited in the audio domain. To address these challenges, we propose a generator-based attack method to generate sparse and transferable adversarial examples to deceive SER models in an end-to-end and efficient manner. We evaluate our method on two widely-used SER datasets, Database of Elicited Mood in Speech (DEMoS) and Interactive Emotional dyadic MOtion CAPture (IEMOCAP), and demonstrate its ability to generate successful sparse adversarial examples in an efficient manner. Moreover, our generated adversarial examples exhibit model-agnostic transferability, enabling effective adversarial attacks on advanced victim models.
Published: 2024

8. Uncovering the Full Potential of Visual Grounding Methods in VQA

Author: Reich, Daniel and Schultz, Tanja
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Visual Grounding (VG) methods in Visual Question Answering (VQA) attempt to improve VQA performance by strengthening a model's reliance on question-relevant visual information. The presence of such relevant information in the visual input is typically assumed in training and testing. This assumption, however, is inherently flawed when dealing with imperfect image representations common in large-scale VQA, where the information carried by visual features frequently deviates from expected ground-truth contents. As a result, training and testing of VG-methods is performed with largely inaccurate data, which obstructs proper assessment of their potential benefits. In this study, we demonstrate that current evaluation schemes for VG-methods are problematic due to the flawed assumption of availability of relevant visual information. Our experiments show that these methods can be much more effective when evaluation conditions are corrected. Code is provided on GitHub.
Published: 2024

9. LabLinking: theory, framework, and solutions of connecting laboratories for distributed human experiments

Author: Schultz, Tanja, Putze, Felix, Reisenhofer, Rafael, Fehr, Thorsten, Meier, Moritz, Mason, Celeste, and Ahrens, Florian
Published: 2024
Full Text: View/download PDF

10. Hybrid Adaptive Systems

Author: Benke, Ivo, Knierim, Michael, Adam, Marc, Beigl, Michael, Dorner, Verena, Ebner-Priemer, Ulrich, Herrmann, Manfred, Klarmann, Martin, Maedche, Alexander, Nafziger, Julia, Nieken, Petra, Pfeiffer, Jella, Puppe, Clemens, Putze, Felix, Scheibehenne, Benjamin, Schultz, Tanja, and Weinhardt, Christof
Published: 2024
Full Text: View/download PDF

11. Entwicklungen in der Digitalisierung von Public Health seit 2020: Beispiele aus dem Leibniz-WissenschaftsCampus Digital Public Health Bremen

Author: Zeeb, Hajo, Schüz, Benjamin, Schultz, Tanja, and Pigeot, Iris
Published: 2024
Full Text: View/download PDF

12. LSTM-MorA: Melody-Accompaniment Classification of MIDI Tracks

Author: Liu, Hui, Flaack, Leon, Zhang, Shiyao, Schultz, Tanja, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Wand, Michael, editor, Malinovská, Kristína, editor, Schmidhuber, Jürgen, editor, and Tetko, Igor V., editor
Published: 2024
Full Text: View/download PDF

13. Taxonomy and Real-Time Classification of Artifacts During Biosignal Acquisition: A Starter Study and Dataset of ECG

Author: Liu, Hui, primary, Zhang, Shiyao, additional, Gamboa, Hugo, additional, Xue, Tingting, additional, Zhou, Congcong, additional, and Schultz, Tanja, additional
Published: 2024
Full Text: View/download PDF

14. Human Activity Recognition, Monitoring, and Analysis Facilitated by Novel and Widespread Applications of Sensors.

Author: Liu, Hui, Gamboa, Hugo, and Schultz, Tanja
Subjects: LANGUAGE models, GRAPH neural networks, MACHINE learning, HUMAN mechanics, ARTIFICIAL intelligence, HUMAN activity recognition, DEEP learning
Abstract: This document is a summary of a special issue of the journal Sensors titled "Human Activity Recognition, Monitoring, and Analysis Facilitated by Novel and Widespread Applications of Sensors." The issue features 10 academic articles selected from 30 submissions. The articles cover a range of topics related to human activity recognition (HAR) and sensor applications, including video-based HAR using graph neural networks, wearable-based HAR with localization, human body movement characteristics for acrophobia study, human activity counting using deep learning, warship commander activities for mental analysis, complex HAR in urban environmental exposure research, accelerometer-based HAR using domain generalization, finger gesture-based user identification using radio frequency technology, associating human behavior, manufacture, and digital interaction with fabrication, and facial expression understanding using deep learning and multimodal large language models. The articles present state-of-the-art approaches and findings in HAR research and are authored by scientists from 14 countries across three continents. [Extracted from the article]
Published: 2024
Full Text: View/download PDF

15. MS2OD: Outlier Detection Using Minimum Spanning Tree and Medoid Selection

Author: Li, Jia, primary, Li, Jiangwei, additional, Wang, Chenxu, additional, Verbeek, Fons J, additional, Schultz, Tanja, additional, and Liu, Hui, additional
Published: 2024
Full Text: View/download PDF

16. Speech Features as Predictors of Momentary Depression Severity in Patients With Depressive Disorder Undergoing Sleep Deprivation Therapy: Ambulatory Assessment Pilot Study

Author: Wadle, Lisa-Marie, primary, Ebner-Priemer, Ulrich W, additional, Foo, Jerome C, additional, Yamamoto, Yoshiharu, additional, Streit, Fabian, additional, Witt, Stephanie H, additional, Frank, Josef, additional, Zillich, Lea, additional, Limberger, Matthias F, additional, Ablimit, Ayimnisagul, additional, Schultz, Tanja, additional, Gilles, Maria, additional, Rietschel, Marcella, additional, and Sirignano, Lea, additional
Published: 2024
Full Text: View/download PDF

17. NeuroHeed: Neuro-Steered Speaker Extraction Using EEG Signals

Author: Pan, Zexu, Borsdorf, Marvin, Cai, Siqi, Schultz, Tanja, and Li, Haizhou
Abstract: Humans possess the remarkable ability to selectively attend to a single speaker amidst competing voices and background noise, known as selective auditory attention. Recent studies in auditory neuroscience indicate a strong correlation between the attended speech signal and the corresponding brain's elicited neuronal activities. In this work, we study such brain activities measured using affordable and non-intrusive electroencephalography (EEG) devices. We present NeuroHeed, a speaker extraction model that leverages the listener's synchronized EEG signals to extract the attended speech signal in a cocktail party scenario, in which the extraction process is conditioned on a neuronal attractor encoded from the EEG signal. We propose both an offline and an online NeuroHeed, with the latter designed for real-time inference. In the online NeuroHeed, we additionally propose an autoregressive speaker encoder, which accumulates past extracted speech signals for self-enrollment of the attended speaker information into an auditory attractor, that retains the attentional momentum over time. Online NeuroHeed extracts the current window of the speech signals with guidance from both attractors. Experimental results on KUL dataset two-speaker scenario demonstrate that NeuroHeed effectively extracts brain-attended speech signals with an average scale-invariant signal-to-noise ratio improvement (SI-SDRi) of 14.3 dB and extraction accuracy of 90.8% in offline settings, and SI-SDRi of 11.2 dB and extraction accuracy of 85.1% in online settings.
Published: 2024
Full Text: View/download PDF

18. Brain Topology Modeling With EEG-Graphs for Auditory Spatial Attention Detection

Author: Cai, Siqi, Schultz, Tanja, and Li, Haizhou
Abstract: Objective: Despite recent advances, the decoding of auditory attention from brain signals remains a challenge. A key solution is the extraction of discriminative features from high-dimensional data, such as multi-channel electroencephalography (EEG). However, to our knowledge, topological relationships between individual channels have not yet been considered in any study. In this work, we introduced a novel architecture that exploits the topology of the human brain to perform auditory spatial attention detection (ASAD) from EEG signals. Methods: We propose EEG-Graph Net, an EEG-graph convolutional network, which employs a neural attention mechanism. This mechanism models the topology of the human brain in terms of the spatial pattern of EEG signals as a graph. In the EEG-Graph, each EEG channel is represented by a node, while the relationship between two EEG channels is represented by an edge between the respective nodes. The convolutional network takes the multi-channel EEG signals as a time series of EEG-graphs and learns the node and edge weights from the contribution of the EEG signals to the ASAD task. The proposed architecture supports the interpretation of the experimental results by data visualization. Results: We conducted experiments on two publicly available databases. The experimental results showed that EEG-Graph Net significantly outperforms the state-of-the-art methods in terms of decoding performance. In addition, the analysis of the learned weight patterns provides insights into the processing of continuous speech in the brain and confirms findings from neuroscientific studies. Conclusion: We showed that modeling brain topology with EEG-graphs yields highly competitive results for auditory spatial attention detection. Significance: The proposed EEG-Graph Net is more lightweight and accurate than competing baselines and provides explanations for the results. Also, the architecture can be easily transferred to other brain-computer interface (BCI) tasks.
Published: 2024
Full Text: View/download PDF

19. Bilingual LSA-based adaptation for statistical machine translation

Author: Tam, Yik-Cheung, Lane, Ian, and Schultz, Tanja
Abstract: Abstract: We propose a novel approach to cross-lingual language model and translation lexicon adaptation for statistical machine translation (SMT) based on bilingual latent semantic analysis. Bilingual LSA enables latent topic distributions to be efficiently transferred across languages by enforcing a one-to-one topic correspondence during training. Using the proposed bilingual LSA framework, model adaptation can be performed by, first, inferring the topic posterior distribution of the source text and then applying the inferred distribution to an n-gram language model of the target language and translation lexicon via marginal adaptation. The background phrase table is enhanced with the additional phrase scores computed using the adapted translation lexicon. The proposed framework also features rapid bootstrapping of LSA models for new languages based on a source LSA model of another language. Our approach is evaluated on the Chinese–English MT06 test set using the medium-scale SMT system and the GALE SMT system measured in BLEU and NIST scores. Improvement in both scores is observed on both systems when the adapted language model and the adapted translation lexicon are applied individually. When the adapted language model and the adapted translation lexicon are applied simultaneously, the gain is additive. At the 95% confidence interval of the unadapted baseline system, the gain in both scores is statistically significant using the medium-scale SMT system, while the gain in the NIST score is statistically significant using the GALE SMT system.
Published: 2024
Full Text: View/download PDF

20. [Developments in the digitalization of public health since 2020 : Examples from the Leibniz ScienceCampus Digital Public Health Bremen].

Author: Zeeb H, Schüz B, Schultz T, and Pigeot I
Subjects: Humans, Artificial Intelligence, Pandemics prevention & control, Germany, Surveys and Questionnaires, Public Health, COVID-19 epidemiology, COVID-19 prevention & control
Abstract: Digital public health has received a significant boost in recent years, especially due to the demands associated with the COVID-19 pandemic. In this report, we provide an overview of the developments in digitalization in the field of public health in Germany since 2020 and illustrate these with examples from the Leibniz ScienceCampus Digital Public Health Bremen (LSC DiPH).The following topics are central: How do digital survey methods as well as digital biomarkers and artificial intelligence methods shape modern epidemiology and prevention research? What is the status of digitalization in public health offices? Which approaches to health economics evaluation of digital public health interventions have been utilized so far? What is the status of training and further education in digital public health?The first years of the Leibniz ScienceCampus Digital Public Health Bremen (LSC DiPH) were also strongly influenced by the COVID-19 pandemic. Repeated population-based digital surveys of the LSC indicated an increase in use of health apps in the population, for example, in applications to support physical activity. The COVID-19-pandemic has also shown that the digitalization of public health enhances the risk of misinformation and disinformation., (© 2024. The Author(s).)
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

20 results on '"Schultz, Tanja"'

1. Investigating Effective Speaker Property Privacy Protection in Federated Learning for Speech Emotion Recognition

2. Speech as a Biomarker for Disease Detection

3. NeuroSpex: Neuro-Guided Speaker Extraction with Cross-Modal Attention

4. On the Role of Visual Grounding in VQA

5. Speech Emotion Recognition under Resource Constraints with Data Distillation

6. Diff-ETS: Learning a Diffusion Probabilistic Model for Electromyography-to-Speech Conversion

7. STAA-Net: A Sparse and Transferable Adversarial Attack for Speech Emotion Recognition

8. Uncovering the Full Potential of Visual Grounding Methods in VQA

9. LabLinking: theory, framework, and solutions of connecting laboratories for distributed human experiments

10. Hybrid Adaptive Systems

11. Entwicklungen in der Digitalisierung von Public Health seit 2020: Beispiele aus dem Leibniz-WissenschaftsCampus Digital Public Health Bremen

12. LSTM-MorA: Melody-Accompaniment Classification of MIDI Tracks

13. Taxonomy and Real-Time Classification of Artifacts During Biosignal Acquisition: A Starter Study and Dataset of ECG

14. Human Activity Recognition, Monitoring, and Analysis Facilitated by Novel and Widespread Applications of Sensors.

15. MS2OD: Outlier Detection Using Minimum Spanning Tree and Medoid Selection

16. Speech Features as Predictors of Momentary Depression Severity in Patients With Depressive Disorder Undergoing Sleep Deprivation Therapy: Ambulatory Assessment Pilot Study

17. NeuroHeed: Neuro-Steered Speaker Extraction Using EEG Signals

18. Brain Topology Modeling With EEG-Graphs for Auditory Spatial Attention Detection

19. Bilingual LSA-based adaptation for statistical machine translation

20. [Developments in the digitalization of public health since 2020 : Examples from the Leibniz ScienceCampus Digital Public Health Bremen].

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

20 results on '"Schultz, Tanja"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources