Search

Your search keyword '"Electrical Engineering and Systems Science - Audio and Speech Processing"' showing total 41,389 results

Search Constraints

Start Over You searched for: Descriptor "Electrical Engineering and Systems Science - Audio and Speech Processing" Remove constraint Descriptor: "Electrical Engineering and Systems Science - Audio and Speech Processing"
41,389 results on '"Electrical Engineering and Systems Science - Audio and Speech Processing"'

Search Results

251. Mitigating Unauthorized Speech Synthesis for Voice Protection

252. RDSinger: Reference-based Diffusion Network for Singing Voice Synthesis

253. A Tutorial on Clinical Speech AI Development: From Data Collection to Model Validation

254. Audio Classification of Low Feature Spectrograms Utilizing Convolutional Neural Networks

255. A Novel Score-CAM based Denoiser for Spectrographic Signature Extraction without Ground Truth

256. Enhancing TTS Stability in Hebrew using Discrete Semantic Units

257. Knowledge Distillation for Real-Time Classification of Early Media in Voice Communications

258. Simultaneous Diarization and Separation of Meetings through the Integration of Statistical Mixture Models

259. OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup

260. ST-ITO: Controlling Audio Effects for Style Transfer with Inference-Time Optimization

261. Multilingual Standalone Trustworthy Voice-Based Social Network for Disaster Situations

262. Meta-Learning Approaches for Improving Detection of Unseen Speech Deepfakes

263. Using Confidence Scores to Improve Eyes-free Detection of Speech Recognition Errors

264. Automatic Estimation of Singing Voice Musical Dynamics

265. MidiTok Visualizer: a tool for visualization and analysis of tokenized MIDI symbolic music

266. Symbotunes: unified hub for symbolic music generative models

267. MusicFlow: Cascaded Flow Matching for Text Guided Music Generation

268. Conditional GAN for Enhancing Diffusion Models in Efficient and Authentic Global Gesture Generation from Audios

269. An approach to hummed-tune and song sequences matching

270. Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation

271. Improving Speech-based Emotion Recognition with Contextual Utterance Analysis and LLMs

272. Enhancing Lie Detection Accuracy: A Comparative Study of Classic ML, CNN, and GCN Models using Audio-Visual Features

273. Analyzing long-term rhythm variations in Mising and Assamese using frequency domain correlates

274. emg2qwerty: A Large Dataset with Baselines for Touch Typing using Surface Electromyography

275. Personality Analysis from Online Short Video Platforms with Multi-domain Adaptation

276. Do Discrete Self-Supervised Representations of Speech Capture Tone Distinctions?

277. GPT-4o System Card

278. Temporal Convolution-based Hybrid Model Approach with Representation Learning for Real-Time Acoustic Anomaly Detection

279. Arabic Music Classification and Generation using Deep Learning

280. Mask-Weighted Spatial Likelihood Coding for Speaker-Independent Joint Localization and Mask Estimation

281. CloserMusicDB: A Modern Multipurpose Dataset of High Quality Music

282. Beyond Correlation: Evaluating Multimedia Quality Models with the Constrained Concordance Index

283. We Augmented Whisper With kNN and You Won't Believe What Came Next

284. Wavetable Synthesis Using CVAE for Timbre Control Based on Semantic Label

285. STTATTS: Unified Speech-To-Text And Text-To-Speech Model

286. Making Social Platforms Accessible: Emotion-Aware Speech Generation with Integrated Text Analysis

287. MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark

288. AlignCap: Aligning Speech Emotion Captioning to Human Preferences

289. A Survey on Speech Large Language Models

290. A contrastive-learning approach for auditory attention detection

291. Gibberish is All You Need for Membership Inference Detection in Contrastive Language-Audio Pretraining

292. Contextual Biasing to Improve Domain-specific Custom Vocabulary Audio Transcription without Explicit Fine-Tuning of Whisper Model

293. Unified Microphone Conversion: Many-to-Many Device Mapping via Feature-wise Linear Modulation

294. Robust and Explainable Depression Identification from Speech Using Vowel-Based Ensemble Learning Approaches

295. Optimizing the role of human evaluation in LLM-based spoken document summarization systems

296. Vocal Melody Construction for Persian Lyrics Using LSTM Recurrent Neural Networks

297. ELAICHI: Enhancing Low-resource TTS by Addressing Infrequent and Low-frequency Character Bigrams

298. Non-intrusive Speech Quality Assessment with Diffusion Models Trained on Clean Speech

299. OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation

300. Regularized autoregressive modeling and its application to audio signal declipping

Catalog

Books, media, physical & digital resources