Search

Your search keyword '"Li, Haizhou"' showing total 2,136 results

Search Constraints

Start Over You searched for: Author "Li, Haizhou" Remove constraint Author: "Li, Haizhou"
2,136 results on '"Li, Haizhou"'

Search Results

1. NeuroSpex: Neuro-Guided Speaker Extraction with Cross-Modal Attention

2. Human-Inspired Audio-Visual Speech Recognition: Spike Activity, Cueing Interaction and Causal Processing

3. Generative Expressive Conversational Speech Synthesis

4. Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization

5. Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning

6. GROOT: Generating Robust Watermark for Diffusion-Model-Based Audio Synthesis

7. SA-WavLM: Speaker-Aware Self-Supervised Pre-training for Mixture Speech

8. Emotion and Intent Joint Understanding in Multimodal Conversation: A Benchmarking Dataset

9. DynaThink: Fast or Slow? A Dynamic Decision-Making Framework for Large Language Models

10. RefXVC: Cross-Lingual Voice Conversion with Enhanced Reference Leveraging

11. Apprenticeship-Inspired Elegance: Synergistic Knowledge Distillation Empowers Spiking Neural Networks for Efficient Single-Eye Emotion Recognition

12. Take the essence and discard the dross: A Rethinking on Data Selection for Fine-Tuning Large Language Models

13. SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words

14. An Exploration of Length Generalization in Transformer-Based Speech Enhancement

15. Multi-Scale Accent Modeling with Disentangling for Multi-Speaker Multi-Accent TTS Synthesis

16. ED-sKWS: Early-Decision Spiking Neural Networks for Rapid,and Energy-Efficient Keyword Spotting

17. Target Speech Diarization with Multimodal Prompts

18. Autoregressive Diffusion Transformer for Text-to-Speech Synthesis

19. How Do Neural Spoofing Countermeasures Detect Partially Spoofed Audio?

20. TS-Align: A Teacher-Student Collaborative Framework for Scalable Iterative Finetuning of Large Language Models

21. Unsupervised Mutual Learning of Dialogue Discourse Parsing and Topic Segmentation

22. Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language Models

23. Mamba in Speech: Towards an Alternative to Self-Attention

24. Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis

25. Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems

26. Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention

27. An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder

28. Voice Conversion Augmentation for Speaker Recognition on Defective Datasets

29. Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training

30. Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy

31. CrossTune: Black-Box Few-Shot Classification with Label Enhancement

32. sVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection with Spiking Neural Networks

33. Apollo: A Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People

34. Fine-Grained Quantitative Emotion Editing for Speech Generation

35. Event-Driven Learning for Spiking Neural Networks

36. Text-guided HuBERT: Self-Supervised Speech Pre-training via Generative Adversarial Networks

37. Computation and Parameter Efficient Multi-Modal Fusion Transformer for Cued Speech Recognition

38. LitE-SNN: Designing Lightweight and Efficient Spiking Neural Network through Spatial-Temporal Compressive Network Search and Joint Optimization

39. CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing

40. An Empirical Study on the Impact of Positional Encoding in Transformer-based Monaural Speech Enhancement

41. Gradient weighting for speaker verification in extremely low Signal-to-Noise Ratio

42. The NUS-HLT System for ICASSP2024 ICMC-ASR Grand Challenge

43. A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators

44. Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling

45. Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

46. Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification

47. Human Demonstrations are Generalizable Knowledge for Robots

48. Sparsity-Driven EEG Channel Selection for Brain-Assisted Speech Enhancement

49. HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs

50. How Well Do Text Embedding Models Understand Syntax?

Catalog

Books, media, physical & digital resources