Search

Your search keyword '"Liu, Shujie"' showing total 49 results

Search Constraints

Start Over You searched for: Author "Liu, Shujie" Remove constraint Author: "Liu, Shujie" Topic computer science - sound Remove constraint Topic: computer science - sound
49 results on '"Liu, Shujie"'

Search Results

1. Autoregressive Speech Synthesis without Vector Quantization

2. VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment

3. VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

4. TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

5. CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

6. RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

7. WavLLM: Towards Robust and Adaptive Speech Large Language Model

8. Advanced Long-Content Speech Recognition With Factorized Neural Transducer

9. Boosting Large Language Model for Speech Synthesis: An Empirical Study

10. Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction

11. WavMark: Watermarking for Audio Generation

12. SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

13. On decoder-only architecture for speech-to-text and large language model integration

14. VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation

15. ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation

16. Code-Switching Text Generation and Injection in Mandarin-English ASR

17. Target Sound Extraction with Variable Cross-modality Clues

18. Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling

19. Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

20. BEATs: Audio Pre-Training with Acoustic Tokenizers

21. VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning

22. Exploring WavLM on Speech Enhancement

23. LongFNT: Long-form Speech Recognition with Factorized Neural Transducer

24. LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers

25. Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation

26. Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training

27. Ultra Fast Speech Separation Model with Teacher Student Learning

28. Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?

29. Speech Pre-training with Acoustic Piece

30. Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data

31. Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction

32. WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

33. Separating Long-Form Speech with Group-Wise Permutation Invariant Training

34. Optimizing Alignment of Speech and Language Latent Spaces for End-to-End Speech Recognition and Understanding

35. SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing

36. Large-scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification

37. UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training

38. Multi-View Self-Attention Based Transformer for Speaker Recognition

39. A Configurable Multilingual Model is All You Need to Recognize All Languages

40. Investigation of Practical Aspects of Single Channel Speech Separation for ASR

41. UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data

42. Don't shoot butterfly with rifles: Multi-channel Continuous Speech Separation with Early Exit Transformer

43. Microsoft Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 2020

44. MoBoAligner: a Neural Alignment Model for Non-autoregressive TTS with Monotonic Boundary Search

45. Curriculum Pre-training for End-to-End Speech Translation

46. Semantic Mask for Transformer based End-to-End Speech Recognition

47. Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation

48. Optimizing Alignment of Speech and Language Latent Spaces for End-To-End Speech Recognition and Understanding

49. LAMASSU: A Streaming Language-Agnostic Multilingual Speech Recognition and Translation Model Using Neural Transducers

Catalog

Books, media, physical & digital resources