Search

Your search keyword '"Li, Jinyu"' showing total 2,579 results

Search Constraints

Start Over You searched for: Author "Li, Jinyu" Remove constraint Author: "Li, Jinyu" Publication Year Range Last 50 years Remove constraint Publication Year Range: Last 50 years
2,579 results on '"Li, Jinyu"'

Search Results

1. Target word activity detector: An approach to obtain ASR word boundaries without lexicon

2. Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation

3. Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech

4. Autoregressive Speech Synthesis without Vector Quantization

5. VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment

6. Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation

7. An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS

8. VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

9. Total-Duration-Aware Duration Modeling for Text-to-Speech Systems

10. TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

11. CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

12. RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

13. WavLLM: Towards Robust and Adaptive Speech Large Language Model

14. Advanced Long-Content Speech Recognition With Factorized Neural Transducer

15. NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

17. Boosting Large Language Model for Speech Synthesis: An Empirical Study

18. Future Intelligent Data link and Unit-Level Combat System Based on Global Combat Cloud

19. COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning

22. Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation

23. RD-VIO: Robust Visual-Inertial Odometry for Mobile Augmented Reality in Dynamic Environments

24. Enhanced Edge-Perceptual Guided Image Filtering

25. Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach

26. ResidualTransformer: Residual Low-Rank Learning with Weight-Sharing for Transformer Layers

27. t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability

28. DiariST: Streaming Speech Translation with Speaker Diarization

29. SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

30. Deepsea: A Meta-ocean Prototype for Undersea Exploration

31. Pre-training End-to-end ASR Models with Augmented Speech Samples Queried by Text

32. On decoder-only architecture for speech-to-text and large language model integration

33. Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments

34. Accelerating Transducers through Adjacent Token Merging

35. Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition

36. Accurate and Structured Pruning for Efficient Automatic Speech Recognition

37. VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation

38. PillarNeXt: Rethinking Network Designs for 3D Object Detection in LiDAR Point Clouds

41. Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling

42. Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training

43. Improving Contextual Spelling Correction by External Acoustics Attention and Semantic Aware Data Augmentation

44. Speaker Change Detection for Transformer Transducer ASR

45. Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

46. Fast and accurate factorized neural transducer for text adaption of end-to-end speech recognition models

47. VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning

48. LongFNT: Long-form Speech Recognition with Factorized Neural Transducer

49. Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition

50. Speech separation with large-scale self-supervised learning

Catalog

Books, media, physical & digital resources