Search

Your search keyword '"Ginsburg, Boris"' showing total 242 results

Search Constraints

Start Over You searched for: Author "Ginsburg, Boris" Remove constraint Author: "Ginsburg, Boris"
242 results on '"Ginsburg, Boris"'

Search Results

1. NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts

2. Anticipating Future with Large Language Model for Simultaneous Machine Translation

3. VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning

4. Three-in-One: Fast and Accurate Transducer for Hybrid-Autoregressive ASR

5. nGPT: Normalized Transformer with Representation Learning on the Hypersphere

6. Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data

7. EMMeTT: Efficient Multimodal Machine Translation Training

8. META-CAT: Speaker-Informed Speech Embeddings via Meta Information Concatenation for Multi-talker ASR

9. Chain-of-Thought Prompting for Speech Translation

10. Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition

11. Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens

12. Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation

13. Resource-Efficient Adaptation of Speech Foundation Models for Multi-Speaker ASR

14. NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks

15. Genetic Instruct: Scaling up Synthetic Generation of Coding Instructions for Large Language Models

16. Schr\'odinger Bridge for Generative Speech Enhancement

17. Romanization Encoding For Multilingual ASR

18. Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations

19. BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5

20. Less is More: Accurate Speech Recognition & Translation without Web-Scale Data

21. DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment

22. Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment

23. Instruction Data Generation and Unsupervised Adaptation for Speech Language Models

24. Nemotron-4 340B Technical Report

25. Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter

26. Label-Looping: Highly Efficient Decoding for Transducers

27. Spectral Codecs: Spectrogram-Based Audio Codecs for High Quality Speech Synthesis

28. Flexible Multichannel Speech Enhancement for Noise-Robust Frontend

29. RULER: What's the Real Context Size of Your Long-Context Language Models?

30. Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition

31. Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer

32. Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition

33. The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System

34. Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation

35. SelfVC: Voice Conversion With Iterative Refinement using Self Transformations

36. SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation

37. LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of end-to-end ASR Models

38. A Chat About Boring Problems: Studying GPT-based text normalization

39. Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition

40. Investigating End-to-End ASR Architectures for Long Form Audio Transcription

41. Conformer-based Target-Speaker Automatic Speech Recognition for Single-Channel Audio

42. Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling

43. Confidence-based Ensembles of End-to-End Speech Recognition Models

44. Unified model for code-switching speech recognition and language identification based on a concatenated tokenizer

45. SpellMapper: A non-autoregressive neural spellchecker for ASR customization with candidate retrieval based on n-gram mappings

46. Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition

47. Efficient Sequence Transduction by Jointly Predicting Tokens and Durations

48. Powerful and Extensible WFST Framework for RNN-Transducer Losses

49. VANI: Very-lightweight Accent-controllable TTS for Native and Non-native speakers with Identity Preservation

50. Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator

Catalog

Books, media, physical & digital resources