1. Sequential Linefeed Insertion into Lecture Transcriptions for Real-Time Captioning
- Author
-
Masaki Murata, Shigeki Matsubara, and Tomohiro Ohno
- Subjects
Closed captioning ,Phrase ,Computer Networks and Communications ,business.industry ,Computer science ,Applied Mathematics ,Speech recognition ,General Physics and Astronomy ,Speech corpus ,computer.software_genre ,Dependency structure ,Signal Processing ,Information support ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,computer ,Natural language processing ,Sentence ,Delay time ,Spoken language - Abstract
SUMMARY To generate readable captions for Japanese spoken monologues such as lectures in real time, it is necessary to sequentially display captions that have proper linefeeds inserted. This paper proposes a technique for sequentially inserting proper linefeeds into a lecture transcript whenever a bunsetsu, which is a linguistic unit shorter than a sentence in Japanese and that roughly corresponds to a basic phrase in English, is identified. Under the assumption that linefeeds are inserted at bunsetsu boundaries, this technique can reduce the delay time of captioning to the utmost possible. This technique statistically judges whether or not a linefeed should be inserted into each bunsetsu boundary by using the information that is available at the time. We conducted experiments on linefeed insertion using a Japanese lecture corpus. The experimental results confirmed that our method, which is a bunsetsu-based linefeed insertion method, was almost as accurate as the sentence-based linefeed insertion method. In addition, we conducted comparative evaluations using four baseline methods. The results confirmed that our method could insert linefeeds more accurately than the simple methods that are thought to have the same delay time as our method.
- Published
- 2015
- Full Text
- View/download PDF