4 results on '"Koller, Oscar"'
Search Results
2. Read My Lips: Continuous Signer Independent Weakly Supervised Viseme Recognition
- Author
-
Koller, Oscar, Ney, Hermann, Bowden, Richard, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Kobsa, Alfred, Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Fleet, David, editor, Pajdla, Tomas, editor, Schiele, Bernt, editor, and Tuytelaars, Tinne, editor
- Published
- 2014
- Full Text
- View/download PDF
3. Towards large vocabulary continuous sign language recognition: from artificial to real-life tasks
- Author
-
Koller, Oscar Tobias Anatol, Ney, Hermann, and Bowden, Richard
- Subjects
hidden markov model ,sign language recognition ,ddc:004 ,video processing ,computer vision - Abstract
Dissertation, RWTH Aachen University, 2020; Aachen 1 Online-Ressource (xi, 180 Seiten) : Illustrationen, Diagramme (2020). = Dissertation, RWTH Aachen University, 2020, Deaf people represent a minority that faces strong accessibility challenges due to a world focused on oral-auditory communication. This thesis deals with large vocabulary continuous sign language recognition, which has the potential to overcome accessibility issues and also communication barriers between Deaf and hearing people. The full communication pipeline is bidirectional and composed of recognition, translation and generation sub-tasks going from sign to spoken and from spoken to sign language. Sign language recognition targets one complex sub-problem in the communication direction from sign to spoken language, recognising the sequence of signs ina signed video utterance. In the scope of this thesis, signs are represented by semantic gloss descriptors which are used to transcribe a signed utterance. It is assumed that sign language video and gloss transcriptions share the same temporal order. The translation problem, which is not addressed in this work, focuses on reordering and translating the recognition output into spoken language, which could then be written or spoken out by the generation part.Automatic sign language recognition is a multi-disciplinary task that covers techniques from its numerous neighbouring fields, such as speech recognition, computer vision and linguistics. Historically, research on sign language recognition has been relatively scattered and often researchers independently captured their own small-scale data sets for experimentation. This has several disadvantages. Mostly, the data sets do not cover sufficient complexity that sign language encompass. Moreover, most previous work does not tackle continuous sign language but only isolated single signs. Besides containing only a small and very limited vocabulary (less than 100 different signs), no work has ever targeted real-life sign language. Until now, the employed data sets only comprised artificial and staged sign language footage, which was planned and recorded with the aim of enabling automatic recognition. The kind of signs to be encountered, the structure of sentences, the signing speed, the choice of expression and dialects were usually controlled and determined beforehand. This work aims at moving sign language recognition to more realistic scenarios. For this purpose we create the first real-life large vocabulary continuous sign language corpora, which are based on recordings of the broadcast channel featuring natural sign language of professional interpreters. This kind of data provides unprecedented complexity for recognition. In the scope of this thesis, we made it publicly available free of charge. A conventional GMM-HMM statistical sign language recognition system with distinct and manually engineered features is created and evaluated on the challenging task. We then leverage the recent advances in deep learning and propose modern hybrid CNN-LSTM-HMM models which are shown to halve the recognition error. We analyse theimpact of various architectural design decisions with the aim of giving guidance to researchers in the field. Finally, we develop a weakly supervised learning scheme based on hybrid multi-stream CNN-LSTM-HMMs that allows the efficient spotting of subunits such as articulated handshapes and mouth patterns in sign language footage., Published by Aachen
- Published
- 2020
- Full Text
- View/download PDF
4. Continuous sign language recognition: Towards large vocabulary statistical recognition systems handling multiple signers.
- Author
-
Koller, Oscar, Forster, Jens, and Ney, Hermann
- Subjects
SIGN language ,IMAGE recognition (Computer vision) ,HUMAN facial recognition software ,HUMAN-computer interaction ,IMAGE analysis - Abstract
This work presents a statistical recognition approach performing large vocabulary continuous sign language recognition across different signers. Automatic sign language recognition is currently evolving from artificial lab-generated data to ‘real-life’ data. To the best of our knowledge, this is the first time system design on a large data set with true focus on real-life applicability is thoroughly presented. Our contributions are in five areas, namely tracking, features, signer dependency, visual modelling and language modelling. We experimentally show the importance of tracking for sign language recognition with respect to the hands and facial landmarks. We further contribute by explicitly enumerating the impact of multimodal sign language features describing hand shape, hand position and movement, inter-hand-relation and detailed facial parameters, as well as temporal derivatives. In terms of visual modelling we evaluate non-gesture-models, length modelling and universal transition models. Signer-dependency is tackled with CMLLR adaptation and we further improve the recognition by employing class language models. We evaluate on two publicly available large vocabulary databases representing lab-data (SIGNUM database: 25 signers, 455 sign vocabulary, 19k sentences) and unconstrained ‘real-life’ sign language (RWTH-PHOENIX-Weather database: 9 signers, 1081 sign vocabulary, 7k sentences) and achieve up to 10.0%/16.4% and respectively up to 34.3%/53.0% word error rate for single signer/multi-signer setups. Finally, this work aims at providing a starting point to newcomers into the field. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.