76 results on '"semi-supervised training"'
Search Results
2. Comparison of Well and Lower-Resourced Self-training in ASR
- Author
-
Luo, Yue, Mihajlik, Péter, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Karpov, Alexey, editor, and Delić, Vlado, editor
- Published
- 2025
- Full Text
- View/download PDF
3. 基于图卷积网络的电能质量评估.
- Author
-
黄宏清, 倪道宏, and 刘雪松
- Subjects
- *
GRAPH neural networks , *ARTIFICIAL neural networks , *RATE setting , *EVALUATION methodology - Abstract
The increasingly widespread use of new power equipment has brought new disturbances to the power system and has placed increasing demands on power quality. In order to make full use of the power quality indicators in the national standards and to make a more comprehensive and integrated evaluation of power quality, this study proposes a power quality evaluation method based on graph convolutional network. A power quality assessment system with graded indicators is proposed according to the current national standards. The correlation between the various power quality assessment indicators is initially determined, and on this basis the indicator relationship diagram is determined, a graph neural network model is built and trained, and the error rate of the test set is 9.02%. A comparison and analysis with other assessment methods using actual measurement data of a power system proves that the proposed method is more effective in assessing power quality over a long time span. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Improving Object Detection Accuracy with Self-Training Based on Bi-Directional Pseudo Label Recovery.
- Author
-
Sajid, Shoaib, Aziz, Zafar, Urmonov, Odilbek, and Kim, HyungWon
- Subjects
TRAINING needs ,DETECTORS - Abstract
Semi-supervised training methods need reliable pseudo labels for unlabeled data. The current state-of-the-art methods based on pseudo labeling utilize only high-confidence predictions, whereas poor confidence predictions are discarded. This paper presents a novel approach to generate high-quality pseudo labels for unlabeled data. It utilizes predictions with high- and low-confidence levels to generate refined labels and then validates the accuracy of those predictions through bi-directional object tracking. The bi-directional object tracker leverages both past and future information to recover missing labels and increase the accuracy of the generated pseudo labels. This method can also substantially reduce the effort and time needed in label creation compared to the conventional manual labeling. The proposed method utilizes a buffer to accumulate detection labels (bounding boxes) predicted by the object detector. These labels are refined for accuracy though forward and backward tracking, ultimately constructing the final set of pseudo labels. The method is integrated in the YOLOv5 object detector and tested on the BDD100K dataset. Through the experiments, we demonstrate the effectiveness of the proposed scheme in automating the process of pseudo label generation with notably higher accuracy than the recent state-of-the-art pseudo label generation schemes. The results show that the proposed method outperforms previous methods in terms of mean average precision (mAP), label generation accuracy, and speed. Using the bi-directional recovery method, an increase in mAP@50 for the BDD100K dataset by 0.52% is achieved, and for the Waymo dataset, it provides an improvement of mAP@50 by 8.7% to 9.9% compared to 8.1% of the existing method when pre-training with 10% of the dataset. An improvement by 2.1% to 2.9% is achieved as compared to 1.7% of the existing method when pre-training with 20% of the dataset. Overall, the improved method leads to a significant enhancement in detection accuracy, achieving higher mAP scores across various datasets, thus demonstrating its robustness and effectiveness in diverse conditions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Dealing with Training Deficiencies
- Author
-
Toennies, Klaus D. and Toennies, Klaus D.
- Published
- 2024
- Full Text
- View/download PDF
6. A home-based hand rehabilitation platform for hemiplegic patients after stroke: A feasibility study
- Author
-
Jasem Banihani and Mohamed-Amine Choukou
- Subjects
Telerehabilitation ,Stroke ,Hemiplegia ,Semi-supervised training ,Science (General) ,Q1-390 ,Social sciences (General) ,H1-99 - Abstract
Background: Patients with stroke often experience weakened upper limbs, making daily tasks difficult to perform. Although rehabilitation devices are available, patients often relapse post-discharge due to insufficient practice. We present a home-based hand telerehabilitation intervention using the iManus™ platform comprising a sensorized glove, a mobile app for the patients, and a therapist portal for monitoring patient progress. Objectives: This research aimed to examine the feasibility, safety, and effectiveness of a home-based telerehabilitation intervention in improving hand function for individuals with mild stroke. A qualitative approach was also used to explore users' experiences, perceived benefits, and challenges associated with using the platform in a home setting. Methods: In this single-case study, we delivered a hand telerehabilitation intervention to a chronic stroke patient with impaired hand function using the iManus™ platform. The intervention consisted of 40 home sessions over eight weeks. We assessed feasibility through user adherence and feedback obtained using a System Usability Scale (SUS) and a semi-structured interview with the participant and their informal caregiver. Safety was evaluated by monitoring pain levels using the Visual Analog Scale (VAS), and efficacy was determined by observing the changes in the fingers’ range of motion using the iManus™ platform and clinical outcomes measures, namely the Fugl-Meyer Assessment (FMA) and Jebsen Taylor Hand Function Test (JTHFT). Results: Our participant completed all the assigned sessions, with each averaging 20 min. Usability scored 77.5 out of 100 on the SUS. User feedback from the interviews revealed improved mobility and control over therapy as benefits, indicating room for improvement in the intervention's adaptability and functionality. During the intervention, the participant noted no pain increase, and the telerehabilitation platform recorded range of motion improvements for all finger and wrist joints, excluding wrist extension. The FMA scores were 43 at T0, 53 at T1, and 56 at T2, while the JTHFT scores were 223 at T0, 188 at T1, and 240 at T2. Conclusions: This single case study demonstrated the preliminary feasibility, safety, and efficacy of a novel home-based hand intervention for stroke survivors. The participant showed improved hand functions, good adherence to the program, and reported satisfaction with the intervention. However, these results are based on a single-case study, and further large-scale studies are needed before any generalization is recommended.
- Published
- 2024
- Full Text
- View/download PDF
7. Detection of traffic congestion in road-occupied electric power construction based on video recognition
- Author
-
ZHANG Ke, WU Jiaqi, CHEN Weicheng, YAN Yunfeng, and QI Donglian
- Subjects
road-occupied power construction ,congestion detection ,video recognition ,domain adaptation ,semi-supervised training ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
The detection of traffic congestion is now realized mostly by human monitoring and sensor monitoring. However, such detection devices are deficient in road-occupied electric power construction. To meet the needs of low equipment dependency and high accuracy of congestion detection in the road-occupied electric power construction, a detection method based on video data is proposed, which uses neural networks to extract features from video data and determine whether there is traffic congestion. In response to data deficiency in the road-occupied electric power construction, the generalization of the network is improved by making full use of the generic traffic scene dataset, and the adaptive learning method based on domain adversarial neural networks (DANN) is used to reduce the differential performance of two data domains in the feature extraction network. Semi-supervised learning (SSL) is proposed to reduce the manual labeling workload. The experimental results show that the proposed method can achieve an accuracy of 93.2% in traffic congestion detection and recognition in road-occupied electric power construction and has high application value.
- Published
- 2023
- Full Text
- View/download PDF
8. Anomaly Detection for Wind Turbines Using Long Short-Term Memory-Based Variational Autoencoder Wasserstein Generation Adversarial Network under Semi-Supervised Training.
- Author
-
Zhang, Chen and Yang, Tao
- Subjects
- *
GENERATIVE adversarial networks , *SUPERVISED learning , *ANOMALY detection (Computer security) , *WIND turbines , *PROBABILITY density function , *DATA distribution - Abstract
Intelligent anomaly detection for wind turbines using deep-learning methods has been extensively researched and yielded significant results. However, supervised learning necessitates sufficient labeled data to establish the discriminant boundary, while unsupervised learning lacks prior knowledge and heavily relies on assumptions about the distribution of anomalies. A long short-term memory-based variational autoencoder Wasserstein generation adversarial network (LSTM-based VAE-WGAN) was established in this paper to address the challenge of small and noisy wind turbine datasets. The VAE was utilized as the generator, with LSTM units replacing hidden layer neurons to effectively extract spatiotemporal factors. The similarity between the model-fit distribution and true distribution was quantified using Wasserstein distance, enabling complex high-dimensional data distributions to be learned. To enhance the performance and robustness of the proposed model, a two-stage adversarial semi-supervised training approach was implemented. Subsequently, a monitoring indicator based on reconstruction error was defined, with the threshold set at a 99.7% confidence interval for the distribution curve fitted by kernel density estimation (KDE). Real cases from a wind farm in northeast China have confirmed the feasibility and advancement of the proposed model, while also discussing the effects of various applied parameters. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
9. Semi-supervised Monocular 3D Object Detection by Multi-view Consistency
- Author
-
Lian, Qing, Xu, Yanbo, Yao, Weilong, Chen, Yingcong, Zhang, Tong, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Avidan, Shai, editor, Brostow, Gabriel, editor, Cissé, Moustapha, editor, Farinella, Giovanni Maria, editor, and Hassner, Tal, editor
- Published
- 2022
- Full Text
- View/download PDF
10. Using Extracted Emotion Cause to Improve Content-Relevance for Empathetic Conversation Generation
- Author
-
Zou, Minghui, Pan, Rui, Zhang, Sai, Zhang, Xiaowang, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Sun, Maosong, editor, Liu, Yang, editor, Che, Wanxiang, editor, Feng, Yang, editor, Qiu, Xipeng, editor, Rao, Gaoqi, editor, and Chen, Yubo, editor
- Published
- 2022
- Full Text
- View/download PDF
11. SCAF: Skip-Connections in Auto-encoder for Face Alignment with Few Annotated Data
- Author
-
Dornier, Martin, Gosselin, Philippe-Henri, Raymond, Christian, Ricquebourg, Yann, Coüasnon, Bertrand, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Sclaroff, Stan, editor, Distante, Cosimo, editor, Leo, Marco, editor, Farinella, Giovanni M., editor, and Tombari, Federico, editor
- Published
- 2022
- Full Text
- View/download PDF
12. 融合注意力机制的对抗式半监督语义分割.
- Author
-
云飞, 殷雁君, 张文轩, and 智敏
- Subjects
CONVOLUTIONAL neural networks ,GENERATIVE adversarial networks ,COMPUTER vision ,IMAGE segmentation ,PROBLEM solving ,MATHEMATICAL convolutions ,DIGITAL image correlation ,SUPERVISED learning - Abstract
Copyright of Journal of Computer Engineering & Applications is the property of Beijing Journal of Computer Engineering & Applications Journal Co Ltd. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2023
- Full Text
- View/download PDF
13. Character region extraction of wheel water meter based on object detection.
- Author
-
Zhu, Guanhua, Zhao, Qianhui, Zhang, Zeyu, Huang, Quansi, and Cheng, Ming
- Subjects
- *
OBJECT recognition (Computer vision) , *WATER meters , *LABOR costs , *PROBLEM solving , *LABOR time - Abstract
Currently, research on automatic meter reading mainly focuses on meter reading recognition, while neglecting the fundamental role of counter detection in the entire automatic meter reading system. In fact, only by accurately locating the counter area can the influence of dial factors be completely eliminated, thus ensuring the accuracy and reliability of subsequent water meter reading recognition. In view of this phenomenon, the focus of this study is on the counter detection stage. Firstly, a target detection-based image skew correction method is proposed to solve the problem of image skew caused by shooting angle and other reasons. This method ensures the accuracy of subsequent counter area positioning and the neatness of cutting effect. Secondly, a semi-supervised target detection training method is proposed to solve the problem of time and manpower costs required in large-scale data situations. In addition, we have made publicly available a dataset containing 1070 water meter images for non-commercial purposes, which can be obtained from the Github 1 1 https://github.com/QuanhuiZhao/water-datasets.. Finally, we evaluated our model on three completely different datasets and compared it with the best positioning results of other models. The experimental results show that compared with other models, the proposed model in this paper has improved the positioning accuracy by 5.82%, 5.96%, and 9.20% on three datasets respectively. Furthermore, in the final visualization comparison, the model accurately identifies the counter region even when faced with complex real-world environments. • Innovative Approach : The manuscript introduces a novel image skew correction method based on object detection, specifically tailored for character region extraction in wheel water meters, enhancing the accuracy of automatic meter reading systems. • Target Detection : A semi-supervised target detection training approach is proposed to reduce the high time and labor costs associated with large-scale data annotation, improving efficiency in the field of automatic meter reading. • Dataset Contribution : The authors have created and made publicly available a dataset comprising 1070 water meter images for non-commercial use, facilitating further research and development in the area. • Performance Improvement : Experimental results demonstrate that the proposed model achieves significant improvements in positioning accuracy over existing models, with increases of 5.82%, 5.96%, and 9.20% on three different datasets, respectively. • Complex Environments Application : The method shows superior performance in complex environments, accurately visualizing and identifying counter regions, which is crucial for practical applications in automatic meter reading. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Anomaly Detection for Wind Turbines Using Long Short-Term Memory-Based Variational Autoencoder Wasserstein Generation Adversarial Network under Semi-Supervised Training
- Author
-
Chen Zhang and Tao Yang
- Subjects
wind turbine ,anomaly detection ,long short-term memory-based (LSTM-based) ,variational autoencoder Wasserstein generation adversarial network (VAE-WGAN) ,semi-supervised training ,Technology - Abstract
Intelligent anomaly detection for wind turbines using deep-learning methods has been extensively researched and yielded significant results. However, supervised learning necessitates sufficient labeled data to establish the discriminant boundary, while unsupervised learning lacks prior knowledge and heavily relies on assumptions about the distribution of anomalies. A long short-term memory-based variational autoencoder Wasserstein generation adversarial network (LSTM-based VAE-WGAN) was established in this paper to address the challenge of small and noisy wind turbine datasets. The VAE was utilized as the generator, with LSTM units replacing hidden layer neurons to effectively extract spatiotemporal factors. The similarity between the model-fit distribution and true distribution was quantified using Wasserstein distance, enabling complex high-dimensional data distributions to be learned. To enhance the performance and robustness of the proposed model, a two-stage adversarial semi-supervised training approach was implemented. Subsequently, a monitoring indicator based on reconstruction error was defined, with the threshold set at a 99.7% confidence interval for the distribution curve fitted by kernel density estimation (KDE). Real cases from a wind farm in northeast China have confirmed the feasibility and advancement of the proposed model, while also discussing the effects of various applied parameters.
- Published
- 2023
- Full Text
- View/download PDF
15. Advancing Brain Metastases Detection in T1-Weighted Contrast-Enhanced 3D MRI Using Noisy Student-Based Training.
- Author
-
Dikici, Engin, Nguyen, Xuan V., Bigelow, Matthew, Ryu, John L., and Prevedello, Luciano M.
- Subjects
- *
CONTRAST-enhanced magnetic resonance imaging , *CONVOLUTIONAL neural networks , *SIGNAL convolution , *CANCER prognosis - Abstract
The detection of brain metastases (BM) in their early stages could have a positive impact on the outcome of cancer patients. The authors previously developed a framework for detecting small BM (with diameters of <15 mm) in T1-weighted contrast-enhanced 3D magnetic resonance images (T1c). This study aimed to advance the framework with a noisy-student-based self-training strategy to use a large corpus of unlabeled T1c data. Accordingly, a sensitivity-based noisy-student learning approach was formulated to provide high BM detection sensitivity with a reduced count of false positives. This paper (1) proposes student/teacher convolutional neural network architectures, (2) presents data and model noising mechanisms, and (3) introduces a novel pseudo-labeling strategy factoring in the sensitivity constraint. The evaluation was performed using 217 labeled and 1247 unlabeled exams via two-fold cross-validation. The framework utilizing only the labeled exams produced 9.23 false positives for 90% BM detection sensitivity, whereas the one using the introduced learning strategy led to ~9% reduction in false detections (i.e., 8.44). Significant reductions in false positives (>10%) were also observed in reduced labeled data scenarios (using 50% and 75% of labeled data). The results suggest that the introduced strategy could be utilized in existing medical detection applications with access to unlabeled datasets to elevate their performances. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
16. Advancing Brain Metastases Detection in T1-Weighted Contrast-Enhanced 3D MRI Using Noisy Student-Based Training
- Author
-
Engin Dikici, Xuan V. Nguyen, Matthew Bigelow, John L. Ryu, and Luciano M. Prevedello
- Subjects
brain metastases ,noisy student ,semi-supervised training ,Medicine (General) ,R5-920 - Abstract
The detection of brain metastases (BM) in their early stages could have a positive impact on the outcome of cancer patients. The authors previously developed a framework for detecting small BM (with diameters of 10%) were also observed in reduced labeled data scenarios (using 50% and 75% of labeled data). The results suggest that the introduced strategy could be utilized in existing medical detection applications with access to unlabeled datasets to elevate their performances.
- Published
- 2022
- Full Text
- View/download PDF
17. Semi-Supervised Training of DNN-Based Acoustic Model for ATC Speech Recognition
- Author
-
Šmídl, Luboš, Švec, Jan, Pražák, Aleš, Trmal, Jan, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Karpov, Alexey, editor, Jokisch, Oliver, editor, and Potapova, Rodmonga, editor
- Published
- 2018
- Full Text
- View/download PDF
18. Unsupervised Geometry-Aware Representation for 3D Human Pose Estimation
- Author
-
Rhodin, Helge, Salzmann, Mathieu, Fua, Pascal, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Ferrari, Vittorio, editor, Hebert, Martial, editor, Sminchisescu, Cristian, editor, and Weiss, Yair, editor
- Published
- 2018
- Full Text
- View/download PDF
19. Semi-Supervised Training of Transformer and Causal Dilated Convolution Network with Applications to Speech Topic Classification.
- Author
-
Zeng, Jinxiang, Zhang, Du, Li, Zhiyi, and Li, Xiaolin
- Subjects
AUTOMATIC speech recognition ,SPEECH perception ,CLASSIFICATION - Abstract
Aiming at the audio event recognition problem of speech recognition, a decision fusion method based on the Transformer and Causal Dilated Convolutional Network (TCDCN) framework is proposed. This method can adjust the model sound events for a long time and capture the time correlation, and can effectively deal with the sparsity of audio data. At the same time, our dataset comes from audio clips cropped by YouTube. In order to reliably and stably identify audio topics, we extract different features and different loss function calculation methods to find the best model solution. The experimental results from different test models show that the TCDCN model proposed in this paper achieves better recognition results than the classification using neural networks and other fusion methods. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
20. A semi-supervised recurrent neural network for video salient object detection.
- Author
-
Kompella, Aditya and Kulkarni, Raghavendra V.
- Subjects
- *
RECURRENT neural networks , *SUPERVISED learning , *MACHINE learning , *DEEP learning - Abstract
A semi-supervised, one-dimensional recurrent neural network (RNN) approach called RVS has been proposed in this paper for video salient object detection. The proposed RVS approach involves the processing of each frame independently without explicitly considering temporal information. The RNN is trained using one-dimensional superpixel features to classify the salient object regions into salient foreground and non-salient background superpixels. Deep learning algorithms generally exhibit heavy dependence on training data size and often take extremely long time for training. On the contrary, the proposed RVS approach involves the training of an RNN using a small data which results in significant reduction in training time. The RVS approach has been extensively evaluated and its results are compared with those of several state-of-the-art methods using the public-domain VideoSeg, SegTrack v1 and SegTrack v2 benchmark video datasets. Further, the RVS approach has been tested using the authors' own video dataset and the complex DAVIS and video object segmentation datasets to evaluate the impact of motion and blur on its performance. The RVS approach delivers results superior to those of several approaches that strongly rely upon spatio-temporal features in detecting the salient objects from the video sequences. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
21. Semi-Supervised Speech Recognition Acoustic Model Training Using Policy Gradient.
- Author
-
Chung, Hoon, Lee, Sung Joo, Jeon, Hyeong Bae, and Park, Jeon Gue
- Subjects
SPEECH perception ,AUTOMATIC speech recognition ,REINFORCEMENT learning ,ACOUSTIC models ,MACHINE learning ,CYCLING training - Abstract
In this paper, we propose a policy gradient-based semi-supervised speech recognition acoustic model training. In practice, self-training and teacher/student learning are one of the widely used semi-supervised training methods due to their scalability and effectiveness. These methods are based on generating pseudo labels for unlabeled samples using a pre-trained model and selecting reliable samples using confidence measure. However, there are some considerations in this approach. The generated pseudo labels can be biased depending on which pre-trained model is used, and the training process can be complicated because the confidence measure is usually carried out in post-processing using external knowledge. Therefore, to address these issues, we propose a policy gradient method-based approach. Policy gradient is a reinforcement learning algorithm to find an optimal behavior strategy for an agent to obtain optimal rewards. The policy gradient-based approach provides a framework for exploring unlabeled data as well as exploiting labeled data, and it also provides a way to incorporate external knowledge in the same training cycle. The proposed approach was evaluated on an in-house non-native Korean recognition domain. The experimental results show that the method is effective in semi-supervised acoustic model training. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
22. Information Extraction
- Author
-
Grishman, Ralph and Mitkov, Ruslan, book editor
- Published
- 2022
- Full Text
- View/download PDF
23. An exploration of semi-supervised and language-adversarial transfer learning using hybrid acoustic model for hindi speech recognition
- Author
-
Kumar, Ankit and Aggarwal, Rajesh Kumar
- Published
- 2022
- Full Text
- View/download PDF
24. Semi-Supervised Training of Transformer and Causal Dilated Convolution Network with Applications to Speech Topic Classification
- Author
-
Jinxiang Zeng, Du Zhang, Zhiyi Li, and Xiaolin Li
- Subjects
topic classification ,automatic speech recognition ,semi-supervised learning ,semi-supervised training ,Transformer and Causal Dilated Convolution Network ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
Aiming at the audio event recognition problem of speech recognition, a decision fusion method based on the Transformer and Causal Dilated Convolutional Network (TCDCN) framework is proposed. This method can adjust the model sound events for a long time and capture the time correlation, and can effectively deal with the sparsity of audio data. At the same time, our dataset comes from audio clips cropped by YouTube. In order to reliably and stably identify audio topics, we extract different features and different loss function calculation methods to find the best model solution. The experimental results from different test models show that the TCDCN model proposed in this paper achieves better recognition results than the classification using neural networks and other fusion methods.
- Published
- 2021
- Full Text
- View/download PDF
25. A comprehensive review of extreme learning machine on medical imaging.
- Author
-
Huérfano-Maldonado, Yoleidy, Mora, Marco, Vilches, Karina, Hernández-García, Ruber, Gutiérrez, Rodrigo, and Vera, Miguel
- Subjects
- *
COMPUTER-assisted image analysis (Medicine) , *MACHINE learning , *FEEDFORWARD neural networks , *DIAGNOSTIC imaging , *SUPERVISED learning , *IMAGE processing , *BREAST - Abstract
The feedforward neural network based on randomization has been of great interest in the scientific community, particularly extreme learning machines, due to its simplicity, training speed, and levels of accuracy comparable to traditional learning algorithms. Extreme learning machines (ELMs) are a type of artificial neural network (ANN) with one or more hidden layers that are trained under supervised, unsupervised, or semi-supervised learning approaches. These networks are widely used in various research areas, such as medical image processing (MI). This research work presents an exhaustive review of extreme learning machines (ELM) and medical image processing (MI), due to the high impact that these networks have had on the scientific community and the importance of MI for physicians who use them to diagnose different injuries and diseases. First, the theoretical construct of ELMs is developed based on the types of supervised, unsupervised, and semi-supervised learning. Then, the importance of MI for the diagnosis of a disease or classification of the most commonly used imaging modalities is analyzed for articles concerning radiography, computed tomography (CT), magnetic resonance (MR), ultrasound (US), and mammography (MG). Next, the reference data sets linked to various human body organs, such as the brain, lungs, skin, eyes, breasts, and cervix are described. Then, a review, analysis, and classification of the development of the last 6 years (2017–2022) of ELMs, based on learning types and MI, is performed. With the information obtained above, a construction of summary tables of the articles, classified according to the type of learning, is performed, highlighting the organ, reference, year, methodology, database, modality, and results. Finally, the discussion, conclusions and challenges related to this topic are presented. The findings indicate that the review articles reported in the literature have not addressed the relationship between ELMs and medical imaging in depth and have excluded key aspects, which are developed in this article. These aspects include a comprehensive analysis of the most popular imaging modalities, a detailed description of both the most popular databases and the most relevant databases for the machine learning community and, finally, the incorporation of schemes that explain the fundamentals of the main learnings considered when generating ELM-based trained smart models, which can be useful for medical image processing. • Papers on Extreme Learning Machine medical imaging applications are discussed. • Supervised, unsupervised and semi-supervised learning are considering. • The evolution of pseudo-inverse idea is examined. • Image repositories using in medical imaging are reviewed • Research trends on Extreme Learning Machine medical imaging are presented. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
26. Semi-Supervised Speech Recognition Acoustic Model Training Using Policy Gradient
- Author
-
Hoon Chung, Sung Joo Lee, Hyeong Bae Jeon, and Jeon Gue Park
- Subjects
speech recognition ,semi-supervised training ,reinforcement learning ,policy gradient ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
In this paper, we propose a policy gradient-based semi-supervised speech recognition acoustic model training. In practice, self-training and teacher/student learning are one of the widely used semi-supervised training methods due to their scalability and effectiveness. These methods are based on generating pseudo labels for unlabeled samples using a pre-trained model and selecting reliable samples using confidence measure. However, there are some considerations in this approach. The generated pseudo labels can be biased depending on which pre-trained model is used, and the training process can be complicated because the confidence measure is usually carried out in post-processing using external knowledge. Therefore, to address these issues, we propose a policy gradient method-based approach. Policy gradient is a reinforcement learning algorithm to find an optimal behavior strategy for an agent to obtain optimal rewards. The policy gradient-based approach provides a framework for exploring unlabeled data as well as exploiting labeled data, and it also provides a way to incorporate external knowledge in the same training cycle. The proposed approach was evaluated on an in-house non-native Korean recognition domain. The experimental results show that the method is effective in semi-supervised acoustic model training.
- Published
- 2020
- Full Text
- View/download PDF
27. Semi-supervised acoustic model training for speech with code-switching.
- Author
-
Yılmaz, Emre, McLaren, Mitchell, Heuvel, Henk van den, and Leeuwen, David A. van
- Subjects
- *
AUTOMATIC speech recognition , *CODE switching (Linguistics) , *FRISIAN language , *BILINGUALISM , *LANGUAGE & languages , *PROGRAMMED instruction - Abstract
Abstract In the FAME! project, we aim to develop an automatic speech recognition (ASR) system for Frisian-Dutch code-switching (CS) speech extracted from the archives of a local broadcaster with the ultimate goal of building a spoken document retrieval system. Unlike Dutch, Frisian is a low-resourced language with a very limited amount of manually annotated speech data. In this paper, we describe several automatic annotation approaches to enable using of a large amount of raw bilingual broadcast data for acoustic model training in a semi-supervised setting. Previously, it has been shown that the best-performing ASR system is obtained by two-stage multilingual deep neural network (DNN) training using 11 hours of manually annotated CS speech (reference) data together with speech data from other high-resourced languages. We compare the quality of transcriptions provided by this bilingual ASR system with several other approaches that use a language recognition system for assigning language labels to raw speech segments at the front-end and using monolingual ASR resources for transcription. We further investigate automatic annotation of the speakers appearing in the raw broadcast data by first labeling with (pseudo) speaker tags using a speaker diarization system and then linking to the known speakers appearing in the reference data using a speaker recognition system. These speaker labels are essential for speaker-adaptive training in the proposed setting. We train acoustic models using the manually and automatically annotated data and run recognition experiments on the development and test data of the FAME! speech corpus to quantify the quality of the automatic annotations. The ASR and CS detection results demonstrate the potential of using automatic language and speaker tagging in semi-supervised bilingual acoustic model training. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
28. Stacked graph bone region U-net with bone representation for hand pose estimation and semi-supervised training.
- Author
-
Zheng, Zhiwei, Hu, Zhongxu, Qin, Hui, and Liu, Jie
- Subjects
- *
DEEP learning - Abstract
3D hand estimation from 2D joint information is an essential task in human-machine interaction, which has achieved great progress as an application of deep learning. However, regression-based methods do not perform well because the structural information is not effectively exploited, and the joint coordinates are variable. To address these issues, the hand pose is represented with bone vectors instead of joint coordinates in this study, which are stabler to learn and allow for easier encoding of the hand geometric structure and joint dependency. A novel graph bone region U-Net is specifically designed for bone representation to learn multiscale structural features, where the proposed novel elements (graph convolution, pooling and unpooling) incorporate hand structural knowledge. Under the introduced "finger-to-hand" framework, the network gradually decreases the scale from bone to finger to hand for learning more meaningful multiscale features. Moreover, the unit network is stacked repeatedly to extract multilevel features. Based on the above network, a simple but effective semi-supervised approach is introduced to address the lack of 3D hand pose labels. Many experiments are conducted to evaluate the proposed approach on two challenging datasets. The experimental results show that the proposed supervised approach outperforms the state-of-the-art methods, and the proposed semi-supervised approach can still achieve favorable performance when the labeled data are scarce. • A new hand pose representation is proposed for 3D hand pose estimation. • A Stacked Graph Bone Region U-Net is proposed for 3D hand pose estimation. • A semi-supervised bone training approach is adopted to relieve the label lack. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
29. FedEntropy: Information-entropy-aided training optimization of semi-supervised federated learning.
- Author
-
Qian, Dongwei, Cui, Yangguang, Fu, Yufei, Liu, Feng, and Wei, Tongquan
- Subjects
- *
SUPERVISED learning , *DISTRIBUTION (Probability theory) , *DATA privacy , *ENTROPY (Information theory) , *MACHINE learning , *GLOBAL method of teaching - Abstract
Emerging federated learning (FL) is able to train a global machine learning (ML) model by using decentralized data from various clients, without exposing the privacy data of clients. Traditional FL assumes that the training data are labeled, but in reality the data captured by the clients are usually unlabeled. Nowadays, the manual data labeling, a common method, is very expensive in practical operation. To solve the above problems, in this paper, we propose a semi-supervised federated learning scheme (FedEntropy) to improve the model performance in the case that unlabeled data dominates the datasets. Specifically, our proposed FedEntropy firstly utilizes information entropy to jointly compute the loss of labeled and unlabeled data. Subsequently, assisted with inverse-trigonometric-based adaptive proportional adjustment algorithm, FedEntropy is able to dynamically set the ratio between loss of labeled and unmarked data. In particular, we prove the effectiveness of the information entropy function on unlabeled data training and reducing the probability distribution gap of datasets. Extensive experiments results demonstrate that, compared with state of art methods, our FedEntropy not only achieves accuracy improvement of up to 6.42% on two common datasets, but also simultaneously reduces the approximately half of the computation overheads in semi-supervised FL training. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
30. Semi-supervised and unsupervised discriminative language model training for automatic speech recognition.
- Author
-
Dikici, Erinç and Saraçlar, Murat
- Subjects
- *
AUTOMATIC speech recognition , *SUPERVISED learning , *LANGUAGE & languages , *HYPOTHESIS , *TRANSCRIPTION (Linguistics) , *MATHEMATICAL models - Abstract
Discriminative language modeling aims to reduce the error rates by rescoring the output of an automatic speech recognition (ASR) system. Discriminative language model (DLM) training conventionally follows a supervised approach, using acoustic recordings together with their manual transcriptions (reference) as training data, and the recognition performance is improved with increasing amount of such matched data. In this study we investigate the case where matched data for DLM training is limited or is not available at all, and explore methods to improve ASR accuracy by incorporating acoustic and text data that come from separate sources. For semi-supervised training, we utilize a confusion model to generate artificial hypotheses instead of the real ASR N-bests. For unsupervised training, we propose three target output selection methods to take over the missing reference. We handle this task both as a structured prediction and a reranking problem and employ two different variants of the WER-sensitive perceptron algorithm. We show that significant improvement over baseline ASR accuracy is obtained even when there is no transcribed acoustic data available to train the DLM. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
31. Lithuanian Broadcast Speech Transcription Using Semi-supervised Acoustic Model Training.
- Author
-
Lileikytė, Rasa, Gorin, Arseniy, Lamel, Lori, Gauvain, Jean-Luc, and Fraga-Silva, Thiago
- Subjects
ACOUSTIC models ,GRAPHEMICS ,ERROR rates ,MATHEMATICAL combinations ,COMPUTER users - Abstract
This paper reports on an experimental work to build a speech transcription system for Lithuanian broadcast data, relying on unsupervised and semi-supervised training methods as well as on other low-knowledge methods to compensate for missing resources. Unsupervised acoustic model training is investigated using 360 hours of untranscribed speech data. A graphemic pronunciation approach is used to simplify the pronunciation model generation and there-fore ease the language model adaptation for the system users. Discriminative training on top of semi-supervised training is also investigated, as well as various types of acoustic features and their combinations. Experimental results are provided for each of our development steps as well as contrastive results comparing various options. Using the best system configuration a word error rate of 18.3% is obtained on a set of development data from the Quaero program. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
32. On the Learning Dynamics of Semi-Supervised Training for ASR
- Author
-
Benji Kershenbaum, Electra Wallington, Ondřej Klejch, and Peter Bell
- Subjects
semi-supervised training ,Computer science ,Learning dynamics ,Speech recognition ,speech recognition ,Supervised training - Abstract
The use of semi-supervised training (SST) has become an increasingly popular way of increasing the performance of ASR acoustic models without the need for further transcribed speech data. However, the performance of the technique can be very sensitive to the quality of the initial ASR system. This paper undertakes a comprehensive study of the improvements gained with respect to variation in the initial systems, the quantity of untranscribed data used, and the learning schedules. We postulate that the reason SST can be effective even when the initial model is poor is because it enables utterance-level information to be propagated to the frame level, and hence hypothesise that the quality of the language model plays a much larger role than the quality of the acoustic model. In experiments on Tagalog data from the IARPA MATERIAL programme, we find that indeed this is the case, and show that with an appropriately chosen recipe it is possible to achieve over 50% relative WER reductions from SST, even when the WER of the initial system is more than 80%.
- Published
- 2021
- Full Text
- View/download PDF
33. Artificial Intelligence for knowledge discovery and generation
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Ruiz Costa-Jussà, Marta, Escolano Peinado, Carlos, Domingo Roig, Oriol, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Ruiz Costa-Jussà, Marta, Escolano Peinado, Carlos, and Domingo Roig, Oriol
- Abstract
En els darrers anys, la indústria de la intel·ligència artificial ha aprofitat el poder de la computació conjuntament amb els models d'aprenentatge profund per construir aplicacions d'avantguarda. Algunes d'aquestes aplicacions, com ara assistents personals o bots de xat, depenen en gran mesura de Bases de Dades de Coneixement, un dipòsit de dades sobre dominis específics. Tot i això, aquestes bases de dades no només han d'ingerir fets nous constantment per actualitzar-se amb la informació més recent sobre el seu domini, sinó que també han de recuperar el coneixement ingerit de manera comprensible per a la gent, en la majoria dels casos. Centrat en aquest darrer punt, fer que el coneixement sigui fàcilment accessible pels humans, aquest treball es centra en donar accés automàticament a aquest coneixement mitjançant el llenguatge natural. Ho fem construint un model únic capaç d'extreure coneixements donat uns enunciats en llenguatge natural, així com de generar-los donat un cert coneixement. La solució proposada, una arquitectura basada en un Transformer eficient, s'entrena en un entorn semi-supervisat de múltiples tasques, seguint un règim d'entrenament cíclic. Els nostres resultats superen l'estat de l'art en l'extracció de coneixement per models sense supervisió, i també s'assoleixen resultats satisfactoris per la tasca de generació de text. El model resultant es pot entrenar fàcilment en qualsevol nou domini, amb dades no paral·leles, simplement afegint text i coneixement al respecte, gràcies al nostre marc d'entrenament cíclic. A més a més, aquest entorn semi-supervisat és útil per aconseguir un aprenentatge permanent., In recent years, Artificial Intelligence industry has leveraged the power of computation along deep learning models to build cutting-edge applications. Some of these applications, such as personal assistants or chat-bots, heavily rely on Knowledge Bases, a data repository about specific domains. However, not only do these data bases need to constantly ingest new facts, in order to be updated with the latest information about its domain, but they also need to retrieve the ingested knowledge in a human friendly manner, in most of the cases. Focusing on the latter, making knowledge easily accessible by humans, this work focuses on automatically giving access to this knowledge through natural language. We do this by building a single model capable of extracting knowledge given natural language utterances, as well as, generating them given some knowledge. The proposed solution, an efficient Transformer architecture, is trained in a multi-task semi-supervised environment, following a cycle training regime. We surpass state-of-the-art results in knowledge extraction for unsupervised models, and reach satisfactory results for the text generation task. The resulting model can be easily trained in any new domain with non-parallel data, by simply adding text and knowledge about it, in our cycle framework. More relevantly, this semi-supervised environment is useful for lifelong learning., En los últimos años, la industria de la inteligencia artificial ha aprovechado el poder de la computación conjuntamente con los modelos de aprendizaje profundo para construir aplicaciones de vanguardia. Algunas de estas aplicaciones, tales como asistentes personales o chat-bots, dependen en gran medida de Bases de Datos de Conocimiento, un depósito de datos sobre dominios específicos. Sin embargo, estas bases de datos no sólo deben ingerir hechos nuevos constantemente para actualizarse con la información más reciente sobre su dominio, sino que también tienen que recuperar el conocimiento ingerido de manera comprensible para la gente, en la mayoría de los casos. Centrado en este último punto, hacer que el conocimiento sea fácilmente accesible por los humanos, este trabajo se caracteriza por dar acceso automáticamente a este conocimiento mediante el lenguaje natural. Lo hacemos construyendo un modelo único capaz de extraer conocimientos dado unos enunciados en lenguaje natural, así como de generar texto dado un cierto conocimiento. La solución propuesta, una arquitectura basada en un Transformer eficiente, se entrena en un entorno semi-supervisado de múltiples tareas, siguiendo un régimen de entrenamiento cíclico. Nuestros resultados superan el estado del arte en la extracción de conocimiento para modelos sin supervisión, y también se alcanzan resultados satisfactorios para la tarea de generación de texto. El modelo resultante se puede entrenar fácilmente en cualquier nuevo dominio, con datos no paralelos, simplemente añadiendo texto y conocimientos al respecto, gracias a nuestro marco de entrenamiento cíclico. Además, este entorno semi-supervisado es útil para conseguir un aprendizaje permanente
- Published
- 2021
34. Semi-Supervised Training of Transformer and Causal Dilated Convolution Network with Applications to Speech Topic Classification
- Author
-
Zhiyi Li, Jinxiang Zeng, Xiaolin Li, and Du Zhang
- Subjects
semi-supervised learning ,semi-supervised training ,Technology ,Transformer and Causal Dilated Convolution Network ,Computer science ,QH301-705.5 ,QC1-999 ,02 engineering and technology ,Semi-supervised learning ,Convolution ,0202 electrical engineering, electronic engineering, information engineering ,General Materials Science ,Biology (General) ,Instrumentation ,QD1-999 ,Supervised training ,Transformer (machine learning model) ,Fluid Flow and Transfer Processes ,Artificial neural network ,business.industry ,Process Chemistry and Technology ,Event recognition ,Physics ,automatic speech recognition ,General Engineering ,020206 networking & telecommunications ,Pattern recognition ,Function (mathematics) ,Engineering (General). Civil engineering (General) ,Calculation methods ,Computer Science Applications ,topic classification ,Chemistry ,020201 artificial intelligence & image processing ,Artificial intelligence ,TA1-2040 ,business - Abstract
Aiming at the audio event recognition problem of speech recognition, a decision fusion method based on the Transformer and Causal Dilated Convolutional Network (TCDCN) framework is proposed. This method can adjust the model sound events for a long time and capture the time correlation, and can effectively deal with the sparsity of audio data. At the same time, our dataset comes from audio clips cropped by YouTube. In order to reliably and stably identify audio topics, we extract different features and different loss function calculation methods to find the best model solution. The experimental results from different test models show that the TCDCN model proposed in this paper achieves better recognition results than the classification using neural networks and other fusion methods.
- Published
- 2021
- Full Text
- View/download PDF
35. Semi-supervised bootstrapping approach for neural network feature extractor training.
- Author
-
Grezl, Frantiseli and Karafiat, Martin
- Abstract
This paper presents bootstrapping approach for neural network training. The neural networks serve as bottle-neck feature extractor for subsequent GMM-HMM recognizer. The recognizer is also used for transcription and confidence assignment of untranscribed data. Based on the confidence, segments are selected and mixed with supervised data and new NNs are trained. With this approach, it is possible to recover 40–55% of the difference between partially and fully transcribed data (3 to 5% absolute improvement over NN trained on supervised data only). Using 70–85% of automatically transcribed segments with the highest confidence was found optimal to achieve this result. [ABSTRACT FROM PUBLISHER]
- Published
- 2013
- Full Text
- View/download PDF
36. Discriminative semi-supervised training for keyword search in low resource languages.
- Author
-
Hsiao, Roger, Ng, Tim, Grezl, Frantisek, Karakos, Damianos, Tsakalidis, Stavros, Nguyen, Long, and Schwartz, Richard
- Abstract
In this paper, we investigate semi-supervised training for low resource languages where the initial systems may have high error rate (≥ 70.0% word eror rate). To handle the lack of data, we study semi-supervised techniques including data selection, data weighting, discriminative training and multilayer perceptron learning to improve system performance. The entire suite of semi-supervised methods presented in this paper was evaluated under the IARPA Babel program for the keyword spotting tasks. Our semi-supervised system had the best performance in the OpenKWS13 surprise language evaluation for the limited condition. In this paper, we describe our work on the Turkish and Vietnamese systems. [ABSTRACT FROM PUBLISHER]
- Published
- 2013
- Full Text
- View/download PDF
37. Semi-supervised training of Deep Neural Networks.
- Author
-
Vesely, Karel, Hannemann, Mirko, and Burget, Lukas
- Abstract
In this paper we search for an optimal strategy for semi-supervised Deep Neural Network (DNN) training. We assume that a small part of the data is transcribed, while the majority of the data is untranscribed. We explore self-training strategies with data selection based on both the utterance-level and frame-level confidences. Further on, we study the interactions between semi-supervised frame-discriminative training and sequence-discriminative sMBR training. We found it beneficial to reduce the disproportion in amounts of transcribed and untranscribed data by including the transcribed data several times, as well as to do a frame-selection based on per-frame confidences derived from confusion in a lattice. For the experiments, we used the Limited language pack condition for the Surprise language task (Vietnamese) from the IARPA Babel program. The absolute Word Error Rate (WER) improvement for frame cross-entropy training is 2.2%, this corresponds to WER recovery of 36% when compared to the identical system, where the DNN is built on the fully transcribed data. [ABSTRACT FROM PUBLISHER]
- Published
- 2013
- Full Text
- View/download PDF
38. Multi-objective optimization for semi-supervised discriminative language modeling.
- Author
-
Kobayashi, Akio, Oku, Takahiro, Imai, Toru, and Nakagawa, Seiichi
- Abstract
A method for semi-supervised language modeling, which was designed to improve the robustness of a language model (LM) obtained from manually transcribed (labeled) data, is proposed. The LM is implemented as a log-linear model, which employs a set of linguistic features derived from word or phoneme n-grams. The proposed method is formulated as a multi-objective optimization programming problem (MOP), which consists of two objective functions based on expected risks for labeled lattices and automatic speech recognition (ASR) lattices as unlabeled training data. The model is trained in a discriminative manner and acquired as a solution to the problem. In transcribing Japanese broadcast programs, the proposed method reduced word error rate by 6.3% compared with that achieved by a conventional trigram LM. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
39. Artificial Intelligence for knowledge discovery and generation
- Author
-
Domingo Roig, Oriol, Ruiz Costa-Jussà, Marta, Escolano Peinado, Carlos, and Universitat Politècnica de Catalunya. Departament de Ciències de la Computació
- Subjects
aprenentatge permanent ,semi-supervised training ,Artificial intelligence ,aprenentatge profund ,Intel·ligència artificial ,lifelong learning ,bases de dades de coneixement ,deep learning ,entrenament semi-supervisat ,entrenament cíclic ,multi-task model ,knowledge bases ,cycle training ,Informàtica::Intel·ligència artificial [Àrees temàtiques de la UPC] ,model multi tàsques - Abstract
En els darrers anys, la indústria de la intel·ligència artificial ha aprofitat el poder de la computació conjuntament amb els models d'aprenentatge profund per construir aplicacions d'avantguarda. Algunes d'aquestes aplicacions, com ara assistents personals o bots de xat, depenen en gran mesura de Bases de Dades de Coneixement, un dipòsit de dades sobre dominis específics. Tot i això, aquestes bases de dades no només han d'ingerir fets nous constantment per actualitzar-se amb la informació més recent sobre el seu domini, sinó que també han de recuperar el coneixement ingerit de manera comprensible per a la gent, en la majoria dels casos. Centrat en aquest darrer punt, fer que el coneixement sigui fàcilment accessible pels humans, aquest treball es centra en donar accés automàticament a aquest coneixement mitjançant el llenguatge natural. Ho fem construint un model únic capaç d'extreure coneixements donat uns enunciats en llenguatge natural, així com de generar-los donat un cert coneixement. La solució proposada, una arquitectura basada en un Transformer eficient, s'entrena en un entorn semi-supervisat de múltiples tasques, seguint un règim d'entrenament cíclic. Els nostres resultats superen l'estat de l'art en l'extracció de coneixement per models sense supervisió, i també s'assoleixen resultats satisfactoris per la tasca de generació de text. El model resultant es pot entrenar fàcilment en qualsevol nou domini, amb dades no paral·leles, simplement afegint text i coneixement al respecte, gràcies al nostre marc d'entrenament cíclic. A més a més, aquest entorn semi-supervisat és útil per aconseguir un aprenentatge permanent. In recent years, Artificial Intelligence industry has leveraged the power of computation along deep learning models to build cutting-edge applications. Some of these applications, such as personal assistants or chat-bots, heavily rely on Knowledge Bases, a data repository about specific domains. However, not only do these data bases need to constantly ingest new facts, in order to be updated with the latest information about its domain, but they also need to retrieve the ingested knowledge in a human friendly manner, in most of the cases. Focusing on the latter, making knowledge easily accessible by humans, this work focuses on automatically giving access to this knowledge through natural language. We do this by building a single model capable of extracting knowledge given natural language utterances, as well as, generating them given some knowledge. The proposed solution, an efficient Transformer architecture, is trained in a multi-task semi-supervised environment, following a cycle training regime. We surpass state-of-the-art results in knowledge extraction for unsupervised models, and reach satisfactory results for the text generation task. The resulting model can be easily trained in any new domain with non-parallel data, by simply adding text and knowledge about it, in our cycle framework. More relevantly, this semi-supervised environment is useful for lifelong learning. En los últimos años, la industria de la inteligencia artificial ha aprovechado el poder de la computación conjuntamente con los modelos de aprendizaje profundo para construir aplicaciones de vanguardia. Algunas de estas aplicaciones, tales como asistentes personales o chat-bots, dependen en gran medida de Bases de Datos de Conocimiento, un depósito de datos sobre dominios específicos. Sin embargo, estas bases de datos no sólo deben ingerir hechos nuevos constantemente para actualizarse con la información más reciente sobre su dominio, sino que también tienen que recuperar el conocimiento ingerido de manera comprensible para la gente, en la mayoría de los casos. Centrado en este último punto, hacer que el conocimiento sea fácilmente accesible por los humanos, este trabajo se caracteriza por dar acceso automáticamente a este conocimiento mediante el lenguaje natural. Lo hacemos construyendo un modelo único capaz de extraer conocimientos dado unos enunciados en lenguaje natural, así como de generar texto dado un cierto conocimiento. La solución propuesta, una arquitectura basada en un Transformer eficiente, se entrena en un entorno semi-supervisado de múltiples tareas, siguiendo un régimen de entrenamiento cíclico. Nuestros resultados superan el estado del arte en la extracción de conocimiento para modelos sin supervisión, y también se alcanzan resultados satisfactorios para la tarea de generación de texto. El modelo resultante se puede entrenar fácilmente en cualquier nuevo dominio, con datos no paralelos, simplemente añadiendo texto y conocimientos al respecto, gracias a nuestro marco de entrenamiento cíclico. Además, este entorno semi-supervisado es útil para conseguir un aprendizaje permanente
- Published
- 2021
40. Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR
- Author
-
Ondrej Klejch, Electra Wallington, Peter Bell, Ko, Hanseok, and Hansen, John H. L.
- Subjects
semi-supervised training ,FOS: Computer and information sciences ,Computer Science - Computation and Language ,Audio and Speech Processing (eess.AS) ,automatic speech recognition ,FOS: Electrical engineering, electronic engineering, information engineering ,decipherment ,cross-lingual transfer ,Computation and Language (cs.CL) ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
We present a method for cross-lingual training an ASR system using absolutely no transcribed training data from the target language, and with no phonetic knowledge of the language in question. Our approach uses a novel application of a decipherment algorithm, which operates given only unpaired speech and text data from the target language. We apply this decipherment to phone sequences generated by a universal phone recogniser trained on out-of-language speech corpora, which we follow with flat-start semi-supervised training to obtain an acoustic model for the new language. To the best of our knowledge, this is the first practical approach to zero-resource cross-lingual ASR which does not rely on any hand-crafted phonetic information. We carry out experiments on read speech from the GlobalPhone corpus, and show that it is possible to learn a decipherment model on just 20 minutes of data from the target language. When used to generate pseudo-labels for semi-supervised training, we obtain WERs that range from 32.5% to just 1.9% absolute worse than the equivalent fully supervised models trained on the same data., Comment: Submitted to Interspeech 2022
- Published
- 2021
- Full Text
- View/download PDF
41. On Semi-Supervised LF-MMI Training of Acoustic Models with Limited Data
- Author
-
Irina Illina, Imran Sheikh, Emmanuel Vincent, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), COMPRISE, Grid'5000, European Project: 825081,H2020,COMPRISE(2018), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)
- Subjects
semi-supervised training ,Computer science ,business.industry ,Detector ,speech recognition ,020206 networking & telecommunications ,Pattern recognition ,02 engineering and technology ,Mutual information ,01 natural sciences ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,lattice-free MMI ,[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,Transcription (linguistics) ,error detection ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Artificial intelligence ,Error detection and correction ,business ,010301 acoustics - Abstract
International audience; This work investigates semi-supervised training of acoustic models (AM) with the lattice-free maximum mutual information (LF-MMI) objective in practically relevant scenarios with a limited amount of labeled in-domain data. An error detection driven semi-supervised AM training approach is proposed, in which an error detector controls the hypothesized transcriptions or lattices used as LF-MMI training targets on additional unlabeled data. Under this approach, our first method uses a single error-tagged hypothesis whereas our second method uses a modified supervision lattice. These methods are evaluated and compared with existing semi-supervised AM training methods in three different matched or mismatched, limited data setups. Word error recovery rates of 28 to 89% are reported.
- Published
- 2020
- Full Text
- View/download PDF
42. Semi-Supervised Speech Recognition Acoustic Model Training Using Policy Gradient
- Author
-
Sung Joo Lee, Hoon Chung, Jeon Gue Park, and Hyeong Bae Jeon
- Subjects
semi-supervised training ,reinforcement learning ,Computer science ,Process (engineering) ,Speech recognition ,02 engineering and technology ,lcsh:Technology ,Domain (software engineering) ,lcsh:Chemistry ,0202 electrical engineering, electronic engineering, information engineering ,Reinforcement learning ,General Materials Science ,lcsh:QH301-705.5 ,Instrumentation ,Fluid Flow and Transfer Processes ,Measure (data warehouse) ,lcsh:T ,Process Chemistry and Technology ,General Engineering ,Training (meteorology) ,speech recognition ,Acoustic model ,021001 nanoscience & nanotechnology ,lcsh:QC1-999 ,Computer Science Applications ,lcsh:Biology (General) ,lcsh:QD1-999 ,lcsh:TA1-2040 ,Scalability ,020201 artificial intelligence & image processing ,lcsh:Engineering (General). Civil engineering (General) ,0210 nano-technology ,Gradient method ,lcsh:Physics ,policy gradient - Abstract
In this paper, we propose a policy gradient-based semi-supervised speech recognition acoustic model training. In practice, self-training and teacher/student learning are one of the widely used semi-supervised training methods due to their scalability and effectiveness. These methods are based on generating pseudo labels for unlabeled samples using a pre-trained model and selecting reliable samples using confidence measure. However, there are some considerations in this approach. The generated pseudo labels can be biased depending on which pre-trained model is used, and the training process can be complicated because the confidence measure is usually carried out in post-processing using external knowledge. Therefore, to address these issues, we propose a policy gradient method-based approach. Policy gradient is a reinforcement learning algorithm to find an optimal behavior strategy for an agent to obtain optimal rewards. The policy gradient-based approach provides a framework for exploring unlabeled data as well as exploiting labeled data, and it also provides a way to incorporate external knowledge in the same training cycle. The proposed approach was evaluated on an in-house non-native Korean recognition domain. The experimental results show that the method is effective in semi-supervised acoustic model training.
- Published
- 2020
- Full Text
- View/download PDF
43. Untranscribed web audio for low resource speech recognition
- Author
-
Steve Renals, Andrea Carmantini, and Peter Bell
- Subjects
semi-supervised training ,web data ,Domain adaptation ,Low resource ,Computer science ,domain adaptation ,Speech recognition ,speech recognition - Abstract
Speech recognition models are highly susceptible to mismatch in the acoustic and language domains between the training and the evaluation data. For low resource languages, it is difficult to obtain transcribed speech for target domains, while untranscribed data can be collected with minimal effort. Recently, a method applying lattice-free maximum mutual information (LF-MMI) to untranscribed data has been found to be effective for semi-supervised training. However, weaker initial models and domain mismatch can result in high deletion rates for the semi-supervised model. Therefore, we propose a method to force the base model to overgenerate possible transcriptions, relying on the ability of LF-MMI to deal with uncertainty.On data from the IARPA MATERIAL programme, our new semi-supervised method outperforms the standard semisupervised method, yielding significant gains when adapting for mismatched bandwidth and domain.
- Published
- 2019
- Full Text
- View/download PDF
44. Endoscopy image enhancement method by generalized imaging defect models based adversarial training.
- Author
-
Li W, Fan J, Li Y, Hao P, Lin Y, Fu T, Ai D, Song H, and Yang J
- Subjects
- Endoscopy, Image Enhancement, Smoke, Image Processing, Computer-Assisted methods, Supervised Machine Learning
- Abstract
Objective. Smoke, uneven lighting, and color deviation are common issues in endoscopic surgery, which have increased the risk of surgery and even lead to failure. Approach. In this study, we present a new physics model driven semi-supervised learning framework for high-quality pixel-wise endoscopic image enhancement, which is generalizable for smoke removal, light adjustment, and color correction. To improve the authenticity of the generated images, and thereby improve the network performance, we integrated specific physical imaging defect models with the CycleGAN framework. No ground-truth data in pairs are required. In addition, we propose a transfer learning framework to address the data scarcity in several endoscope enhancement tasks and improve the network performance. Main results. Qualitative and quantitative studies reveal that the proposed network outperforms the state-of-the-art image enhancement methods. In particular, the proposed method performs much better than the original CycleGAN, for example, the structural similarity improved from 0.7925 to 0.8648, feature similarity for color images from 0.8917 to 0.9283, and quaternion structural similarity from 0.8097 to 0.8800 in the smoke removal task. Experimental results of the proposed transfer learning method also reveal its superior performance when trained with small datasets of target tasks. Significance. Experimental results on endoscopic images prove the effectiveness of the proposed network in smoke removal, light adjustment, and color correction, showing excellent clinical usefulness., (© 2022 Institute of Physics and Engineering in Medicine.)
- Published
- 2022
- Full Text
- View/download PDF
45. Semi-Supervised Acoustic Model Training by Discriminative Data Selection From Multiple ASR Systems’ Hypotheses
- Author
-
Yuya Akita, Tatsuya Kawahara, and Sheng Li
- Subjects
semi-supervised training ,Acoustics and Ultrasonics ,Computer science ,Machine learning ,computer.software_genre ,USable ,01 natural sciences ,Data modeling ,Set (abstract data type) ,030507 speech-language pathology & audiology ,03 medical and health sciences ,acoustic model ,Discriminative model ,0103 physical sciences ,Computer Science (miscellaneous) ,Electrical and Electronic Engineering ,Hidden Markov model ,010301 acoustics ,Measure (data warehouse) ,Artificial neural network ,business.industry ,speech recognition ,Acoustic model ,Pattern recognition ,Computational Mathematics ,lecture transcription ,Artificial intelligence ,0305 other medical science ,business ,computer - Abstract
While the performance of ASR systems depends on the size of the training data, it is very costly to prepare accurate and faithful transcripts. In this paper, we investigate a semisupervised training scheme, which takes the advantage of huge quantities of unlabeled video lecture archive, particularly for the deep neural network (DNN) acoustic model. In the proposed method, we obtain ASR hypotheses by complementary GMM- and DNN-based ASR systems. Then, a set of CRF-based classifiers is trained to select the correct hypotheses and verify the selected data. The proposed hypothesis combination shows higher quality compared with the conventional system combination method (ROVER). Moreover, compared with the conventional data selection based on confidence measure score, our method is demonstrated more effective for filtering usable data. Significant improvement in the ASR accuracy is achieved over the baseline system and in comparison with the models trained with the conventional system combination and data selection methods.
- Published
- 2016
- Full Text
- View/download PDF
46. Building an ASR System for Mboshi Using A Cross-Language Definition of Acoustic Units Approach
- Author
-
Scharenborg, O.E., Ebel, Patrick, Ciannella, Francesco, Hasegawa-Johnson, Mark, and Dehak, Najim
- Subjects
Artificial neural network ,Computer science ,Speech recognition ,Low-resource automatic speech recognition ,Retraining ,Initialization ,Training methods ,Semi-supervised training ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Language definition ,n, Semi-supervised training ,Layer (object-oriented design) ,0305 other medical science ,Representation (mathematics) ,Cross-language adaptation - Abstract
For many languages in the world, not enough (annotated) speech data is available to train an ASR system. Recently, we proposed a cross-language method for training an ASR system using linguistic knowledge and semi-supervised training. Here, we apply this approach to the low-resource language Mboshi. Using an ASR system trained on Dutch, Mboshi acoustic units were first created using cross-language initialization of the phoneme vectors in the output layer. Subsequently, this adapted system was retrained using Mboshi self-labels. Two training methods were investigated: retraining of only the output layer and retraining the full deep neural network (DNN). The resulting Mboshi system was analyzed by investigating per phoneme accuracies, phoneme confusions, and by visualizing the hidden layers of the DNNs prior to and following retraining with the self-labels. Results showed a fairly similar performance for the two training methods but a better phoneme representation for the fully retrained DNN.
- Published
- 2018
- Full Text
- View/download PDF
47. Exploiting foreign resources for DNN-based ASR
- Author
-
Motlicek, Petr, Imseng, David, Potard, Blaise, Garner, Philip N., and Himawan, Ivan
- Published
- 2015
- Full Text
- View/download PDF
48. Lithuanian Broadcast Speech Transcription Using Semi-Supervised Acoustic Model Training
- Author
-
Thiago Fraga-Silva, Rasa Lileikytė, Lori Lamel, Arseniy Gorin, Jean-Luc Gauvain, Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI), Université Paris Saclay (COmUE)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université - UFR d'Ingénierie (UFR 919), Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Université Paris-Sud - Paris 11 (UP11), Elsevier, and Publications, Limsi
- Subjects
semi-supervised training ,Computer science ,Speech recognition ,Word error rate ,low-resourced languages ,02 engineering and technology ,Pronunciation ,[INFO] Computer Science [cs] ,computer.software_genre ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,Set (abstract data type) ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Discriminative model ,0202 electrical engineering, electronic engineering, information engineering ,[INFO]Computer Science [cs] ,Adaptation (computer science) ,General Environmental Science ,Artificial neural network ,business.industry ,automatic speech recognition ,Acoustic model ,Lithuanian ,neural networks ,language.human_language ,[INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL] ,language ,General Earth and Planetary Sciences ,020201 artificial intelligence & image processing ,Language model ,Artificial intelligence ,0305 other medical science ,business ,Lithuanian language ,computer ,Natural language processing - Abstract
International audience; This paper reports on an experimental work to build a speech transcriptionsystem for Lithuanian broadcast data, relying on unsupervised andsemi-supervised training methods as well as on other low-knowledge methods tocompensate for missing resources. Unsupervised acoustic model training isinvestigated using 360 hours of untranscribed speech data. A graphemicpronunciation approach is used to simplify the pronunciation model generationand therefore ease the language model adaptation for the system users.Discriminative training on top of semi-supervised training is alsoinvestigated, as well as various types of acoustic features and theircombinations. Experimental results are provided for each of our developmentsteps as well as contrastive results comparing various options. Using the bestsystem configuration a word error rate of 18.3\% is obtained on a set ofdevelopment data from the Quaero program.
- Published
- 2016
49. 音響モデルの準教師付き及び半教師付き学習による音声認識
- Author
-
Li, Sheng, 河原, 達也, 黒橋, 禎夫, and 鹿島, 久嗣
- Subjects
semi-supervised training ,lightly-supervised training ,speech recognition ,acoustic model training - Published
- 2016
50. Removing segmentation inconsistencies with semi-supervised non-adjacency constraint.
- Author
-
Ganaye, Pierre-Antoine, Sdika, Michaël, Triggs, Bill, and Benoit-Cattin, Hugues
- Subjects
- *
DEEP learning , *COMPUTER vision , *IMAGE analysis , *MACHINE learning - Abstract
• Segmentation deep neural network can be trained to satisfy anatomical constraints. • No modification of the architecture is required: no computational cost at inference. • The training can be done in a semi supervised way to benefit from non annotated data. • Training with constraints implies a large reduction of errors in Hausdorff distance. The advent of deep learning has pushed medical image analysis to new levels, rapidly replacing more traditional machine learning and computer vision pipelines. However segmenting and labelling anatomical regions remains challenging owing to appearance variations, imaging artifacts, the paucity and variability of annotated data, and the difficulty of fully exploiting domain constraints such as anatomical knowledge about inter-region relationships. We address the last point, improving the network's region-labeling consistency by introducing NonAdjLoss, an adjacency-graph based auxiliary training loss that penalizes outputs containing regions with anatomically-incorrect adjacency relationships. NonAdjLoss supports both fully-supervised training and a semi-supervised extension in which it is applied to unlabeled supplementary training data. The approach substantially reduces segmentation anomalies on the MICCAI-2012, IBSRv2 brain MRI datasets and the Anatomy3 whole body CT dataset, especially when semi-supervised training is included. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.