1. Practical deep learning
- Author
-
Dong, Hao and Guo, Yike
- Subjects
004 - Abstract
Deep learning is experiencing a revolution with tremendous progress because of the availability of large datasets and computing resources. The development of deeper and larger neural network models has made significant progress recently in boosting the accuracy of many applications, such as image classification, image captioning, object detection, and language translation. However, despite the opportunities they offer, existing deep learning approaches are impractical for many applications due to the following challenges. Many applications exist with only limited amounts of annotated training data, or the collected labelled training data is too expensive. Such scenarios impose significant drawbacks for deep learning methods, which are not designed for limited data and suffer from performance decay. Especially for generative tasks, because the data for many generative tasks is difficult to obtain from the real world and the results they generate are difficult to control. As deep learning algorithms become more complicated increasing the workload for researchers to train neural network models and manage the life-cycle deep learning workflows, including the model, dataset, and training pipeline, the demand for efficient deep learning development is rising. Practical deep learning should achieve adequate performance from the limited training data as well as be based on efficient deep learning development processes. In this thesis, we propose several novel methods to improve the practicability of deep generative models and development processes, leading to four contributions. First, we improve the visual quality of synthesising images conditioned on text descriptions without requiring more manual labelled data, which provides controllable generated results using object attribute information from text descriptions. Second, we achieve unsupervised image-to-image translation that synthesises images conditioned on input images without requiring paired images to supervise the training, which provides controllable generated results using semantic visual information from input images. Third, we deliver semantic image synthesis that synthesises images conditioned on both image and text descriptions without requiring ground truth images to supervise the training, which provides controllable generated results using both semantic visual and object attribute information. Fourth, we develop a research-oriented deep learning library called TensorLayer to reduce the workload of researchers for defining models, implementing new layers, and managing the deep learning workflow comprised of the dataset, model, and training pipeline. In 2017, this library has won the best open source software award issued by ACM Multimedia (MM).
- Published
- 2019
- Full Text
- View/download PDF