Generating Cooking Recipes from Cooking Videos Using Deep Learning Considering Previous Process with Video Encoding

Authors :: Akihiko Ohsuga
Ryohei Orihara
Tatsuki Fujii
Yuichi Sei
Yasuyuki Tahara
Source :: APPIS
Publication Year :: 2020
Publisher :: ACM, 2020.
Abstract: Research on generating natural language captions to visual data such as images and videos has produced considerable results with deep learning methods and attracted attention in recent years. In this research, we aim to generate recipe sentences from cooking videos acquired from YouTube. We treat the task as image captioning. There are two aspects to be considered in order to do so. We believe that the semantics of each process should be taken into account to improve the captioning ' s accuracy. Furthermore, data processing, that is obtaining images from each process using several visual processing methods such as object detection should be important. We propose a captioning model where a sentence vector is embedded to consider the consistency of the recipe. From differences between generated recipes and the reference recipe, we can calculate recipe scores. We use three metrics that are used in previous studies to evaluate the image captioning model. We compare the scores to with ones from baseline models.

Database :: OpenAIRE
Journal :: Proceedings of the 3rd International Conference on Applications of Intelligent Systems
Accession number :: edsair.doi...........61aad8c3b8ed04f1d89f29644293ffa7
Full Text :: https://doi.org/10.1145/3378184.3378217

Full Text Access

Tools