Back to Search Start Over

Semi-Supervised Temporal Segmentation of Manufacturing Work Video by Automatically Building a Hierarchical Tree of Category Labels

Authors :
Kazuaki Nakamura
Naoko Nitta
Eiji Nabata
Noboru Babaguchi
Kensuke Fujii
Satoki Matsumura
Source :
IEEE Access, Vol 9, Pp 68017-68027 (2021)
Publication Year :
2021
Publisher :
Institute of Electrical and Electronics Engineers (IEEE), 2021.

Abstract

Nowadays, many industrial companies visually record workers’ activities for the purposes of streamlining their work processes. However, since untrimmed raw videos are hard to use, it is desired to automatically divide the videos into segments and recognize which kind of operation is performed on each segment. This task is called temporal video segmentation. We propose a method for achieving it, particularly targeting videos of manufacturing work with a specialized vehicle such as a hydraulic excavator. To make the performance of temporal video segmentation high, it is quite essential to extract good visual features from input videos. This can be hardly achieved by unsupervised methods, whereas supervised methods have another drawback that collecting a sufficient amount of training data is difficult due to its labor-intensiveness. To overcome these drawbacks, the proposed method employs a semi-supervised approach. We assume that a set of weakly-labeled videos whose frames only sparsely have a category label are given as input, where the labeled frames are used as training data to train a desirable feature extractor. Under this assumption, the proposed method first divides the input videos into short segments called primitive segments having the fixed length and then clusters them using visual features extracted by the above feature extractor. To achieve higher performance, we also use a hierarchical tree of the category labels and recursively perform the above process at each branch in the tree, where the tree is automatically built by the proposed method. In our experiments, we achieved a segmentation performance of 0.947 on the F-measure, even when only 1.25% of all the frames in the input videos are labeled.

Details

ISSN :
21693536
Volume :
9
Database :
OpenAIRE
Journal :
IEEE Access
Accession number :
edsair.doi.dedup.....d3d1349cb960f3ebb7a217428d1579c8