Back to Search
Start Over
Spatiotemporal Video Highlight by Neural Network Considering Gaze and Hands of Surgeon in Egocentric Surgical Videos
- Source :
- Journal of Medical Robotics Research.
- Publication Year :
- 2021
- Publisher :
- World Scientific Pub Co Pte Ltd, 2021.
-
Abstract
- In the medical field, surgical videos can be used to introduce surgical skills. Medical students and residents watch the videos to study the surgical skills and increase learning speed by compensating for the lack of experience in surgical rooms due to limited opportunity to join in surgery. To record egocentric surgical videos by a wearable camera is a solution to record surgical skills of a surgeon in detail. However, most egocentric surgical videos are of quite long duration. For example, in the case of tumor removal in breast surgery, a video recording time often reaches 2[Formula: see text]h. With that length, it is time consuming to see important scenes in the video, particularly because many surgical videos include nonessential scenes such as sterilization and preparation of tools. For extracting specific scenes from a long video, we can apply scene estimation by machine learning. Furthermore, it is important to know where the surgeon is looking to observe the area of the incision in detail. In particular, it is vital to be able to zoom in on key elements, allowing viewers to see the incision area and the fine details of the necessary surgical skills. In this study, we aimed to highlight incision scenes from egocentric surgical videos in the spatiotemporal domain by utilizing two neural networks for the temporal and spatial highlights. For the temporal highlights, we designed a neural network that estimates the incision scenes by learning gaze speed, hand movements, number of hands, and background movements in egocentric surgical videos. For the spatial highlights, in order to estimate the important area to zoom in, we designed a neural network that learns the surgeon’s gaze on natural features of surgical scenes to form a probability map as a representation of the estimated gaze area. The estimated gaze area was also used to calculate the appropriate zoom-in position and zoom-in ratio. To control the highlighted parameters in accord with user preferences, we also made a user interface that allows for the selection of playback speed gain and zoom ratio gain. For the evaluation, we verified the performance of the networks by a quantitative assessment and conducted a user study with medical doctors by showing an actual surgical video to obtain a qualitative assessment on the proposed system.
Details
- ISSN :
- 24249068 and 2424905X
- Database :
- OpenAIRE
- Journal :
- Journal of Medical Robotics Research
- Accession number :
- edsair.doi...........ae5d71357a640978472a7c3aa552381e
- Full Text :
- https://doi.org/10.1142/s2424905x21410014