1. Trapezoid-structured LSTM with segregated gates and bridge joints for video frame inpainting.
- Author
-
Chiang, Ting-Hui, Lin, Yun-Tang, Lin, Jaden Chao-Ho, and Tseng, Yu-Chee
- Subjects
- *
INPAINTING , *TRAPEZOIDS , *SIGNAL-to-noise ratio , *VIDEOS - Abstract
This work considers the video frame inpainting problem, where several former and latter frames are given, and the goal is to predict the middle frames. The state-of-the-art solution has applied bidirectional long short-term memory (LSTM) networks, which has a spatial-temporal mismatch problem. In this paper, we propose a trapezoid-structured LSTM architecture called T-LSTM-sbm for video frame inpainting with three designs: (i) segregated spatial-temporal gates, (ii) bridge joints, and (iii) multi-kernel LSTM. To prevent the spatial-temporal mismatch problem, while features are being passed through multi-layered LSTM nodes, the trapezoid structure reduces its number of LSTM nodes by two after each layer. This makes the model converge to the inpainted results more effectively. The separated temporal and spatial gates design can learn better spatial and temporal features by using individual gates. To relieve the information loss problem during the convergence of the trapezoidal layers, we use bridge joints among layers to better preserve useful information. The multiple kernels in LSTM are to enable extracting multi-scale information flows. T-LSTM-sbm is proved to outperform the state-of-the-art solutions in peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) on three common datasets, KTH Action, HMDB-51, and UCF-101. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF