1. Caption Generation From Road Images for Traffic Scene Modeling
- Author
-
Yuehu Liu, Chuan Wu, Yaochen Li, Jihua Zhu, and Ling Li
- Subjects
business.industry ,Computer science ,Mechanical Engineering ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,3d model ,Construct (python library) ,Object (computer science) ,Computer Science Applications ,Position (vector) ,Traffic scene ,Automotive Engineering ,Computer vision ,Artificial intelligence ,Element (category theory) ,business - Abstract
In this traffic-scene-modeling study, we propose an image-captioning network which incorporates element attention into an encoder-decoder mechanism to generate more reasonable scene captions. A visual-relationship-detecting network is also developed to detect the relative positions of object pairs. Firstly, the traffic scene elements are detected and segmented according to their clustered locations. Then, the image-captioning network is applied to generate the corresponding description of each traffic scene element. The visual-relationship-detecting network is utilized to detect the position relations of all object pairs in the subregion. The static and dynamic traffic elements are appropriately selected and organized to construct a 3D model according to the captions and the position relations. The reconstructed 3D traffic scenes can be utilized for the offline test of unmanned vehicles. The evaluations and comparisons based on the TSD-max, KITTI and Microsoft's COCO datasets demonstrate the effectiveness of the proposed framework.
- Published
- 2022