Descriptor: "pyramid" / Topic: computer vision - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"pyramid"' showing total 1,144 results

Start Over Descriptor "pyramid" Topic computer vision

1,144 results on '"pyramid"'

1. A pyramid transformer with cross-shaped windows for low-light image enhancement.

Author: Li, Canlin, Gao, Pengcheng, Song, Shun, Liu, Jinhua, and Bi, Lihua
Subjects: *TRANSFORMER models, *CONVOLUTIONAL neural networks, *IMAGE intensifiers, *NATURAL language processing, *DEEP learning, *PYRAMIDS, *COMPUTER vision
Abstract: Low-light image enhancement is a low-level vision task. Most of the existing methods are based on convolutional neural network(CNN). Transformer is a predominant deep learning model that has been widely adopted in various fields, such as natural language processing and computer vision. Compared with CNN, transformer has the ability to capture long-range dependencies to make full use of global contextual information. For low-light enhancement tasks, this capability can promote the model to learn the correct luminance, color and texture. We try to introduce transformer into the low-light image enhancement field. In this paper, we design a pyramid transformer with cross-shaped windows (CSwin-P). CSwin-P contains an encoder and decoder. Both the encoder and decoder contain several stages. Each stage contains several enhanced CSwin transformer blocks (ECTB). ECTB uses cross-shaped window self-attention and a feed-forward layer with spatial interaction unit. Spatial interaction unit can further capture local contextual information through gating mechanism. CSwin-P uses implicit positional encoding, and the model is unrestricted by the image size in the inference phase. Numerous experiments prove that our method is superior to the current state-of-the-art methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

2. SDTP: Semantic-Aware Decoupled Transformer Pyramid for Dense Image Prediction.

Author: Li, Zekun, Liu, Yufan, Li, Bing, Feng, Bailan, Wu, Kebin, Peng, Chengwei, and Hu, Weiming
Subjects: *PYRAMIDS, *COMPUTER vision, *FORECASTING, *SEMANTICS, *IMAGE segmentation, *PROBLEM solving
Abstract: Although transformer has achieved great progress on computer vision tasks, the scale variation in dense image prediction is still the key challenge. Few effective multi-scale techniques are applied in transformer and there are two main limitations in the current methods. On the one hand, self-attention module in vanilla transformer fails to sufficiently exploit the diversity of semantic information because of its rigid mechanism. On the other hand, it is difficult to build attention and interaction among different levels due to the heavy computational burden. To alleviate this problem, we first revisit multi-scale problem in dense prediction, verifying the significance of diverse semantic representation and multi-scale interaction, and exploring the adaptation of transformer to pyramidal structure. Inspired by these findings, we propose a novel Semantic-aware Decoupled Transformer Pyramid (SDTP) for dense image prediction, consisting of Intra-level Semantic Promotion (ISP), Cross-level Decoupled Interaction (CDI) and Attention Refinement Function (ARF). ISP explores the semantic diversity in different receptive space through more flexible self-attention strategy. CDI builds the global attention and interaction among different levels in decoupled space which also solves the problem of heavy computation. Besides, ARF is further added to refine the attention in transformer. Experimental results demonstrate the validity and generality of the proposed method, which outperforms the state-of-the-art by a significant margin in dense image prediction tasks. Furthermore, the proposed components are all plug-and-play, which can be embedded in other methods. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

3. RoadCapsFPN: Capsule Feature Pyramid Network for Road Extraction From VHR Optical Remote Sensing Imagery

Author: Hanyun Wang, Haiyan Guan, Yongtao Yu, and Dilong Li
Subjects: Computer science, Remote sensing (archaeology), Feature (computer vision), business.industry, Mechanical Engineering, Automotive Engineering, Extraction (chemistry), Pyramid, Computer vision, Artificial intelligence, business, Computer Science Applications
Published: 2022
Full Text: View/download PDF

4. CDD-Net: A Context-Driven Detection Network for Multiclass Object Detection

Author: Yezi Wang, Ke Zhang, Jingyu Wang, Yulin Wu, Qiang Li, and Qi Wang
Subjects: Computer science, business.industry, Context (language use), Geotechnical Engineering and Engineering Geology, Object detection, Convolution, Data set, Feature (computer vision), Region of interest, Pyramid, Computer vision, Pyramid (image processing), Artificial intelligence, Electrical and Electronic Engineering, Focus (optics), business, Block (data storage)
Abstract: Unlike object detection in natural images that usually achieved great success, remote sensing imagery has its own challenges to detect and localize multiclass objects, such as large-scale change, uncertain direction, and high density. The context information of the objects is very worthwhile for solving these challenges in remote sensing images. In this letter, we propose a context-driven detection network (CDD-Net) to improve the accuracy of multiclass object detection in remote sensing images. For capturing the local neighboring objects and features, a local context feature network (LCFN) is proposed to learn the local context of the region of interest. Meanwhile, a hybrid attention pyramid network (HAPN) is designed, which can steer the focus to more valuable features. The HAPN inserts a squeeze and excitation block (SEB) and three asymmetric convolution blocks (ACBs) in the feature pyramid network (FPN). The experimental results over the DOTA-v1.5 data set demonstrate that the proposed CDD-Net yields promising results.
Published: 2022
Full Text: View/download PDF

5. Semantic Segmentation for Remote Sensing Images Using Pyramid Object-Based Markov Random Field With Dual-Track Information Transmission

Author: Hongtai Yao, Bowen Li, Gong Li, Le Zhao, Meng Tian, and Wang Xianpei
Subjects: Information transmission, Markov random field, Computer science, business.industry, Track (disk drive), Object based, DUAL (cognitive architecture), Geotechnical Engineering and Engineering Geology, Remote sensing (archaeology), Pyramid, Segmentation, Computer vision, Artificial intelligence, Electrical and Electronic Engineering, business
Published: 2022
Full Text: View/download PDF

6. YolTrack: Multitask Learning Based Real-Time Multiobject Tracking and Segmentation for Autonomous Vehicles

Author: Xuepeng Chang, Huijun Gao, Huihui Pan, and Weichao Sun
Subjects: Artificial neural network, Computer Networks and Communications, Computer science, business.industry, Multi-task learning, Frame rate, Computer Science Applications, Task (computing), Artificial Intelligence, Feature (computer vision), Video tracking, Pyramid, Embedding, Computer vision, Segmentation, Artificial intelligence, Pyramid (image processing), business, Software
Abstract: Modern autonomous vehicles are required to perform various visual perception tasks for scene construction and motion decision. The multiobject tracking and instance segmentation (MOTS) are the main tasks since they directly influence the steering and braking of the car. Implementing both tasks using a multitask learning neural network presents significant challenges in performance and complexity. Current work on MOTS devotes to improve the precision of the network with a two-stage tracking by detection model, which is difficult to satisfy the real-time requirement of autonomous vehicles. In this article, a real-time multitask network named YolTrack based on one-stage instance segmentation model is proposed to perform the MOTS task, achieving an inference speed of 29.5 frames per second (fps) with slight accuracy and precision drop. The YolTrack uses ShuffleNet V2 with feature pyramid network (FPN) as a backbone, from which two decoders are extended to generate instance segments and embedding vectors. Segmentation masks are used to improve the tracking performance by performing logic AND operation with feature maps, proving that foreground segmentation plays an important role in object tracking. The different scales of multiple tasks are balanced by the optimized geometric mean loss during the training phase. Experimental results on the KITTI MOTS data set show that YolTrack outperforms other state-of-the-art MOTS architectures in real-time aspect and is appropriate for deployment in autonomous vehicles.
Published: 2021
Full Text: View/download PDF

7. Pseudo‐Siamese residual atrous pyramid network for multi‐focus image fusion

Author: Changhe Tu, Jinjiang Li, Limai Jiang, and Hui Fan
Subjects: Image fusion, business.industry, Computer science, Multi focus, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Residual, QA76.75-76.765, Signal Processing, Pyramid, Photography, Computer vision, Computer Vision and Pattern Recognition, Artificial intelligence, Computer software, Electrical and Electronic Engineering, business, TR1-1050, Software
Abstract: Depth of field is one of the critical reasons to limit the richness of image information. Usually, in a scene with multiple targets, when the distance between each target and the lens is different, the clear scene image can be get within a certain distance range. This situation restricts the further image processing, such as semantic segmentation, object recognition and 3D reconstruction. Multi‐focus image fusion uses two or more images focused on different targets to fuse scene information, which can solve this problem to a great extent. In general, two or more multi‐focus images can cover almost all near/far targets. The fusion of more than two multi‐focus images can be accomplished by cascading the fusion results of the previous two images and the next image to be processed many times. Therefore, the paper focus on the fusion of two multi‐focus images. Inspired by this, new Pseudo‐Siamese neural network with several residual atrous convolution pyramids with multi‐level perception ability to perceive the multi‐level features and consistency relations of multi‐focus image pairs is proposed, and multi‐layer residual blocks are used to fuse the extracted features. In this process, the residual of the groundtruth and the generated image will be learned. Finally, a fully focused image without blur will be generated. After several ablation experiments and comparison experiments with other methods, the results show that the performance of the method proposed in this paper is state‐of‐the‐art, and overall better than other methods, which are advanced.
Published: 2021

8. Face Age Simulation based on Pyramid-cGAN and Aging Function

Author: Ig-Jae Kim, Sung Eun Choi, and Yu-Jin Hong
Subjects: Computer science, business.industry, Face (geometry), Pyramid, Computer vision, Function (mathematics), Artificial intelligence, business, Simulation based
Published: 2021
Full Text: View/download PDF

9. U‐FPNDet: A one‐shot traffic object detector based on U‐shaped feature pyramid module

Author: Xiao Ke and Jianping Li
Subjects: One shot, Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, QA76.75-76.765, Feature (computer vision), Signal Processing, Pyramid, Photography, Object detector, Computer vision, Computer Vision and Pattern Recognition, Artificial intelligence, Computer software, Electrical and Electronic Engineering, business, TR1-1050, Software
Abstract: In the field of automatic driving, identifying vehicles and pedestrians is the starting point of other automatic driving techniques. Using the information collected by the camera to detect traffic targets is particularly important. The main bottleneck of traffic object detection is due to the same category of targets, which may have different scales. For example, the pixel‐level of cars may range from 30 to 300 px, which will cause instability of positioning and classification. In this paper, a multi‐dimension feature pyramid is constructed in order to solve the multi‐scale problem. The feature pyramid is built by developing a U‐shaped module and using a cascade‐method. In order to verify the effectiveness of the U‐shaped module, we also designed a new one‐shot detector U‐FPNDet. The model first extracts the basic feature map by using the basic network and constructs the multi‐dimension feature pyramid. Next, a pyramid pooling module is used to get more context information from the scene. Finally, the detection network is run on each level of the pyramid to obtain the final result by NMS. By using this method, a state‐of‐the‐art performance is achieved on both detection and classification on commonly used benchmarks.
Published: 2021

10. Copy-Move Forgery Detection Based on Pyramid Correlation Network

Author: Peng Liang
Subjects: Copy move forgery, Computer science, business.industry, General Chemical Engineering, Pyramid, General Materials Science, Computer vision, Artificial intelligence, business, Industrial and Manufacturing Engineering
Abstract: Block-based image copy-move detection algorithms disregard the spatial layout of the features, leading to the poor detection performance under small-region tampering samples. Therefore, we propose a pyramid correlation network (PCNet) for copy-move forgery detection, whose goal is to obtain rich and detailed image representation via a pyramid cascaded correlation architecture. Experimental results show that PCNet outperforms the comparison algorithm on USCISI, CASIA and CoMoFoD data sets. Compared to the benchmark model BusterNet, F1 scores of PCNet has increased by 33.84% and 30.62% on CASIA CMFD dataset and CoMoFoD dataset respectively.
Published: 2021
Full Text: View/download PDF

11. Bi-directional skip connection feature pyramid network and sub-pixel convolution for high-quality object detection

Author: Xiaohai He, Wu Xiaohong, Shuqi Xiong, Linbo Qing, Honggang Chen, and Tong Chen
Subjects: 0209 industrial biotechnology, Pixel, business.industry, Computer science, Cognitive Neuroscience, Context (language use), 02 engineering and technology, Object detection, Computer Science Applications, Convolution, Upsampling, 020901 industrial engineering & automation, Artificial Intelligence, Feature (computer vision), Distortion, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Pyramid (image processing), Artificial intelligence, business, Interpolation
Abstract: In existing state-of-the-art object detectors, feature pyramid networks (FPN) and multiscale feature fusion are still typically used. The traditional FPN fusion strategy is based on the top-down fusion of high-level semantic information. The top-down fusion method generally uses upsampling based on interpolation, which often results in jagged edges, mosaic distortion, and edge blurring. Moreover, in order to improve accuracy, the FPN-based fusion strategy must add multiple top-down components for fusion, which increases computational costs and leads to a poor balance between precision and speed. In this paper, we propose a novel fusion strategy based on a backbone network. We aim to design simple and efficient components for high-quality object detection. Our proposed strategy, bi-directional skip connection FPN (BiSCFPN), consists of three components: a bi-directional skip connection (BiSC), a selective dilated convolution module (SDCM), and sub-pixel convolution (SP). The BiSC aims to enhance semantic information between different feature layers in the backbone network and simultaneously uses the SDCM to improve the receptive fields of differently sized targets in the fusion stage. Finally, SP learns the relationship between the features of upsampling and downsampling images to effectively mitigate the problems caused by the traditional interpolation method. BiSCFPN achieves an average precision of 38.2% in tests with the Microsoft Common Objects in Context (MS COCO) test-dev dataset at a real-time speed of ~ 50 FPS ( 608 × 608 ) using an Nvidia GeForce RTX 2080 Ti graphics card and significantly improves the balance between precision and speed.
Published: 2021
Full Text: View/download PDF

12. Haze concentration adaptive network for image dehazing

Author: Pengcheng Huang, Xiaoqin Zhang, Jiawei Xu, Li Zhao, and Tao Wang
Subjects: 0209 industrial biotechnology, Haze, Computer science, Structural similarity, business.industry, Cognitive Neuroscience, 02 engineering and technology, Computer Science Applications, Image (mathematics), 020901 industrial engineering & automation, Artificial Intelligence, Feature (computer vision), Pyramid, 0202 electrical engineering, electronic engineering, information engineering, Code (cryptography), 020201 artificial intelligence & image processing, Computer vision, Pyramid (image processing), Artificial intelligence, business
Abstract: Learning-based methods have attracted considerable interest in image dehazing. However, most existing methods are not well adapted to different hazy conditions, especially when dealing with the heavily hazy scene. There is often a significant amount of haze that remains in the images recovered by most methods. To address this issue, we propose an end-to-end Haze Concentration Adaptive Network (HCAN), including a pyramid feature extractor (PFE), a feature enhancement module (FEM), and a multi-scale feature attention module (MSFAM) for image dehazing. Specifically, PFE based on the feature pyramid structure leverages complementary features from different CNN layers to help the clear image prediction. Then, FEM fuses four kinds of images with different haze density (i.e., three recovered images in the FEM with light haze density, and the input hazy image with strong haze condition) to guide the network to adaptively perceive images under different haze conditions. Finally, MSFAM is designed under two principles, multi-scale structure and attention mechanism. It is used to help the network produce a clear image with more details, and ease the network training. Comprehensive experiments demonstrate that the proposed HCAN performs favorably against the state-of-the-art methods in terms of PSNR, SSIM, and visual effect. The results, per-trained models and code are available at https://github.com/TaoWangzj/HCAN .
Published: 2021
Full Text: View/download PDF

13. Orientation-Aware Vehicle Detection in Aerial Images via an Anchor-Free Object Detection Approach

Author: Tao Zhang, Tong Zhang, and Furong Shi
Subjects: Computer science, business.industry, Orientation (computer vision), Deep learning, Feature extraction, 0211 other engineering and technologies, Multi-task learning, 02 engineering and technology, Object detection, Feature (computer vision), Pyramid, General Earth and Planetary Sciences, Computer vision, Pyramid (image processing), Artificial intelligence, Electrical and Electronic Engineering, business, Aerial image, 021101 geological & geomatics engineering
Abstract: Vehicle detection in aerial images is an important and challenging task in the field of remote sensing. Recently, deep learning technologies have yielded superior performance for object detection in remote sensing images. However, the detection results of the existing methods are horizontal bounding boxes that ignore vehicle orientations, thereby having limited applicability in scenes with dense vehicles or clutter backgrounds. In this article, we propose a one-stage, anchor-free detection approach to detect arbitrarily oriented vehicles in high-resolution aerial images. The vehicle detection task is transformed into a multitask learning problem by directly predicting high-level vehicle features via a fully convolutional network. That is, a classification subtask is created to look for vehicle central points and three regression subtasks are created to predict vehicle orientations, scales, and offsets of vehicle central points. First, coarse and fine feature maps outputted from different stages of a residual network are concatenated together by a feature pyramid fusion strategy. Upon the concatenated features, four convolutional layers are attached in parallel to predict high-level vehicle features. During training, task uncertainty learned from the training data is used to weight loss function in the multitask learning setting. For inferencing, oriented bounding boxes are generated using the predicted vehicle features, and oriented nonmaximum suppression (NMS) postprocessing is used to reduce redundant results. Experiments on two public aerial image data sets have shown the effectiveness of the proposed approach.
Published: 2021
Full Text: View/download PDF

14. SPMNet: A light-weighted network with separable pyramid module for real-time semantic segmentation

Author: Shiwei Gao, Zhuping Wang, Hao Zhang, Changzhu Zhang, and Chao Huang
Subjects: Artificial Intelligence, Computer science, business.industry, Pyramid, Robot, Segmentation, Computer vision, Weighted network, Artificial intelligence, business, Software, Theoretical Computer Science, Separable space
Abstract: Real-time semantic segmentation aims to generate high-quality prediction in limited time. Recently, with the development of many related potential applications, such as autonomous driving, robot se...
Published: 2021
Full Text: View/download PDF

15. Feature Pyramid Network-based Long-Distance Drone Detection Method

Author: JeongIn Kwon, Jihun Cha, Lee Injae, SoHeeSon, Jinwoo Jeon, and Choi Haechul
Subjects: business.industry, Computer science, Feature (computer vision), Deep learning, Pyramid, Computer vision, Artificial intelligence, business, Drone, Object detection
Published: 2021
Full Text: View/download PDF

16. Automatic Vehicle Counting and Tracking in Aerial Video Feeds using Cascade Region-based Convolutional Neural Networks and Feature Pyramid Networks

Author: Yomna Youssef and Mohamed Elshenawy
Subjects: 050210 logistics & transportation, Data collection, Computer science, business.industry, Mechanical Engineering, 05 social sciences, 02 engineering and technology, Aerial video, Tracking (particle physics), Convolutional neural network, Drone, Cascade, Feature (computer vision), 0502 economics and business, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, Civil and Structural Engineering
Abstract: Unmanned aerial vehicles, or drones, are poised to solve many problems associated with data collection in complex urban environments. Drones are easy to deploy, have a great ability to move and explore the environment, and are relatively cheaper than other data collection methods. This study investigated the use of Cascade Region-based convolutional neural network (R-CNN) networks to enable automatic vehicle counting and tracking in aerial video streams.The presented technique combines feature pyramid networks and a Cascade R-CNN architecture to enable accurate detection and classification of vehicles.The paper discusses the implementation and evaluation of the detection and tracking techniques and highlights their advantages when they are used to collect traffic data.
Published: 2021
Full Text: View/download PDF

17. A pyramid non‐local enhanced residual dense network for single image de‐raining

Author: Shuangli Du, Hengrui Fan, Jing Hu, Minghua Zhao, Li Wang, and Peng Li
Subjects: Computer science, business.industry, Non local, Residual, QA76.75-76.765, Signal Processing, Pyramid, Photography, Computer vision, Computer software, Computer Vision and Pattern Recognition, Artificial intelligence, Electrical and Electronic Engineering, Single image, TR1-1050, business, Software
Abstract: Single image de‐raining based on convolutional neural network (CNN) has made considerable progress in recent years. However, usually the de‐rained result has dark artifacts and image textures tend to be over‐smoothed. In this paper, a pyramid non‐local enhanced residual dense network is proposed to reduce such distortion. Firstly, the down‐sampled images are input into the Laplacian pyramid, which can extract the overall and partial texture clues, and subsequently a set of images of different scales are produced. Secondly, these images are fed into a non‐local enhanced residual dense block, which can not only capture long‐distance dependencies of feature maps, but also fully utilizes the hierarchical features in every dense block, leading to high accuracy of rain streaks extraction and better preservation of image edge detail. Finally, the de‐rained image is gradually restored by Gaussian reconstruction pyramid. Experimental results on both synthetic data and real‐world data show that the artifacts distortion is obviously reduced by the proposed network. And the quality of de‐rained image is significantly improved compared with the state‐of‐the‐art methods.
Published: 2021
Full Text: View/download PDF

18. Pyramid Frequency Feature Fusion Object Detection Networks

Author: Mao Lin, Yang Dawei, Xuemeng Li, and Rubo Zhang
Subjects: Feature fusion, business.industry, Computer science, Pyramid, Computer vision, Artificial intelligence, business, Computer Graphics and Computer-Aided Design, Software, Object detection
Published: 2021
Full Text: View/download PDF

19. A novel method for vehicle headlights detection using salient region segmentation and PHOG feature

Author: Huaping Guan, Jinxia Shang, Hongbo Bi, Yun Liu, Lina Yang, and Minghui Wang
Subjects: Brightness, Computer Networks and Communications, business.industry, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020207 software engineering, 02 engineering and technology, Histogram of oriented gradients, Hardware and Architecture, Salient, Feature (computer vision), Pyramid, 0202 electrical engineering, electronic engineering, information engineering, Media Technology, Segmentation, Computer vision, Pyramid (image processing), Artificial intelligence, business, Software
Abstract: In this paper, we explore an issue that is to detect vehicle headlights from the nighttime traffic surveillance images with highly reflections. In the night, reflections on the road (water) surface, vehicles bodies, or some reflective objects (such as lane markings, traffic signs) will interfere the headlight detection seriously. Although the existing methods have achieved good results, however, most of them failed to detect the headlight when headlights are far from camera. In order to solve the issue, we propose a novel method for vehicle headlights detection. The proposed method makes full use of the brightness and gradient information of the headlights in the night. First, we propose an effective region-of-interest (ROI) segmentation method which is based on multi-scale local saliency detection. The method pre-serve faint or small-sized objects and retain the original shape of the object to the greatest extent. Then, we compute the pyramid histogram of oriented gradients (PHOG) features, which are used to train support vector machine (SVM) classifier. Finally, the extracted bright blocks are classified according to the pre-trained SVM classifier. Experimental results and quantitative evaluations in different scenes demonstrate that our proposed method can achieve a better result compared with previous methods.
Published: 2021
Full Text: View/download PDF

20. Encoder-Decoder Structure With the Feature Pyramid for Depth Estimation From a Single Image

Author: Mengxia Tang, Songnan Chen, Ruifang Dong, and Jiangming Kan
Subjects: single image, General Computer Science, business.industry, Computer science, Feature extraction, encoder-decoder, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, General Engineering, Image (mathematics), Convolution, Depth prediction, Upsampling, feature pyramid, Feature (computer vision), Depth map, Pyramid, General Materials Science, Computer vision, lcsh:Electrical engineering. Electronics. Nuclear engineering, Artificial intelligence, Pyramid (image processing), business, lcsh:TK1-9971, Encoder
Abstract: We address the problem of depth estimation from a single monocular image in the paper. Depth estimation from a single image is an ill-posed and inherently ambiguous problem. In the paper, we propose an encoder-decoder structure with the feature pyramid to predict the depth map from a single RGB image. More specifically, the feature pyramid is used to detect objects of different scales in the image. The encoder structure aims to extract the most representative information from the original image through a series of convolution operations and to reduce the resolution of the input image. We adopt Res2-50 as the encoder to extract important features. The decoder section uses a novel upsampling structure to improve the output resolution. Then, we also propose a novel loss function that adds gradient loss and surface normal loss to the depth loss, which can predict not only the global depth but also the depth of fuzzy edges and small objects. Additionally, we use Adam as our optimization function to optimize our network and speed up convergence. Our extensive experimental evaluation proves the efficiency and effectiveness of the method, which is competitive with previous methods on the Make3D dataset and outperforms state-of-the-art methods on the NYU Depth v2 dataset.
Published: 2021
Full Text: View/download PDF

21. A Dense Feature Pyramid Network-Based Deep Learning Model for Road Marking Instance Segmentation Using MLS Point Clouds

Author: Hao Ma, Ruofei Zhong, Lirong Liu, Siyun Chen, Zhenxin Zhang, and Liqiang Zhang
Subjects: Computer science, business.industry, Deep learning, Feature extraction, 0211 other engineering and technologies, Point cloud, 02 engineering and technology, Image segmentation, Feature (computer vision), Pyramid, General Earth and Planetary Sciences, Computer vision, Segmentation, Pyramid (image processing), Artificial intelligence, Electrical and Electronic Engineering, business, 021101 geological & geomatics engineering
Abstract: Accurate and efficient extraction of road marking plays an important role in road transportation engineering, automotive vision, and automatic driving. In this article, we proposed a dense feature pyramid network (DFPN)-based deep learning model, by considering the particularity and complexity of road marking. The DFPN concatenated its shallow feature channels with deep feature channels so that the shallow feature maps with high resolution and abundant image details can utilize the deep features. Thus, the DFPN can learn hierarchical deep detailed features. The designed deep learning model was trained end to end for road marking instance extraction with mobile laser scanning (MLS) point clouds. Then, we introduced the focal loss function into the optimization of deep learning model in road marking segmentation part, to pay more attention to the hard-classified samples with a large extent of background. In the experiments, our method can achieve better results than state-of-the-art methods on instance segmentation of road markings, which illustrated the advantage of the proposed method.
Published: 2021
Full Text: View/download PDF

22. OPMP: An Omnidirectional Pyramid Mask Proposal Network for Arbitrary-Shape Scene Text Detection

Author: Zhongrong Wei, Sheng Zhang, Chunhua Shen, Lianwen Jin, and Liu Yuliang
Subjects: business.industry, Computer science, Feature extraction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, Image segmentation, Computer Science Applications, Component (UML), Signal Processing, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, Media Technology, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, Pyramid (image processing), Electrical and Electronic Engineering, business, Omnidirectional antenna
Abstract: Scene text detection methods have achieved significant progresses. However, stack-omnidirectional text dilemma, under-segmentation of very close text words, and over-segmentation of arbitrary-shape long text lines, are still main challenges. Motivated by these problems, we proposed a two stage method called omnidirectional pyramid mask proposal text detector (OPMP). OPMP removes anchor mechanism that requires heuristic non-maximum suppress processing. Instead, it uses an effective pyramid lengthwise and sidewise residual sequence modeling method to produce arbitrary-shape proposals. To accurately extract the features of text shape, OPMP enhances the backbone layers by a multiple arbitrary-shape fitting mechanism. Finally, a multi-grain text classification module is proposed, which reclassifies each text region robustly. Comprehensive ablation studies demonstrate the effectiveness of each proposed component. In addition, experiments on various benchmarks, including ICDAR2015, MLT, MSRA-TD500, CTW1500, and Total-text, show that our method outperforms previous state-of-the-art methods.
Published: 2021
Full Text: View/download PDF

23. Video Frame Interpolation and Enhancement via Pyramid Recurrent Framework

Author: Wang Shen, Guangtao Zhai, Zhiyong Gao, Li Chen, Wenbo Bao, and Xiongkuo Min
Subjects: Compression artifact, Computer science, business.industry, Motion blur, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, Frame rate, Computer Graphics and Computer-Aided Design, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Noise (video), Artificial intelligence, Pyramid (image processing), Motion interpolation, Focus (optics), business, Software, Reference frame
Abstract: Video frame interpolation aims to improve users' watching experiences by generating high-frame-rate videos from low-frame-rate ones. Existing approaches typically focus on synthesizing intermediate frames using high-quality reference images. However, the captured reference frames may suffer from inevitable spatial degradations such as motion blur, sensor noise, etc. Few studies have approached the joint video enhancement problem, namely synthesizing high-frame-rate and high-quality results from low-frame-rate degraded inputs. In this paper, we propose a unified optimization framework for video frame interpolation with spatial degradations. Specifically, we develop a frame interpolation module with a pyramid structure to cyclically synthesize high-quality intermediate frames. The pyramid module features adjustable spatial receptive field and temporal scope, thus contributing to controllable computational complexity and restoration ability. Besides, we propose an inter-pyramid recurrent module to connect sequential models to exploit the temporal relationship. The pyramid module integrates the recurrent module, thus can iteratively synthesize temporally smooth results. And the pyramid modules share weights across iterations, thus it does not expand the model's parameter size. Our model can be generalized to several applications such as up-converting the frame rate of videos with motion blur, reducing compression artifacts, and jointly super-resolving low-resolution videos. Extensive experimental results demonstrate that our method performs favorably against state-of-the-art methods on various video frame interpolation and enhancement tasks.
Published: 2021
Full Text: View/download PDF

24. Salient Object Detection in Stereoscopic 3D Images Using a Deep Convolutional Residual Autoencoder

Author: Junwei Wu, Lu Yu, Wujie Zhou, Jenq-Neng Hwang, and Jingsheng Lei
Subjects: Computer science, business.industry, Feature extraction, Supervised learning, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Stereoscopy, 02 engineering and technology, Autoencoder, Object detection, Computer Science Applications, law.invention, law, Feature (computer vision), Signal Processing, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, Media Technology, RGB color model, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, Pyramid (image processing), Electrical and Electronic Engineering, business, Encoder
Abstract: In recent years, the detection of distinctive objects in stereoscopic 3D images has drawn increasing attention. Unlike 2D salient object detection, salient object detection in stereoscopic 3D images is highly challenging. Hence, we propose a novel Deep Convolutional Residual Autoencoder (DCRA) for end-to-end salient object detection in stereoscopic 3D images. The core trainable architecture of the salient object detection model employs raw stereoscopic 3D images as the inputs and their corresponding ground truth saliency masks as the labels. A convolutional residual module is applied to both the encoder and the decoder as a basic building block in the DCRA, and long-range skip connections are employed to bypass the equal-sized feature maps between the encoder and the decoder. To explore the complex relationships and exploit the complementarity between RGB (photometric) and depth (geometric) information, multiple feature map fusion modules are constructed. These modules integrate texture and structure information between the RGB and depth branches of the encoder and fuse their features over several multiscale layers. Finally, to efficiently optimize DCRA parameters, a supervision pyramid based on boundary loss and background prior loss is adopted, which employs supervised learning over the multiscale layers in the decoder to prevent vanishing gradients and accelerate the training at the fusion stage. We compare the proposed DCRA with state-of-the-art methods on two challenging benchmark datasets. The results of these experiments demonstrate that our proposed DCRA performs favorably against the comparison models.
Published: 2021
Full Text: View/download PDF

25. PA-MVSNet: Sparse-to-Dense Multi-View Stereo With Pyramid Attention

Author: Ke Zhang, Mengyu Liu, Jinlai Zhang, and Zhenbiao Dong
Subjects: Schedule, General Computer Science, Computer science, Feature extraction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, Iterative reconstruction, Iterative refinement, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Computer vision, business.industry, 3D reconstruction, pyramid attention, General Engineering, deep learning, 020207 software engineering, Object detection, Multi-view stereo, Feature (computer vision), 020201 artificial intelligence & image processing, lcsh:Electrical engineering. Electronics. Nuclear engineering, Artificial intelligence, business, depth estimate, lcsh:TK1-9971, point cloud
Abstract: Multi-view based 3D reconstruction aims to obtain 3D structure information of objects in space through two-dimensional images. In this paper, we propose a new multi-view stereo network that can robustly reconstruct the scene. To enhance the feature representation ability of Point-MVSNet, a pyramid attention module is introduced. Specifically, we exploit the attention mechanism for the multi-scale feature pyramid to capture larger receptive fields and richer information. Instead of constructing a feature pyramid as the input, results of the pyramid attention module at different scales are directly used for the next layer. The network eventually generates a high-quality depth estimation for 3D reconstruction from sparse to dense by an iterative refinement schedule. Experiments have been performed to evaluate 3D reconstruction quality by comparison with existing state-of-the-art methods on the DTU dataset. The experimental results indicate our method performs the best in overall quality compared with previous methods, proving the effectiveness of our method. In the end, we use the data collected by mobile devices to implement 3D reconstruction with a combination of traditional and learning-based methods, providing ideas for the 3D reconstruction technology on mobile devices.
Published: 2021
Full Text: View/download PDF

26. MCnet: Multiple Context Information Segmentation Network of No-Service Rail Surface Defects

Author: Menghui Niu, Jing Xu, Yunhui Yan, Defu Zhang, Yu He, and Kechen Song
Subjects: business.industry, Computer science, 020208 electrical & electronic engineering, Feature extraction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Context (language use), 02 engineering and technology, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, Computer vision, Segmentation, Noise (video), Artificial intelligence, Pyramid (image processing), Electrical and Electronic Engineering, business, Instrumentation, Block (data storage)
Abstract: Surface defect segmentation of no-service rail is important for its quality assessment. There are several challenges of uneven illumination, complex background, and difficulty of sample collection for no-service rail surface defects (NRSDs). In this article, we propose an acquisition scheme with two lamp light and color scan line charge-coupled device (CCD) to alleviate uneven illumination. Then, a multiple context information segmentation network is proposed to improve NRSD segmentation. The network makes full use of context information based on dense block, pyramid pooling module, and multi-information integration. Besides, the attention mechanism is applied to optimize extracted information by filtering noise. For the problem of real sample shortage, we propose to utilize artificial samples to train the network. And an NRSD data set NRSD-MN is built with artificial NRSDs and natural NRSDs. Experimental results show that our method is feasible and has a good segmentation effect on artificial and natural NRSDs.
Published: 2021
Full Text: View/download PDF

27. Pose-Guided Part-Based Adaptive Pyramid Features for Occluded Person Reidentification

Author: Xiaobing Lin, Zengxi Huang, Jilin Li, and Xiaoqin Tang
Subjects: Scheme (programming language), Article Subject, Matching (graph theory), Computer science, General Mathematics, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, Task (project management), Pyramid, QA1-939, 0202 electrical engineering, electronic engineering, information engineering, Computer vision, computer.programming_language, Artificial neural network, business.industry, General Engineering, 020207 software engineering, Engineering (General). Civil engineering (General), Partition (database), 020201 artificial intelligence & image processing, Artificial intelligence, TA1-2040, business, Focus (optics), computer, Mathematics
Abstract: Reidentifying an occluded person across nonoverlapping cameras is still a challenging task. In this work, we propose a novel pose-guided part-based adaptive pyramid neural network for occluded person reidentification. Firstly, to alleviate the impact of occlusion, we utilize pose landmarks to generate pose-guided attention maps. The attention maps will help the model focus on the nonoccluded regions. Secondly, we use pyramid pooling to extract multiscale features in order to address the scale variation problem. The generated pyramid features are then multiplied by attention maps to achieve pose-guided adaptive pyramid features. Thirdly, we propose a pose-guided body part partition scheme to deal with the alignment problem. Accordingly, the adaptive pyramid features are divided into partitions and fed into individual fully connected layers. In the end, all the part-based matching scores are fused with a weighted sum rule for person reidentification. The effectiveness of our method is clearly validated by the experimental results on two popular occluded and holistic datasets, i.e., Occluded-DukeMTMC and the Market-1501.
Published: 2020
Full Text: View/download PDF

28. The Effect of Pyramid Method Digital Processing to Enhance the Ground Goals Images for Masscrafts

Author: Rana Ali Salim
Subjects: Human-Computer Interaction, Information Systems and Management, Computer science, business.industry, Pyramid, Computer vision, Artificial intelligence, Library and Information Sciences, business, Software
Abstract: While Mechanism of Objective Marking of Automated Marking is approaching the latest technique mainly; the approach has an important aspect; where assansoyn fellow complete the mechanical teaching curriculum by filling up the niche in the near term. While inexplicable isn't fully understood, validates discrimination decisions in mechanical teaching, thus instilling confide in ML goal calls. Alternatively, the approach via can act as a standalone element, especially in scenarios where a little amount of benefit, for example, "Today's the interactions of curricula via don't require train data, and thus prove different mechanical teaching curricula for accurate train data. So that an example-via approach, which screen shoot prominent goal shape data to identifying in a large-scale satellite imagery. The right mix of coarse three dimensions aims finds the abstract form and realism of the goals to provide strength against objective differences discrimination at the same time. The curriculum uses powerful new forms of image correlation to match the shape of expected objectives with the image. Look for shape projections about setting, and use engineering property objectives and shadow projections. Binding factors provide tolerance to lighting differences, temperate covers; where many true subjects. To provide distinguishing digital objective on realistic satellite imagery that illustrates performance.
Published: 2020
Full Text: View/download PDF

29. HAPNet: hierarchically aggregated pyramid network for real-time stereo matching

Author: Mirek Janatka, Patrick Brandao, E Mazomenos, Dimitris Psychogyios, and Danail Stoyanov
Subjects: genetic structures, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Biomedical Engineering, Computational Mechanics, Stereo matching, 02 engineering and technology, Convolutional neural network, GeneralLiterature_MISCELLANEOUS, 030218 nuclear medicine & medical imaging, 03 medical and health sciences, 0302 clinical medicine, Surgical site, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, Radiology, Nuclear Medicine and imaging, Computer vision, ComputingMethodologies_COMPUTERGRAPHICS, business.industry, food and beverages, eye diseases, Computer Science Applications, Computer-aided diagnosis, 020201 artificial intelligence & image processing, sense organs, Artificial intelligence, business
Abstract: Recovering the 3D shape of the surgical site is crucial for multiple computer-assisted interventions. Stereo endoscopes can be used to compute 3D depth but computational stereo is a challenging, no...
Published: 2020
Full Text: View/download PDF

30. Image Dehazing Method of Transmission Line for Unmanned Aerial Vehicle Inspection Based on Densely Connection Pyramid Network

Author: Ma Fuqi, Rong Jia, Xiaoyang Wang, Wei Li, and Jun Liu
Subjects: Technology, Brightness, Haze, Article Subject, Computer Networks and Communications, business.industry, Computer science, Image quality, media_common.quotation_subject, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, TK5101-6720, Image (mathematics), Transmission line, Pyramid, Telecommunication, Transmittance, Contrast (vision), Computer vision, Pyramid (image processing), Artificial intelligence, Electrical and Electronic Engineering, business, Information Systems, media_common
Abstract: The quality of the camera image directly determines the accuracy of the defect identification of the transmission line equipment. However, complex external factors such as haze can seriously affect the image quality of the aircraft. The traditional image dehazing methods are difficult to meet the needs of enhanced image inspection in complex environments. In this paper, the image enhancement technology in haze environment is studied, and an image dehazing method of transmission line based on densely connection pyramid network is proposed. The method uses an improved pyramid network for transmittance map calculation and uses an improved U-net network for atmospheric light value calculation. Then, the transmittance map, atmospheric light value, and dehazed image are jointly optimized to obtain image dehazing model. The method proposed in this paper can improve image brightness and contrast, increase image detail information, and can generate more realistic deblur images than traditional methods.
Published: 2020
Full Text: View/download PDF

31. Novel up-scale feature aggregation for object detection in aerial images

Author: Yanfen Gan, Chi-Man Vong, Hu Lin, Jingkai Zhou, and Qiong Liu
Subjects: 0209 industrial biotechnology, business.industry, Computer science, Cognitive Neuroscience, Detector, 02 engineering and technology, Object (computer science), Frame rate, Object detection, Computer Science Applications, 020901 industrial engineering & automation, Artificial Intelligence, Feature (computer vision), Pyramid, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, Pyramid (image processing), business, Scale (map)
Abstract: Object detection is a pivotal task for many unmanned aerial vehicle (UAV) applications. Compared to general scenes, the objects in aerial images are typically much smaller. For this reason, most general object detectors suffer from two critical challenges while dealing with aerial images: 1) The widely exploited Feature Pyramid Network works by integrating high-level features to lower levels progressively. However, this manner does not transfer equivalent information from each level of backbone network to the generated features, and the shared detection head faces an unbalanced sources of information flow, damaging the detection accuracy. 2) Up-sampling is commonly used to expand feature resolution for feature fusion or feature aggregation. However, existing up-sampling methods are ineffective to reconstruct high resolution feature maps. To address these two challenges, two works are proposed: 1) An up-scale feature aggregation framework that fully utilizes multi-scale complementary information, and 2) a novel up-sampling method that further improve detection accuracy. These two proposals are integrated into an end-to-end single-stage object detector namely HawkNet. Extensive experiments are conducted on VisDrone-DET2018, UAVDT and DIOR datasets. Compared to the RetinaNet baseline, our HawkNet achieves absolute gains of 6.0%, 1.2% and 5.9% in average precision (AP) on VisDrone-DET2018, UAVDT and DIOR datasets, respectively. For a 800 × 1333 input on the UAVDT dataset, HawkNet with ResNet-50 backbone surpasses existing methods for single-scale inference and achieves the best performance (37.4 AP), while operating at 10.6 frames per second on a single Nvidia GTX 1080Ti GPU.
Published: 2020
Full Text: View/download PDF

32. Local Riesz Pyramid for Faster Phase-Based Video Magnification

Author: Megumi Isogai, Hideaki Kimata, Shoichiro Takeda, and Shinya Shimizu
Subjects: Artificial Intelligence, Hardware and Architecture, business.industry, Computer science, Pyramid, Phase (waves), Magnification, Computer vision, Computer Vision and Pattern Recognition, Artificial intelligence, Electrical and Electronic Engineering, business, Software
Published: 2020
Full Text: View/download PDF

33. Medical image super-resolution with laplacian dense network

Author: Rongzhu Zhang, Awais Ahmad, Marcelo Keese Albertini, Lihui Chen, Xiaomin Yang, and Rui Tang
Subjects: Computer Networks and Communications, business.industry, Computer science, 020207 software engineering, 02 engineering and technology, Resolution (logic), Superresolution, Image (mathematics), Task (computing), Hardware and Architecture, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, Media Technology, Medical imaging, Laplacian pyramid, Computer vision, Artificial intelligence, Pyramid (image processing), business, Laplace operator, Software
Abstract: High resolution medical images are expected for accurate analysis results in medical diagnosis. However, the resolution of these medical images is always restricted by the factors such as medical devices, time constraints. Despite these restrictions, the resolution of these medical images can be enhanced with a well-designed super-resolution(SR) algorithm. As a post-processing manner after medical imaging, the adoption of the SR algorithms has the advantages of low cost and high efficiency compared with upgrading medical devices. In this paper, we propose a network named LDSRN that combines the Laplacian pyramid structure and the dense network to reconstruct clear and convincing medical HR images. Our LDSRN can make full use of the information from different pyramid levels to recover faithful HR images by the dense connection. Specifically, the Laplacian structure decomposes the difficult SR task into several easy SR tasks to obtain the HR images step by step for better reconstruction. Experimental results demonstrate that our LDSRN can obtain better HR medical images than several state-of-the-art SR methods in terms of objective indices and subjective evaluations.
Published: 2020
Full Text: View/download PDF

34. Object detector with enriched global context information

Author: Caihong Yuan, Feng Ping, Zhiqiang Zhao, Tianjiang Wang, Yihao Luo, and Jingjuan Guo
Subjects: Computer Networks and Communications, business.industry, Computer science, Detector, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020207 software engineering, 02 engineering and technology, Pascal (programming language), Object detection, Hardware and Architecture, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, Media Technology, Object detector, Computer vision, Artificial intelligence, business, computer, Software, computer.programming_language
Abstract: How to add more context information and bring more accurate detection is an important problem to be considered in object detection. In this paper, we propose a new object detector with enriched global context information by a pyramid feature pool module and several global activation blocks, named EGCI-Net, which is a one-stage object detector from scratch as DSOD.The global activation blocks are added into the backbone sub network of the detector to weaken the local information of the detected object feature maps and increase the global context of them. And the pyramid feature pool module produces multi-scale global context features to supervise the pyramid features by multi-scale global average pooling. Then the features obtained by the main structure are fused with the pyramid pooling features to merge into the final multibox detector. We have evaluated our detector on the Pascal VOC and MS COCO datasets. The experimental results show that our proposed detector achieves better results than DSOD and exceeds most of the existing excellent detectors, especially detects partially occluded objects and small objects well.
Published: 2020
Full Text: View/download PDF

35. Deep Multi-Scale Context Aware Feature Aggregation for Curved Scene Text Detection

Author: Hua Zhang, Xiaochun Cao, and Pengwen Dai
Subjects: business.industry, Computer science, Text segmentation, Feature extraction, Context (language use), 02 engineering and technology, Image segmentation, Computer Science Applications, Minimum bounding box, Feature (computer vision), Signal Processing, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, Media Technology, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, Pyramid (image processing), Electrical and Electronic Engineering, business
Abstract: Scene text plays a significant role in image and video understanding, which has made great progress in recent years. Most existing models on text detection in the wild have the assumption that all the texts are surrounded by a rotated rectangle or quadrangle. While there also exist lots of curved texts in the wild, which would not be bounded by a regular bounding box. In this paper, we develop a novel architecture to localize the text regions, which can deal with curved-shape scene texts. Specifically, we first design a text-related feature enhancement module by incorporating the prior knowledge of the text shape to enhance the feature representations. After that, based on the enhanced features, we employ a region proposal network to generate the candidate boxes of scene texts. For each text candidate, a pyramid region-of-interest pooling attention module is utilized to extract the fixed-size features. Finally, we exploit the box-aware context-based text segmentation module and box refinement network to obtain the location of scene text. Experiments are conducted on four challenging benchmarks ${CTW1500}$ , ${totalTEXT}$ , ${ICDAR-2015}$ and ${MLT}$ , and the experimental results have demonstrated the superiority of our model.
Published: 2020
Full Text: View/download PDF

36. HIERARCHICAL APPROACH FOR DETECTING CHANGES WITH THE USE OF DIFFERENT PYRAMID LEVELS IN DENSE IMAGE MATCHING

Author: M. Pilarska
Subjects: lcsh:Applied optics. Photonics, 010504 meteorology & atmospheric sciences, Computer science, business.industry, lcsh:T, 0211 other engineering and technologies, Point cloud, Ground sample distance, lcsh:TA1501-1820, 02 engineering and technology, 01 natural sciences, lcsh:Technology, Field (geography), Photogrammetry, lcsh:TA1-2040, Pyramid, Computer vision, Artificial intelligence, Pyramid (image processing), business, lcsh:Engineering (General). Civil engineering (General), Spatial analysis, Change detection, 021101 geological & geomatics engineering, 0105 earth and related environmental sciences
Abstract: Many cities order spatial data systematically, in particular aerial nadir images and orthophotomaps. However, only the orthoimages and orthophotomaps are usually used by the city administration, particularly in spatial planning. Some of the users are not aware of the possibilities as to how the aerial images can be used. Spatial data users, who may not be specialists in photogrammetry, are sometimes not aware that it is possible to obtain 3D information from 2D images as a point cloud. The idea of dense image matching (DIM) is well-known and described in the field of photogrammetry. Although dense image matching is a time- and memory-consuming process, this does not present a major drawback with modern computing. Images for the test area – Warsaw – are characterised by Ground Sampling Distance (GSD) equal to 8 cm. These images can be successfully used in change detection processes, comparing the dense image matching point cloud from two different dates. What is important while considering land cover change detection, is that it is not necessary to generate a detailed and high-density point cloud, e.g. in order to detect changes in buildings. The main idea of the article is to present the possibility of using higher levels of images pyramid in dense image matching within the change detection process as a way to optimize the processing time and point cloud accuracy. Which level of pyramid is needed to detect different changes in urban land cover will also be discussed.
Published: 2020

37. Video retrieval using salient foreground region of motion vector based extracted keyframes and spatial pyramid matching

Author: Susanta Mukhopadhyay and Ajay Kumar Mallick
Subjects: Matching (statistics), Computer Networks and Communications, Computer science, Property (programming), business.industry, Feature vector, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020207 software engineering, Usability, 02 engineering and technology, Object (computer science), Motion vector, Automatic summarization, Hardware and Architecture, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, Media Technology, Saliency map, Computer vision, Pyramid (image processing), Artificial intelligence, business, Software
Abstract: Despite enormous research efforts devoted by the research community to effectively and precisely perform video matching and retrieval among heterogeneous videos from large-scale video repositories still remains a complex and most challenging task. In order to address this complex challenge, a content based video retrieval technique is required, which can exploit the visual content of the videos for effective retrieval from the videos repositories. In our proposed method, we introduce a computer assisted video retrieval technique which can retrieve the visually similar videos stored in the repositories. To accomplish this task, video summarization based on motion vector is employed to select keyframes based on similar segments. To estimate the video content, salient foreground extraction is executed, and matching based on the spatial pyramid is employed for matching the keyframe features of query video with videos in the repositories. The contribution of the former process has two major sections for superior saliency map generation. Firstly, it heuristically integrates the regional property, contrast, and foreground descriptors together. Secondly, it introduces a new feature vector to characterize the foreground as an object descriptor, while the latter process is the extension of orderless bag-of-features representation, which has significant performance with respect to scene categorization. The video retrieval performance is compared with standard state-of-the-art techniques using real-time datasets. Experimental and usability studies provide satisfactory results for video retrieval based on evaluation metrics such as video sampling error, fidelity, precision, and recall.
Published: 2020
Full Text: View/download PDF

38. Scale-aware feature pyramid architecture for marine object detection

Author: Huibing Wang, Fengqiang Xu, Xianping Fu, and Jinjia Peng
Subjects: 0209 industrial biotechnology, Computer science, business.industry, Feature extraction, 02 engineering and technology, Pascal (programming language), Object detection, Upsampling, 020901 industrial engineering & automation, Artificial Intelligence, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, Architecture, business, computer, Software, computer.programming_language
Abstract: Marine object detection is an appealing but challengeable task in computer vision. Even though recent popular object detection algorithms perform well on common classes, they cannot acquire satisfied detection performance on marine objects because underwater images are affected by color cast and blur, and scales of the target in underwater images are usually small. These phenomena aggravate the difficulty of detection. Thus, it is urgent to design a proper structure to settle marine object detection issues. To this end, this paper proposes a novel scale-aware feature pyramid architecture named SA-FPN to extract abundant robust features on underwater images and improve the performance on marine object detection. Specifically, we design a special backbone subnetwork to improve the ability of feature extraction, which could provide richer fine-grained features for small object detection. What is more, this paper proposes a multi-scale feature pyramid to enrich the semantic features for prediction. Each feature map is enhanced by the higher level layer with context information through a top-down upsampling pathway. Through obtaining ample feature maps on underwater images, our algorithm could generate multiple bounding boxes for each target. To mitigate the reduplicative boxes and avoid miss suppression, we replace the non-maximum suppression method with soft non-maximum suppression. In this paper, we evaluate our algorithm on underwater image datasets and achieve 76.27% mAP. Meanwhile, we conduct experiments on PASCAL VOC datasets and smart unmanned vending machines datasets and get 79.13% mAP and 91.81% mAP, respectively. The experimental results reveal that our approach achieves best performance not only on marine object detection, but also on common classes.
Published: 2020
Full Text: View/download PDF

39. Methodology for the detection of accidents in the company based on the Bird pyramid

Author: Lina Mariana Carpinteyro-Chavez, Alexandra Berenice Zamudio-Rodríguez, Argelia Teon-Veja, and Silvia María Balderas-López
Subjects: Computer science, business.industry, Pyramid, Computer vision, Artificial intelligence, business
Abstract: Accidents at work are a very important issue within companies, because they reflect losses not only economic or raw material, but can affect the lives of workers, therefore it is important to develop a Methodology for the detection of accidents that allows the control of accident investigations. To carry out this project, it was necessary to carry out three phases, the first of which was research, in which technical and regulatory information sources applicable to the program were reviewed. In the second stage, the accident investigation process was designed, in which variables such as: investigation time, communication time, personnel and accident investigation follow-up were considered. Finally, a pilot run was carried out to identify the areas of opportunity of the program and define the corrective actions. Through the detection, management and control of accidents, it is possible to decrease the accident rate, this methodology identifies the root cause of the problem and generates corrective or preventive actions to avoid repeating said accident.
Published: 2020
Full Text: View/download PDF

40. Small Object Detection in Unmanned Aerial Vehicle Images Using Feature Fusion and Scaling-Based Single Shot Detector With Spatial Context Analysis

Author: Xi Liang, Li Zhuo, Qi Tian, Li Yuzhao, and Jing Zhang
Subjects: Spatial contextual awareness, Computer science, business.industry, Feature extraction, Detector, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, Object (computer science), Object detection, Feature (computer vision), Pyramid, 0202 electrical engineering, electronic engineering, information engineering, Media Technology, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, Pyramid (image processing), Deconvolution, Electrical and Electronic Engineering, business
Abstract: Objects in unmanned aerial vehicle (UAV) images are generally small due to the high-photography altitude. Although many efforts have been made in object detection, how to accurately and quickly detect small objects is still one of the remaining open challenges. In this paper, we propose a feature fusion and scaling-based single shot detector (FS-SSD) for small object detection in the UAV images. The FS-SSD is an enhancement based on FSSD, a variety of the original single shot multibox detector (SSD). We add an extra scaling branch of the deconvolution module with an average pooling operation to form a feature pyramid. The original feature fusion branch is adjusted to be better suited to the small object detection task. The two feature pyramids generated by the deconvolution module and feature fusion module are utilized to make predictions together. In addition to the deep features learned by the FS-SSD, to further improve the detection accuracy, spatial context analysis is proposed to incorporate the object spatial relationships into object redetection. The interclass and intraclass distances between different object instances are computed as a spatial context, which proves effective for multiclass small object detection. Six experiments are conducted on the PASCAL VOC dataset and the two UAV image datasets. The experimental results demonstrate that the proposed method can achieve a comparable detection speed but an accuracy superior to those of the six state-of-the-art methods.
Published: 2020
Full Text: View/download PDF

41. Finding every car: a traffic surveillance multi-scale vehicle object detection method

Author: Ling-Qun Zuo, Rui-Sheng Jia, Qi-Chao Mao, and Hong-Mei Sun
Subjects: business.industry, Computer science, Scale (descriptive set theory), 02 engineering and technology, Object detection, Artificial Intelligence, Feature (computer vision), Pyramid, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, Pyramid (image processing), Layer (object-oriented design), business
Abstract: According to the problem that the multi-scale vehicle objects in traffic surveillance video are difficult to detect and the overlapping objects are prone to missed detection, an improved vehicle object detection method based on YOLOv3 was proposed. In order to extract feature more efficiently, we first use the inverted residuals technique to improve the convolutional layer of YOLOv3. To solve the multi-scale vehicle object detection problem, three spatial pyramid pooling(SPP) modules are added before each YOLO layer to obtain multi-scale information. In order to cope with the overlapping of vehicles in traffic videos, soft non maximum suppression (Soft-NMS) is used to replace non maximum suppression (NMS), thereby reducing the missing of predicted boxes due to vehicle overlaps. Our experiment results in the Car dataset and the KITTI dataset confirm that the proposed method achieves good detection results for vehicle objects of various scales in various scenes. Our method can meet the needs of practical applications better.
Published: 2020
Full Text: View/download PDF

42. Pyramid Constrained Self-Attention Network for Fast Video Salient Object Detection

Author: Ziqin Wang, Li-Juan Wang, Yun Liu, Shao-Ping Lu, Ming-Ming Cheng, and Yu-Chao Gu
Subjects: Computer science, business.industry, Self attention, 02 engineering and technology, General Medicine, Object motion, Salient object detection, 010501 environmental sciences, 01 natural sciences, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, 0105 earth and related environmental sciences
Abstract: Spatiotemporal information is essential for video salient object detection (VSOD) due to the highly attractive object motion for human's attention. Previous VSOD methods usually use Long Short-Term Memory (LSTM) or 3D ConvNet (C3D), which can only encode motion information through step-by-step propagation in the temporal domain. Recently, the non-local mechanism is proposed to capture long-range dependencies directly. However, it is not straightforward to apply the non-local mechanism into VSOD, because i) it fails to capture motion cues and tends to learn motion-independent global contexts; ii) its computation and memory costs are prohibitive for video dense prediction tasks such as VSOD. To address the above problems, we design a Constrained Self-Attention (CSA) operation to capture motion cues, based on the prior that objects always move in a continuous trajectory. We group a set of CSA operations in Pyramid structures (PCSA) to capture objects at various scales and speeds. Extensive experimental results demonstrate that our method outperforms previous state-of-the-art methods in both accuracy and speed (110 FPS on a single Titan Xp) on five challenge datasets. Our code is available at https://github.com/guyuchao/PyramidCSA.
Published: 2020
Full Text: View/download PDF

43. Pyramid Attention Aggregation Network for Semantic Segmentation of Surgical Instruments

Author: Zhen-Liang Ni, Gui-Bin Bian, Zeng-Guang Hou, Xiao-Liang Xie, Hua-Bin Chen, Guan'an Wang, and Xiao-Hu Zhou
Subjects: Upsampling, business.industry, Computer science, Aggregate (data warehouse), Pyramid, Computer vision, Segmentation, General Medicine, Pyramid (image processing), Artificial intelligence, business, Joint (audio engineering), Block (data storage)
Abstract: Semantic segmentation of surgical instruments plays a critical role in computer-assisted surgery. However, specular reflection and scale variation of instruments are likely to occur in the surgical environment, undesirably altering visual features of instruments, such as color and shape. These issues make semantic segmentation of surgical instruments more challenging. In this paper, a novel network, Pyramid Attention Aggregation Network, is proposed to aggregate multi-scale attentive features for surgical instruments. It contains two critical modules: Double Attention Module and Pyramid Upsampling Module. Specifically, the Double Attention Module includes two attention blocks (i.e., position attention block and channel attention block), which model semantic dependencies between positions and channels by capturing joint semantic information and global contexts, respectively. The attentive features generated by the Double Attention Module can distinguish target regions, contributing to solving the specular reflection issue. Moreover, the Pyramid Upsampling Module extracts local details and global contexts by aggregating multi-scale attentive features. It learns the shape and size features of surgical instruments in different receptive fields and thus addresses the scale variation issue. The proposed network achieves state-of-the-art performance on various datasets. It achieves a new record of 97.10% mean IOU on Cata7. Besides, it comes first in the MICCAI EndoVis Challenge 2017 with 9.90% increase on mean IOU.
Published: 2020
Full Text: View/download PDF

44. Dynamic attention network for semantic segmentation

Author: Fei Wu, Feng Chen, Xiao-Yuan Jing, Changhui Hu, Yimu Ji, and Qi Ge
Subjects: 0209 industrial biotechnology, business.industry, Computer science, Cognitive Neuroscience, Geometric transformation, 02 engineering and technology, Pascal (programming language), Computer Science Applications, 020901 industrial engineering & automation, Artificial Intelligence, Attention network, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Segmentation, Artificial intelligence, business, Encoder, computer, computer.programming_language
Abstract: Semantic segmentation networks usually utilize special pyramid structure after encoder or combine low-level and high-level feature maps in decoder to capture multi-scale context information, which we term them feature combination and feature connection, respectively. However, both such frameworks are less valid with the fixed geometric structure or unsuitable interim. In this paper, we advocate Dynamic Attention Network (DAN) to solve these problems. First, we design a Deformable Attention Pyramid (DAP) module to perform a self-adjustable descriptor of high-level output, which utilizes deformable function to model geometric transformation. With DAP, semantic information can be captured effectively. Second, we propose a Fusing Attention Interim (FAI) module to guide the back-propagation of long-short range information in each level of decoder. We evaluate DAN on the challenging PASCAL VOC 2012 and Cityscapes segmentation benchmarks and find that it achieves state-of-the-art results without post-processing. Our observation can be concluded that the flexible structure that possesses dynamic attention mechanism is beneficial to learn multi-scale context information.
Published: 2020
Full Text: View/download PDF

45. Windowed Bundle Adjustment Framework for Unsupervised Learning of Monocular Depth Estimation With U-Net Extension and Clip Loss

Author: Michael Kaess and Lipu Zhou
Subjects: Control and Optimization, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Biomedical Engineering, Context (language use), Bundle adjustment, 02 engineering and technology, 010501 environmental sciences, Overfitting, 01 natural sciences, Artificial Intelligence, Depth map, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, Segmentation, Computer vision, Pyramid (image processing), Image resolution, 0105 earth and related environmental sciences, business.industry, Mechanical Engineering, Superresolution, Computer Science Applications, Human-Computer Interaction, Control and Systems Engineering, Unsupervised learning, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, business
Abstract: This letter presents a self-supervised framework for learning depth from monocular videos. In particular, the main contributions of this letter include: (1) We present a windowed bundle adjustment framework to train the network. Compared to most previous works that only consider constraints from consecutive frames, our framework increases the camera baseline and introduces more constraints to avoid overfitting. (2) We extend the widely used U-Net architecture by applying a Spatial Pyramid Net (SPN) and a Super Resolution Net (SRN). The SPN fuses information from an image spatial pyramid for the depth estimation, which addresses the context information attenuation problem of the original U-Net. The SRN learns to estimate a high resolution depth map from a low resolution image, which can benefit the recovery of details. (3) We adopt a clip loss function to handle moving objects and occlusions that were solved by designing complicated network or requiring extra information (such as segmentation mask [1] ) in previous works. Experimental results show that our algorithm provides state-of-the-art results on the KITTI benchmark.
Published: 2020
Full Text: View/download PDF

46. FAOD-Net: A Fast AOD-Net for Dehazing Single Image

Author: Dengyin Zhang, Chao Zhou, and Wen Qian
Subjects: Article Subject, business.industry, Computer science, Image quality, General Mathematics, Aggregate (data warehouse), General Engineering, 020207 software engineering, Context (language use), 02 engineering and technology, Engineering (General). Civil engineering (General), Image (mathematics), Test set, Pyramid, QA1-939, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, Pyramid (image processing), TA1-2040, business, Mathematics, Network model
Abstract: In this paper, we present an extremely computation-efficient model called FAOD-Net for dehazing single image. FAOD-Net is based on a streamlined architecture that uses depthwise separable convolutions to build lightweight deep neural networks. Moreover, the pyramid pooling module is added in FAOD-Net to aggregate the context information of different regions of the image, thereby improving the ability of the network model to obtain the global information of the foggy image. To get the best FAOD-Net, we use the RESIDE training set to train our proposed model. In addition, we have carried out extensive experiments on the RESIDE test set. We use full-reference and no-reference image quality evaluation indicators to measure the effect of dehazing. Experimental results show that the proposed algorithm has satisfactory results in terms of defogging quality and speed.
Published: 2020
Full Text: View/download PDF

47. ECAP-YOLO: Efficient Channel Attention Pyramid YOLO for Small Object Detection in Aerial Image

Author: Sungho Kim, Munhyeong Kim, and Jongmin Jeong
Subjects: Channel (digital image), Computer science, business.industry, Science, object detection, Object detection, small target detection, Pyramid, General Earth and Planetary Sciences, Computer vision, Artificial intelligence, business, channel attention, Aerial image
Abstract: Detection of small targets in aerial images is still a difficult problem due to the low resolution and background-like targets. With the recent development of object detection technology, efficient and high-performance detector techniques have been developed. Among them, the YOLO series is a representative method of object detection that is light and has good performance. In this paper, we propose a method to improve the performance of small target detection in aerial images by modifying YOLOv5. The backbone is was modified by applying the first efficient channel attention module, and the channel attention pyramid method was proposed. We propose an efficient channel attention pyramid YOLO (ECAP-YOLO). Second, in order to optimize the detection of small objects, we eliminated the module for detecting large objects and added a detect layer to find smaller objects, reducing the computing power used for detecting small targets and improving the detection rate. Finally, we use transposed convolution instead of upsampling. Comparing the method proposed in this paper to the original YOLOv5, the performance improvement for the mAP was 6.9% when using the VEDAI dataset, 5.4% when detecting small cars in the xView dataset, 2.7% when detecting small vehicle and small ship classes from the DOTA dataset, and approximately 2.4% when finding small cars in the Arirang dataset.
Published: 2021
Full Text: View/download PDF

48. Siamese network with bidirectional feature pyramid for small target tracking

Author: Xun Duan, Huiyun Long, Guangqian Kong, Yun Wu, and Lei Liu
Subjects: Feature (computer vision), Computer science, business.industry, Pyramid, Computer vision, Artificial intelligence, Electrical and Electronic Engineering, Small target, Tracking (particle physics), business, Atomic and Molecular Physics, and Optics, Computer Science Applications
Published: 2021
Full Text: View/download PDF

49. DCPNet: A Densely Connected Pyramid Network for Monocular Depth Estimation

Author: Zhitong Lai, Zhiguo Wu, Nannan Ding, Linjian Sun, Yanjie Wang, and Rui Tian
Subjects: Multiple stages, pyramid networks, Feature fusion, Monocular, Computer science, business.industry, Chemical technology, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, monocular depth estimation, TP1-1185, Biochemistry, Atomic and Molecular Physics, and Optics, Article, Analytical Chemistry, dense connection, Pyramid, Fuse (electrical), Benchmark (computing), Image Processing, Computer-Assisted, feature fusion, Computer vision, Artificial intelligence, Electrical and Electronic Engineering, business, Instrumentation
Abstract: Pyramid architecture is a useful strategy to fuse multi-scale features in deep monocular depth estimation approaches. However, most pyramid networks fuse features only within the adjacent stages in a pyramid structure. To take full advantage of the pyramid structure, inspired by the success of DenseNet, this paper presents DCPNet, a densely connected pyramid network that fuses multi-scale features from multiple stages of the pyramid structure. DCPNet not only performs feature fusion between the adjacent stages, but also non-adjacent stages. To fuse these features, we design a simple and effective dense connection module (DCM). In addition, we offer a new consideration of the common upscale operation in our approach. We believe DCPNet offers a more efficient way to fuse features from multiple scales in a pyramid-like network. We perform extensive experiments using both outdoor and indoor benchmark datasets (i.e., the KITTI and the NYU Depth V2 datasets) and DCPNet achieves the state-of-the-art results.
Published: 2021
Full Text: View/download PDF

50. A terrain segmentation method based on pyramid scene parsing-mobile network for outdoor robots

Author: Sergey A. Chepinskiy, Botao Zhang, Rong Xiong, and Tao Hong
Subjects: Parsing, Visual perception, TK7800-8360, Computer science, business.industry, Terrain segmentation, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Mobile robot, QA75.5-76.95, computer.software_genre, Computer Science Applications, Artificial Intelligence, Electronic computers. Computer science, Pyramid, Cellular network, Robot, Computer vision, Artificial intelligence, Electronics, business, computer, Software, ComputingMethodologies_COMPUTERGRAPHICS
Abstract: Terrain segmentation is of great significance to robot navigation, cognition, and map building. However, the existing vision-based methods are challenging to meet the high-accuracy and real-time performance. A terrain segmentation method with a novel lightweight pyramid scene parsing mobile network is proposed for terrain segmentation in robot navigation. It combines the feature extraction structure of MobileNet and the encoding path of pyramid scene parsing network. The depthwise separable convolution, the spatial pyramid pooling, and the feature fusion are employed to reduce the onboard computing time of pyramid scene parsing mobile network. A unique data set called Hangzhou Dianzi University Terrain Dataset is constructed for terrain segmentation, which contains more than 4000 images from 10 different scenes. The data set was collected from a robot’s perspective to make it more suitable for robotic applications. Experimental results show that the proposed method has high-accuracy and real-time performance on the onboard computer. Moreover, its real-time performance is better than most state-of-the-art methods for terrain segmentation.
Published: 2021

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

1,144 results on '"pyramid"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources