177 results on '"Bovik AC"'
Search Results
2. 3D-PSSIM: Projective Structural Similarity for 3D Mesh Quality Assessment Robust to Topological Irregularities.
- Author
-
Lee S, Kang J, Lee S, Lin W, and Bovik AC
- Abstract
Despite acceleration in the use of 3D meshes, it is difficult to find effective mesh quality assessment algorithms that can produce predictions highly correlated with human subjective opinions. Defining mesh quality features is challenging due to the irregular topology of meshes, which are defined on vertices and triangles. To address this, we propose a novel 3D projective structural similarity index ( 3D- PSSIM) for meshes that is robust to differences in mesh topology. We address topological differences between meshes by introducing multi-view and multi-layer projections that can densely represent the mesh textures and geometrical shapes irrespective of mesh topology. It also addresses occlusion problems that occur during projection. We propose visual sensitivity weights that capture the perceptual sensitivity to the degree of mesh surface curvature. 3D- PSSIM computes perceptual quality predictions by aggregating quality-aware features that are computed in multiple projective spaces onto the mesh domain, rather than on 2D spaces. This allows 3D- PSSIM to determine which parts of a mesh surface are distorted by geometric or color impairments. Experimental results show that 3D- PSSIM can predict mesh quality with high correlation against human subjective judgments, across the presence of noise, even when there are large topological differences, outperforming existing mesh quality assessment models.
- Published
- 2024
- Full Text
- View/download PDF
3. HDR or SDR? A Subjective and Objective Study of Scaled and Compressed Videos.
- Author
-
Ebenezer JP, Shang Z, Chen Y, Wu Y, Wei H, Sethuraman S, and Bovik AC
- Abstract
We conducted a large-scale study of human perceptual quality judgments of High Dynamic Range (HDR) and Standard Dynamic Range (SDR) videos subjected to scaling and compression levels and viewed on three different display devices. While conventional expectations are that HDR quality is better than SDR quality, we have found subject preference of HDR versus SDR depends heavily on the display device, as well as on resolution scaling and bitrate. To study this question, we collected more than 23,000 quality ratings from 67 volunteers who watched 356 videos on OLED, QLED, and LCD televisions, and among many other findings, observed that HDR videos were often rated as lower quality than SDR videos at lower bitrates, particularly when viewed on LCD and QLED displays. Since it is of interest to be able to measure the quality of videos under these scenarios, e.g. to inform decisions regarding scaling, compression, and SDR vs HDR, we tested several well-known full-reference and no-reference video quality models on the new database. Towards advancing progress on this problem, we also developed a novel no-reference model called HDRPatchMAX, that uses a contrast-based analysis of classical and bit-depth features to predict quality more accurately than existing metrics.
- Published
- 2024
- Full Text
- View/download PDF
4. Convex Hull Prediction for Adaptive Video Streaming by Recurrent Learning.
- Author
-
Paul S, Norkin A, and Bovik AC
- Abstract
Adaptive video streaming relies on the construction of efficient bitrate ladders to deliver the best possible visual quality to viewers under bandwidth constraints. The traditional method of content dependent bitrate ladder selection requires a video shot to be pre-encoded with multiple encoding parameters to find the optimal operating points given by the convex hull of the resulting rate-quality curves. However, this pre-encoding step is equivalent to an exhaustive search process over the space of possible encoding parameters, which causes significant overhead in terms of both computation and time expenditure. To reduce this overhead, we propose a deep learning based method of content aware convex hull prediction. We employ a recurrent convolutional network (RCN) to implicitly analyze the spatiotemporal complexity of video shots in order to predict their convex hulls. A two-step transfer learning scheme is adopted to train our proposed RCN-Hull model, which ensures sufficient content diversity to analyze scene complexity, while also making it possible to capture the scene statistics of pristine source videos. Our experimental results reveal that our proposed model yields better approximations of the optimal convex hulls, and offers competitive time savings as compared to existing approaches. On average, the pre-encoding time was reduced by 53.8% by our method, while the average Bjøntegaard delta bitrate (BD-rate) of the predicted convex hulls against ground truth was 0.26%, and the mean absolute deviation of the BD-rate distribution was 0.57%.
- Published
- 2024
- Full Text
- View/download PDF
5. Subjective Quality Assessment of Compressed Tone-Mapped High Dynamic Range Videos.
- Author
-
Venkataramanan AK and Bovik AC
- Abstract
High Dynamic Range (HDR) videos are able to represent wider ranges of contrasts and colors than Standard Dynamic Range (SDR) videos, giving more vivid experiences. Due to this, HDR videos are expected to grow into the dominant video modality of the future. However, HDR videos are incompatible with existing SDR displays, which form the majority of affordable consumer displays on the market. Because of this, HDR videos must be processed by tone-mapping them to reduced bit-depths to service a broad swath of SDR-limited video consumers. Here, we analyze the impact of tone-mapping operators on the visual quality of streaming HDR videos. To this end, we built the first large-scale subjectively annotated open-source database of compressed tone-mapped HDR videos, containing 15,000 tone-mapped sequences derived from 40 unique HDR source contents. The videos in the database were labeled with more than 750,000 subjective quality annotations, collected from more than 1,600 unique human observers. We demonstrate the usefulness of the new subjective database by benchmarking objective models of visual quality on it. We envision that the new LIVE Tone-Mapped HDR (LIVE-TMHDR) database will enable significant progress on HDR video tone mapping and quality assessment in the future. To this end, we make the database freely available to the community at https://live.ece.utexas.edu/research/LIVE_TMHDR/index.html.
- Published
- 2024
- Full Text
- View/download PDF
6. A Study of Subjective and Objective Quality Assessment of HDR Videos.
- Author
-
Shang Z, Ebenezer JP, Venkataramanan AK, Wu Y, Wei H, Sethuraman S, and Bovik AC
- Abstract
As compared to standard dynamic range (SDR) videos, high dynamic range (HDR) content is able to represent and display much wider and more accurate ranges of brightness and color, leading to more engaging and enjoyable visual experiences. HDR also implies increases in data volume, further challenging existing limits on bandwidth consumption and on the quality of delivered content. Perceptual quality models are used to monitor and control the compression of streamed SDR content. A similar strategy should be useful for HDR content, yet there has been limited work on building HDR video quality assessment (VQA) algorithms. One reason for this is a scarcity of high-quality HDR VQA databases representative of contemporary HDR standards. Towards filling this gap, we created the first publicly available HDR VQA database dedicated to HDR10 videos, called the Laboratory for Image and Video Engineering (LIVE) HDR Database. It comprises 310 videos from 31 distinct source sequences processed by ten different compression and resolution combinations, simulating bitrate ladders used by the streaming industry. We used this data to conduct a subjective quality study, gathering more than 20,000 human quality judgments under two different illumination conditions. To demonstrate the usefulness of this new psychometric data resource, we also designed a new framework for creating HDR quality sensitive features, using a nonlinear transform to emphasize distortions occurring in spatial portions of videos that are enhanced by HDR, e.g., having darker blacks and brighter whites. We apply this new method, which we call HDRMAX, to modify the widely-deployed Video Multimethod Assessment Fusion (VMAF) model. We show that VMAF+HDRMAX provides significantly elevated performance on both HDR and SDR videos, exceeding prior state-of-the-art model performance. The database is now accessible at: https://live.ece.utexas.edu/research/LIVEHDR/LIVEHDR_index.html. The model will be made available at a later date at: https://live.ece.utexas.edu//research/Quality/index_algorithms.htm.
- Published
- 2024
- Full Text
- View/download PDF
7. Dual-Stream Complex-Valued Convolutional Network for Authentic Dehazed Image Quality Assessment.
- Author
-
Guan T, Li C, Zheng Y, Wu X, and Bovik AC
- Abstract
Effectively evaluating the perceptual quality of dehazed images remains an under-explored research issue. In this paper, we propose a no-reference complex-valued convolutional neural network (CV-CNN) model to conduct automatic dehazed image quality evaluation. Specifically, a novel CV-CNN is employed that exploits the advantages of complex-valued representations, achieving better generalization capability on perceptual feature learning than real-valued ones. To learn more discriminative features to analyze the perceptual quality of dehazed images, we design a dual-stream CV-CNN architecture. The dual-stream model comprises a distortion-sensitive stream that operates on the dehazed RGB image, and a haze-aware stream on a novel dark channel difference image. The distortion-sensitive stream accounts for perceptual distortion artifacts, while the haze-aware stream addresses the possible presence of residual haze. Experimental results on three publicly available dehazed image quality assessment (DQA) databases demonstrate the effectiveness and generalization of our proposed CV-CNN DQA model as compared to state-of-the-art no-reference image quality assessment algorithms.
- Published
- 2024
- Full Text
- View/download PDF
8. Subjective and Objective Quality Assessment of Rendered Human Avatar Videos in Virtual Reality.
- Author
-
Chen YC, Saha A, Chapiro A, Hane C, Bazin JC, Qiu B, Zanetti S, Katsavounidis I, and Bovik AC
- Subjects
- Humans, Avatar, Video Recording methods, Virtual Reality, Algorithms, Image Processing, Computer-Assisted methods
- Abstract
We study the visual quality judgments of human subjects on digital human avatars (sometimes referred to as "holograms" in the parlance of virtual reality [VR] and augmented reality [AR] systems) that have been subjected to distortions. We also study the ability of video quality models to predict human judgments. As streaming human avatar videos in VR or AR become increasingly common, the need for more advanced human avatar video compression protocols will be required to address the tradeoffs between faithfully transmitting high-quality visual representations while adjusting to changeable bandwidth scenarios. During transmission over the internet, the perceived quality of compressed human avatar videos can be severely impaired by visual artifacts. To optimize trade-offs between perceptual quality and data volume in practical workflows, video quality assessment (VQA) models are essential tools. However, there are very few VQA algorithms developed specifically to analyze human body avatar videos, due, at least in part, to the dearth of appropriate and comprehensive datasets of adequate size. Towards filling this gap, we introduce the LIVE-Meta Rendered Human Avatar VQA Database, which contains 720 human avatar videos processed using 20 different combinations of encoding parameters, labeled by corresponding human perceptual quality judgments that were collected in six degrees of freedom VR headsets. To demonstrate the usefulness of this new and unique video resource, we use it to study and compare the performances of a variety of state-of-the-art Full Reference and No Reference video quality prediction models, including a new model called HoloQA. As a service to the research community, we publicly releases the metadata of the new database at https://live.ece.utexas.edu/research/LIVE-Meta-rendered-human-avatar/index.html.
- Published
- 2024
- Full Text
- View/download PDF
9. One Transform To Compute Them All: Efficient Fusion-Based Full-Reference Video Quality Assessment.
- Author
-
Venkataramanan AK, Stejerean C, Katsavounidis I, and Bovik AC
- Abstract
The Visual Multimethod Assessment Fusion (VMAF) algorithm has recently emerged as a state-of-the-art approach to video quality prediction, that now pervades the streaming and social media industry. However, since VMAF requires the evaluation of a heterogeneous set of quality models, it is computationally expensive. Given other advances in hardware-accelerated encoding, quality assessment is emerging as a significant bottleneck in video compression pipelines. Towards alleviating this burden, we propose a novel Fusion of Unified Quality Evaluators (FUNQUE) framework, by enabling computation sharing and by using a transform that is sensitive to visual perception to boost accuracy. Further, we expand the FUNQUE framework to define a collection of improved low-complexity fused-feature models that advance the state-of-the-art of video quality performance with respect to both accuracy, by 4.2% to 5.3%, and computational efficiency, by factors of 3.8 to 11 times!.
- Published
- 2023
- Full Text
- View/download PDF
10. Machine Learning of Physiologic Waveforms and Electronic Health Record Data: A Large Perioperative Data Set of High-Fidelity Physiologic Waveforms.
- Author
-
Kim S, Kwon S, Rudas A, Pal R, Markey MK, Bovik AC, and Cannesson M
- Subjects
- Humans, Clinical Relevance, Electronic Health Records, Machine Learning
- Abstract
Perioperative morbidity and mortality are significantly associated with both static and dynamic perioperative factors. The studies investigating static perioperative factors have been reported; however, there are a limited number of previous studies and data sets analyzing dynamic perioperative factors, including physiologic waveforms, despite its clinical importance. To fill the gap, the authors introduce a novel large size perioperative data set: Machine Learning Of physiologic waveforms and electronic health Record Data (MLORD) data set. They also provide a concise tutorial on machine learning to illustrate predictive models trained on complex and diverse structures in the MLORD data set., (Copyright © 2023 Elsevier Inc. All rights reserved.)
- Published
- 2023
- Full Text
- View/download PDF
11. CONVIQT: Contrastive Video Quality Estimator.
- Author
-
Madhusudana PC, Birkbeck N, Wang Y, Adsumilli B, and Bovik AC
- Abstract
Perceptual video quality assessment (VQA) is an integral component of many streaming and video sharing platforms. Here we consider the problem of learning perceptually relevant video quality representations in a self-supervised manner. Distortion type identification and degradation level determination is employed as an auxiliary task to train a deep learning model containing a deep Convolutional Neural Network (CNN) that extracts spatial features, as well as a recurrent unit that captures temporal information. The model is trained using a contrastive loss and we therefore refer to this training framework and resulting model as CONtrastive VIdeo Quality EstimaTor (CONVIQT). During testing, the weights of the trained model are frozen, and a linear regressor maps the learned features to quality scores in a no-reference (NR) setting. We conduct comprehensive evaluations of the proposed model against leading algorithms on multiple VQA databases containing wide ranges of spatial and temporal distortions. We analyze the correlations between model predictions and ground-truth quality ratings, and show that CONVIQT achieves competitive performance when compared to state-of-the-art NR-VQA models, even though it is not trained on those databases. Our ablation experiments demonstrate that the learned representations are highly robust and generalize well across synthetic and realistic distortions. Our results indicate that compelling representations with perceptual bearing can be obtained using self-supervised learning.
- Published
- 2023
- Full Text
- View/download PDF
12. Helping Visually Impaired People Take Better Quality Pictures.
- Author
-
Mandal M, Ghadiyaram D, Gurari D, and Bovik AC
- Subjects
- Humans, Color Perception, Visual Acuity, Image Processing, Computer-Assisted methods, Semantics, Persons with Visual Disabilities
- Abstract
Perception-based image analysis technologies can be used to help visually impaired people take better quality pictures by providing automated guidance, thereby empowering them to interact more confidently on social media. The photographs taken by visually impaired users often suffer from one or both of two kinds of quality issues: technical quality (distortions), and semantic quality, such as framing and aesthetic composition. Here we develop tools to help them minimize occurrences of common technical distortions, such as blur, poor exposure, and noise. We do not address the complementary problems of semantic quality, leaving that aspect for future work. The problem of assessing, and providing actionable feedback on the technical quality of pictures captured by visually impaired users is hard enough, owing to the severe, commingled distortions that often occur. To advance progress on the problem of analyzing and measuring the technical quality of visually impaired user-generated content (VI-UGC), we built a very large and unique subjective image quality and distortion dataset. This new perceptual resource, which we call the LIVE-Meta VI-UGC Database, contains 40K real-world distorted VI-UGC images and 40K patches, on which we recorded 2.7M human perceptual quality judgments and 2.7M distortion labels. Using this psychometric resource we also created an automatic limited vision picture quality and distortion predictor that learns local-to-global spatial quality relationships, achieving state-of-the-art prediction performance on VI-UGC pictures, significantly outperforming existing picture quality models on this unique class of distorted picture data. We also created a prototype feedback system that helps to guide users to mitigate quality issues and take better quality pictures, by creating a multi-task learning framework. The dataset and models can be accessed at: https://github.com/mandal-cv/visimpaired.
- Published
- 2023
- Full Text
- View/download PDF
13. Study of Subjective and Objective Quality Assessment of Mobile Cloud Gaming Videos.
- Author
-
Saha A, Chen YC, Davis C, Qiu B, Wang X, Gowda R, Katsavounidis I, and Bovik AC
- Abstract
We present the outcomes of a recent large-scale subjective study of Mobile Cloud Gaming Video Quality Assessment (MCG-VQA) on a diverse set of gaming videos. Rapid advancements in cloud services, faster video encoding technologies, and increased access to high-speed, low-latency wireless internet have all contributed to the exponential growth of the Mobile Cloud Gaming industry. Consequently, the development of methods to assess the quality of real-time video feeds to end-users of cloud gaming platforms has become increasingly important. However, due to the lack of a large-scale public Mobile Cloud Gaming Video dataset containing a diverse set of distorted videos with corresponding subjective scores, there has been limited work on the development of MCG-VQA models. Towards accelerating progress towards these goals, we created a new dataset, named the LIVE-Meta Mobile Cloud Gaming (LIVE-Meta-MCG) video quality database, composed of 600 landscape and portrait gaming videos, on which we collected 14,400 subjective quality ratings from an in-lab subjective study. Additionally, to demonstrate the usefulness of the new resource, we benchmarked multiple state-of-the-art VQA algorithms on the database. The new database will be made publicly available on our website: https://live.ece.utexas.edu/research/LIVE-Meta-Mobile-Cloud-Gaming/index.html.
- Published
- 2023
- Full Text
- View/download PDF
14. Self-Supervised Learning of Perceptually Optimized Block Motion Estimates for Video Compression.
- Author
-
Paul S, Norkin A, and Bovik AC
- Abstract
Block based motion estimation is integral to inter prediction processes performed in hybrid video codecs. Prevalent block matching based methods that are used to compute block motion vectors (MVs) rely on computationally intensive search procedures. They also suffer from the aperture problem, which tends to worsen as the block size is reduced. Moreover, the block matching criteria used in typical codecs do not account for the resulting levels of perceptual quality of the motion compensated pictures that are created upon decoding. Towards achieving the elusive goal of perceptually optimized motion estimation, we propose a search-free block motion estimation framework using a multi-stage convolutional neural network, which is able to conduct motion estimation on multiple block sizes simultaneously, using a triplet of frames as input. This composite block translation network (CBT-Net) is trained in a self-supervised manner on a large database that we created from publicly available uncompressed video content. We deploy the multi-scale structural similarity (MS-SSIM) loss function to optimize the perceptual quality of the motion compensated predicted frames. Our experimental results highlight the computational efficiency of our proposed model relative to conventional block matching based motion estimation algorithms, for comparable prediction errors. Further, when used to perform inter prediction in AV1, the MV predictions of the perceptually optimized model result in average Bjontegaard-delta rate (BD-rate) improvements of -1.73% and -1.31% with respect to the MS-SSIM and Video Multi-Method Assessment Fusion (VMAF) quality metrics, respectively, as compared to the block matching based motion estimation system employed in the SVT-AV1 encoder.
- Published
- 2022
- Full Text
- View/download PDF
15. Video Quality Model of Compression, Resolution and Frame Rate Adaptation Based on Space-Time Regularities.
- Author
-
Lee DY, Kim J, Ko H, and Bovik AC
- Abstract
Being able to accurately predict the visual quality of videos subjected to various combinations of dimension reduction protocols is of high interest to the streaming video industry, given rapid increases in frame resolutions and frame rates. In this direction, we have developed a video quality predictor that is sensitive to spatial, temporal, or space-time subsampling combined with compression. Our predictor is based on new models of space-time natural video statistics (NVS). Specifically, we model the statistics of divisively normalized difference between neighboring frames that are relatively displaced. In an extensive empirical study, we found that those paths of space-time displaced frame differences that provide maximal regularity against our NVS model generally align best with motion trajectories. Motivated by this, we built a new video quality prediction engine that extracts NVS features that represent how space-time directional regularities are disturbed by space-time distortions. Based on parametric models of these regularities, we compute features that are used to train a regressor that can accurately predict perceptual quality. As a stringent test of the new model, we apply it to the difficult problem of predicting the quality of videos subjected not only to compression, but also to downsampling in space and/or time. We show that the new quality model achieves state-of-the-art (SOTA) prediction performance on the new ETRI-LIVE Space-Time Subsampled Video Quality (STSVQ) and also on the AVT-VQDB-UHD-1 database.
- Published
- 2022
- Full Text
- View/download PDF
16. FOVQA: Blind Foveated Video Quality Assessment.
- Author
-
Jin Y, Patney A, Webb R, and Bovik AC
- Subjects
- Attention, Normal Distribution, Video Recording methods, Algorithms, Data Compression
- Abstract
Previous blind or No Reference (NR) Image / video quality assessment (IQA/VQA) models largely rely on features drawn from natural scene statistics (NSS), but under the assumption that the image statistics are stationary in the spatial domain. Several of these models are quite successful on standard pictures. However, in Virtual Reality (VR) applications, foveated video compression is regaining attention, and the concept of space-variant quality assessment is of interest, given the availability of increasingly high spatial and temporal resolution contents and practical ways of measuring gaze direction. Distortions from foveated video compression increase with increased eccentricity, implying that the natural scene statistics are space-variant. Towards advancing the development of foveated compression / streaming algorithms, we have devised a no-reference (NR) foveated video quality assessment model, called FOVQA, which is based on new models of space-variant natural scene statistics (NSS) and natural video statistics (NVS). Specifically, we deploy a space-variant generalized Gaussian distribution (SV-GGD) model and a space-variant asynchronous generalized Gaussian distribution (SV-AGGD) model of mean subtracted contrast normalized (MSCN) coefficients and products of neighboring MSCN coefficients, respectively. We devise a foveated video quality predictor that extracts radial basis features, and other features that capture perceptually annoying rapid quality fall-offs. We find that FOVQA achieves state-of-the-art (SOTA) performance on the new 2D LIVE-FBT-FCVR database, as compared with other leading Foveated IQA / VQA models. we have made our implementation of FOVQA available at: https://live.ece.utexas.edu/research/Quality/FOVQA.zip.
- Published
- 2022
- Full Text
- View/download PDF
17. A Subjective and Objective Study of Space-Time Subsampled Video Quality.
- Author
-
Lee DY, Paul S, Bampis CG, Ko H, Kim J, Jeong SY, Homan B, and Bovik AC
- Abstract
Video dimensions are continuously increasing to provide more realistic and immersive experiences to global streaming and social media viewers. However, increments in video parameters such as spatial resolution and frame rate are inevitably associated with larger data volumes. Transmitting increasingly voluminous videos through limited bandwidth networks in a perceptually optimal way is a current challenge affecting billions of viewers. One recent practice adopted by video service providers is space-time resolution adaptation in conjunction with video compression. Consequently, it is important to understand how different levels of space-time subsampling and compression affect the perceptual quality of videos. Towards making progress in this direction, we constructed a large new resource, called the ETRI-LIVE Space-Time Subsampled Video Quality (ETRI-LIVE STSVQ) database, containing 437 videos generated by applying various levels of combined space-time subsampling and video compression on 15 diverse video contents. We also conducted a large-scale human study on the new dataset, collecting about 15,000 subjective judgments of video quality. We provide a rate-distortion analysis of the collected subjective scores, enabling us to investigate the perceptual impact of space-time subsampling at different bit rates. We also evaluated and compare the performance of leading video quality models on the new database. The new ETRI-LIVE STSVQ database is being made freely available at (https://live.ece.utexas.edu/research/ETRI-LIVE_STSVQ/index.html).
- Published
- 2022
- Full Text
- View/download PDF
18. Study of the Subjective and Objective Quality of High Motion Live Streaming Videos.
- Author
-
Shang Z, Ebenezer JP, Wu Y, Wei H, Sethuraman S, and Bovik AC
- Subjects
- Databases, Factual, Humans, Motion, Video Recording, Algorithms, Artifacts
- Abstract
Video livestreaming is gaining prevalence among video streaming service s, especially for the delivery of live, high motion content such as sport ing events. The quality of the se livestreaming videos can be adversely affected by any of a wide variety of events, including capture artifacts, and distortions incurred during coding and transmission. High motion content can cause or exacerbate many kinds of distortion, such as motion blur and stutter. Because of this, the development of objective Video Quality Assessment (VQA) algorithms that can predict the perceptual quality of high motion, live streamed videos is greatly desired. Important resources for developing these algorithms are appropriate databases that exemplify the kinds of live streaming video distortions encountered in practice. Towards making progress in this direction, we built a video quality database specifically designed for live streaming VQA research. The new video database is called the Laboratory for Image and Video Engineering (LIVE) Livestream Database. The LIVE Livestream Database includes 315 videos of 45 source sequences from 33 original contents impaired by 6 types of distortions. We also performed a subjective quality study using the new database, whereby more than 12,000 human opinions were gathered from 40 subjects. We demonstrate the usefulness of the new resource by performing a holistic evaluation of the performance of current state-of-the-art (SOTA) VQA models. We envision that researchers will find the dataset to be useful for the development, testing, and comparison of future VQA models. The LIVE Livestream database is being made publicly available for these purposes at https://live.ece. utexas.edu/research/LIVE_APV_Study/apv_index.html.
- Published
- 2022
- Full Text
- View/download PDF
19. Image Quality Assessment Using Contrastive Learning.
- Author
-
Madhusudana PC, Birkbeck N, Wang Y, Adsumilli B, and Bovik AC
- Abstract
We consider the problem of obtaining image quality representations in a self-supervised manner. We use prediction of distortion type and degree as an auxiliary task to learn features from an unlabeled image dataset containing a mixture of synthetic and realistic distortions. We then train a deep Convolutional Neural Network (CNN) using a contrastive pairwise objective to solve the auxiliary problem. We refer to the proposed training framework and resulting deep IQA model as the CONTRastive Image QUality Evaluator (CONTRIQUE). During evaluation, the CNN weights are frozen and a linear regressor maps the learned representations to quality scores in a No-Reference (NR) setting. We show through extensive experiments that CONTRIQUE achieves competitive performance when compared to state-of-the-art NR image quality models, even without any additional fine-tuning of the CNN backbone. The learned representations are highly robust and generalize well across images afflicted by either synthetic or authentic distortions. Our results suggest that powerful quality representations with perceptual relevance can be obtained without requiring large labeled subjective image quality datasets. The implementations used in this paper are available at https://github.com/pavancm/CONTRIQUE.
- Published
- 2022
- Full Text
- View/download PDF
20. On the space-time statistics of motion pictures.
- Author
-
Lee DY, Ko H, Kim J, and Bovik AC
- Subjects
- Humans, Motion Perception physiology, Time Factors, Motion Pictures
- Abstract
It is well known that natural images possess statistical regularities that can be captured by bandpass decomposition and divisive normalization processes that approximate early neural processing in the human visual system. We expand on these studies and present new findings on the properties of space-time natural statistics that are inherent in motion pictures. Our model relies on the concept of temporal bandpass (e.g., lag) filtering in lateral geniculate nucleus (LGN) and area V1, which is similar to smoothed frame differencing of video frames. Specifically, we model the statistics of the differences between adjacent or neighboring video frames that have been slightly spatially displaced relative to one another. We find that when these space-time differences are further subjected to locally pooled divisive normalization, statistical regularities (or lack thereof) arise that depend on the local motion trajectory. We find that bandpass and divisively normalized frame differences that are displaced along the motion direction exhibit stronger statistical regularities than for other displacements. Conversely, the direction-dependent regularities of displaced frame differences can be used to estimate the image motion (optical flow) by finding the space-time displacement paths that best preserve statistical regularity.
- Published
- 2021
- Full Text
- View/download PDF
21. Towards Perceptually Optimized Adaptive Video Streaming-A Realistic Quality of Experience Database.
- Author
-
Bampis CG, Li Z, Katsavounidis I, Huang TY, Ekanadham C, and Bovik AC
- Abstract
Measuring Quality of Experience (QoE) and integrating these measurements into video streaming algorithms is a multi-faceted problem that fundamentally requires the design of comprehensive subjective QoE databases and objective QoE prediction models. To achieve this goal, we have recently designed the LIVE-NFLX-II database, a highly-realistic database which contains subjective QoE responses to various design dimensions, such as bitrate adaptation algorithms, network conditions and video content. Our database builds on recent advancements in content-adaptive encoding and incorporates actual network traces to capture realistic network variations on the client device. The new database focuses on low bandwidth conditions which are more challenging for bitrate adaptation algorithms, which often must navigate tradeoffs between rebuffering and video quality. Using our database, we study the effects of multiple streaming dimensions on user experience and evaluate video quality and quality of experience models and analyze their strengths and weaknesses. We believe that the tools introduced here will help inspire further progress on the development of perceptually-optimized client adaptation and video streaming strategies. The database is publicly available at http://live.ece.utexas.edu/research/LIVE_NFLX_II/live_nflx_plus.html.
- Published
- 2021
- Full Text
- View/download PDF
22. Predicting the Quality of Compressed Videos With Pre-Existing Distortions.
- Author
-
Yu X, Birkbeck N, Wang Y, Bampis CG, Adsumilli B, and Bovik AC
- Abstract
Because of the increasing ease of video capture, many millions of consumers create and upload large volumes of User-Generated-Content (UGC) videos to social and streaming media sites over the Internet. UGC videos are commonly captured by naive users having limited skills and imperfect techniques, and tend to be afflicted by mixtures of highly diverse in-capture distortions. These UGC videos are then often uploaded for sharing onto cloud servers, where they are further compressed for storage and transmission. Our paper tackles the highly practical problem of predicting the quality of compressed videos (perhaps during the process of compression, to help guide it), with only (possibly severely) distorted UGC videos as references. To address this problem, we have developed a novel Video Quality Assessment (VQA) framework that we call 1stepVQA (to distinguish it from two-step methods that we discuss). 1stepVQA overcomes limitations of Full-Reference, Reduced-Reference and No-Reference VQA models by exploiting the statistical regularities of both natural videos and distorted videos. We also describe a new dedicated video database, which was created by applying a realistic VMAF-Guided perceptual rate distortion optimization (RDO) criterion to create realistically compressed versions of UGC source videos, which typically have pre-existing distortions. We show that 1stepVQA is able to more accurately predict the quality of compressed videos, given imperfect reference videos, and outperforms other VQA models in this scenario.
- Published
- 2021
- Full Text
- View/download PDF
23. Subjective and Objective Quality Assessment of 2D and 3D Foveated Video Compression in Virtual Reality.
- Author
-
Jin Y, Chen M, Goodall T, Patney A, and Bovik AC
- Abstract
In Virtual Reality (VR), the requirements of much higher resolution and smooth viewing experiences under rapid and often real-time changes in viewing direction, leads to significant challenges in compression and communication. To reduce the stresses of very high bandwidth consumption, the concept of foveated video compression is being accorded renewed interest. By exploiting the space-variant property of retinal visual acuity, foveation has the potential to substantially reduce video resolution in the visual periphery, with hardly noticeable perceptual quality degradations. Accordingly, foveated image / video quality predictors are also becoming increasingly important, as a practical way to monitor and control future foveated compression algorithms. Towards advancing the development of foveated image / video quality assessment (FIQA / FVQA) algorithms, we have constructed 2D and (stereoscopic) 3D VR databases of foveated / compressed videos, and conducted a human study of perceptual quality on each database. Each database includes 10 reference videos and 180 foveated videos, which were processed by 3 levels of foveation on the reference videos. Foveation was applied by increasing compression with increased eccentricity. In the 2D study, each video was of resolution 7680×3840 and was viewed and quality-rated by 36 subjects, while in the 3D study, each video was of resolution 5376×5376 and rated by 34 subjects. Both studies were conducted on top of a foveated video player having low motion-to-photon latency (~50ms). We evaluated different objective image and video quality assessment algorithms, including both FIQA / FVQA algorithms and non-foveated algorithms, on our so called LIVE-Facebook Technologies Foveation-Compressed Virtual Reality (LIVE-FBT-FCVR) databases. We also present a statistical evaluation of the relative performances of these algorithms. The LIVE-FBT-FCVR databases have been made publicly available and can be accessed at https://live.ece.utexas.edu/research/LIVEFBTFCVR/index.html.
- Published
- 2021
- Full Text
- View/download PDF
24. UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content.
- Author
-
Tu Z, Wang Y, Birkbeck N, Adsumilli B, and Bovik AC
- Abstract
Recent years have witnessed an explosion of user-generated content (UGC) videos shared and streamed over the Internet, thanks to the evolution of affordable and reliable consumer capture devices, and the tremendous popularity of social media platforms. Accordingly, there is a great need for accurate video quality assessment (VQA) models for UGC/consumer videos to monitor, control, and optimize this vast content. Blind quality prediction of in-the-wild videos is quite challenging, since the quality degradations of UGC videos are unpredictable, complicated, and often commingled. Here we contribute to advancing the UGC-VQA problem by conducting a comprehensive evaluation of leading no-reference/blind VQA (BVQA) features and models on a fixed evaluation architecture, yielding new empirical insights on both subjective video quality studies and objective VQA model design. By employing a feature selection strategy on top of efficient BVQA models, we are able to extract 60 out of 763 statistical features used in existing methods to create a new fusion-based model, which we dub the VIDeo quality EVALuator (VIDEVAL), that effectively balances the trade-off between VQA performance and efficiency. Our experimental results show that VIDEVAL achieves state-of-the-art performance at considerably lower computational cost than other leading models. Our study protocol also defines a reliable benchmark for the UGC-VQA problem, which we believe will facilitate further research on deep learning-based VQA modeling, as well as perceptually-optimized efficient UGC video processing, transcoding, and streaming. To promote reproducible research and public evaluation, an implementation of VIDEVAL has been made available online: https://github.com/vztu/VIDEVAL.
- Published
- 2021
- Full Text
- View/download PDF
25. VR Sickness Versus VR Presence: A Statistical Prediction Model.
- Author
-
Kim W, Lee S, and Bovik AC
- Abstract
Although it is well-known that the negative effects of VR sickness, and the desirable sense of presence are important determinants of a user's immersive VR experience, there remains a lack of definitive research outcomes to enable the creation of methods to predict and/or optimize the trade-offs between them. Most VR sickness assessment (VRSA) and VR presence assessment (VRPA) studies reported to date have utilized simple image patterns as probes, hence their results are difficult to apply to the highly diverse contents encountered in general, real-world VR environments. To help fill this void, we have constructed a large, dedicated VR sickness/presence (VR-SP) database, which contains 100 VR videos with associated human subjective ratings. Using this new resource, we developed a statistical model of spatio-temporal and rotational frame difference maps to predict VR sickness. We also designed an exceptional motion feature, which is expressed as the correlation between an instantaneous change feature and averaged temporal features. By adding additional features (visual activity, content features) to capture the sense of presence, we use the new data resource to explore the relationship between VRSA and VRPA. We also show the aggregate VR-SP model is able to predict VR sickness with an accuracy of 90% and VR presence with an accuracy of 75% using the new VR-SP dataset.
- Published
- 2021
- Full Text
- View/download PDF
26. Perceptual Video Quality Prediction Emphasizing Chroma Distortions.
- Author
-
Chen LH, Bampis CG, Li Z, Sole J, and Bovik AC
- Abstract
Measuring the quality of digital videos viewed by human observers has become a common practice in numerous multimedia applications, such as adaptive video streaming, quality monitoring, and other digital TV applications. Here we explore a significant, yet relatively unexplored problem: measuring perceptual quality on videos arising from both luma and chroma distortions from compression. Toward investigating this problem, it is important to understand the kinds of chroma distortions that arise, how they relate to luma compression distortions, and how they can affect perceived quality. We designed and carried out a subjective experiment to measure subjective video quality on both luma and chroma distortions, introduced both in isolation as well as together. Specifically, the new subjective dataset comprises a total of 210 videos afflicted by distortions caused by varying levels of luma quantization commingled with different amounts of chroma quantization. The subjective scores were evaluated by 34 subjects in a controlled environmental setting. Using the newly collected subjective data, we were able to demonstrate important shortcomings of existing video quality models, especially in regards to chroma distortions. Further, we designed an objective video quality model which builds on existing video quality algorithms, by considering the fidelity of chroma channels in a principled way. We also found that this quality analysis implies that there is room for reducing bitrate consumption in modern video codecs by creatively increasing the compression factor on chroma channels. We believe that this work will both encourage further research in this direction, as well as advance progress on the ultimate goal of jointly optimizing luma and chroma compression in modern video encoders.
- Published
- 2021
- Full Text
- View/download PDF
27. ProxIQA: A Proxy Approach to Perceptual Optimization of Learned Image Compression.
- Author
-
Chen LH, Bampis CG, Li Z, Norkin A, and Bovik AC
- Subjects
- Algorithms, Humans, Video Recording, Data Compression methods, Image Processing, Computer-Assisted methods, Neural Networks, Computer
- Abstract
The use of l
p (p = 1,2) norms has largely dominated the measurement of loss in neural networks due to their simplicity and analytical properties. However, when used to assess the loss of visual information, these simple norms are not very consistent with human perception. Here, we describe a different "proximal" approach to optimize image analysis networks against quantitative perceptual models. Specifically, we construct a proxy network, broadly termed ProxIQA, which mimics the perceptual model while serving as a loss layer of the network. We experimentally demonstrate how this optimization framework can be applied to train an end-to-end optimized image compression network. By building on top of an existing deep image compression model, we are able to demonstrate a bitrate reduction of as much as 31% over MSE optimization, given a specified perceptual quality (VMAF) level.- Published
- 2021
- Full Text
- View/download PDF
28. ST-GREED: Space-Time Generalized Entropic Differences for Frame Rate Dependent Video Quality Prediction.
- Author
-
Madhusudana PC, Birkbeck N, Wang Y, Adsumilli B, and Bovik AC
- Abstract
We consider the problem of conducting frame rate dependent video quality assessment (VQA) on videos of diverse frame rates, including high frame rate (HFR) videos. More generally, we study how perceptual quality is affected by frame rate, and how frame rate and compression combine to affect perceived quality. We devise an objective VQA model called Space-Time GeneRalized Entropic Difference (GREED) which analyzes the statistics of spatial and temporal band-pass video coefficients. A generalized Gaussian distribution (GGD) is used to model band-pass responses, while entropy variations between reference and distorted videos under the GGD model are used to capture video quality variations arising from frame rate changes. The entropic differences are calculated across multiple temporal and spatial subbands, and merged using a learned regressor. We show through extensive experiments that GREED achieves state-of-the-art performance on the LIVE-YT-HFR Database when compared with existing VQA models. The features used in GREED are highly generalizable and obtain competitive performance even on standard, non-HFR VQA databases. The implementation of GREED has been made available online: https://github.com/pavancm/GREED.
- Published
- 2021
- Full Text
- View/download PDF
29. ChipQA: No-Reference Video Quality Prediction via Space-Time Chips.
- Author
-
Ebenezer JP, Shang Z, Wu Y, Wei H, Sethuraman S, and Bovik AC
- Abstract
We propose a new model for no-reference video quality assessment (VQA). Our approach uses a new idea of highly-localized space-time (ST) slices called Space-Time Chips (ST Chips). ST Chips are localized cuts of video data along directions that implicitly capture motion. We use perceptually-motivated bandpass and normalization models to first process the video data, and then select oriented ST Chips based on how closely they fit parametric models of natural video statistics. We show that the parameters that describe these statistics can be used to reliably predict the quality of videos, without the need for a reference video. The proposed method implicitly models ST video naturalness, and deviations from naturalness. We train and test our model on several large VQA databases, and show that our model achieves state-of-the-art performance at reduced cost, without requiring motion computation.
- Published
- 2021
- Full Text
- View/download PDF
30. Lewis Antigen Phenotype and Survival of Patients With Pancreatic Cancer.
- Author
-
Kwon S, Kim S, Giovannucci EL, Hidalgo M, Markey MK, Bovik AC, Kwon MJ, Kim KJ, Im H, Park JY, Bang S, Park SW, Song SY, and Chung MJ
- Subjects
- Aged, Antigens, Tumor-Associated, Carbohydrate analysis, Carcinoma, Pancreatic Ductal mortality, Carcinoma, Pancreatic Ductal pathology, Female, Humans, Male, Middle Aged, Neoplasm Staging, Pancreatic Neoplasms mortality, Pancreatic Neoplasms pathology, Phenotype, Prospective Studies, Registries, Risk Assessment, Risk Factors, Time Factors, Carcinoma, Pancreatic Ductal immunology, Lewis Blood Group Antigens analysis, Pancreatic Neoplasms immunology
- Abstract
Objectives: The association of Lewis antigen phenotype with survival of patients with pancreatic ductal adenocarcinoma was investigated., Methods: A total of 1187 patients diagnosed with pancreatic ductal adenocarcinoma were evaluated in a prospective cohort. Patients were classified into 3 different groups according to Lewis antigen phenotype: Lewis antigen (1) A positive [Le(a+b-)], (2) B positive [Le(a-b+)], and (3) negative [Le(a-b-)]. Risk of mortality was analyzed with Cox regression after adjusting for other predictors., Results: The risk of mortality increased in the order of Le(a+b-), Le(a-b+), and Le(a-b-) [reference; hazard ratio (HR), 1.27; 95% confidence interval (CI)], 1.03-1.57; P = 0.02; and HR, 1.65; 95% CI, 1.31-2.09; P < 0.001] after adjusting for other predictors. Among patients with serum carbohydrate antigen (CA) 19-9 lower than 37 U/mL, the association seemed more apparent (reference; HR, 1.50; 95% CI, 0.77-2.29; P = 0.22; and HR, 2.10; 95% CI, 1.10-4.02; P < 0.02)., Conclusions: The risk of mortality increased in the order of Le(a+b-), Le(a-b+), and Le(a-b-). The difference in prognosis according to the Lewis antigen phenotype was more pronounced in the low CA 19-9 group, which suggests that the Lewis antigen phenotype works as a biomarker predicting the prognosis of patients with pancreatic cancer with undetectable CA 19-9 level.
- Published
- 2020
- Full Text
- View/download PDF
31. Speeding up VP9 Intra Encoder with Hierarchical Deep Learning Based Partition Prediction.
- Author
-
Paul S, Norkin A, and Bovik AC
- Abstract
In VP9 video codec, the sizes of blocks are decided during encoding by recursively partitioning 64×64 superblocks using rate-distortion optimization (RDO). This process is computationally intensive because of the combinatorial search space of possible partitions of a superblock. Here, we propose a deep learning based alternative framework to predict the intra-mode superblock partitions in the form of a four-level partition tree, using a hierarchical fully convolutional network (H-FCN). We created a large database of VP9 superblocks and the corresponding partitions to train an H-FCN model, which was subsequently integrated with the VP9 encoder to reduce the intra-mode encoding time. The experimental results establish that our approach speeds up intra-mode encoding by 69.7% on average, at the expense of a 1.71% increase in the Bjøntegaard-Delta bitrate (BD-rate). While VP9 provides several built-in speed levels which are designed to provide faster encoding at the expense of decreased rate-distortion performance, we find that our model is able to outperform the fastest recommended speed level of the reference VP9 encoder for the good quality intra encoding configuration, in terms of both speedup and BD-rate.
- Published
- 2020
- Full Text
- View/download PDF
32. Day and Night-Time Dehazing by Local Airlight Estimation.
- Author
-
Ancuti C, Ancuti CO, De Vleeschouwer C, and Bovik AC
- Abstract
We introduce an effective fusion-based technique to enhance both day-time and night-time hazy scenes. When inverting the Koschmieder light transmission model, and by contrast with the common implementation of the popular dark-channel DehazeHeCVPR2009, we estimate the airlight on image patches and not on the entire image. Local airlight estimation is adopted because, under night-time conditions, the lighting generally arises from multiple localized artificial sources, and is thus intrinsically non-uniform. Selecting the sizes of the patches is, however, non-trivial. Small patches are desirable to achieve fine spatial adaptation to the atmospheric light, but large patches help improve the airlight estimation accuracy by increasing the possibility of capturing pixels with airlight appearance (due to severe haze). For this reason, multiple patch sizes are considered to generate several images, that are then merged together. The discrete Laplacian of the original image is provided as an additional input to the fusion process to reduce the glowing effect and to emphasize the finest image details. Similarly, for day-time scenes we apply the same principle but use a larger patch size. For each input, a set of weight maps are derived so as to assign higher weights to regions of high contrast, high saliency and small saturation. Finally the derived inputs and the normalized weight maps are blended in a multi-scale fashion using a Laplacian pyramid decomposition. Extensive experimental results demonstrate the effectiveness of our approach as compared with recent techniques, both in terms of computational efficiency and the quality of the outputs.
- Published
- 2020
- Full Text
- View/download PDF
33. Study of Subjective and Objective Quality Assessment of Audio-Visual Signals.
- Author
-
Min X, Zhai G, Zhou J, Farias MCQ, and Bovik AC
- Abstract
The topics of visual and audio quality assessment (QA) have been widely researched for decades, yet nearly all of this prior work has focused only on single-mode visual or audio signals. However, visual signals rarely are presented without accompanying audio, including heavy-bandwidth video streaming applications. Moreover, the distortions that may separately (or conjointly) afflict the visual and audio signals collectively shape user-perceived quality of experience (QoE). This motivated us to conduct a subjective study of audio and video (A/V) quality, which we then used to compare and develop A/V quality measurement models and algorithms. The new LIVE-SJTU Audio and Video Quality Assessment (A/V-QA) Database includes 336 A/V sequences that were generated from 14 original source contents by applying 24 different A/V distortion combinations on them. We then conducted a subjective A/V quality perception study on the database towards attaining a better understanding of how humans perceive the overall combined quality of A/V signals. We also designed four different families of objective A/V quality prediction models, using a multimodal fusion strategy. The different types of A/V quality models differ in both the unimodal audio and video quality prediction models comprising the direct signal measurements and in the way that the two perceptual signal modes are combined. The objective models are built using both existing state-of-the-art audio and video quality prediction models and some new prediction models, as well as quality-predictive features delivered by a deep neural network. The methods of fusing audio and video quality predictions that are considered include simple product combinations as well as learned mappings. Using the new subjective A/V database as a tool, we validated and tested all of the objective A/V quality prediction models. We will make the database publicly available to facilitate further research.
- Published
- 2020
- Full Text
- View/download PDF
34. Quality Prediction on Deep Generative Images.
- Author
-
Ko H, Lee DY, Cho S, and Bovik AC
- Abstract
In recent years, deep neural networks have been utilized in a wide variety of applications including image generation. In particular, generative adversarial networks (GANs) are able to produce highly realistic pictures as part of tasks such as image compression. As with standard compression, it is desirable to be able to automatically assess the perceptual quality of generative images to monitor and control the encode process. However, existing image quality algorithms are ineffective on GAN generated content, especially on textured regions and at high compressions. Here we propose a new "naturalness"-based image quality predictor for generative images. Our new GAN picture quality predictor is built using a multi-stage parallel boosting system based on structural similarity features and measurements of statistical similarity. To enable model development and testing, we also constructed a subjective GAN image quality database containing (distorted) GAN images and collected human opinions of them. Our experimental results indicate that our proposed GAN IQA model delivers superior quality predictions on the generative image datasets, as well as on traditional image quality datasets.
- Published
- 2020
- Full Text
- View/download PDF
35. Blind Noisy Image Quality Assessment Using Sub-Band Kurtosis.
- Author
-
Deng C, Wang S, Bovik AC, Huang GB, and Zhao B
- Subjects
- Image Processing, Computer-Assisted methods, Machine Learning, Models, Statistical, Wavelet Analysis
- Abstract
Noise that afflicts natural images, regardless of the source, generally disturbs the perception of image quality by introducing a high-frequency random element that, when severe, can mask image content. Except at very low levels, where it may play a purpose, it is annoying. There exist significant statistical differences between distortion-free natural images and noisy images that become evident upon comparing the empirical probability distribution histograms of their discrete wavelet transform (DWT) coefficients. The DWT coefficients of low- or no-noise natural images have leptokurtic, peaky distributions with heavy tails; while noisy images tend to be platykurtic with less peaky distributions and shallower tails. The sample kurtosis is a natural measure of the peakedness and tail weight of the distributions of random variables. Here, we study the efficacy of the sample kurtosis of image wavelet coefficients as a feature driving, an extreme learning machine which learns to map kurtosis values into perceptual quality scores. The model is trained and tested on five types of noisy images, including additive white Gaussian noise, additive Gaussian color noise, impulse noise, masked noise, and high-frequency noise from the LIVE, CSIQ, TID2008, and TID2013 image quality databases. The experimental results show that the trained model has better quality evaluation performance on noisy images than existing blind noise assessment models, while also outperforming general-purpose blind and full-reference image quality assessment methods.
- Published
- 2020
- Full Text
- View/download PDF
36. Dynamic Receptive Field Generation for Full-Reference Image Quality Assessment.
- Author
-
Kim W, Nguyen AD, Lee S, and Bovik AC
- Abstract
Most full-reference image quality assessment (FR-IQA) methods advanced to date have been holistically designed without regard to the type of distortion impairing the image. However, the perception of distortion depends nonlinearly on the distortion type. Here we propose a novel FR-IQA framework that dynamically generates receptive fields responsive to distortion type. Our proposed method-dynamic receptive field generation based image quality assessor (DRF-IQA)-separates the process of FR-IQA into two streams: 1) dynamic error representation and 2) visual sensitivity-based quality pooling. The first stream generates dynamic receptive fields on the input distorted image, implemented by a trained convolutional neural network (CNN), then the generated receptive field profiles are convolved with the distorted and reference images, and differenced to produce spatial error maps. In the second stream, a visual sensitivity map is generated. The visual sensitivity map is used to weight the spatial error map. The experimental results show that the proposed model achieves state-of-the-art prediction accuracy on various open IQA databases.
- Published
- 2020
- Full Text
- View/download PDF
37. A Unified Probabilistic Formulation of Image Aesthetic Assessment.
- Author
-
Zeng H, Cao Z, Zhang L, and Bovik AC
- Abstract
Image aesthetic assessment (IAA) has been attracting considerable attention in recent years due to the explosive growth of digital photography in Internet and social networks. The IAA problem is inherently challenging, owning to the ineffable nature of the human sense of aesthetics and beauty, and its close relationship to understanding pictorial content. Three different approaches to framing and solving the problem have been posed: binary classification, average score regression and score distribution prediction. Solutions that have been proposed have utilized different types of aesthetic labels and loss functions to train deep IAA models. However, these studies ignore the fact that the three different IAA tasks are inherently related. Here, we reveal that the use of the different types of aesthetic labels can be developed within the same statistical framework, which we use to create a unified probabilistic formulation of all the three IAA tasks. This unified formulation motivates the use of an efficient and effective loss function for training deep IAA models to conduct different tasks. We also discuss the problem of learning from a noisy raw score distribution which hinders network performance. We then show that by fitting the raw score distribution to a more stable and discriminative score distribution, we are able to train a single model which is able to obtain highly competitive performance on all three IAA tasks. Extensive qualitative analysis and experimental results on image aesthetic benchmarks validate the superior performance afforded by the proposed formulation. The source code is available at.
- Published
- 2019
- Full Text
- View/download PDF
38. Quality Measurement of Images on Mobile Streaming Interfaces Deployed at Scale.
- Author
-
Sinno Z, Moorthy A, De Cock J, Li Z, and Bovik AC
- Abstract
With the growing use of smart cellular devices for entertainment purposes, audio and video streaming services now offer an increasingly wide variety of popular mobile applications that offer portable and accessible ways to consume content. The user interfaces of these applications have become increasingly visual in nature, and are commonly loaded with dense multimedia content such as thumbnail images, animated GIFs, and short videos. To efficiently render these and to aid rapid download to the client display, it is necessary to compress, scale and color subsample them. These operations introduce distortions, reducing the appeal of the application. It is desirable to be able to automatically monitor and govern the visual qualities of these small images, which are usually small images. However, while there exists a variety of high-performing image quality assessment (IQA) algorithms, none have been designed for this particular use case. This kind of content often has unique characteristics, such as overlaid graphics, intentional brightness, gradients, text, and warping. We describe a study we conducted on the subjective and objective quality of images embedded in the displayed user interfaces of mobile streaming applications. We created a database of typical "billboard" and "thumbnail" images viewed on such services. Using the collected data, we studied the effects of compression, scaling and chroma-subsampling on perceived quality by conducting a subjective study. We also evaluated the performance of leading picture quality prediction models on the new database. We report some surprising results regarding algorithm performance, and find that there remains ample scope for future model development.
- Published
- 2019
- Full Text
- View/download PDF
39. Predicting Detection Performance on Security X-Ray Images as a Function of Image Quality.
- Author
-
Gupta P, Sinno Z, Glover JL, Paulter NG, and Bovik AC
- Abstract
Developing methods to predict how image quality affects the task performance is a topic of great interest in many applications. While such studies have been performed in the medical imaging community, little work has been reported in the security X-ray imaging literature. In this paper, we develop models that predict the effect of image quality on the detection of the improvised explosive device components by bomb technicians in images taken using portable X-ray systems. Using a newly developed NIST-LIVE X-Ray Task Performance Database, we created a set of objective algorithms that predict bomb technician detection performance based on the measures of image quality. Our basic measures are traditional image quality indicators (IQIs) and perceptually relevant natural scene statistics (NSS)-based measures that have been extensively used in visible light image quality prediction algorithms. We show that these measures are able to quantify the perceptual severity of degradations and can predict the performance of expert bomb technicians in identifying threats. Combining NSS- and IQI-based measures yields even better task performance prediction than either of these methods independently. We also developed a new suite of statistical task prediction models that we refer to as quality inspectors of X-ray images (QUIX); we believe this is the first NSS-based model for security X-ray images. We also show that QUIX can be used to reliably predict conventional IQI metric values on the distorted X-ray images.
- Published
- 2019
- Full Text
- View/download PDF
40. Predicting the Quality of Images Compressed after Distortion in Two Steps.
- Author
-
Yu X, Bampis CG, Gupta P, and Bovik AC
- Abstract
In a typical communication pipeline, images undergo a series of processing steps that can cause visual distortions before being viewed. Given a high quality reference image, a reference (R) image quality assessment (IQA) algorithm can be applied after compression or transmission. However, the assumption of a high quality reference image is often not fulfilled in practice, thus contributing to less accurate quality predictions when using stand-alone R IQA models. This is particularly common on social media, where hundreds of billions of usergenerated photos and videos containing diverse, mixed distortions are uploaded, compressed, and shared annually on sites like Facebook, YouTube, and Snapchat. The qualities of the pictures that are uploaded to these sites vary over a very wide range. While this is an extremely common situation, the problem of assessing the qualities of compressed images against their precompressed, but often severely distorted (reference) pictures has been little studied. Towards ameliorating this problem, we propose a novel two-step image quality prediction concept that combines NR with R quality measurements. Applying a first stage of NR IQA to determine the possibly degraded quality of the source image yields information that can be used to quality-modulate the R prediction to improve its accuracy. We devise a simple and efficient weighted product model of R and NR stages, which combines a pre-compression NR measurement with a post-compression R measurement. This first-of-a-kind two-step approach produces more reliable objective prediction scores. We also constructed a new, first-of-a-kind dedicated database specialized for the design and testing of two-step IQA models. Using this new resource, we show that twostep approaches yield outstanding performance when applied to compressed images whose original, pre-compression quality covers a wide range of realistic distortion types and severities. The two-step concept is versatile as it can use any desired R and NR components. We are making the source code of a particularly efficient model that we call 2stepQA publicly available at https://github.com/xiangxuyu/2stepQA. We are also providing the dedicated new two-step database free of charge at http://live.ece.utexas.edu/research/twostep/index.html.
- Published
- 2019
- Full Text
- View/download PDF
41. Study of Subjective Quality and Objective Blind Quality Prediction of Stereoscopic Videos.
- Author
-
Appina B, Dendi SVR, Manasa K, Channappayya SS, and Bovik AC
- Abstract
We present a new subjective and objective study on full high-definition (HD) stereoscopic (3D or S3D) video quality. In subjective study, we constructed an S3D video dataset with 12 pristine and 288 test videos, and the test videos are generated by applying the H.264 and H.265 compression, blur and frame freeze artifacts. We also propose a no reference (NR) objective video quality assessment (QA) algorithm that relies on measurements of the statistical dependencies between the motion and disparity subband coefficients of S3D videos. Inspired by the Generalized Gaussian Distribution (GGD) approach in liu2011statistical, we model the joint statistical dependencies between the motion and disparity components as following a Bivariate Generalized Gaussian Distribution (BGGD). We estimate the BGGD model parameters (α,β) and the coherence measure (Ψ) from the eigenvalues of the sample covariance matrix (M) of the BGGD. In turn, we model the BGGD parameters of pristine S3D videos using a Multivariate Gaussian (MVG) distribution. The likelihood of a test video's MVG model parameters coming from the pristine MVG model is computed and shown to play a key role in the overall quality estimation. We also estimate the global motion content of each video by averaging the SSIM scores between pairs of successive video frames. To estimate the test S3D video's spatial quality, we apply the popular 2D NR unsupervised NIQE image QA model on a frame-by-frame basis on both views. The overall quality of a test S3D video is finally computed by pooling the test S3D video's likelihood estimates, global motion strength and spatial quality scores. The proposed algorithm, which is 'completely blind' (requiring no reference videos or training on subjective scores) is called the Motion and Disparity based 3D video quality evaluator (MoDi3D). We show that MoDi3D delivers competitive performance over a wide variety of datasets including the IRCCYN dataset, the WaterlooIVC Phase I dataset, the LFOVIA dataset and our proposed LFOVIAS3DPh2 S3D video dataset.
- Published
- 2019
- Full Text
- View/download PDF
42. Detecting and Mapping Video Impairments.
- Author
-
Goodall TR and Bovik AC
- Abstract
Automatically identifying the locations and severities of video artifacts without the advantage of an original reference video is a difficult task. We present a novel approach to conducting no-reference artifact detection in digital videos, implemented as an efficient and unique dual-path (parallel) excitatory/inhibitory neural network that uses a simple discrimination rule to define a bank of accurate distortion detectors. The learning engine is distortion-sensitized by pre-processing each video using a statistical image model. The overall system is able to produce full-resolution space-time distortion maps for visualization, as well as providing global distortion detection decisions that represent the state of the art in performance. Our model, which we call the Video Impairment Mapper (VIDMAP), produces a first-of-a-kind full resolution map of artifact detection probabilities. The current realization of this system is able to accurately detect and map eight of the most important artifact categories encountered during streaming video source inspection: aliasing, video encoding corruptions, quantization, contours/banding, combing, compression, dropped frames, and upscaling artifacts. We show that it is either competitive with or significantly outperforms the previous state-of-the-art on the whole-image artifact detection task. A software release of VIDMAP that has been trained to detect and map these artifacts is available online: http://live.ece.utexas.edu/research/quality/VIDMAP release.zip for public use and evaluation.
- Published
- 2018
- Full Text
- View/download PDF
43. Automatic segmentation of inorganic nanoparticles in BF TEM micrographs.
- Author
-
Groom DJ, Yu K, Rasouli S, Polarinakis J, Bovik AC, and Ferreira PJ
- Abstract
Transmission electron microscopy (TEM) represents a unique and powerful modality for capturing spatial features of nanoparticles, such as size and shape. However, poor statistics arise as a key obstacle, due to the challenge in accurately and automatically segmenting nanoparticles in TEM micrographs. Towards remedying this deficit, we introduce an automatic particle picking device that is based on the concept of variance hybridized mean local thresholding. Validation of this new segmentation model is accomplished by applying a program written in Matlab to a database of 150 bright field TEM micrographs containing approximately 2,000 nanoparticles. We compare the results to global thresholding, local thresholding, and manual segmentation. It is found that this novel automatic particle picking device reduces false positives and false negatives significantly, while increasing the number of individual particles picked on regions of particle overlap., (Copyright © 2018. Published by Elsevier B.V.)
- Published
- 2018
- Full Text
- View/download PDF
44. Large-Scale Study of Perceptual Video Quality.
- Author
-
Sinno Z and Bovik AC
- Abstract
The great variations of videographic skills in videography, camera designs, compression and processing protocols, communication and bandwidth environments, and displays leads to an enormous variety of video impairments. Current noreference (NR) video quality models are unable to handle this diversity of distortions. This is true in part because available video quality assessment databases contain very limited content, fixed resolutions, were captured using a small number of camera devices by a few videographers and have been subjected to a modest number of distortions. As such, these databases fail to adequately represent real world videos, which contain very different kinds of content obtained under highly diverse imaging conditions and are subject to authentic, complex and often commingled distortions that are difficult or impossible to simulate. As a result, NR video quality predictors tested on real-world video data often perform poorly. Towards advancing NR video quality prediction, we have constructed a largescale video quality assessment database containing 585 videos of unique content, captured by a large number of users, with wide ranges of levels of complex, authentic distortions. We collected a large number of subjective video quality scores via crowdsourcing. A total of 4776 unique participants took part in the study, yielding more than 205000 opinion scores, resulting in an average of 240 recorded human opinions per video. We demonstrate the value of the new resource, which we call the LIVE Video Quality Challenge Database (LIVE-VQC for short), by conducting a comparison of leading NR video quality predictors on it. This study is the largest video quality assessment study ever conducted along several key dimensions: number of unique contents, capture devices, distortion types and combinations of distortions, study participants, and recorded subjective scores. The database is available for download on this link: http://live.ece.utexas.edu/research/LIVEVQC/index.html.
- Published
- 2018
- Full Text
- View/download PDF
45. Modeling the Perceptual Quality of Immersive Images Rendered on Head Mounted Displays: Resolution and Compression.
- Author
-
Huang M, Shen Q, Ma Z, Bovik AC, Gupta P, Zhou R, and Cao X
- Abstract
We develop a model that expresses the joint impact of spatial resolution s and JPEG compression quality factor qf on immersive image quality. The model is expressed as the product of optimized exponential functions of these factors. The model is tested on a subjective database of immersive image contents rendered on a head mounted display (HMD). High Pearson correlation and Spearman correlation (> 0.95) and small relative root mean squared error (< 5.6%) are achieved between the model predictions and the subjective quality judgements. The immersive ground-truth images along with the rest of the database are made available for future research and comparisons.
- Published
- 2018
- Full Text
- View/download PDF
46. Recurrent and Dynamic Models for Predicting Streaming Video Quality of Experience.
- Author
-
Bampis CG, Li Z, Katsavounidis I, and Bovik AC
- Abstract
Streaming video services represent a very large fraction of global bandwidth consumption. Due to the exploding demands of mobile video streaming services, coupled with limited bandwidth availability, video streams are often transmitted through unreliable, low-bandwidth networks. This unavoidably leads to two types of major streaming-related impairments: compression artifacts and/or rebuffering events. In streaming video applications, the end-user is a human observer; hence being able to predict the subjective Quality of Experience (QoE) associated with streamed videos could lead to the creation of perceptually optimized resource allocation strategies driving higher quality video streaming services. We propose a variety of recurrent dynamic neural networks that conduct continuous-time subjective QoE prediction. By formulating the problem as one of time-series forecasting, we train a variety of recurrent neural networks and non-linear autoregressive models to predict QoE using several recently developed subjective QoE databases. These models combine multiple, diverse neural network inputs, such as predicted video quality scores, rebuffering measurements, and data related to memory and its effects on human behavioral responses, using them to predict QoE on video streams impaired by both compression artifacts and rebuffering events. Instead of finding a single time-series prediction model, we propose and evaluate ways of aggregating different models into a forecasting ensemble that delivers improved results with reduced forecasting variance. We also deploy appropriate new evaluation metrics for comparing time-series predictions in streaming applications. Our experimental results demonstrate improved prediction performance that approaches human performance. An implementation of this work can be found at https://github.com/christosbampis/NARX_QoE_release.
- Published
- 2018
- Full Text
- View/download PDF
47. Towards a Closed Form Second-Order Natural Scene Statistics Model.
- Author
-
Sinno Z, Caramanis C, and Bovik AC
- Abstract
Previous work on natural scene statistics (NSS)-based image models has focused primarily on characterizing the univariate bandpass statistics of single pixels. These models have proven to be powerful tools driving a variety of computer vision and image/video processing applications, including depth estimation, image quality assessment, and image denoising, among others. Multivariate NSS models descriptive of the joint distributions of spatially separated bandpass image samples have, however, received relatively little attention. Here, we develop a closed form bivariate spatial correlation model of bandpass and normalized image samples that completes an existing 2D joint generalized Gaussian distribution model of adjacent bandpass pixels. Our model is built using a set of diverse, high-quality naturalistic photographs, and as a control, we study the model properties on white noise. We also study the way the model fits are affected when the images are modified by common distortions.
- Published
- 2018
- Full Text
- View/download PDF
48. Deep Visual Discomfort Predictor for Stereoscopic 3D Images.
- Author
-
Oh H, Ahn S, Lee S, and Bovik AC
- Abstract
Most prior approaches to the problem of stereoscopic 3D (S3D) visual discomfort prediction (VDP) have focused on the extraction of perceptually meaningful handcrafted features based on models of visual perception and of natural depth statistics. Towards advancing performance on this problem, we have developed a deep learning based VDP model named Deep Visual Discomfort Predictor (DeepVDP). DeepVDP uses a convolutional neural network (CNN) to learn features that are highly predictive of experienced visual discomfort. Since a large amount of reference data is needed to train a CNN, we develop a systematic way of dividing S3D image into local regions defined as patches, and model a patch-based CNN using two sequential training steps. Since it is very difficult to obtain human opinions on each patch, instead a proxy ground-truth label that is generated by an existing S3D visual discomfort prediction algorithm called 3D-VDP is assigned to each patch. These proxy ground-truth labels are used to conduct the first stage of training the CNN. In the second stage, the automatically learned local abstractions are aggregated into global features via a feature aggregation layer. The learned features are iteratively updated via supervised learning on subjective 3D discomfort scores, which serve as ground-truth labels on each S3D image. The patchbased CNN model that has been pretrained on proxy groundtruth labels is subsequently retrained on true global subjective scores. The global S3D visual discomfort scores predicted by the trained DeepVDP model achieve state-of-the-art performance as compared to previous VDP algorithms.
- Published
- 2018
- Full Text
- View/download PDF
49. Learning a Continuous-Time Streaming Video QoE Model.
- Author
-
Ghadiyaram D, Pan J, and Bovik AC
- Abstract
Over-the-top adaptive video streaming services are frequently impacted by fluctuating network conditions that can lead to rebuffering events (stalling events) and sudden bitrate changes. These events visually impact video consumers' quality of experience (QoE) and can lead to consumer churn. The development of models that can accurately predict viewers' instantaneous subjective QoE under such volatile network conditions could potentially enable the more efficient design of quality-control protocols for media-driven services, such as YouTube, Amazon, Netflix, and so on. However, most existing models only predict a single overall QoE score on a given video and are based on simple global video features, without accounting for relevant aspects of human perception and behavior. We have created a QoE evaluator, called the time-varying QoE Indexer, that accounts for interactions between stalling events, analyzes the spatial and temporal content of a video, predicts the perceptual video quality, models the state of the client-side data buffer, and consequently predicts continuous-time quality scores that agree quite well with human opinion scores. The new QoE predictor also embeds the impact of relevant human cognitive factors, such as memory and recency, and their complex interactions with the video content being viewed. We evaluated the proposed model on three different video databases and attained standout QoE prediction performance.
- Published
- 2018
- Full Text
- View/download PDF
50. Multivariate Statistical Approach to Image Quality Tasks.
- Author
-
Gupta P, Bampis CG, Glover JL, Paulter NG Jr, and Bovik AC
- Abstract
Many existing Natural Scene Statistics-based no reference image quality assessment (NR IQA) algorithms employ univariate parametric distributions to capture the statistical inconsistencies of bandpass distorted image coefficients. Here we propose a multivariate model of natural image coefficients expressed in the bandpass spatial domain that has the potential to capture higher-order correlations that may be induced by the presence of distortions. We analyze how the parameters of the multivariate model are affected by different distortion types, and we show their ability to capture distortion-sensitive image quality information. We also demonstrate the violation of Gaussianity assumptions that occur when locally estimating the energies of distorted image coefficients. Thus we propose a generalized Gaussian-based local contrast estimator as a way to implement non-linear local gain control, that facilitates the accurate modeling of both pristine and distorted images. We integrate the novel approach of generalized contrast normalization with multivariate modeling of bandpass image coefficients into a holistic NR IQA model, which we refer to as multivariate generalized contrast normalization (MVGCN). We demonstrate the improved performance of MVGCN on quality relevant tasks on multiple imaging modalities, including visible light image quality prediction and task success prediction on distorted X-ray images.
- Published
- 2018
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.