Author: "Thomas S. Huang" / Topic: computer vision - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Thomas S. Huang"' showing total 392 results

Start Over Author "Thomas S. Huang" Topic computer vision

392 results on '"Thomas S. Huang"'

1. FLNet: Landmark Driven Fetching and Learning Network for Faithful Talking Facial Animation Synthesis

Author: Kuangxiao Gu, Yuqian Zhou, and Thomas S. Huang
Subjects: FOS: Computer and information sciences, Landmark, Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, General Medicine, Face space, 0202 electrical engineering, electronic engineering, information engineering, Learning network, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, Image warping, business, Computer facial animation
Abstract: Talking face synthesis has been widely studied in either appearance-based or warping-based methods. Previous works mostly utilize single face image as a source, and generate novel facial animations by merging other person's facial features. However, some facial regions like eyes or teeth, which may be hidden in the source image, can not be synthesized faithfully and stably. In this paper, We present a landmark driven two-stream network to generate faithful talking facial animation, in which more facial details are created, preserved and transferred from multiple source images instead of a single one. Specifically, we propose a network consisting of a learning and fetching stream. The fetching sub-net directly learns to attentively warp and merge facial regions from five source images of distinctive landmarks, while the learning pipeline renders facial organs from the training face space to compensate. Compared to baseline algorithms, extensive experiments demonstrate that the proposed method achieves a higher performance both quantitatively and qualitatively. Codes are at https://github.com/kgu3/FLNet_AAAI2020., Accepted by AAAI 2020
Published: 2020
Full Text: View/download PDF

2. Semi-online Multi-people Tracking by Re-identification

Author: Long Lan, Dacheng Tao, Xinchao Wang, Gang Hua, and Thomas S. Huang
Subjects: Markov random field, business.industry, Computer science, Deep learning, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Contrast (statistics), Space (commercial competition), Tracking (particle physics), Task (project management), Alpha (programming language), Artificial Intelligence, Pattern recognition (psychology), Computer vision, Computer Vision and Pattern Recognition, Artificial intelligence, business, Software
Abstract: In this paper, we propose a novel semi-online approach to tracking multiple people. In contrast to conventional offline approaches that take the whole image sequence as input, our semi-online approach tracks people in a frame-by-frame manner by exploring the time, space and multi-camera relationship of detection hypotheses in the near future frames. We cast the multi-people tracking task as a re-identification problem, and explicitly account for objects’ appearance changes and longer-term associations. We model our approach using a Multi-Label Markov Random Field, and introduce a fast $$\alpha $$-expansion algorithm to solve it efficiently. To our best knowledge, this is the first semi-online approach achieved by re-identification. It yields very promising tracking results especially in challenging cases, such as scenarios of the crowded streets where pedestrians frequently occlude each other, scenes captured with moving cameras where objects may disappear and reappear randomly, and videos under changing illuminations wherein the appearances of objects are influenced.
Published: 2020
Full Text: View/download PDF

3. NTIRE 2020 Challenge on Image and Video Deblurring

Author: Seungjun Nah, Sanghyun Son, Radu Timofte, Kyoung Mu Lee, Yu Tseng, Yu-Syuan Xu, Cheng-Ming Chiang, Yi-Min Tsai, Stephan Brehm, Sebastian Scherer, Dejia Xu, Yihao Chu, Qingyan Sun, Jiaqin Jiang, Lunhao Duan, Jian Yao, Kuldeep Purpohit, Maitreya Suin, A.N. Rajagopalan, Yuichi Ito, P.S. Hrishikesh, Densen Puthussery, K.A. Akhil, C.V. Jiji, Guisik Kim, P.L. Deepa, Zhiwei Xiong, Jie Huang, Dong Liu, Sangmin Kim, Hyungjoon Nam, Jisu Kim, Jechang Jeong, Shihua Huang, Yuchen Fan, Jiahui Yu, Haichao Yu, Thomas S. Huang, Ya Zhou, Xin Li, Sen Liu, Zhibo Chen, Saikat Dutta, Sourya Dipta Das, Shivam Garg, Daniel Sprague, Bhrij Patel, and Thomas Huck
Subjects: Deblurring, Relation (database), business.industry, Computer science, Motion blur, Photography, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Computer vision, Artificial intelligence, business, Image (mathematics)
Abstract: Motion blur is one of the most common degradation artifacts in dynamic scene photography. This paper reviews the NTIRE 2020 Challenge on Image and Video Deblurring. In this challenge, we present the evaluation results from 3 competition tracks as well as the proposed solutions. Track 1 aims to develop single-image deblurring methods focusing on restoration quality. On Track 2, the image deblurring methods are executed on a mobile platform to find the balance of the running speed and the restoration accuracy. Track 3 targets developing video deblurring methods that exploit the temporal relation between input frames. In each competition, there were 163, 135, and 102 registered participants and in the final testing phase, 9, 4, and 7 teams competed. The winning methods demonstrate the state-of-the-art performance on image and video deblurring tasks.
Published: 2020
Full Text: View/download PDF

4. Agriculture-Vision: A Large Aerial Image Database for Agricultural Pattern Analysis

Author: Naira Hovakimyan, Thomas S. Huang, Honghui Shi, Xingqian Xu, Zilong Huang, Adrian Tudor, David Wilson, Hrant Khachatrian, Greg Rose, Robert J. Brunner, Ivan Dozier, Yunchao Wei, Alexander G. Schwing, Mang Tik Chiu, and Hovnatan Karapetyan
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer science, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, 0211 other engineering and technologies, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, Machine Learning (cs.LG), Computer Science - Computers and Society, Computers and Society (cs.CY), 0202 electrical engineering, electronic engineering, information engineering, FOS: Electrical engineering, electronic engineering, information engineering, Segmentation, Computer vision, Image resolution, Aerial image, 021101 geological & geomatics engineering, Pixel, Contextual image classification, business.industry, Image and Video Processing (eess.IV), Cognitive neuroscience of visual object recognition, Image segmentation, Electrical Engineering and Systems Science - Image and Video Processing, Visualization, RGB color model, 020201 artificial intelligence & image processing, Artificial intelligence, business
Abstract: The success of deep learning in visual recognition tasks has driven advancements in multiple fields of research. Particularly, increasing attention has been drawn towards its application in agriculture. Nevertheless, while visual pattern recognition on farmlands carries enormous economic values, little progress has been made to merge computer vision and crop sciences due to the lack of suitable agricultural image datasets. Meanwhile, problems in agriculture also pose new challenges in computer vision. For example, semantic segmentation of aerial farmland images requires inference over extremely large-size images with extreme annotation sparsity. These challenges are not present in most of the common object datasets, and we show that they are more challenging than many other aerial image datasets. To encourage research in computer vision for agriculture, we present Agriculture-Vision: a large-scale aerial farmland image dataset for semantic segmentation of agricultural patterns. We collected 94,986 high-quality aerial images from 3,432 farmlands across the US, where each image consists of RGB and Near-infrared (NIR) channels with resolution as high as 10 cm per pixel. We annotate nine types of field anomaly patterns that are most important to farmers. As a pilot study of aerial agricultural semantic segmentation, we perform comprehensive experiments using popular semantic segmentation models; we also propose an effective model designed for aerial agricultural pattern recognition. Our experiments demonstrate several challenges Agriculture-Vision poses to both the computer vision and agriculture communities. Future versions of this dataset will include even more aerial images, anomaly patterns and image channels. More information at https://www.agriculture-vision.com., Comment: CVPR 2020
Published: 2020
Full Text: View/download PDF

5. NTIRE 2020 Challenge on Perceptual Extreme Super-Resolution: Methods and Results

Author: Jing Liu, Yukai Shi, C. V. Jiji, Tong Yang, Mykola Mykhailych, Junyeop Lee, Gwantae Kim, Zhipeng Luo, Yandong Guo, Jiahao Wu, Liang Lin, Shengchen Zhu, Haoyu Zhong, Zhenyu Xu, Fu Li, Fuzhi Yang, JaeHyun Baek, Jong Chul Ye, Jinjia Peng, Densen Puthussery, Thomas S. Huang, Taizhang Shang, Wenhao Wu, Kai Zhang, Zhi Jin, Dongliang He, Jaihyun Park, Jeongki Min, Radu Timofte, Jungki Min, Chao Li, Kanghyu Lee, Sejong Yang, Chenming Shang, P. S. Hrishikesh, Huibing Wang, Qiuju Dai, Norimichi Ukita, Yifu Chen, Takeru Ooba, Zhijing Yang, Yuehan Yao, Jiande Jiang, Seon Joo Kim, Kazutoshi Akita, Xinbo Gao, Shuhang Gu, Xiaojun Yang, Huanrong Zhang, Kwangjin Yoon, Taegyun Jeon, Jianwei Li, Jianlong Fu, Huan Yang, Younghyun Jo, Wen Lu, Lin Zha, Byung-Hoon Kim, Bokyeung Lee, Tongtong Zhao, Shilei Wen, Ding Yukang, and Yuchen Fan
Subjects: FOS: Computer and information sciences, Ground truth, Computer science, business.industry, media_common.quotation_subject, Computer Vision and Pattern Recognition (cs.CV), Image and Video Processing (eess.IV), Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020206 networking & telecommunications, 02 engineering and technology, Electrical Engineering and Systems Science - Image and Video Processing, Superresolution, Image (mathematics), Task (project management), 0202 electrical engineering, electronic engineering, information engineering, FOS: Electrical engineering, electronic engineering, information engineering, Computer vision, Quality (business), Artificial intelligence, Set (psychology), business, Focus (optics), Image resolution, media_common
Abstract: This paper reviews the NTIRE 2020 challenge on perceptual extreme super-resolution with focus on proposed solutions and results. The challenge task was to super-resolve an input image with a magnification factor 16 based on a set of prior examples of low and corresponding high resolution images. The goal is to obtain a network design capable to produce high resolution results with the best perceptual quality and similar to the ground truth. The track had 280 registered participants, and 19 teams submitted the final results. They gauge the state-of-the-art in single image super-resolution., Comment: CVPRW 2020
Published: 2020
Full Text: View/download PDF

6. Motion Pyramid Networks for Accurate and Efficient Cardiac Motion Estimation

Author: Hanchao Yu, Xiao Chen, Shanhui Sun, Humphrey Shi, Thomas S. Huang, and Terrence Chen
Subjects: Motion compensation, business.industry, Computer science, Deep learning, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Physics::Physics Education, Motion (physics), 030218 nuclear medicine & medical imaging, 03 medical and health sciences, 0302 clinical medicine, Motion field, Feature (computer vision), Motion estimation, Pyramid, Computer vision, Artificial intelligence, Pyramid (image processing), business, 030217 neurology & neurosurgery
Abstract: Cardiac motion estimation plays a key role in MRI cardiac feature tracking and function assessment such as myocardium strain. In this paper, we propose Motion Pyramid Networks, a novel deep learning-based approach for accurate and efficient cardiac motion estimation. We predict and fuse a pyramid of motion fields from multiple scales of feature representations to generate a more refined motion field. We then use a novel cyclic teacher-student training strategy to make the inference end-to-end and further improve the tracking performance. Our teacher model provides more accurate motion estimation as supervision through progressive motion compensations. Our student model learns from the teacher model to estimate motion in a single step while maintaining accuracy. The teacher-student knowledge distillation is performed in a cyclic way for a further performance boost. Our proposed method outperforms a strong baseline model on two public available clinical datasets significantly, evaluated by a variety of metrics and the inference time. New evaluation metrics are also proposed to represent errors in a clinically meaningful manner.
Published: 2020
Full Text: View/download PDF

7. Free-Form Image Inpainting With Gated Convolution

Author: Xin Lu, Jimei Yang, Jiahui Yu, Zhe Lin, Thomas S. Huang, and Xiaohui Shen
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Pixel, Channel (digital image), business.industry, Computer science, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Inpainting, Graphics (cs.GR), Machine Learning (cs.LG), Convolution, Image (mathematics), Computer Science - Graphics, Simple (abstract algebra), Code (cryptography), Computer vision, Artificial intelligence, business
Abstract: We present a generative image inpainting system to complete images with free-form mask and guidance. The system is based on gated convolutions learned from millions of images without additional labelling efforts. The proposed gated convolution solves the issue of vanilla convolution that treats all input pixels as valid ones, generalizes partial convolution by providing a learnable dynamic feature selection mechanism for each channel at each spatial location across all layers. Moreover, as free-form masks may appear anywhere in images with any shape, global and local GANs designed for a single rectangular mask are not applicable. Thus, we also present a patch-based GAN loss, named SN-PatchGAN, by applying spectral-normalized discriminator on dense image patches. SN-PatchGAN is simple in formulation, fast and stable in training. Results on automatic image inpainting and user-guided extension demonstrate that our system generates higher-quality and more flexible results than previous methods. Our system helps user quickly remove distracting objects, modify image layouts, clear watermarks and edit faces. Code, demo and models are available at: https://github.com/JiahuiYu/generative_inpainting, Accepted in ICCV 2019 Oral; open sourced; interactive demo available: http://jiahuiyu.com/deepfill/
Published: 2019
Full Text: View/download PDF

8. NTIRE 2019 Challenge on Video Deblurring: Methods and Results

Author: Seungjun Nah, Ke Yu, Thomas S. Huang, Kelvin C.K. Chan, Fan Hongfei, Mohammad Tofighi, Ji Soo Kim, Muhammad Haris, Chen Change Loy, Chao Dong, Aditya Arora, Zhang Wenjie, Jeonghun Kim, Yuchen Fan, Zhang Yumei, Vishal Chudasama, Li Guo, Fahad Shahbaz Khan, Munchurl Kim, Ding Liu, Radu Timofte, Qingwen He, Se Young Chun, Tiantong Guo, Sanghyun Son, Kuldeep Purohit, Kishor Upla, Rahul Kumar Gupta, Dong-won Park, Vishal Monga, Xiang Li, Ling Shao, Syed Waqas Zamir, Heena Patel, Wang Xintao, Norimichi Ukita, Hyeonjun Sim, Sungyong Baik, Salman Khan, Jiahui Yu, A.N. Rajagopalan, Gyeongsik Moon, Greg Shakhnarovich, Kyoung Mu Lee, and Seokil Hong
Subjects: Deblurring, business.industry, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, Focus (optics), Image resolution, Image restoration, Data compression
Abstract: This paper reviews the first NTIRE challenge on video deblurring (restoration of rich details and high frequency components from blurred video frames) with focus on the proposed solutions and results. A new REalistic and Diverse Scenes dataset (REDS) was employed. The challenge was divided into 2 tracks. Track 1 employed dynamic motion blurs while Track 2 had additional MPEG video compression artifacts. Each competition had 109 and 93 registered participants. Total 13 teams competed in the final testing phase. They gauge the state-of-the-art in video deblurring problem.
Published: 2019
Full Text: View/download PDF

9. NTIRE 2019 Challenge on Video Super-Resolution: Methods and Results

Author: Shuhang Gu, Greg Shakhnarovich, Fatih Porikli, Kyoung Mu Lee, Zhongyuan Wang, Yulun Zhang, Seokil Hong, M. Akin Yilmaz, Kuldeep Purohit, Si Miao, A. S. Mandal, Yapeng Tian, A. Murat Tekalp, Norimichi Ukita, Sanghyun Son, Junjun Jiang, Yun Fu, A.N. Rajagopalan, Chenliang Xu, Gyeongsik Moon, Sungyong Baik, Ankit Shukla, Zhe Hu, Chen Change Loy, Manoj Sharma, Chao Li, Ding Yukang, Dong Un Kang, Yongxin Zhu, Santanu Chaudhury, Ratheesh Kalarot, Hang Dong, Avinash Upadhyay, Muhammad Haris, Ke Yu, Thomas S. Huang, Megh Makwana, Wang Xintao, Xinyi Zhang, Peng Yi, Jiahui Yu, Kwanyoung Kim, Kui Jiang, Chao Dong, Xiao Huo, Rudrabha Mukhopadhyay, Yuchen Fan, Dongliang He, Ding Liu, Se Young Chun, Shilei Wen, Ajay Pratap Singh, Xiao Liu, Kelvin C.K. Chan, Cansu Korkmaz, Jiayi Ma, Radu Timofte, Anuj Badhwar, Seungjun Nah, Dheeraj Khanna, Tekalp, Ahmet Murat (ORCID 0000-0003-1465-8121 & YÖK ID 26207), Nah, S., Timofte, R., Gu, S., Baik, S., Hong, S., Moon, G., Son, S., Lee, K.M., Wang, X., Chan, K.C.K., Yu, K., Dong, C., Loy, C.C., Fan, Y., Yu, J., Liu, D., Huang, T.S., Liu, X., Li, C., He, D., DIng, Y., Wen, S., Porikli, F., Kalarot, R., Haris, M., Shakhnarovich, G., Ukita, N., Yi, P., Wang, Z., Jiang, K., Jiang, J., Ma, J., Dong, H., Zhang, X., Hu, Z., Kim, K., Kang, D.U., Chun, S.Y., Purohit, K., Rajagopalan, A.N., Tian, Y., Zhang, Y., Fu, Y., Xu, C., Yılmaz, M.A., Korkmaz, C., Sharma, M., Makwana, M., Badhwar, A., Singh, A.P., Upadhyay, A., Mukhopadhyay, R., Shukla, A., Khanna, D., Mandal, A.S., Chaudhury, S., Miao, S., Zhu, Y., Huo, X., College of Engineering, and Department of Electrical and Electronics Engineering
Subjects: Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, Image resolution, Image restoration, Video signal processing, Track (rail transport), Superresolution, Optical resolving power, Image super, 0202 electrical engineering, electronic engineering, information engineering, Bicubic interpolation, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, Focus (optics), business
Abstract: This paper reviews the first NTIRE challenge on video super-resolution (restoration of rich details in low-resolution video frames) with focus on proposed solutions and results. A new REalistic and Diverse Scenes dataset (REDS) was employed. The challenge was divided into 2 tracks. Track 1 employed standard bicubic downscaling setup while Track 2 had realistic dynamic motion blurs. Each competition had 124 and 104 registered participants. There were total 14 teams in the final testing phase. They gauge the state-of-the-art in video super-resolution., NA
Published: 2019
Full Text: View/download PDF

10. Self-Reproducing Video Frame Interpolation

Author: Haichao Yu, Thomas S. Huang, Zhangyang Wang, Jiajun Deng, and Xinchao Wang
Subjects: business.industry, Computer science, Deep learning, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Convolutional neural network, Constraint (information theory), Consistency (database systems), End-to-end principle, Computer vision, Artificial intelligence, Motion interpolation, Symmetry (geometry), business, Interpolation
Abstract: Frame interpolation has recently witnessed success by convolutional neural networks, that are learned from end to end to minimizing the reconstruction loss of dropped frames. This paper introduces a novel self-reproducing mechanism, that the real (given) frames could in turn be interpolated from the interpolated ones, to further substantially improve the consistency and performance of video frame interpolation. Such a consistency constraint accounts for the inherent symmetry between existing and interpolated frames in a video sequence, providing a strong form of self-supervision. We then build a pyramid-like architecture, under which existing interpolation models can plug-and-play as building blocks. Extensive experiments validate its state-of-the-art performance, on both high resolution videos in the wild and public benchmarks.
Published: 2019
Full Text: View/download PDF

11. A Novel Framework for 3D-2D Vertebra Matching

Author: Thenkurussi Kesavadas, Jianbo Jiao, Bihan Wen, Xinchao Wang, Honghui Shi, Yunchao Wei, Zhangyang Wang, Haichao Yu, Hanchao Yu, Yang Fu, Thomas S. Huang, and Matthew Bramlet
Subjects: Matching (statistics), Artificial neural network, business.industry, Computer science, 3D projection, Object detection, Hough transform, law.invention, Image (mathematics), law, Minimum bounding box, Computer vision, Artificial intelligence, business, Projection (set theory)
Abstract: 3D-2D medical image matching is a crucial task in image-guided surgery, image-guided radiation therapy and minimally invasive surgery. The task relies on identifying the correspondence between a 2D reference image and the 2D projection of 3D target image. In this paper, we propose a novel image matching framework between 3D CT projection and 2D X-ray image, tailored for vertebra images. The main idea is to learn a vertebra detector by means of deep neural network. The detected vertebra is represented by a bounding box in the 3D CT projection. Next, the bounding box annotated by the doctor on the X-ray image is matched to the corresponding box in the 3D projection. We evaluate our proposed method on our own-collected 3D-2D registration dataset. The experimental results show that our framework outperforms the state-of-the-art neural network-based keypoint matching methods.
Published: 2019
Full Text: View/download PDF

12. Improving 3D Human Pose Estimation Via 3D Part Affinity Fields

Author: Zixu Zhao, Yuxiao Hu, Lei Zhang, Xinchao Wang, Thomas S. Huang, and Ding Liu
Subjects: Monocular, Pixel, Artificial neural network, Computer science, business.industry, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Image (mathematics), 0202 electrical engineering, electronic engineering, information engineering, Benchmark (computing), Task analysis, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, Simple linear regression, business, Pose, 0105 earth and related environmental sciences
Abstract: 3D human pose estimation from monocular images has become a heated area in computer vision recently. For years, most deep neural network based practices have adopted either an end-to-end approach, or a two-stage approach. An end-to-end network typically estimates 3D human poses directly from 2D input images, but it suffers from the shortage of 3D human pose data. It is also obscure to know if the inaccuracy stems from limited visual under-standing or 2D-to-3D mapping. Whereas a two-stage directly lifts those 2D keypoint outputs to the 3D space, after utilizing an existing network for 2D keypoint detections. However, they tend to ignore some useful contextual hints from the 2D raw image pixels. In this paper, we introduce a two-stage architecture that can eliminate the main disadvantages of both these approaches. During the first stage we use an existing state-of-the-art detector to estimate 2D poses. To add more con-textual information to help lifting 2D poses to 3D poses, we propose 3D Part Affinity Fields (3D-PAFs). We use 3D-PAFs to infer 3D limb vectors, and combine them with 2D poses to regress the 3D coordinates. We trained and tested our proposed framework on Human3.6M, the most popular 3D human pose benchmark dataset. Our approach achieves the state-of-the-art performance, which proves that with right selections of contextual information, a simple regression model can be very powerful in estimating 3D poses.
Published: 2019
Full Text: View/download PDF

13. NTIRE 2019 challenge on real image denoising: Methods and results

Author: Kazutoshi Akita, Thomas S. Huang, Simone Zini, Raimondo Schettini, Jae-Ryun Chung, Bumjun Park, Chuan Wang, Sang-Won Lee, Seung-Won Jung, Simone Bianco, Lei Zhang, Yiyun Zhao, Yuchen Fan, Yifan Ding, Greg Shakhnarovich, Se Young Chun, Hongwei Yong, Ling Shao, Deyu Meng, Wangmeng Zuo, Chi Li, Salman Khan, Tomoki Yoshida, Chang Chen, Ding Liu, Dongwon Park, Wenyi Tang, Zhiwei Xiong, Syed Waqas Zamir, Yuqian Zhou, Norimichi Ukita, Haoqiang Fan, Seung-Wook Kim, Jue Wang, Zhiguo Cao, Yuzhi Wang, Radu Timofte, Dong-Wook Kim, Sung-Jea Ko, Fahad Shahbaz Khan, Magauiya Zhussip, Dong-Pan Lim, Seo-Won Ji, Yang Wang, Muhammad Haris, Aditya Arora, Michael S. Brown, Shakarim Soltanayev, Jiaming Liu, Qin Xu, Abdelrahman Abdelhamed, Shaofan Cai, Kai Zhang, Jechang Jeong, Chi-Hao Wu, Songhyun Yu, Yue Lu, Pengliang Tang, Abdelhamed, A, Timofte, R, Brown, M, Yu, S, Park, B, Jeong, J, Jung, S, Kim, D, Chung, J, Liu, J, Wang, Y, Wu, C, Xu, Q, Wang, C, Cai, S, Ding, Y, Fan, H, Wang, J, Zhang, K, Zuo, W, Zhussip, M, Park, D, Soltanayev, S, Chun, S, Xiong, Z, Chen, C, Haris, M, Akita, K, Yoshida, T, Shakhnarovich, G, Ukita, N, Zamir, S, Arora, A, Khan, S, Khan, F, Shao, L, Ko, S, Lim, D, Kim, S, Ji, S, Lee, S, Tang, W, Fan, Y, Zhou, Y, Liu, D, Huang, T, Meng, D, Zhang, L, Yong, H, Zhao, Y, Tang, P, Lu, Y, Schettini, R, Bianco, S, Zini, S, Li, C, and Cao, Z
Subjects: Noise measurement, Computer science, business.industry, Noise reduction, sRGB, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 0211 other engineering and technologies, INF/01 - INFORMATICA, 02 engineering and technology, Color space, Real image, Image denoising, 0202 electrical engineering, electronic engineering, information engineering, RGB color model, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, Focus (optics), business, 021101 geological & geomatics engineering
Abstract: This paper reviews the NTIRE 2019 challenge on real image denoising with focus on the proposed methods and their results. The challenge has two tracks for quantitatively evaluating image denoising performance in (1) the Bayer-pattern raw-RGB and (2) the standard RGB (sRGB) color spaces. The tracks had 216 and 220 registered participants, respectively. A total of 15 teams, proposing 17 methods, competed in the final phase of the challenge. The proposed methods by the 15 teams represent the current state-of-the-art performance in image denoising targeting real noisy images.
Published: 2019

14. When Image Denoising Meets High-Level Vision Tasks: A Deep Learning Approach

Author: Xianming Liu, Zhangyang Wang, Bihan Wen, Ding Liu, and Thomas S. Huang
Subjects: FOS: Computer and information sciences, Artificial neural network, Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Deep learning, Noise reduction, Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020206 networking & telecommunications, 02 engineering and technology, Machine learning, computer.software_genre, Convolutional neural network, Image (mathematics), Computer Science::Computer Vision and Pattern Recognition, 0202 electrical engineering, electronic engineering, information engineering, Code (cryptography), 020201 artificial intelligence & image processing, Video denoising, Computer vision, Artificial intelligence, business, computer
Abstract: Conventionally, image denoising and high-level vision tasks are handled separately in computer vision. In this paper, we cope with the two jointly and explore the mutual influence between them. First we propose a convolutional neural network for image denoising which achieves the state-of-the-art performance. Second we propose a deep neural network solution that cascades two modules for image denoising and various high-level tasks, respectively, and use the joint loss for updating only the denoising network via back-propagation. We demonstrate that on one hand, the proposed denoiser has the generality to overcome the performance degradation of different high-level vision tasks. On the other hand, with the guidance of high-level vision information, the denoising network can generate more visually appealing results. To the best of our knowledge, this is the first work investigating the benefit of exploiting image semantics simultaneously for image denoising and high-level vision tasks via deep learning. The code is available online https://github.com/Ding-Liu/DeepDenoising., Comment: the 27th International Joint Conference on Artificial Intelligence (2018)
Published: 2018
Full Text: View/download PDF

15. NTIRE 2018 Challenge on Single Image Super-Resolution: Methods and Results

Author: Radu Timofte, Shuhang Gu, Jiqing Wu, Luc Van Gool, Lei Zhang, Ming-Hsuan Yang, Muhammad Haris, Greg Shakhnarovich, Norimichi Ukita, Shijia Hu, Yijie Bei, Zheng Hui, Xiao Jiang, Yanan Gu, Jie Liu, Yifan Wang, Federico Perazzi, Brian McWilliams, Alexander Sorkine-Hornung, Olga Sorkine-Hornung, Christopher Schroers, Jiahui Yu, Yuchen Fan, Jianchao Yang, Ning Xu, Zhaowen Wang, Xinchao Wang, Thomas S. Huang, Xintao Wang, Ke Yu, Tak-Wai Hui, Chao Dong, Liang Lin, Chen Change Loy, Dongwon Park, Kwanyoung Kim, Se Young Chun, Kai Zhang, Pengjv Liu, Wangmeng Zuo, Shi Guo, Jiye Liu, Jinchang Xu, Yijiao Liu, Fengye Xiong, Yuan Dong, Hongliang Bai, Alexandru Damian, Nikhil Ravi, Sachit Menon, Cynthia Rudin, Junghoon Seo, Taegyun Jeon, Jamyoung Koo, Seunghyun Jeon, Soo Ye Kim, Jae-Seok Choi, Sehwan Ki, Soomin Seo, Hyeonjun Sim, Saehun Kim, Munchurl Kim, Rong Chen, Kun Zeng, Jinkang Guo, Yanyun Qu, Cuihua Li, Namhyuk Ahn, Byungkon Kang, Kyung-Ah Sohn, Yuan Yuan, Jiawei Zhang, Jiahao Pang, Xiangyu Xu, Yan Zhao, Wei Deng, Sibt Ul Hussain, Muneeb Aadil, Rafia Rahim, Xiaowang Cai, Fang Huang, Yueshu Xu, Pablo Navarrete Michelini, Dan Zhu, Hanwen Liu, Jun-Hyuk Kim, Jong-Seok Lee, Yiwen Huang, Ming Qiu, Liting Jing, Jiehang Zeng, Ying Wang, Manoj Sharma, Rudrabha Mukhopadhyay, Avinash Upadhyay, Sriharsha Koundinya, Ankit Shukla, Santanu Chaudhury, Zhe Zhang, Yu Hen Hu, and Lingzhi Fu
Subjects: 0209 industrial biotechnology, Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, Tracking (particle physics), Superresolution, Pipeline (software), Image (mathematics), 020901 industrial engineering & automation, 0202 electrical engineering, electronic engineering, information engineering, Bicubic interpolation, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, Single image, Focus (optics), business, Image resolution
Abstract: This paper reviews the 2nd NTIRE challenge on single image super-resolution (restoration of rich details in a low resolution image) with focus on proposed solutions and results. The challenge had 4 tracks. Track 1 employed the standard bicubic downscaling setup, while Tracks 2, 3 and 4 had realistic unknown downgrading operators simulating camera image acquisition pipeline. The operators were learnable through provided pairs of low and high resolution train images. The tracks had 145, 114, 101, and 113 registered participants, resp., and 31 teams competed in the final testing phase. They gauge the state-of-the-art in single image super-resolution.
Published: 2018
Full Text: View/download PDF

16. Survey of Face Detection on Low-Quality Images

Author: Yuqian Zhou, Thomas S. Huang, and Ding Liu
Subjects: FOS: Computer and information sciences, business.industry, Computer science, Computer Vision and Pattern Recognition (cs.CV), media_common.quotation_subject, Detector, Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Contrast (statistics), 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Robust design, Face (geometry), 0202 electrical engineering, electronic engineering, information engineering, Benchmark (computing), 020201 artificial intelligence & image processing, Computer vision, Quality (business), Artificial intelligence, Noise (video), business, Face detection, 0105 earth and related environmental sciences, media_common
Abstract: Face detection is a well-explored problem. Many challenges on face detectors like extreme pose, illumination, low resolution and small scales are studied in the previous work. However, previous proposed models are mostly trained and tested on good-quality images which are not always the case for practical applications like surveillance systems. In this paper, we first review the current state-of-the-art face detectors and their performance on benchmark dataset FDDB, and compare the design protocols of the algorithms. Secondly, we investigate their performance degradation while testing on low-quality images with different levels of blur, noise, and contrast. Our results demonstrate that both hand-crafted and deep-learning based face detectors are not robust enough for low-quality images. It inspires researchers to produce more robust design for face detection in the wild.
Published: 2018
Full Text: View/download PDF

17. YouTube-VOS: Sequence-to-Sequence Video Object Segmentation

Author: Thomas S. Huang, Brian Price, Scott Cohen, Yuchen Fan, Yuchen Liang, Ning Xu, Dingcheng Yue, Jianchao Yang, and Linjie Yang
Subjects: Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Optical flow, 02 engineering and technology, 010501 environmental sciences, Object (computer science), 01 natural sciences, Factor (programming language), Test set, 0202 electrical engineering, electronic engineering, information engineering, Key (cryptography), 020201 artificial intelligence & image processing, Computer vision, Segmentation, Artificial intelligence, Scale (map), business, computer, 0105 earth and related environmental sciences, computer.programming_language
Abstract: Learning long-term spatial-temporal features are critical for many video analysis tasks. However, existing video segmentation methods predominantly rely on static image segmentation techniques, and methods capturing temporal dependency for segmentation have to depend on pretrained optical flow models, leading to suboptimal solutions for the problem. End-to-end sequential learning to explore spatial-temporal features for video segmentation is largely limited by the scale of available video segmentation datasets, i.e., even the largest video segmentation dataset only contains 90 short video clips. To solve this problem, we build a new large-scale video object segmentation dataset called YouTube Video Object Segmentation dataset (YouTube-VOS). Our dataset contains 3,252 YouTube video clips and 78 categories including common objects and human activities (This is the statistics when we submit this paper, see updated statistics on our website). This is by far the largest video object segmentation dataset to our knowledge and we have released it at https://youtube-vos.org. Based on this dataset, we propose a novel sequence-to-sequence network to fully exploit long-term spatial-temporal information in videos for segmentation. We demonstrate that our method is able to achieve the best results on our YouTube-VOS test set and comparable results on DAVIS 2016 compared to the current state-of-the-art methods. Experiments show that the large scale dataset is indeed a key factor to the success of our model.
Published: 2018
Full Text: View/download PDF

18. Connecting Image Denoising and High-Level Vision Tasks via Deep Learning

Author: Bihan Wen, Zhangyang Wang, Xianming Liu, Jianbo Jiao, Ding Liu, and Thomas S. Huang
Subjects: FOS: Computer and information sciences, Artificial neural network, Computer science, business.industry, Deep learning, Noise reduction, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, Computer Graphics and Computer-Aided Design, Convolutional neural network, Convolution, Upsampling, Computer Science::Computer Vision and Pattern Recognition, 0202 electrical engineering, electronic engineering, information engineering, Code (cryptography), 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, Focus (optics), Software
Abstract: Image denoising and high-level vision tasks are usually handled independently in the conventional practice of computer vision, and their connection is fragile. In this paper, we cope with the two jointly and explore the mutual influence between them with the focus on two questions, namely (1) how image denoising can help improving high-level vision tasks, and (2) how the semantic information from high-level vision tasks can be used to guide image denoising. First for image denoising we propose a convolutional neural network in which convolutions are conducted in various spatial resolutions via downsampling and upsampling operations in order to fuse and exploit contextual information on different scales. Second we propose a deep neural network solution that cascades two modules for image denoising and various high-level tasks, respectively, and use the joint loss for updating only the denoising network via back-propagation. We experimentally show that on one hand, the proposed denoiser has the generality to overcome the performance degradation of different high-level vision tasks. On the other hand, with the guidance of high-level vision information, the denoising network produces more visually appealing results. Extensive experiments demonstrate the benefit of exploiting image semantics simultaneously for image denoising and high-level vision tasks via deep learning. The code is available online: https://github.com/Ding-Liu/DeepDenoising, Comment: arXiv admin note: text overlap with arXiv:1706.04284
Published: 2018
Full Text: View/download PDF

19. Subcategory-Aware Object Detection

Author: Thomas S. Huang, Jianchao Yang, Tianjiang Wang, Xiaoyuan Yu, Zhe Lin, and Jiangping Wang
Subjects: Subcategory, Training set, Computer science, business.industry, Applied Mathematics, Feature extraction, ComputingMilieux_PERSONALCOMPUTING, Cognitive neuroscience of visual object recognition, Pattern recognition, Object detection, Spectral clustering, Object-class detection, Feature (computer vision), Mathematics::Category Theory, Signal Processing, Viola–Jones object detection framework, Computer vision, Artificial intelligence, Electrical and Electronic Engineering, Cluster analysis, business
Abstract: In this letter, we introduce a subcategory-aware object detection framework to detect generic object classes with high intra-class variace. Motivated by the observation that the object appearance demonstrates some clustering property, we split the training data into subcategories and train a detector for each subcategory. Since the proposed ensemble of detectors relies heavily on subcategory clustering, we propose an effective subcategories generation method that is tuned for the detection task. More specifically, we first initialize subcategories by constrained spectral clustering based on mid-level image features used in object recognition. Then we jointly learn the ensemble detectors and the latent subcategories in an alternative manner. Our performance on the PASCAL VOC 2007 detection challenges and INRIA Person dataset is comparable with state-of-the-art, even with much less computational cost.
Published: 2015
Full Text: View/download PDF

20. Key Point Detection by Max Pooling for Tracking

Author: Tianjiang Wang, Thomas S. Huang, Jianchao Yang, and Xiaoyuan Yu
Subjects: Computer science, business.industry, Feature extraction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Tracking system, Pattern recognition, Computer Science Applications, Visualization, Human-Computer Interaction, Control and Systems Engineering, Robustness (computer science), Video tracking, Computer vision, Artificial intelligence, Electrical and Electronic Engineering, business, Feature learning, Software, Linear filter, Information Systems
Abstract: Inspired by the recent image feature learning work, we propose a novel key point detection approach for object tracking. Our approach can select mid-level interest key points by max pooling over the local descriptor responses from a set of filters. Linear filters are first learned from targets in first frames. Then max pooling is performed over data driven spatial supporting field to detect discriminant key points, and thus the detected key points bear higher level semantic meanings, which we apply in tracking by structured key point matching. We show that our tracking system is robust to occlusions and cluttered background. Testing on several challenging tracking sequences, we demonstrate that our proposed tracking system can achieve competitive or better performances than the state-of-the-art trackers.
Published: 2015
Full Text: View/download PDF

21. Image Super-Resolution: Historical Overview and Future Challenges

Author: Thomas S. Huang and Jianchao Yang
Subjects: Computer science, business.industry, Computer vision, Artificial intelligence, business, Superresolution, Image (mathematics)
Published: 2017
Full Text: View/download PDF

22. NTIRE 2017 Challenge on Single Image Super-Resolution: Methods and Results

Author: Ch V. Sai Praveen, Dacheng Tao, Deqing Sun, Jae-Seok Choi, Giang Bui, Luc Van Gool, Lei Zhang, Qi Guo, Yunjin Chen, Karen Egiazarian, Xintao Wang, Ke Yu, Jiahui Yu, Bee Oh Lim, Hojjat Seyed Mousavi, Seungjun Nah, Yulun Zhang, Ming-Hsuan Yang, Yapeng Tian, Tiep H. Vu, Zhimin Tang, Heewon Kim, Cristóvão Cruz, Vishal Monga, Yuchen Fan, Chen Change Loy, Rakesh Mehta, Jinshan Pan, Yoseob Han, Ye Duan, Truc Le, Yu Qiao, Ruxin Wang, Xiangyu Xu, Xuan-Phung Huynh, Chao Dong, Xu Jinchang, Jaejun Yoo, Thomas S. Huang, Radu Timofte, Wei Han, Xueying Qin, Zhiqiang Xia, Shaohui Li, Xu Lin, Haichao Yu, Yujin Zhang, Vladimir Katkovnik, Honghui Shi, Yu Zhao, Woong Bae, Zhengtao Wang, Abhinav Agarwalla, Arnav Kumar Jain, Ding Liu, Liang Lin, Xibin Song, Che Zhu, Wangmeng Zuo, Wen Heng, Xinchao Wang, Shixiang Wu, Zhangyang Wang, Sanghyun Son, Hongdiao Wen, Jianxin Pang, Kyoung Mu Lee, Linkai Luo, Eirikur Agustsson, Ruofan Zhou, Yuchao Dai, Min Fu, Tiantong Guo, Munchurl Kim, Jong Chul Ye, Lei Cao, and Kai Zhang
Subjects: Standard test image, Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020206 networking & telecommunications, Image processing, 02 engineering and technology, Superresolution, Kernel (image processing), 0202 electrical engineering, electronic engineering, information engineering, Bicubic interpolation, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, Image resolution, Image restoration, Sub-pixel resolution, Feature detection (computer vision)
Abstract: This paper reviews the first challenge on single image super-resolution (restoration of rich details in an low resolution image) with focus on proposed solutions and results. A new DIVerse 2K resolution image dataset (DIV2K) was employed. The challenge had 6 competitions divided into 2 tracks with 3 magnification factors each. Track 1 employed the standard bicubic downscaling setup, while Track 2 had unknown downscaling operators (blur kernel and decimation) but learnable through low and high res train images. Each competition had ∽100 registered participants and 20 teams competed in the final testing phase. They gauge the state-of-the-art in single image super-resolution.
Published: 2017
Full Text: View/download PDF

23. Deep Image Matting

Author: Thomas S. Huang, Scott Cohen, Brian Price, and Ning Xu
Subjects: FOS: Computer and information sciences, Artificial neural network, Computer science, business.industry, Deep learning, Computer Vision and Pattern Recognition (cs.CV), Feature extraction, Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020207 software engineering, Pattern recognition, Context (language use), 02 engineering and technology, Image segmentation, Real image, Image (mathematics), Image texture, Computer Science::Computer Vision and Pattern Recognition, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business
Abstract: Image matting is a fundamental computer vision problem and has many applications. Previous algorithms have poor performance when an image has similar foreground and background colors or complicated textures. The main reasons are prior methods 1) only use low-level features and 2) lack high-level context. In this paper, we propose a novel deep learning based algorithm that can tackle both these problems. Our deep model has two parts. The first part is a deep convolutional encoder-decoder network that takes an image and the corresponding trimap as inputs and predict the alpha matte of the image. The second part is a small convolutional network that refines the alpha matte predictions of the first network to have more accurate alpha values and sharper edges. In addition, we also create a large-scale image matting dataset including 49300 training images and 1000 testing images. We evaluate our algorithm on the image matting benchmark, our testing set, and a wide variety of real images. Experimental results clearly demonstrate the superiority of our algorithm over previous methods., Comment: Computer Vision and Pattern Recognition 2017
Published: 2017
Full Text: View/download PDF

24. Spatial–Spectral Classification of Hyperspectral Images Using Discriminative Dictionary Designed by Learning Vector Quantization

Author: Nasser M. Nasrabadi, Zhaowen Wang, and Thomas S. Huang
Subjects: Learning vector quantization, Training set, K-SVD, Contextual image classification, Pixel, Computer science, business.industry, Hyperspectral imaging, Pattern recognition, Sparse approximation, Discriminative model, Computer Science::Computer Vision and Pattern Recognition, Hinge loss, General Earth and Planetary Sciences, Computer vision, Artificial intelligence, Electrical and Electronic Engineering, business
Abstract: In this paper, a novel discriminative dictionary learning method is proposed for sparse-representation-based classification (SRC) to label highly dimensional hyperspectral imagery (HSI). In SRC, a dictionary is conventionally constructed using all of the training pixels, which is not only inefficient due to the large size of typical HSI images but also ineffective in capturing class-discriminative information crucial for classification. We address the dictionary design problem with the inspiration from the learning vector quantization technique and propose a hinge loss function that is directly related to the classification task as the objective function for dictionary learning. The resulting online learning procedure systematically “pulls” and “pushes” dictionary atoms so that they become better adapted to distinguish between different classes. In addition, the spatial context for a test pixel within its local neighborhood is modeled using a Bayesian graph model and is incorporated with the sparse representation of a single test pixel in a unified probabilistic framework, which enables further refinement of our dictionary to capture the spatial class dependence that complements the spectral information. Experiments on different HSI images demonstrate that the dictionaries optimized using our method can achieve higher classification accuracy with substantially reduced dictionary size than using the whole training set. The proposed method also outperforms existing dictionary learning methods and attains the state-of-the-art results in both the spectral-only and spatial-spectral settings.
Published: 2014
Full Text: View/download PDF

25. Sparse Coding And Its Applications In Computer Vision

Author: Zhaowen Wang, Jianchao Yang, Haichao Zhang, Zhangyang Wang, Thomas S Huang, Ding Liu, Yingzhen Yang, Zhaowen Wang, Jianchao Yang, Haichao Zhang, Zhangyang Wang, Thomas S Huang, Ding Liu, and Yingzhen Yang
Subjects: Computer graphics, Computer vision, Image processing--Digital techniques
Abstract: This book provides a broader introduction to the theories and applications of sparse coding techniques in computer vision research. It introduces sparse coding in the context of representation learning, illustrates the fundamental concepts, and summarizes the most active research directions. A variety of applications of sparse coding are discussed, ranging from low-level image processing tasks such as super-resolution and de-blurring to high-level semantic understanding tasks such as image recognition, clustering and fusion.The book is suitable to be used as an introductory overview to this field, with its theoretical part being both easy and precious enough for quick understanding. It is also of great value to experienced researchers as it offers new perspective to the underlying mechanism of sparse coding, and points out potential future directions for different applications.
Published: 2016

26. Saliency-maximized audio visualization and efficient audio-visual browsing for faster-than-real-time human acoustic event detection

Author: Mark Hasegawa-Johnson, Xiaodan Zhuang, Camille Goudeseune, Thomas S. Huang, Sarah King, and Kai-Hsiang Lin
Subjects: Audio signal, General Computer Science, Computer science, business.industry, Event (computing), Speech recognition, Experimental and Cognitive Psychology, Context (language use), Mutual information, Theoretical Computer Science, Visualization, Salient, Spectrogram, Computer vision, Artificial intelligence, Zoom, business
Abstract: Browsing large audio archives is challenging because of the limitations of human audition and attention. However, this task becomes easier with a suitable visualization of the audio signal, such as a spectrogram transformed to make unusual audio events salient. This transformation maximizes the mutual information between an isolated event's spectrogram and an estimate of how salient the event appears in its surrounding context. When such spectrograms are computed and displayed with fluid zooming over many temporal orders of magnitude, sparse events in long audio recordings can be detected more quickly and more easily. In particular, in a 1/10-real-time acoustic event detection task, subjects who were shown saliency-maximized rather than conventional spectrograms performed significantly better. Saliency maximization also improves the mutual information between the ground truth of nonbackground sounds and visual saliency, more than other common enhancements to visualization.
Published: 2013
Full Text: View/download PDF

27. Image and Video Restorations via Nonlocal Kernel Regression

Author: Jianchao Yang, Haichao Zhang, Yanning Zhang, and Thomas S. Huang
Subjects: Video post-processing, Video Recording, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Image processing, Pattern Recognition, Automated, Image texture, Artificial Intelligence, Image Interpretation, Computer-Assisted, Computer Science::Multimedia, Photography, Computer vision, Electrical and Electronic Engineering, Image restoration, Mathematics, Feature detection (computer vision), business.industry, Binary image, Pattern recognition, Non-local means, Computer Science Applications, Human-Computer Interaction, Control and Systems Engineering, Data Interpretation, Statistical, Subtraction Technique, Computer Science::Computer Vision and Pattern Recognition, Regression Analysis, Video denoising, Artificial intelligence, business, Algorithms, Software, Information Systems
Abstract: A nonlocal kernel regression (NL-KR) model is presented in this paper for various image and video restoration tasks. The proposed method exploits both the nonlocal self-similarity and local structural regularity properties in natural images. The nonlocal self-similarity is based on the observation that image patches tend to repeat themselves in natural images and videos, and the local structural regularity observes that image patches have regular structures where accurate estimation of pixel values via regression is possible. By unifying both properties explicitly, the proposed NL-KR framework is more robust in image estimation, and the algorithm is applicable to various image and video restoration tasks. In this paper, we apply the proposed model to image and video denoising, deblurring, and superresolution reconstruction. Extensive experimental results on both single images and realistic video sequences demonstrate that the proposed framework performs favorably with previous works both qualitatively and quantitatively.
Published: 2013
Full Text: View/download PDF

28. Pose-robust face recognition via sparse representation

Author: Yanning Zhang, Haichao Zhang, and Thomas S. Huang
Subjects: Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Pattern recognition, Sparse approximation, Facial recognition system, Artificial Intelligence, Signal Processing, Three-dimensional face recognition, Computer vision, Computer Vision and Pattern Recognition, Artificial intelligence, Invariant (mathematics), Face detection, business, Software
Abstract: We propose a pose-robust face recognition method to handle the challenging task of face recognition in the presence of large pose difference between gallery and probe faces. The proposed method exploits the sparse property of the representation coefficients of a face image over its corresponding view-dictionary. By assuming the representation coefficients are invariant to pose, we can synthesize for the probe image a novel face image which has smaller pose difference with the gallery faces. Furthermore, face recognition in the presence of pose variations is achieved based on the synthesized face image again via sparse representation. Extensive experiments on CMU Multi-PIE face database are conducted to verify the efficacy of the proposed method.
Published: 2013
Full Text: View/download PDF

29. D3: Deep Dual-Domain Based Fast Restoration of JPEG-Compressed Images

Author: Yingzhen Yang, Shiyu Chang, Zhangyang Wang, Ding Liu, Thomas S. Huang, and Qing Ling
Subjects: Scheme (programming language), Computer science, business.industry, 020206 networking & telecommunications, 02 engineering and technology, computer.file_format, JPEG, Encoding (memory), 0202 electrical engineering, electronic engineering, information engineering, Discrete cosine transform, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, computer, Image restoration, Transform coding, computer.programming_language
Abstract: In this paper, we design a Deep Dual-Domain (D3) based fast restoration model to remove artifacts of JPEG compressed images. It leverages the large learning capacity of deep networks, as well as the problem-specific expertise that was hardly incorporated in the past design of deep architectures. For the latter, we take into consideration both the prior knowledge of the JPEG compression scheme, and the successful practice of the sparsity-based dual-domain approach. We further design the One-Step Sparse Inference (1-SI) module, as an efficient and lightweighted feed-forward approximation of sparse coding. Extensive experiments verify the superiority of the proposed D3 model over several state-of-the-art methods. Specifically, our best model is capable of outperforming the latest deep model for around 1 dB in PSNR, and is 30 times faster.
Published: 2016
Full Text: View/download PDF

30. 3D Face Modeling

Author: Pooya Khorrami, Vuong Le, Usman Tariq, Thomas S. Huang, and Hao Tang
Subjects: business.industry, Computer science, Face (sociological concept), Computer vision, Artificial intelligence, business
Published: 2016
Full Text: View/download PDF

31. Eye Gaze Estimation

Author: Pooya Khorrami, Hao Tang, Usman Tariq, Vuong Le, and Thomas S. Huang
Subjects: Estimation, business.industry, Computer science, Eye tracking, Computer vision, Artificial intelligence, business
Published: 2016
Full Text: View/download PDF

32. Model Based Video Encoding

Author: Thomas S. Huang, Usman Tariq, Vuong Le, Pooya Khorrami, and Hao Tang
Subjects: business.industry, Computer science, Video encoding, Computer vision, Artificial intelligence, computer.file_format, Multiview Video Coding, business, Smacker video, computer
Published: 2016
Full Text: View/download PDF

33. Photo Stream Alignment and Summarization for Collaborative Photo Collection and Sharing

Author: Jie Yu, Thomas S. Huang, Jiebo Luo, and Jianchao Yang
Subjects: Matching (graph theory), Event (computing), Digital photo frame, business.industry, Computer science, Feature extraction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Automatic summarization, Computer Science Applications, Camera phone, Signal Processing, Media Technology, Collaborative filtering, Graph (abstract data type), Computer vision, Artificial intelligence, Electrical and Electronic Engineering, business, Image retrieval
Abstract: With the popularity of digital cameras and camera phones, it is common for different people, who may or may not know each other, to attend the same event and take pictures and videos from different spatial or personal perspectives. Within the realm of social media, it is desirable to enable these people to select and share their pictures and videos in order to enrich memories and facilitate social networking. However, it is cumbersome to manually manage these photos from different cameras, of which the clocks settings are often not calibrated. In this paper, we propose automatic algorithms to address the above problems. First, we accurately align different photo streams or sequences from different photographers for the same event in chronological order on a common timeline, while respecting the time constraints within each photo stream. Given the preferred similarity measures (e.g., visual, and spatial similarities), our algorithm performs photo stream alignment via matching on a bipartite kernel sparse representation graph that forces the data connections to be sparse in an explicit fashion. Furthermore, we can produce a summary master stream from the aligned super stream of photos for efficient sharing by removing those redundant photos in the super stream while accounting for the temporal integrity. Based on a similar kernel sparse representation graph, our master stream summarization algorithm performs greedy backward selection to drop redundant photos without affecting the integrity of remaining photos for the entire event. We evaluate our algorithms on real-world personal online albums for 36 events and demonstrate its efficacy in automatically facilitating collaborative photo collection and sharing.
Published: 2012
Full Text: View/download PDF

34. Multi-View Automatic Target Recognition using Joint Sparse Representation

Author: Y. Zhang, Nasser M. Nasrabadi, Thomas S. Huang, and Haichao Zhang
Subjects: Synthetic aperture radar, business.industry, Aerospace Engineering, Pattern recognition, Sparse approximation, Target acquisition, Support vector machine, Azimuth, Automatic target recognition, Radar imaging, Computer vision, Artificial intelligence, Electrical and Electronic Engineering, business, Classifier (UML), Mathematics
Abstract: We introduce a novel joint sparse representation based multi-view automatic target recognition (ATR) method, which can not only handle multi-view ATR without knowing the pose but also has the advantage of exploiting the correlations among the multiple views of the same physical target for a single joint recognition decision. Extensive experiments have been carried out on moving and stationary target acquisition and recognition (MSTAR) public database to evaluate the proposed method compared with several state-of-the-art methods such as linear support vector machine (SVM), kernel SVM, as well as a sparse representation based classifier (SRC). Experimental results demonstrate that the proposed joint sparse representation ATR method is very effective and performs robustly under variations such as multiple joint views, depression, azimuth angles, target articulations, as well as configurations.
Published: 2012
Full Text: View/download PDF

35. Joint dynamic sparse representation for multi-view face recognition

Author: Nasser M. Nasrabadi, Thomas S. Huang, Haichao Zhang, and Yanning Zhang
Subjects: Computer science, business.industry, Representation (systemics), Pattern recognition, Sparse approximation, Facial recognition system, Class (biology), Task (project management), Artificial Intelligence, Face (geometry), Signal Processing, Computer vision, Computer Vision and Pattern Recognition, Artificial intelligence, Joint (audio engineering), business, Software
Abstract: We consider the problem of automatically recognizing a human face from its multi-view images with unconstrained poses. We formulate the multi-view face recognition task as a joint sparse representation model and take advantage of the correlations among the multiple views for face recognition using a novel joint dynamic sparsity prior. The proposed joint dynamic sparsity prior promotes shared joint sparsity patterns among the multiple sparse representation vectors at class-level, while allowing distinct sparsity patterns at atom-level within each class to facilitate a flexible representation. Extensive experiments on the CMU Multi-PIE face database are conducted to verify the efficacy of the proposed method.
Published: 2012
Full Text: View/download PDF

36. Text From Corners: A Novel Approach to Detect Text and Caption in Videos

Author: Yuncai Liu, Xu Zhao, Yuxiao Hu, Yun Fu, Thomas S. Huang, and Kai-Hsiang Lin
Subjects: Computer science, business.industry, Search engine indexing, Feature extraction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Optical flow, Corner detection, Computer Graphics and Computer-Aided Design, Discriminative model, Motion estimation, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Computer vision, Artificial intelligence, business, Software
Abstract: Detecting text and caption from videos is important and in great demand for video retrieval, annotation, indexing, and content analysis. In this paper, we present a corner based approach to detect text and caption from videos. This approach is inspired by the observation that there exist dense and orderly presences of corner points in characters, especially in text and caption. We use several discriminative features to describe the text regions formed by the corner points. The usage of these features is in a flexible manner, thus, can be adapted to different applications. Language independence is an important advantage of the proposed method. Moreover, based upon the text features, we further develop a novel algorithm to detect moving captions in videos. In the algorithm, the motion features, extracted by optical flow, are combined with text features to detect the moving caption patterns. The decision tree is adopted to learn the classification criteria. Experiments conducted on a large volume of real video shots demonstrate the efficiency and robustness of our proposed approaches and the real-world system. Our text and caption detection system was recently highlighted in a worldwide multimedia retrieval competition, Star Challenge, by achieving the superior performance with the top ranking.
Published: 2011
Full Text: View/download PDF

37. Age Synthesis and Estimation via Faces: A Survey

Author: Guodong Guo, Thomas S. Huang, and Yun Fu
Subjects: Aging, Biometrics, Computer science, Machine vision, Image processing, Models, Biological, Facial recognition system, Pattern Recognition, Automated, Computer graphics, Artificial Intelligence, Image Interpretation, Computer-Assisted, Computer Graphics, Humans, Computer Simulation, Computer vision, Computer facial animation, Anthropometry, business.industry, Applied Mathematics, Age progression, Data science, Computational Theory and Mathematics, Face, Pattern recognition (psychology), Regression Analysis, Computer Vision and Pattern Recognition, Artificial intelligence, business, Algorithms, Software
Abstract: Human age, as an important personal trait, can be directly inferred by distinct patterns emerging from the facial appearance. Derived from rapid advances in computer graphics and machine vision, computer-based age synthesis and estimation via faces have become particularly prevalent topics recently because of their explosively emerging real-world applications, such as forensic art, electronic customer relationship management, security control and surveillance monitoring, biometrics, entertainment, and cosmetology. Age synthesis is defined to rerender a face image aesthetically with natural aging and rejuvenating effects on the individual face. Age estimation is defined to label a face image automatically with the exact age (year) or the age group (year range) of the individual face. Because of their particularity and complexity, both problems are attractive yet challenging to computer-based application system designers. Large efforts from both academia and industry have been devoted in the last a few decades. In this paper, we survey the complete state-of-the-art techniques in the face image-based age synthesis and estimation topics. Existing models, popular algorithms, system performances, technical difficulties, popular face aging databases, evaluation protocols, and promising future directions are also provided with systematic discussions.
Published: 2010
Full Text: View/download PDF

38. Human Pose Regression Through Multiview Visual Fusion

Author: Yun Fu, Xu Zhao, Huazhong Ning, Yuncai Liu, and Thomas S. Huang
Subjects: Image fusion, business.industry, Computer science, Feature extraction, Scale-invariant feature transform, Pattern recognition, Image processing, 3D pose estimation, Sensor fusion, Linear discriminant analysis, Articulated body pose estimation, Discriminative model, Salient, Media Technology, Computer vision, Artificial intelligence, Electrical and Electronic Engineering, business, Pose
Abstract: We consider the problem of estimating 3-D human body pose from visual signals within a discriminative framework. It is challenging because there is a wide gap between complex 3-D human motion and planar visual observation, which makes this a severely ill-conditioned problem. In this paper, we focus on three critical factors to tackle human body pose estimation, namely, feature extraction, learning algorithm, and camera utilization. On the feature level, we describe images using the salient interest points represented by scale-invariant feature transform (SIFT)-like descriptors, in which the position, appearance, and local structural information are encoded simultaneously. On the learning algorithm level, we propose to use Gaussian processes and multiple linear (ML) regression to model the mapping between poses and features. Fusing image information from multiple cameras in different views is of great interest to us on the camera level. We make a comprehensive evaluation on the HumanEva database and get two meaningful insights into the three crucial aspects for human pose estimation: 1) although the choice of feature is very important to the problem, once the learning algorithm becomes efficient, the choice of feature is no longer critical, and 2) the impact of information combination from multiple cameras on pose estimation is closely related to not only the quantity of image information, but also its quality. In most cases, it is true that the more information is involved, the better results can be achieved. But when the information quantity is the same, the differences in quality will lead to totally different performance. Furthermore, dense evaluations demonstrate that our approach is an accurate and robust solution to the human body pose estimation problem.
Published: 2010
Full Text: View/download PDF

39. Sparse Representation for Computer Vision and Pattern Recognition

Author: John Wright, Guillermo Sapiro, Shuicheng Yan, Yi Ma, Julien Mairal, and Thomas S. Huang
Subjects: Signal processing, Computer science, business.industry, Information processing, Sparse approximation, Application software, computer.software_genre, Facial recognition system, Information extraction, Compressed sensing, Computer vision, Algorithm design, Artificial intelligence, Electrical and Electronic Engineering, business, computer
Abstract: Techniques from sparse signal representation are beginning to see significant impact in computer vision, often on nontraditional applications where the goal is not just to obtain a compact high-fidelity representation of the observed signal, but also to extract semantic information. The choice of dictionary plays a key role in bridging this gap: unconventional dictionaries consisting of, or learned from, the training samples themselves provide the key to obtaining state-of-the-art results and to attaching semantic meaning to sparse signal representations. Understanding the good performance of such unconventional dictionaries in turn demands new algorithmic and analytical techniques. This review paper highlights a few representative examples of how the interaction between sparse signal representation and computer vision can enrich both fields, and raises a number of open questions for further study.
Published: 2010
Full Text: View/download PDF

40. Hierarchical Space-Time Model Enabling Efficient Search for Human Actions

Author: Thomas S. Huang, Tony X. Han, Ming Liu, Dirk B. Walther, and Huazhong Ning
Subjects: Matching (graph theory), Hierarchy (mathematics), business.industry, Cognitive neuroscience of visual object recognition, Pattern recognition, Gabor filter, Robustness (computer science), Histogram, Media Technology, Feature (machine learning), Hierarchical control system, Computer vision, Artificial intelligence, Electrical and Electronic Engineering, business, Mathematics
Abstract: We propose a five-layer hierarchical space-time model (HSTM) for representing and searching human actions in videos. From a features point of view, both invariance and selectivity are desirable characteristics, which seem to contradict each other. To make these characteristics coexist, we introduce a coarse-to-fine search and verification scheme for action searching, based on the HSTM model. Because going through layers of the hierarchy corresponds to progressively turning the knob between invariance and selectivity, this strategy enables search for human actions ranging from rapid movements of sports to subtle motions of facial expressions. The introduction of the Histogram of Gabor Orientations feature makes the searching for actions go smoothly across the hierarchical layers of the HSTM model. The efficient matching is achieved by applying integral histograms to compute the features in the top two layers. The HSTM model was tested on three selected challenging video sequences and on the KTH human action database. And it achieved improvement over other state-of-the-art algorithms. These promising results validate that the HSTM model is both selective and robust for searching human actions.
Published: 2009
Full Text: View/download PDF

41. Locating Nose-Tips and Estimating Head Poses in Images by Tensorposes

Author: Jilin Tu, Yun Fu, and Thomas S. Huang
Subjects: business.industry, Computer science, Color image, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Pattern recognition, Image processing, Image segmentation, 3D pose estimation, Facial recognition system, Principal component analysis, Media Technology, Segmentation, Computer vision, Artificial intelligence, Electrical and Electronic Engineering, business, Face detection, Pose
Abstract: This paper introduces a head pose estimation system that automatically localizes the nose-tips of the faces and estimates head poses in images simultaneously. In the training stage, the nose-tips of the faces are first manually labeled. The appearance variations caused by head pose changes are then characterized by a tensorposes model. Given an image with unknown head pose and nose-tip location, the nose-tip of the face is automatically localized in a coarse-to-fine fashion after the skin color segmentation. The head pose is also estimated simultaneously. The performance of our system is evaluated on the Pointing'04 head pose image data set. We first evaluate the classification performance of the tensorposes models with image patches of the faces cropped according to the manually labeled nose-tip locations of the faces in the Pointing '04 data set. By leaving-one-person-out evaluation strategy, we obtain the optimal parameters of the Tensorposes model, and evaluate the discriminative power of the tensorposes model built based on high order singular value decomposition (HOSVD) and multilinear independent component analysis (MICA), and naive principal component analysis (PCA) subspace models. It is shown Tensorposes model by HOSVD and MICA decomposition performs similarly good but much better than naive PCA subspace models. The tensorposes model is then utilized to automatically localize nose-tip location in the testing image and to simultaneously estimate the head pose. The nose-tip localization and pose estimation accuracy of the proposed system are evaluated against the ground truth. Finally, cross-database evaluation of the performance of our system is carried out on Pointing'04 database, a selected subset of CMU PIE database, and some pictures from CLEAR'07 head pose evaluation database. The experiments show that our system generalizes reasonably well to the real-world scenarios.
Published: 2009
Full Text: View/download PDF

42. Online updating appearance generative mixture model for meanshift tracking

Author: Hai Tao, Thomas S. Huang, and Jilin Tu
Subjects: Computer science, business.industry, Histogram matching, Pattern recognition, Tracking (particle physics), Mixture model, Computer Science Applications, Generative model, Hardware and Architecture, Histogram, Expectation–maximization algorithm, Eye tracking, Computer vision, Computer Vision and Pattern Recognition, Artificial intelligence, Graphical model, business, Software
Abstract: This paper proposes an appearance generative mixture model based on key frames for meanshift tracking. Meanshift tracking algorithm tracks an object by maximizing the similarity between the histogram in tracking window and a static histogram acquired at the beginning of tracking. The tracking therefore could fail if the appearance of the object varies substantially. In this paper, we assume the key appearances of the object can be acquired before tracking and the manifold of the object appearance can be approximated by piece-wise linear combination of these key appearances in histogram space. The generative process is described by a Bayesian graphical model. An Online EM algorithm is proposed to estimate the model parameters from the observed histogram in the tracking window and to update the appearance histogram. We applied this approach to track human head motion and to infer the head pose simultaneously in videos. Experiments verify that our online histogram generative model constrained by key appearance histograms alleviates the drifting problem often encountered in tracking with online updating, that the enhanced meanshift algorithm is capable of tracking object of varying appearances more robustly and accurately, and that our tracking algorithm can infer additional information such as the object poses.
Published: 2008
Full Text: View/download PDF

43. Face as mouse through visual face tracking

Author: Thomas S. Huang, Hai Tao, and Jilin Tu
Subjects: Orientation (computer vision), Facial motion capture, Computer science, business.industry, Interface (computing), ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Input device, Image processing, Tracking (particle physics), Facial recognition system, Gesture recognition, Motion estimation, Computer graphics (images), Face (geometry), Signal Processing, Personal computer, Computer vision, Computer Vision and Pattern Recognition, Artificial intelligence, Face detection, Closing (morphology), business, Rotation (mathematics), Software, ComputingMethodologies_COMPUTERGRAPHICS
Abstract: This paper introduces a novel camera mouse driven by visual face tracking based on a 3D model. As the camera becomes standard configuration for personal computers (PCs) and computation speed increases, achieving human-machine interaction through visual face tracking becomes a feasible solution to hands-free control. Human facial movements can be broken down into rigid motions, such as rotation and translation, and non-rigid motions such as opening, closing, and stretching of the mouth. First, we describe our face tracking system which can robustly and accurately retrieve these motion parameters from videos in real time [H. Tao, T. Huang, Explanation-based facial motion tracking using a piecewise Bezier volume deformation model, in: Proceedings of IEEE Computer Vision and Pattern Recogintion, vol. 1, 1999, pp. 611-617]. The retrieved (rigid) motion parameters can be employed to navigate the mouse cursor; the detection of mouth (non-rigid) motions triggers mouse events in the operating system. Three mouse control modes are investigated and their usability is compared. Experiments in the Windows XP environment verify the convenience of our camera mouse in hands-free control. This technology can be an alternative input option for people with hand and speech disability, as well as for futuristic vision-based games and interfaces.
Published: 2007
Full Text: View/download PDF

44. Joint face and head tracking inside multi-camera smart rooms

Author: Gerasimos Potamianos, Thomas S. Huang, Andrew W. Senior, and Zhenqiu Zhang
Subjects: Computer science, business.industry, Initialization, Centroid, Tracking system, Active appearance model, Tracking error, Robustness (computer science), Signal Processing, Computer vision, AdaBoost, Artificial intelligence, Electrical and Electronic Engineering, business, Face detection
Abstract: The paper introduces a novel detection and tracking system that provides both frame-view and world-coordinate human location information, based on video from multiple synchronized and calibrated cameras with overlapping fields of view. The system is developed and evaluated for the specific scenario of a seminar lecturer presenting in front of an audience inside a “smart room”, its aim being to track the lecturer’s head centroid in the three-dimensional (3D) space and also yield two-dimensional (2D) face information in the available camera views. The proposed approach is primarily based on a statistical appearance model of human faces by means of well-known AdaBoost-like face detectors, extended to address the head pose variation observed in the smart room scenario of interest. The appearance module is complemented by two novel components and assisted by a simple tracking drift detection mechanism. The first component of interest is the initialization module, which employs a spatio-temporal dynamic programming approach with appropriate penalty functions to obtain optimal 3D location hypotheses. The second is an adaptive subspace learning based 2D tracking scheme with a novel forgetting mechanism, introduced to reduce tracking drift and increase robustness. System performance is benchmarked on an extensive database of realistic human interaction in the lecture smart room scenario, collected as part of the European integrated project “CHIL”. The system consistently achieves excellent tracking precision, with a 3D mean tracking error of less than 16 cm, and is demonstrated to outperform four alternative tracking schemes. Furthermore, the proposed system performs relatively well in detecting frontal and near-frontal faces in the available frame views.
Published: 2007
Full Text: View/download PDF

45. 3D human model and joint parameter estimation from monocular image

Author: Minglei Tong, Yuncai Liu, and Thomas S. Huang
Subjects: Surface (mathematics), Polynomial, Monocular, business.industry, Computer science, Image processing, Kinematics, Convolution, Silhouette, Artificial Intelligence, Computer Science::Computer Vision and Pattern Recognition, Motion estimation, Signal Processing, Computer vision, Computer Vision and Pattern Recognition, Artificial intelligence, business, Software
Abstract: In this paper we present a novel class of human model described by convolution surface attached to articulated kinematics skeletons. The human pose can be estimated from silhouette in monocular images. The contribution of this paper consists of three points: First, human model of convolution surface is presented and its shape is deformable when changing polynomial parameters and radius parameters. Second, convolution surface and curve correspondence theorem is presented to give a map between 3D pose and 2D contour. Third, we model the human silhouette with convolution curve in order to estimate joint parameters from monocular images and we also give an effective constraint function. Evaluation of this approach is performed on some video frames about a walking man. The experiment result shows that our method works well without self-occlusion.
Published: 2007
Full Text: View/download PDF

46. BodyPrint: Pose Invariant 3D Shape Matching of Human Bodies

Author: Terrence Chen, Jiangping Wang, Vivek Kumar Singh, Thomas S. Huang, and Kai Ma
Subjects: business.industry, Computer science, Pattern recognition, Computer vision, Shape matching, Artificial intelligence, Invariant (mathematics), business, Subspace topology
Abstract: 3D human body shape matching has large potential on many real world applications, especially with the recent advances in the 3D range sensing technology. We address this problem by proposing a novel holistic human body shape descriptor called BodyPrint. To compute the bodyprint for a given body scan, we fit a deformable human body mesh and project the mesh parameters to a low-dimensional subspace which improves discriminability across different persons. Experiments are carried out on three real-world human body datasets to demonstrate that BodyPrint is robust to pose variation as well as missing information and sensor noise. It improves the matching accuracy significantly compared to conventional 3D shape matching techniques using local features. To facilitate practical applications where the shape database may grow over time, we also extend our learning framework to handle online updates.
Published: 2015
Full Text: View/download PDF

47. Hyper-Spectral Image Modeling

Author: Haichao Zhang, Zhaowen Wang, Thomas S. Huang, Ding Liu, Yingzhen Yang, Zhangyang Wang, and Jianchao Yang
Subjects: Image texture, Computer science, business.industry, Spectral image, Computer vision, Artificial intelligence, Geometric modeling, business, Image-based modeling and rendering
Published: 2015
Full Text: View/download PDF

48. Image Super-Resolution

Author: Thomas S. Huang, Zhangyang Wang, Haichao Zhang, Yingzhen Yang, Zhaowen Wang, Jianchao Yang, and Ding Liu
Subjects: Computer science, business.industry, Computer vision, Artificial intelligence, business, Superresolution, Image (mathematics)
Published: 2015
Full Text: View/download PDF

49. Self-Tuned Deep Super Resolution

Author: Jianchao Yang, Wei Han, Thomas S. Huang, Zhaowen Wang, Yingzhen Yang, Zhangyang Wang, and Shiyu Chang
Subjects: FOS: Computer and information sciences, business.industry, Computer science, Deep learning, Noise reduction, Reliability (computer networking), Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Superresolution, Autoencoder, Machine Learning (cs.LG), Computer Science - Learning, Range (mathematics), Convolutional code, Computer vision, Artificial intelligence, business, Joint (audio engineering), Algorithm, Image resolution
Abstract: Deep learning has been successfully applied to image super resolution (SR). In this paper, we propose a deep joint super resolution (DJSR) model to exploit both external and self similarities for SR. A Stacked Denoising Convolutional Auto Encoder (SDCAE) is first pre-trained on external examples with proper data augmentations. It is then fine-tuned with multi-scale self examples from each input, where the reliability of self examples is explicitly taken into account. We also enhance the model performance by sub-model training and selection. The DJSR model is extensively evaluated and compared with state-of-the-arts, and show noticeable performance improvements both quantitatively and perceptually on a wide range of images.
Published: 2015
Full Text: View/download PDF

50. DeepFont: Identify Your Font from An Image

Author: Aseem Agarwala, Thomas S. Huang, Zhangyang Wang, Eli Shechtman, Jonathan Brandt, Hailin Jin, and Jianchao Yang
Subjects: FOS: Computer and information sciences, Identification (information), Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Font, Computer Science - Computer Vision and Pattern Recognition, Pattern recognition, Computer vision, Artificial intelligence, Similarity measure, business, Convolutional neural network
Abstract: As font is one of the core design concepts, automatic font identification and similar font suggestion from an image or photo has been on the wish list of many designers. We study the Visual Font Recognition (VFR) problem, and advance the state-of-the-art remarkably by developing the DeepFont system. First of all, we build up the first available large-scale VFR dataset, named AdobeVFR, consisting of both labeled synthetic data and partially labeled real-world data. Next, to combat the domain mismatch between available training and testing data, we introduce a Convolutional Neural Network (CNN) decomposition approach, using a domain adaptation technique based on a Stacked Convolutional Auto-Encoder (SCAE) that exploits a large corpus of unlabeled real-world text images combined with synthetic data preprocessed in a specific way. Moreover, we study a novel learning-based model compression approach, in order to reduce the DeepFont model size without sacrificing its performance. The DeepFont system achieves an accuracy of higher than 80% (top-5) on our collected dataset, and also produces a good font similarity measure for font selection and suggestion. We also achieve around 6 times compression of the model without any visible loss of recognition accuracy., Comment: To Appear in ACM Multimedia as a full paper
Published: 2015
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Category

Publication Type

Journal

Database

Publisher

392 results on '"Thomas S. Huang"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources