21 results on '"Ming-Sui Lee"'
Search Results
2. VR Sickness Assessment with Perception Prior and Hybrid Temporal Features
- Author
-
Ming-Sui Lee, Li-Chung Chuang, Po-Chen Kuo, and Dong-Yi Lin
- Subjects
Computer science ,business.industry ,media_common.quotation_subject ,Feature extraction ,Optical flow ,020207 software engineering ,02 engineering and technology ,Virtual reality ,Motion (physics) ,Random forest ,Perception ,Pattern recognition (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,Simulator sickness ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business ,media_common - Abstract
Virtual reality (VR) sickness is one of the obstacles hindering the growth of the VR market. Different VR contents may cause various degree of sickness. If the degree of the sickness can be estimated objectively, it adds a great value and help in designing the VR contents. To address this problem, a novel content-based VR sickness assessment method which considers both the perception prior and hybrid temporal features is proposed. Based on the perception prior which assumes the user's field of view becomes narrower while watching videos, a Gaussian weighted optical flow is calculated with a specified aspect ratio. In order to capture the dynamic characteristics, hybrid temporal features including horizontal motion, vertical motion and the proposed motion anisotropy are adopted. In addition, a new dataset is compiled with one hundred VR sickness test samples and each of which comes along with the Discomfort Scores (DS) answered by the user and a Simulator Sickness Questionnaire (SSQ) collected at the end of test. A random forest regressor is then trained on this dataset by feeding the hybrid temporal features of both the present and the previous minute. Extensive experiments are conducted on the VRSA dataset and the results demonstrate that the proposed method is comparable to the state-of-the-art method in terms of effectiveness and efficiency.
- Published
- 2021
- Full Text
- View/download PDF
3. Activity Recognition Using First-Person-View Cameras Based on Sparse Optical Flows
- Author
-
Chu-Song Chen, Chia-Hao Chang, Yan-Jing Lei, Yi-Ping Hung, Ming-Sui Lee, and Peng Yua Kao
- Subjects
Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Convolutional neural network ,Activity recognition ,First person ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,0105 earth and related environmental sciences - Abstract
First-person-view (FPV) cameras are finding wide use in daily life to record activities and sports. In this paper, we propose a succinct and robust 3D convolutional neural network (CNN) architecture accompanied with an ensemble-learning network for activity recognition with FPV videos. The proposed 3D CNN is trained on low-resolution (32 × 32) sparse optical flows using FPV video datasets consisting of daily activities. According to the experimental results, our network achieves an average accuracy of 90%.
- Published
- 2021
- Full Text
- View/download PDF
4. Intensity-aware GAN for Single Image Reflection Removal
- Author
-
Li-Chung Chuang, Nien-Hsin Chou, and Ming-Sui Lee
- Subjects
Computer science ,business.industry ,media_common.quotation_subject ,020206 networking & telecommunications ,Gallium nitride ,02 engineering and technology ,Function (mathematics) ,Image (mathematics) ,Power (physics) ,chemistry.chemical_compound ,Reflection (mathematics) ,chemistry ,Prior probability ,0202 electrical engineering, electronic engineering, information engineering ,Contrast (vision) ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business ,Intensity (heat transfer) ,media_common - Abstract
Single image reflection removal is a challenging task in computer vision. Most existing approaches rely on carefully handcrafted priors to solve the problem. Contrast to the optimization-based methods, an intensity-aware GAN with dual generators is proposed to directly estimate the function which transforms the mixture image into the reflection image itself. From the observation that the reflection layer has more discriminating power in the region with low intensity than that in the region with high intensity, the proposed architecture better describes the characteristic of the model. Moreover, a reflection image synthesis method based on the screen blending model is also presented. Experimental results demonstrate that the results of reflection removal are satisfactory in real cases while comparing with state-of-the-art methods.
- Published
- 2019
- Full Text
- View/download PDF
5. A Learning-Based Prediction Model for Baby Accidents
- Author
-
Shao-Fu Lien, Ming-Sui Lee, and Peng-Jie Wang
- Subjects
030507 speech-language pathology & audiology ,03 medical and health sciences ,Computer science ,Statistics ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Learning based ,02 engineering and technology ,0305 other medical science - Abstract
According to the statistics in the United Kingdom, more than two million babies and toddlers experienced accidents every year. Despite the places where accidents happened, most of the accidents could’ve been predicted and prevented. In order to avoid causing injuries by accident, a temporal-pyramid long short-term memory (TP-LSTM) network along with the temporal attention mechanism is proposed to predict whether an accident will happen in the future or not. The proposed network is capable of capturing important information of the video at different temporal resolution and selecting crucial frames that contribute to the accident most. Moreover, the proposed early exponential loss (EEL) function is incorporated to achieve better prediction. The baby video dataset (BVD) containing 670 videos is collected from several video-sharing websites. 320 of which are with accidents and the others are without accidents. The experimental results show that the proposed network attains average precision of 61.13% and the accidents are foreseen 4.196 seconds before the occurrence with 80% recall.
- Published
- 2019
- Full Text
- View/download PDF
6. A multilevel technique for automatic foreground extraction
- Author
-
Yi-Min Yang and Ming-Sui Lee
- Subjects
business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Probabilistic logic ,Pattern recognition ,Image segmentation ,Mixture model ,Latent Dirichlet allocation ,Generative model ,symbols.namesake ,Robustness (computer science) ,symbols ,Segmentation ,Computer vision ,Chinese restaurant process ,Artificial intelligence ,business ,Mathematics - Abstract
Foreground extraction is an important and challenging problem in many applications of computer vision. Most existing algorithms either require user intervention as hard constraints or demand special inputs for extra information. Thus, an automatic foreground extraction algorithm from a single image is proposed in this paper. A Gaussian image pyramid is constructed and the gradient vector flow (GVF) snake is adopted at the coarsest level to generate a rough contour of the object which is then upsampled and serves as the initial input to GVF snake in the next level. This process repeats until the estimated contour is propagated to the finest level. Based on the result in the finest level, a binary mask can be generated accordingly and becomes the initial constraint in the segmentation. The proposed segmentation step includes two novel schemes, which simulate the Latent Dirichlet Allocation (LDA) generative model and a probabilistic stochastic process called Chinese restaurant process. With these mechanisms, the Gaussian Mixture Models adaptively determine the number of components for foreground and background individually. As a result, the proposed method is expected to not only produce a satisfactory result of foreground extraction automatically with more robustness and adaptation but also serve as a good preprocessing to improve the performance and accuracy for tasks in computer vision.
- Published
- 2017
- Full Text
- View/download PDF
7. Facial expression synthesis from a single image
- Author
-
Man-Chia Chang and Ming-Sui Lee
- Subjects
Facial expression ,Face hallucination ,Facial motion capture ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Feature (computer vision) ,Face (geometry) ,Three-dimensional face recognition ,Computer vision ,Artificial intelligence ,business ,Computer facial animation ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
Facial expression synthesis has drawn a lot of attention in many applications, such as facial animation and human-computer interactions. Some expression synthesis methods are conducted in 2D domain where only images are taken as input but the muscle deformation is usually neglected regardless of different expressions. Methods performed in 3D domain generate more natural synthesized images but require a 3D model of the input and suffer high computational complexity, which makes it inapplicable to certain situations. A facial expression synthesis method which combines the advantages of 2D and 3D methods is proposed in this paper to synthesize expressions on an input neutral facial image. More accurate geometry information is exploited from 3D models by applying a time-saving face model reconstruction method. Expression on 2D is then synthesized using the information from 3D to produce a natural synthesized facial image with desired expression. To obtain the expressive image, the displacements of 48 facial feature points are utilized to approximate all the displacement for the whole face. Experimental results demonstrate that the proposed system can generate facial images of various expressions with satisfactory quality.
- Published
- 2014
- Full Text
- View/download PDF
8. Automatic trimap generation for digital image matting
- Author
-
Chang-Lin Hsieh and Ming-Sui Lee
- Subjects
business.industry ,Computer science ,Process (computing) ,Pattern recognition ,Image processing ,Image segmentation ,Image (mathematics) ,Upsampling ,Reduction (complexity) ,Digital image ,Computer vision ,Segmentation ,Artificial intelligence ,business - Abstract
Digital image matting is one of the most popular topics in image processing in recent years. For most matting methods, trimap serves as one of the key inputs, and the accuracy of the trimap affects image matting result a lot. Most existing works did not pay much attention to acquiring a trimap; instead, they assumed that the trimap was given, meaning the matting process usually involved users' inputs. In this paper, an automatic trimap generation technique is proposed. First, the contour of the segmentation result is dilated to get an initial guess of the trimap followed by alpha estimation. Then, a smart brush with dynamic width is performed by analyzing the structure of the foreground object to generate another trimap. In other words, the brush size is enlarged if the object boundary contains fine details like hair, fur, etc. On the contrary, the brush size gets smaller if the contour of the object is just a simple curve or straight line. Moreover, by combining the trimap obtained in step one and downsampling the image, the uncertainty is defined as the blurred region, and the third trimap is formed. The final step is to combine these three trimaps together by voting. The experimental results show that the trimap generated by the proposed method effectively improves the matting result. Moreover, the enhancement of the accuracy of the trimap results in a reduction of regions to be processed, so that the matting procedure is accelerated.
- Published
- 2013
- Full Text
- View/download PDF
9. A low-complexity upsampling technique for H.264
- Author
-
Ming-Sui Lee and Wei-Chi Chen
- Subjects
Motion compensation ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Data_CODINGANDINFORMATIONTHEORY ,Upsampling ,Motion estimation ,Maximum a posteriori estimation ,Codec ,Computer vision ,Artificial intelligence ,business ,Algorithm ,Image resolution ,Block (data storage) ,Reference frame ,Block-matching algorithm ,Interpolation - Abstract
A hybrid up-sampling algorithm based on the predicted modes of H.264/AVC is proposed in this paper. Other than video codecs like MPEG group, H.264/AVC utilizes variable block size for motion estimation and motion compensation, which results in better precision and compression efficiency. According to the mode decision built in H.264/AVC, the macroblocks of each frame are divided into intra mode, skip mode and others. For intra-mode macroblocks which contain more details, they are up-sampled by MAP (maximum a posteriori) since this method has best performance among existing super resolution algorithms. For macroblocks coded as skip mode, they are assumed to be highly correlated to macroblocks in the reference frame. Thus those blocks are duplicated from those referenced blocks. For the rest of the macroblocks, they not only have correspondence with blocks in other frames but also contain relatively complicated content so that they are further analyzed into variable block sizes, say 16×16, 8×16, 16×8, 8×8, 8×4, 4×8 and 4×4. By adopting different up-sampling methods adaptively with variable block sizes, the proposed method saves computational efforts of smoother blocks for complicated blocks so that the overall complexity can be successfully reduced. Comparing to traditional frame-based up-sampling methods, the experimental results demonstrated that the proposed algorithm provides a more efficient way to up-sample videos and is capable of preserving satisfactory visual quality.
- Published
- 2011
- Full Text
- View/download PDF
10. Image recovery of geometric distortion with multi-bit data embedding
- Author
-
Ming-Sui Lee and Yu-Hsiang Chiu
- Subjects
Discrete wavelet transform ,Transmission (telecommunications) ,Robustness (computer science) ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Bit error rate ,Computer vision ,Artificial intelligence ,business ,Digital watermarking ,Object detection ,Image (mathematics) - Abstract
Image transmission is sometimes accompanied with geometric distortions. A novel image recovery scheme with multi-bit binary message is proposed in this paper. In the proposed scheme, several predefined templates are introduced to an image in the discrete wavelet transform domain. A blind template detection algorithm is performed on the geometrically distorted image to extract locations of the templates and the hidden message, which are modeled probabilistically in a Bayesian network. Once the locations of the templates are successfully detected, they serve as the registration references in the recovering process. As a result, the image attacked by geometric distortions can be recovered according to the estimated displacements. The goal of this project is to develop a scheme to correct various geometric distortions with relatively lower bit error rate.
- Published
- 2010
- Full Text
- View/download PDF
11. QPalm: A gesture recognition system for remote control with list menu
- Author
-
Yi-Ping Hung, Ming-Sui Lee, Ju-Chun Ko, Yu-Hsin Chang, Jane Yung-jen Hsu, and Liwei Chan
- Subjects
Computer science ,Machine vision ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Interaction technique ,law.invention ,Digital media ,Circular motion ,law ,Human–computer interaction ,Gesture recognition ,Scrolling ,Computer vision ,Artificial intelligence ,business ,Remote control ,Stereo camera - Abstract
The coming ubiquity of digital media content is driving the need of a solution for improving the interaction between the people and media. In this work, we proposed a novel interaction technique, QPalm, which allows the user to control the media via a list menu shown on a distant display by drawing circles in the air with one hand. To manipulate a list menu remotely, QPalm includes two basic functions, browse and choosing, realized by recognizing the userpsilas palm performing circular and push motions in the air. The circular motion provides fluidity in scrolling a menu up and down, while push motion is intuitive when the user decided to choose an item during a circular motion. Based on this design, we develop a vision system based on a stereo camera to track the userpsilas palm without interfering by intruders behind or next to the operating user. For more specifically, the contribution of the work includes: (1) an intuitive interaction technique, QPalm, for remote control with list menu, and (2) a palm tracking algorithm to support QPalm based on merely depth and motion information of images for a practical consideration.
- Published
- 2008
- Full Text
- View/download PDF
12. A Content-Adaptive Up-Sampling Technique for Image Resolution Enhancement
- Author
-
C.-C.J. Kuo, Mei-Yin Shen, and Ming-Sui Lee
- Subjects
Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Edge enhancement ,Subpixel rendering ,Image (mathematics) ,Upsampling ,Discrete cosine transform ,Computer vision ,Artificial intelligence ,business ,Image resolution ,Unsharp masking ,Block (data storage) - Abstract
A content adaptive technique is proposed to upsample an image to an output image of higher resolution in this work. The proposed technique is a block-based processing algorithm that offers the flexibility in choosing the most suitable up-sampling method for a particular block type. Block classification is first conducted in the DCT domain to categorized each image block into several types: smooth areas, textures, edges and others. For the plain background and smooth surfaces, simple patches are used to enlarge the image size without degrading the resultant visual quality. The unsharp masking method is applied to the textured region to preserve high frequency components. Since human eyes are more sensitive to edges, we adopt a more sophisticated technique to process edge blocks. That is, they are approximated by a facet model so that the image data at subpixel positions can be generated accordingly. A post-processing technique such as ID directional unsharp masking can be used to enhance edge sharpness furthermore. Experimental results are given to demonstrate the efficiency of the proposed technique.
- Published
- 2007
- Full Text
- View/download PDF
13. A Quad-Tree Decomposition Approach to Cartoon Image Compression
- Author
-
Yi-Chen Tsai, Mei-Yin Shen, C.-C.J. Kuo, and Ming-Sui Lee
- Subjects
Lossless compression ,Computer science ,business.industry ,Search engine indexing ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,File size ,Shape coding ,Computer vision ,Entropy encoding ,Artificial intelligence ,business ,ComputingMethodologies_COMPUTERGRAPHICS ,Color Cell Compression ,Data compression ,Image compression - Abstract
A quad-tree decomposition approach is proposed for cartoon image compression in this work. The proposed algorithm achieves excellent coding performance by using a unique quad-tree decomposition and shape coding method along with a GIF like color indexing technique to efficiently encode large areas of the same color, which appear in a cartoon-type image commonly. To reduce complexity, the input image is partitioned into small blocks and the quad-tree decomposition is independently applied to each block instead of the entire image. The LZW entropy coding method can be performed as a post-processing step to further reduce the coded file size. It is demonstrated by experimental results that the proposed method outperforms several well-known lossless image compression techniques for cartoon images that contain 256 colors or less.
- Published
- 2006
- Full Text
- View/download PDF
14. A DCT-Domain Video Alignment Technique for MPEG Sequences
- Author
-
Mei-Yin Shen, C.-C.J. Kuo, and Ming-Sui Lee
- Subjects
Motion compensation ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Image registration ,Edge detection ,Motion estimation ,Computer Science::Multimedia ,Discrete cosine transform ,Computer vision ,Artificial intelligence ,business ,Group of pictures ,Mathematics ,Block-matching algorithm ,Data compression - Abstract
An image/video registration technique for multiple compressed video inputs such as MPEG sequences is investigated. The proposed technique is based on the matching of discrete cosine transform (DCT) coefficients and motion vectors. First, the I frame of each input sequence is separated into the background and moving objects. For the background, coarse edge features are extracted by applying edge detectors of different characteristics to the luminance DC coefficients. Each detector generates a difference map for a single background. A threshold is determined for each difference map to produce a binary map. Then, alignment parameters are determined using the binary maps of input images generated by the same detector. For the moving object, alignment parameters can be finetuned by the motion information of all frames in the same group of pictures (GOP). Finally, the actual displacement in the pixel domain is estimated by the weighted average of alignment parameters from all background detectors and refinement parameters from motion information. It is shown by experimental results that the proposed method reduces the computational cost of image/video registration significantly in comparison with the traditional pixel domain registration techniques while achieving certain quality of composition
- Published
- 2005
- Full Text
- View/download PDF
15. DCT-Domain Image Registration Techniques for Compressed Video
- Author
-
Mei-Yin Shen, C.-C. Jay Kuo, Ming-Sui Lee, and Akio Yoneyama
- Subjects
Pixel ,business.industry ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Image registration ,Pattern recognition ,Edge detection ,Motion JPEG ,Computer Science::Computer Vision and Pattern Recognition ,Computer Science::Multimedia ,Discrete cosine transform ,Computer vision ,Artificial intelligence ,business ,Transform coding ,Data compression ,Mathematics - Abstract
A technique for image registration in compressed video, such as motion JPEG or the I-picture of MPEG, is investigated. The technique is based on DCT (discrete cosine transform) coefficient matching. First, the coarse edge features are extracted by applying several edge detectors to luminance DC coefficients. Each detector generates one difference map for a single input image. A threshold is set up for each difference map to produce a binary map. Then, the alignment parameters are determined based on the binary maps of both input images generated by the same detector. Finally, the actual displacement in the pixel domain is calculated by averaging parameters from all detectors. Experimental results show that the proposed method reduces the computational cost of image registration dramatically as compared with the pixel domain and edge-based DCT domain registration techniques, while achieving a certain quality of composition.
- Published
- 2005
- Full Text
- View/download PDF
16. QPalm: A gesture recognition system for remote control with list menu.
- Author
-
Yu-Hsin Chang, Li-Wei Chan, Ju-Chun Ko, Ming-Sui Lee, Hsu, J., and Yi-Ping Hung
- Published
- 2008
- Full Text
- View/download PDF
17. A Content-Adaptive Up-Sampling Technique for Image Resolution Enhancement.
- Author
-
Ming-Sui Lee, Mei-Yin Shen, and Kuo, C.-C.J.
- Published
- 2007
- Full Text
- View/download PDF
18. A DCT-Domain Video Alignment Technique for MPEG Sequences.
- Author
-
Ming-Sui Lee, Shen, M., and Kuo, C.-C.J.
- Published
- 2005
- Full Text
- View/download PDF
19. Techniques for Flexible Image/Video Resolution Conversion with Heterogeneous Terminals.
- Author
-
Ming-Sui Lee, Mel-Yin Shen, Kuo, C. -C. Jay, and Yoneyama, Akio
- Subjects
- *
MULTIMEDIA systems , *INFORMATION storage & retrieval systems , *IMAGE retrieval , *INTERACTIVE multimedia , *DIGITAL video , *DIGITAL video standards , *OPTICAL resolution , *GEOMETRICAL optics - Abstract
Multimedia capturing and display devices of different resolutions and aspect ratios can be easily connected by networks and, thus, there is a great need to develop techniques that facilitate flexible image/video format conversion and content adaptation among these heterogeneous terminals. Quality degradation due to down-sampling, up-sampling, coding/decoding, and some content adaptation mechanism (say, image mosaicking) in the transmission process is inevitable. It is desirable that multimedia contents can be easily captured, displayed, and seamlessly composed. Challenges and techniques to achieve this goal are reviewed first. Then, two specific topics, i.e., image/video mosaicking and super resolution (SR) conversion, are highlighted. As compared with previous work developed for these problems, the challenge under the current context is to strike a balance between low computational complexity and high quality of resultant image/video. Several new developments along this line are discussed. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
20. DCT-domain image registration techniques for compressed video.
- Author
-
Ming-Sui Lee, Meiyin Shen, Yoneyama, A., and Jay Kuo, C.-C.
- Published
- 2005
- Full Text
- View/download PDF
21. Color matching techniques for video mosaic applications.
- Author
-
Ming-Sui Lee, Meiyin Shen, and Jay Kuo, C.-C.
- Published
- 2004
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.