44 results on '"Ming-Sui Lee"'
Search Results
2. BIRD-PCC: Bi-directional Range Image-based Deep LiDAR Point Cloud Compression
- Author
-
Chia-Sheng Liu, Jia-Fong Yeh, Hao Hsu, Hung-Ting Su, Ming-Sui Lee, and Winston H. Hsu
- Subjects
FOS: Computer and information sciences ,Computer Science - Robotics ,Robotics (cs.RO) ,Computer Science - Multimedia ,Multimedia (cs.MM) - Abstract
The large amount of data collected by LiDAR sensors brings the issue of LiDAR point cloud compression (PCC). Previous works on LiDAR PCC have used range image representations and followed the predictive coding paradigm to create a basic prototype of a coding framework. However, their prediction methods give an inaccurate result due to the negligence of invalid pixels in range images and the omission of future frames in the time step. Moreover, their handcrafted design of residual coding methods could not fully exploit spatial redundancy. To remedy this, we propose a coding framework BIRD-PCC. Our prediction module is aware of the coordinates of invalid pixels in range images and takes a bidirectional scheme. Also, we introduce a deep-learned residual coding module that can further exploit spatial redundancy within a residual frame. Experiments conducted on SemanticKITTI and KITTI-360 datasets show that BIRD-PCC outperforms other methods in most bitrate conditions and generalizes well to unseen environments., Comment: Accepted to ICASSP 2023
- Published
- 2023
- Full Text
- View/download PDF
3. Objective evaluation of biomaterial effects after injection laryngoplasty – Introduction of artificial intelligence‐based ultrasonic image analysis
- Author
-
Yong-Wei Chen, Tsung-Lin Yang, Wen-Hsuan Tseng, Tzu-Yu Hsiao, Ming-Sui Lee, and Che-Chai Wang
- Subjects
Adult ,Male ,Time Factors ,Imaging phantom ,Laryngoplasty ,03 medical and health sciences ,0302 clinical medicine ,Artificial Intelligence ,Image Processing, Computer-Assisted ,Humans ,Medicine ,Temporal change ,Hyaluronic Acid ,030223 otorhinolaryngology ,Ultrasonography, Interventional ,Aged ,Phantoms, Imaging ,business.industry ,Vocal fold paralysis ,Middle Aged ,Glottal closure ,Injection laryngoplasty ,Otorhinolaryngology ,030220 oncology & carcinogenesis ,Female ,Ultrasonic sensor ,Objective evaluation ,Artificial intelligence ,business ,Vocal Cord Paralysis - Abstract
OBJECTIVE Hyaluronic acid (HA) can be degraded over time. However, persistence of the effects after injection laryngoplasty (IL) for unilateral vocal fold paralysis (UVFP), longer than expected from HA longevity, has been observed. The purpose of the study was to develop a methodology with clinical utility for objective evaluation of the temporal change in HA volume after IL using artificial intelligence (AI)-based ultrasonic assessment. DESIGN, SETTING AND PARTICIPANTS Imaging phantoms simulating injected HA were built in different volumes for designing the algorithm for machine learning. Subsequently, five adult patients who had undergone IL with HA for UVFP were recruited for clinical evaluation. MAIN OUTCOME MEASURES Estimated volumes were evaluated for injected HA by the automatic algorithm as well as voice outcomes at 2 weeks, and 2 and 6 months after IL. RESULTS On imaging phantoms, contours on each frame were described well by the algorithm and the volume could be estimated accordingly. The error rates were 0%-9.2%. Moreover, the resultant contours of the HA area were captured in detail for all participants. The estimated volume decreased to an average of 65.76% remaining at 2 months and to a minimal amount at 6 months while glottal closure remained improved. CONCLUSION The volume change of the injected HA over time for an individual was estimated non-invasively by AI-based ultrasonic image analysis. The prolonged effect after treatment, longer than HA longevity, was demonstrated objectively for the first time. The information is beneficial to achieve optimal cost-effectiveness of IL and improve the life quality of the patients.
- Published
- 2021
- Full Text
- View/download PDF
4. Query-Driven Multi-Instance Learning
- Author
-
Yen-Chi Hsu, Cheng-Yao Hong, Tyng-Luh Liu, and Ming-Sui Lee
- Subjects
ComputingMethodologies_PATTERNRECOGNITION ,Iterative method ,Computer science ,business.industry ,Word2vec ,General Medicine ,Artificial intelligence ,Machine learning ,computer.software_genre ,business ,computer ,MNIST database - Abstract
We introduce a query-driven approach (qMIL) to multi-instance learning where the queries aim to uncover the class labels embodied in a given bag of instances. Specifically, it solves a multi-instance multi-label learning (MIML) problem with a more challenging setting than the conventional one. Each MIML bag in our formulation is annotated only with a binary label indicating whether the bag contains the instance of a certain class and the query is specified by the word2vec of a class label/name. To learn a deep-net model for qMIL, we construct a network component that achieves a generalized compatibility measure for query-visual co-embedding and yields proper instance attentions to the given query. The bag representation is then formed as the attention-weighted sum of the instances' weights, and passed to the classification layer at the end of the network. In addition, the qMIL formulation is flexible for extending the network to classify unseen class labels, leading to a new technique to solve the zero-shot MIML task through an iterative querying process. Experimental results on action classification over video clips and three MIML datasets from MNIST, CIFAR10 and Scene are provided to demonstrate the effectiveness of our method.
- Published
- 2020
- Full Text
- View/download PDF
5. Multi-instance learning for eosinophil quantification of sinonasal histopathology images
- Author
-
Yi-Tsen Lin, Ming-Sui Lee, and Te-Huei Yeh
- Subjects
General Medicine - Published
- 2023
- Full Text
- View/download PDF
6. VR Sickness Assessment with Perception Prior and Hybrid Temporal Features
- Author
-
Ming-Sui Lee, Li-Chung Chuang, Po-Chen Kuo, and Dong-Yi Lin
- Subjects
Computer science ,business.industry ,media_common.quotation_subject ,Feature extraction ,Optical flow ,020207 software engineering ,02 engineering and technology ,Virtual reality ,Motion (physics) ,Random forest ,Perception ,Pattern recognition (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,Simulator sickness ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business ,media_common - Abstract
Virtual reality (VR) sickness is one of the obstacles hindering the growth of the VR market. Different VR contents may cause various degree of sickness. If the degree of the sickness can be estimated objectively, it adds a great value and help in designing the VR contents. To address this problem, a novel content-based VR sickness assessment method which considers both the perception prior and hybrid temporal features is proposed. Based on the perception prior which assumes the user's field of view becomes narrower while watching videos, a Gaussian weighted optical flow is calculated with a specified aspect ratio. In order to capture the dynamic characteristics, hybrid temporal features including horizontal motion, vertical motion and the proposed motion anisotropy are adopted. In addition, a new dataset is compiled with one hundred VR sickness test samples and each of which comes along with the Discomfort Scores (DS) answered by the user and a Simulator Sickness Questionnaire (SSQ) collected at the end of test. A random forest regressor is then trained on this dataset by feeding the hybrid temporal features of both the present and the previous minute. Extensive experiments are conducted on the VRSA dataset and the results demonstrate that the proposed method is comparable to the state-of-the-art method in terms of effectiveness and efficiency.
- Published
- 2021
- Full Text
- View/download PDF
7. Activity Recognition Using First-Person-View Cameras Based on Sparse Optical Flows
- Author
-
Chu-Song Chen, Chia-Hao Chang, Yan-Jing Lei, Yi-Ping Hung, Ming-Sui Lee, and Peng Yua Kao
- Subjects
Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Convolutional neural network ,Activity recognition ,First person ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,0105 earth and related environmental sciences - Abstract
First-person-view (FPV) cameras are finding wide use in daily life to record activities and sports. In this paper, we propose a succinct and robust 3D convolutional neural network (CNN) architecture accompanied with an ensemble-learning network for activity recognition with FPV videos. The proposed 3D CNN is trained on low-resolution (32 × 32) sparse optical flows using FPV video datasets consisting of daily activities. According to the experimental results, our network achieves an average accuracy of 90%.
- Published
- 2021
- Full Text
- View/download PDF
8. Online CNN-based multiple object tracking with enhanced model updates and identity association
- Author
-
Xuejing Lei, C.-C. Jay Kuo, Weihao Gan, Shuo Wang, and Ming-Sui Lee
- Subjects
business.industry ,BitTorrent tracker ,Computer science ,Association (object-oriented programming) ,020207 software engineering ,02 engineering and technology ,Object (computer science) ,Convolutional neural network ,Video tracking ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,Trajectory ,Benchmark (computing) ,Identity (object-oriented programming) ,020201 artificial intelligence & image processing ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Software - Abstract
Online multiple objects tracking (MOT) is a challenging problem due to occlusions and interactions among targets. An online MOT method with enhanced model updates and identity association is presented to handle the error drift and the identity switch problems in this work. The proposed MOT system consists of multiple single CNN(Convolutional Neural Networks)-based object trackers, where the shared CONV layers are fixed and used to extract the appearance representation while target-specific FC layers are updated online to distinguish the target from background. Two model updates are developed to build an accurate tracker. When a target is visible and with smooth movement, we perform the incremental update based on its recent appearance. When a target experiences error drifting due to occlusion, we conduct the refresh update to clear all previous memory of the target. Moreover, we introduce an enhanced online ID assignment scheme based on multi-level features to confirm the trajectory of each target. Experimental results demonstrate that the proposed online MOT method outperforms other existing online methods against the MOT17 and MOT16 benchmark datasets and achieves the best performance in terms of ID association.
- Published
- 2018
- Full Text
- View/download PDF
9. Online object tracking via motion-guided convolutional neural network (MGNet)
- Author
-
Ming-Sui Lee, C.-C. (Jay) Kuo, Weihao Gan, and Chi-Hao Wu
- Subjects
Scheme (programming language) ,BitTorrent tracker ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Optical flow ,020206 networking & telecommunications ,02 engineering and technology ,Object (computer science) ,Convolutional neural network ,Video tracking ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,Benchmark (computing) ,RGB color model ,020201 artificial intelligence & image processing ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,computer ,computer.programming_language - Abstract
Tracking-by-detection (TBD) is widely used in visual object tracking. However, many TBD-based methods ignore the strong motion correlation between current and previous frames. In this work, a motion-guided convolutional neural network (MGNet) solution to online object tracking is proposed. The MGNet tracker is built upon the multi-domain convolutional neural network with two innovations: (1) a motion-guided candidate selection (MCS) scheme based on a dynamic prediction model is proposed to accurately and efficiently generate the candidate regions and (2) the spatial RGB and temporal optical flow are combined as inputs and processed in an unified end-to-end trained network, rather than a two-branch processing network. We compare the performance of the MGNet, the MDNet and several state-of-the-art online object trackers on the OTB and the VOT benchmark datasets, and demonstrate that the temporal correlation between any two consecutive frames in videos can be more effectively captured by the MGNet via extensive performance evaluation.
- Published
- 2018
- Full Text
- View/download PDF
10. Intensity-aware GAN for Single Image Reflection Removal
- Author
-
Li-Chung Chuang, Nien-Hsin Chou, and Ming-Sui Lee
- Subjects
Computer science ,business.industry ,media_common.quotation_subject ,020206 networking & telecommunications ,Gallium nitride ,02 engineering and technology ,Function (mathematics) ,Image (mathematics) ,Power (physics) ,chemistry.chemical_compound ,Reflection (mathematics) ,chemistry ,Prior probability ,0202 electrical engineering, electronic engineering, information engineering ,Contrast (vision) ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business ,Intensity (heat transfer) ,media_common - Abstract
Single image reflection removal is a challenging task in computer vision. Most existing approaches rely on carefully handcrafted priors to solve the problem. Contrast to the optimization-based methods, an intensity-aware GAN with dual generators is proposed to directly estimate the function which transforms the mixture image into the reflection image itself. From the observation that the reflection layer has more discriminating power in the region with low intensity than that in the region with high intensity, the proposed architecture better describes the characteristic of the model. Moreover, a reflection image synthesis method based on the screen blending model is also presented. Experimental results demonstrate that the results of reflection removal are satisfactory in real cases while comparing with state-of-the-art methods.
- Published
- 2019
- Full Text
- View/download PDF
11. A Learning-Based Prediction Model for Baby Accidents
- Author
-
Shao-Fu Lien, Ming-Sui Lee, and Peng-Jie Wang
- Subjects
030507 speech-language pathology & audiology ,03 medical and health sciences ,Computer science ,Statistics ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Learning based ,02 engineering and technology ,0305 other medical science - Abstract
According to the statistics in the United Kingdom, more than two million babies and toddlers experienced accidents every year. Despite the places where accidents happened, most of the accidents could’ve been predicted and prevented. In order to avoid causing injuries by accident, a temporal-pyramid long short-term memory (TP-LSTM) network along with the temporal attention mechanism is proposed to predict whether an accident will happen in the future or not. The proposed network is capable of capturing important information of the video at different temporal resolution and selecting crucial frames that contribute to the accident most. Moreover, the proposed early exponential loss (EEL) function is incorporated to achieve better prediction. The baby video dataset (BVD) containing 670 videos is collected from several video-sharing websites. 320 of which are with accidents and the others are without accidents. The experimental results show that the proposed network attains average precision of 61.13% and the accidents are foreseen 4.196 seconds before the occurrence with 80% recall.
- Published
- 2019
- Full Text
- View/download PDF
12. Object tracking with temporal prediction and spatial refinement (TPSR)
- Author
-
Chi-Hao Wu, Weihao Gan, Ming-Sui Lee, and C.-C. Jay Kuo
- Subjects
Computer science ,business.industry ,Template matching ,Frame (networking) ,Optical flow ,020207 software engineering ,Tracking system ,02 engineering and technology ,Tracking (particle physics) ,Minimum bounding box ,Video tracking ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Spatial analysis ,Software - Abstract
A temporal prediction and spatial refinement (TPSR) method for online single object tracking (SOT) is proposed in this work. The TPSR tracking system consists of three cascaded modules: pre-processing (PP), temporal prediction (TP) and spatial refinement (SR). Illumination variation and shaking camera movement are two challenging factors in the tracking problem. They are first compensated in the PP module. Then, a joint region-based template matching (TM) and pixel-wised optical flow (OF) scheme is adopted in the TP module, where the switch between TM and OF is conducted automatically. These two modes work in a complementary manner to handle different foreground and background situations. Finally, to overcome the drifting error arising from the TP module, the bounding box location and size are finetuned using the local spatial information of the new frame in the SR module. The proposed TPSR tracking system offers the state-of-the-art performance with respect to a commonly used benchmarking dataset. HighlightsA robust tracker based on temporal prediction and spatial refinement is proposed.Temporal prediction relies on two complementary components.Fine-tuning using the local spatial information avoids the error drifting.
- Published
- 2016
- Full Text
- View/download PDF
13. Video object tracking and segmentation with box annotation
- Author
-
Ming-Sui Lee, Kaitai Zhang, Qin Huang, Yueru Chen, Jongmoo Choi, C.-C. Jay Kuo, and Ye Wang
- Subjects
Computer science ,business.industry ,Track (disk drive) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,020206 networking & telecommunications ,02 engineering and technology ,Object (computer science) ,Tracking (particle physics) ,Annotation ,Minimum bounding box ,Video tracking ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Segmentation ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Software - Abstract
This paper presents a two-stage approach, track and then segment, to perform semi-supervised video object segmentation (VOS) with only bounding box annotations. The proposed reverse optimization for VOS (ROVOS) which leverages a fully convolutional Siamese network performs tracking and segmentation in the tracker. The segmentation cues are able to reversely optimize the location of the tracker and the object segmentation masks are produced by the two-branch system online. The experimental results on DAVIS 2016 and DAVIS 2017 demonstrate significant improvements of the proposed algorithm over the state-of-the-art methods.
- Published
- 2020
- Full Text
- View/download PDF
14. Design Pseudo Ground Truth with Motion Cue for Unsupervised Video Object Segmentation
- Author
-
Siyang Li, C.-C. Jay Kuo, Jongmoo Choi, Ming-Sui Lee, Qin Huang, Yueru Chen, and Ye Wang
- Subjects
Ground truth ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Unsupervised segmentation ,02 engineering and technology ,010501 environmental sciences ,Object (computer science) ,01 natural sciences ,Motion (physics) ,Video tracking ,0202 electrical engineering, electronic engineering, information engineering ,Benchmark (computing) ,020201 artificial intelligence & image processing ,Segmentation ,Computer vision ,Artificial intelligence ,business ,0105 earth and related environmental sciences - Abstract
One major technique debt in video object segmentation is to label the object masks for training instances. As a result, we propose to prepare inexpensive, yet high quality pseudo ground truth corrected with motion cue for video object segmentation training. Our method conducts semantic segmentation using instance segmentation networks and, then, selects the segmented object of interest as the pseudo ground truth based on the motion information. Afterwards, the pseudo ground truth is exploited to finetune the pretrained objectness network to facilitate object segmentation in the remaining frames of the video. We show that the pseudo ground truth could effectively improve the segmentation performance. This straightforward unsupervised video object segmentation method is more efficient than existing methods. Experimental results on DAVIS and FBMS show that the proposed method outperforms state-of-the-art unsupervised segmentation methods on various benchmark datasets. And the category-agnostic pseudo ground truth has great potential to extend to multiple arbitrary object tracking.
- Published
- 2019
- Full Text
- View/download PDF
15. Stillness Moves
- Author
-
Pei Yi Lee, Han Hong Lin, Ping-Hsuan Han, Kuan Yin Lu, Wei-Zen Sun, Yi-Ping Hung, Amy Ming Sui Lee, Chia Hung Sun, and Yao Fu Jan
- Subjects
Movement (music) ,education ,05 social sciences ,Training system ,020207 software engineering ,Body movement ,02 engineering and technology ,Body weight ,Affect (psychology) ,Training (civil) ,Visualization ,0202 electrical engineering, electronic engineering, information engineering ,0501 psychology and cognitive sciences ,Transfer of learning ,Psychology ,050107 human factors ,Cognitive psychology - Abstract
Body weight-transfer plays an important role in many exercises. The correlation of the body posture, movement, and weight-transfer will mutually affect the trainee to do well in performances such as Tai-Chi exercise. According to the traditional way of learning Tai-Chi, we proposed Stillness Moves, a physical training system for Tai-Chi, which captures and records users' skeleton movement and weight-transfer information for offering real-time and summary visual feedback. Based on above, we provide a gradual learning program in physical training, which combines body movement and weight-transfer learning. We evaluated our system and compared the performance without and with weight-transfer guidance in the user study. The result demonstrated that weight-transfer guidance is beneficial for trainee learning the Tai-Chi moves. For difficult moves, the trainee should learn the weight-transfer first, then, learning the body movement.
- Published
- 2018
- Full Text
- View/download PDF
16. A multilevel technique for automatic foreground extraction
- Author
-
Yi-Min Yang and Ming-Sui Lee
- Subjects
business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Probabilistic logic ,Pattern recognition ,Image segmentation ,Mixture model ,Latent Dirichlet allocation ,Generative model ,symbols.namesake ,Robustness (computer science) ,symbols ,Segmentation ,Computer vision ,Chinese restaurant process ,Artificial intelligence ,business ,Mathematics - Abstract
Foreground extraction is an important and challenging problem in many applications of computer vision. Most existing algorithms either require user intervention as hard constraints or demand special inputs for extra information. Thus, an automatic foreground extraction algorithm from a single image is proposed in this paper. A Gaussian image pyramid is constructed and the gradient vector flow (GVF) snake is adopted at the coarsest level to generate a rough contour of the object which is then upsampled and serves as the initial input to GVF snake in the next level. This process repeats until the estimated contour is propagated to the finest level. Based on the result in the finest level, a binary mask can be generated accordingly and becomes the initial constraint in the segmentation. The proposed segmentation step includes two novel schemes, which simulate the Latent Dirichlet Allocation (LDA) generative model and a probabilistic stochastic process called Chinese restaurant process. With these mechanisms, the Gaussian Mixture Models adaptively determine the number of components for foreground and background individually. As a result, the proposed method is expected to not only produce a satisfactory result of foreground extraction automatically with more robustness and adaptation but also serve as a good preprocessing to improve the performance and accuracy for tasks in computer vision.
- Published
- 2017
- Full Text
- View/download PDF
17. Techniques for flexible image/video resolution conversion with heterogeneous terminals
- Author
-
Akio Yoneyama, Mei-Yin Shen, C.-C.J. Kuo, and Ming-Sui Lee
- Subjects
Computer Networks and Communications ,Computer science ,Video capture ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Image processing ,Video processing ,Content adaptation ,Display resolution ,Computer Science Applications ,Display device ,Video tracking ,Computer vision ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Image resolution ,Computer hardware - Abstract
Multimedia capturing and display devices of different resolutions and aspect ratios can be easily connected by networks and, thus, there is a great need to develop techniques that facilitate flexible image/video format conversion and content adaptation among these heterogeneous terminals. Quality degradation due to downsampling, up-sampling, coding/decoding, and some content adaptation mechanism (say, image mosaicking) in the transmission process is inevitable. It is desirable that multimedia contents can be easily captured, displayed, and seamlessly composed. Challenges and techniques to achieve this goal are reviewed first. Then, two specific topics, i.e., image/video mosaicking and super resolution (SR) conversion, are highlighted. As compared with previous work developed for these problems, the challenge under the current context is to strike a balance between low computational complexity and high quality of resultant image/video. Several new developments along this line are discussed
- Published
- 2007
- Full Text
- View/download PDF
18. Optically active silica and polymeric materials for microcavity lasers and sensors
- Author
-
Nishita Deka, Vinh Diep, Ce Shi, A. Kovach, Kelvin Kuo, Simin Mehrabani, Ashley J. Maker, Eda Gungor, Ming-Sui Lee, and Andrea M. Armani
- Subjects
Materials science ,Dopant ,business.industry ,Physics::Optics ,Nanoparticle ,Laser ,Photon upconversion ,law.invention ,symbols.namesake ,Raman laser ,law ,symbols ,Optoelectronics ,Whispering-gallery wave ,business ,Raman spectroscopy ,Lasing threshold - Abstract
Silica and silica-doped high quality factor (Q) optical resonators have demonstrated ultra-low threshold lasers based on numerous mechanisms (eg rare earth dopants, Raman). To date, the key focus has been on maintaining a high Q, as that determines the lasing threshold and linewidth. However, equally important criteria are lasing efficiency and wavelength. These parameters are governed by the material, not the cavity Q. Therefore, to fully address this challenge, it is necessary to develop new materials. We have synthesized a suite of silica and polymeric materials with nanoparticle and rare-earth dopants to enable the development of microcavity lasers with emission from the near-IR to the UV. Additionally, the efficiencies and thresholds of many of these devices surpass the previous work. Specifically, the silica sol-gel lasers are co- and tri-doped with metal nanoparticles (eg Ti, Al) and rare-earth materials (eg Yb, Nb, Tm) and are fabricated using conventional micro/nanofabrication methods. The intercalation of the metal in the silica matrix reduces the clustering of the rare-earth ions and reduces the phonon energy of the glass, improving efficiency and overall device performance. Additionally, the silica Raman gain coefficient is enhanced due to the inclusion of the metal nanoparticles, which results in a lower threshold and a higher efficiency silica Raman laser. Finally, we have synthesized several polymer films doped with metal (eg Au, Ag) nanoparticles and deposited them on the surface of our microcavity devices. By pumping on the plasmonic resonant wavelength of the particle, we are able to achieve plasmonic-enhanced upconversion lasing.
- Published
- 2015
- Full Text
- View/download PDF
19. Facial expression synthesis from a single image
- Author
-
Man-Chia Chang and Ming-Sui Lee
- Subjects
Facial expression ,Face hallucination ,Facial motion capture ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Feature (computer vision) ,Face (geometry) ,Three-dimensional face recognition ,Computer vision ,Artificial intelligence ,business ,Computer facial animation ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
Facial expression synthesis has drawn a lot of attention in many applications, such as facial animation and human-computer interactions. Some expression synthesis methods are conducted in 2D domain where only images are taken as input but the muscle deformation is usually neglected regardless of different expressions. Methods performed in 3D domain generate more natural synthesized images but require a 3D model of the input and suffer high computational complexity, which makes it inapplicable to certain situations. A facial expression synthesis method which combines the advantages of 2D and 3D methods is proposed in this paper to synthesize expressions on an input neutral facial image. More accurate geometry information is exploited from 3D models by applying a time-saving face model reconstruction method. Expression on 2D is then synthesized using the information from 3D to produce a natural synthesized facial image with desired expression. To obtain the expressive image, the displacements of 48 facial feature points are utilized to approximate all the displacement for the whole face. Experimental results demonstrate that the proposed system can generate facial images of various expressions with satisfactory quality.
- Published
- 2014
- Full Text
- View/download PDF
20. Opportunities for Persuasive Technology to Motivate Heavy Computer Users for Stretching Exercise
- Author
-
Yong-Xiang Chen, Shang-Hua Yang, Shih-Sung Lin, King-Jen Chang, Cheng-Min Jen, Shun-Wen Cheng, Shih-Yao Lin, Siek-Siang Chiang, Jau-Yih Tsauo, Yi-Ping Hung, Ming-Sui Lee, Wen-Ching Liao, Chia-Shiang Shih, Shu-Yun Chih, and Yu-Shan Lin
- Subjects
Competition (economics) ,Multimedia ,Computer science ,Human–computer interaction ,Social persuasion ,Computer users ,Persuasive technology ,computer.software_genre ,computer - Abstract
Reducing the negative effects of extended computer use is becoming increasingly important and it has been demonstrated that appropriate stretching yields benefits. We investigated the opportunities of motivating heavy computer users to stretch by incorporating mobile and sensing technologies into a 1-on-1 social competition game. We implemented the "Social Persuasion System for Stretching" SP-Stretch and conducted a 4-week study with 25 heavy computer users. Based on the quantitative and qualitative results, we identify a number of design considerations, and provide suggestions for future research.
- Published
- 2014
- Full Text
- View/download PDF
21. Haze effect removal from image via haze density estimation in optical model
- Author
-
Chia-Hung Yeh, Li-Wei Kang, Ming-Sui Lee, and Cheng Yang Lin
- Subjects
Haze ,Pixel ,Channel (digital image) ,Light ,business.industry ,Atmosphere ,Image processing ,Density estimation ,Models, Theoretical ,Image Enhancement ,Atomic and Molecular Physics, and Optics ,Optics ,Motion estimation ,Image Interpretation, Computer-Assisted ,Environmental science ,Scattering, Radiation ,Computer Simulation ,Bilateral filter ,business ,Artifacts ,Rain and snow mixed ,Algorithms ,Remote sensing - Abstract
Images/videos captured from optical devices are usually degraded by turbid media such as haze, smoke, fog, rain and snow. Haze is the most common problem in outdoor scenes because of the atmosphere conditions. This paper proposes a novel single image-based dehazing framework to remove haze artifacts from images, where we propose two novel image priors, called the pixel-based dark channel prior and the pixel-based bright channel prior. Based on the two priors with the haze optical model, we propose to estimate atmospheric light via haze density analysis. We can then estimate transmission map, followed by refining it via the bilateral filter. As a result, high-quality haze-free images can be recovered with lower computational complexity compared with the state-of-the-art approach based on patch-based dark channel prior.
- Published
- 2013
22. Automatic trimap generation for digital image matting
- Author
-
Chang-Lin Hsieh and Ming-Sui Lee
- Subjects
business.industry ,Computer science ,Process (computing) ,Pattern recognition ,Image processing ,Image segmentation ,Image (mathematics) ,Upsampling ,Reduction (complexity) ,Digital image ,Computer vision ,Segmentation ,Artificial intelligence ,business - Abstract
Digital image matting is one of the most popular topics in image processing in recent years. For most matting methods, trimap serves as one of the key inputs, and the accuracy of the trimap affects image matting result a lot. Most existing works did not pay much attention to acquiring a trimap; instead, they assumed that the trimap was given, meaning the matting process usually involved users' inputs. In this paper, an automatic trimap generation technique is proposed. First, the contour of the segmentation result is dilated to get an initial guess of the trimap followed by alpha estimation. Then, a smart brush with dynamic width is performed by analyzing the structure of the foreground object to generate another trimap. In other words, the brush size is enlarged if the object boundary contains fine details like hair, fur, etc. On the contrary, the brush size gets smaller if the contour of the object is just a simple curve or straight line. Moreover, by combining the trimap obtained in step one and downsampling the image, the uncertainty is defined as the blurred region, and the third trimap is formed. The final step is to combine these three trimaps together by voting. The experimental results show that the trimap generated by the proposed method effectively improves the matting result. Moreover, the enhancement of the accuracy of the trimap results in a reduction of regions to be processed, so that the matting procedure is accelerated.
- Published
- 2013
- Full Text
- View/download PDF
23. IrotateGrasp
- Author
-
Ming-Sui Lee, Yen-Ting Liu, Fang-I Hsiao, Yi-Ching Chiu, Mike Y. Chen, Lung-Pan Cheng, Che-Yang Wu, Meng Han Lee, and Hsiang-Sheng Liang
- Subjects
business.industry ,Computer science ,GRASP ,Usability ,law.invention ,Touchscreen ,law ,Orientation (geometry) ,Computer vision ,Adaptive user interface ,Artificial intelligence ,business ,Mobile device ,Rotation (mathematics) - Abstract
Automatic screen rotation improves viewing experience and usability of mobile devices, but current gravity-based approaches do not support postures such as lying on one side, and manual rotation switches require explicit user input. iRotateGrasp automatically rotates screens of mobile devices to match users' viewing orientations based on how users are grasping the devices. Our insight is that users' grasps are consistent for each orientation, but significantly differ between different orientations. Our prototype used a total of 44 capacitive sensors along the four sides and the back of an iPod Touch, and uses support vector machine (SVM) to recognize grasps at 25Hz. We collected 6-users' usage under 108 different combinations of posture, orienta-tion, touchscreen operation, and left/right/both hands. Our offline analysis showed that our grasp-based approach is promising, with 80.9% accuracy when training and testing on different users, and up to 96.7% if users are willing to train the system. Our user study (N=16) showed that iRo-tateGrasp had an accuracy of 78.8% and was 31.3% more accurate than gravity-based rotation.
- Published
- 2013
- Full Text
- View/download PDF
24. Noncontact respiratory measurement of volume change using depth camera
- Author
-
Meng-Chieh Yu, Jia-Ling Liou, Shuenn-Wen Kuo, Yi-Ping Hung, and Ming-Sui Lee
- Subjects
Thorax ,medicine.medical_specialty ,Correlation coefficient ,business.industry ,medicine.medical_treatment ,Reproducibility of Results ,Sensitivity and Specificity ,Standard deviation ,law.invention ,Imaging, Three-Dimensional ,law ,Image Interpretation, Computer-Assisted ,medicine ,Respiratory Mechanics ,Humans ,Thoracotomy ,Radiology ,Respiratory system ,business ,Lung Volume Measurements ,Thoracic Wall ,Respiratory minute volume ,Spirometer ,Biomedical engineering ,Volume (compression) - Abstract
In this study, a system is developed to measure human chest wall motion for respiratory volume estimation without any physical contact. Based on depth image sensing technique, respiratory volume is estimated by measuring morphological changes of the chest wall. We evaluated the system and compared with a standard reference device, and the results show strong agreement in respiratory volume measurement [correlation coefficient: r=0.966]. The isovolume test presents small variations of the total respiratory volume during the isovolume maneuver (standard deviation
- Published
- 2013
25. Multiparameter Sleep Monitoring Using a Depth Camera
- Author
-
Yi-Ping Hung, Ming-Sui Lee, Jia-Ling Liou, Huan Wu, and Meng-Chieh Yu
- Subjects
medicine.anatomical_structure ,Supine position ,Respiratory rate ,Sleep monitoring ,Computer science ,medicine ,Breathing ,Sleep position ,Body movement ,Sleep (system call) ,Torso ,Simulation - Abstract
In this study, a depth analysis technique was developed to monitor user’s breathing rate, sleep position, and body movement while sleeping without any physical contact. A cross-section method was proposed to detect user’s head and torso from the sequence of depth images. In the experiment, eight participants were asked to change the sleep positions (supine and side-lying) every fifteen breathing cycles on the bed. The results showed that the proposed method is promising to detect the head and torso with various sleeping postures and body shapes. In addition, a realistic over-night sleep monitoring experiment was conducted. The results demonstrated that this system is promising to monitor the sleep conditions in realistic sleep conditions and the measurement accuracy was better than the first experiment. This study is important for providing a non-contact technology to measure multiple sleep conditions and assist users in better understanding of his sleep quality.
- Published
- 2013
- Full Text
- View/download PDF
26. Multimedia-assisted breathwalk-aware system
- Author
-
Yi-Ping Hung, Ming-Sui Lee, Huan Wu, and Meng-Chieh Yu
- Subjects
Adult ,Engineering ,media_common.quotation_subject ,Biomedical Engineering ,Mobile computing ,Wearable computer ,Monitoring, Ambulatory ,Pilot Projects ,Walking ,computer.software_genre ,Feedback ,Humans ,Meditation ,Pace ,media_common ,Multimedia ,business.industry ,Respiration ,Signal Processing, Computer-Assisted ,Equipment Design ,Awareness ,Shoes ,Preferred walking speed ,Gait analysis ,Breathing ,business ,computer ,Mobile device - Abstract
Breathwalk is a science of combining specific patterns of footsteps synchronized with the breathing. In this study, we developed a multimedia-assisted Breathwalk-aware system which detects user's walking and breathing conditions and provides appropriate multimedia guidance on the smartphone. Through the mobile device, the system enhances user's awareness of walking and breathing behaviors. As an example application in slow technology, the system could help meditator beginners learn “walking meditation,” a type of meditation which aims to be as slow as possible in taking pace, to synchronize footstep with breathing, and to land every footstep with toes first. In the pilot study, we developed a walking-aware system and evaluated whether multimedia-assisted mechanism is capable of enhancing beginner's walking awareness while walking meditation. Experimental results show that it could effectively assist beginners in slowing down the walking speed and decreasing incorrect footsteps. In the second experiment, we evaluated the Breathwalk-aware system to find a better feedback mechanism for learning the techniques of Breathwalk while walking meditation. The experimental results show that the visual-auditory mechanism is a better multimedia-assisted mechanism while walking meditation than visual mechanism and auditory mechanism.
- Published
- 2012
27. A low-complexity upsampling technique for H.264
- Author
-
Ming-Sui Lee and Wei-Chi Chen
- Subjects
Motion compensation ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Data_CODINGANDINFORMATIONTHEORY ,Upsampling ,Motion estimation ,Maximum a posteriori estimation ,Codec ,Computer vision ,Artificial intelligence ,business ,Algorithm ,Image resolution ,Block (data storage) ,Reference frame ,Block-matching algorithm ,Interpolation - Abstract
A hybrid up-sampling algorithm based on the predicted modes of H.264/AVC is proposed in this paper. Other than video codecs like MPEG group, H.264/AVC utilizes variable block size for motion estimation and motion compensation, which results in better precision and compression efficiency. According to the mode decision built in H.264/AVC, the macroblocks of each frame are divided into intra mode, skip mode and others. For intra-mode macroblocks which contain more details, they are up-sampled by MAP (maximum a posteriori) since this method has best performance among existing super resolution algorithms. For macroblocks coded as skip mode, they are assumed to be highly correlated to macroblocks in the reference frame. Thus those blocks are duplicated from those referenced blocks. For the rest of the macroblocks, they not only have correspondence with blocks in other frames but also contain relatively complicated content so that they are further analyzed into variable block sizes, say 16×16, 8×16, 16×8, 8×8, 8×4, 4×8 and 4×4. By adopting different up-sampling methods adaptively with variable block sizes, the proposed method saves computational efforts of smoother blocks for complicated blocks so that the overall complexity can be successfully reduced. Comparing to traditional frame-based up-sampling methods, the experimental results demonstrated that the proposed algorithm provides a more efficient way to up-sample videos and is capable of preserving satisfactory visual quality.
- Published
- 2011
- Full Text
- View/download PDF
28. i − m − Breath: The Effect of Multimedia Biofeedback on Learning Abdominal Breath
- Author
-
Yi-Ping Hung, Ming-Sui Lee, Jin-Shing Chen, Su-Chu Hsu, King-Jen Chang, and Meng-Chieh Yu
- Subjects
medicine.medical_specialty ,Computer science ,medicine.medical_treatment ,digestive, oral, and skin physiology ,Breathing ,medicine ,Physical therapy ,Visual feedback ,Biofeedback ,Simulation - Abstract
Breathing is a natural and important exercise for human beings, and the right breath method can make people healthier and even happier. i-m-Breath was developed to assist users in learning of abdominal breath, which used Respiration Girth Sensors (RGS) to measure user's breath pattern and provided visual feedback to assist in learning abdominal breath. In this paper, we tried to study the effect of biofeedback mechanism on learning of abdominal breath. We cooperated with College of Medicine in National Taiwan University to take the experiments to explore whether the biofeedback mechanism affect the learning of abdominal breath. The results of the experiments showed that i-m-Breath could help people in improving the breath habit from chest breath to abdominal breath, and in the future the system will be used the hospital. Finally, this study is important for providing a biofeedback mechanism to assist users in better understanding of his breath pattern and improving the breath habit.
- Published
- 2011
- Full Text
- View/download PDF
29. i-m-Space
- Author
-
Pei-Hsuan Chou, Jin-Yao Lin, Mike Y. Chen, Szu-Wei Wu, Shih-Yao Lin, Ju-Chun Ko, King-Jen Chang, Sue-Huei Chen, Wei-Ting Peng, Wei-Han Chen, Mei-Lan Chang, Chia Han Chang, Han-Hung Lin, Jin-Shing Chen, Yi-Ping Hung, Ming-Sui Lee, I-Ling Hu, Meng-Chieh Yu, and Yi-Yu Chung
- Subjects
medicine.medical_specialty ,Rehabilitation ,Multimedia ,Computer science ,business.industry ,medicine.medical_treatment ,Flexibility (personality) ,Traditional therapy ,Space (commercial competition) ,Biofeedback ,computer.software_genre ,medicine.disease ,Breast cancer ,medicine ,Medical physics ,business ,computer ,Interactive media - Abstract
This paper presents i-m-Space, an interactive multimedia rehabilitation space that helps the post-surgery recovery of breast cancer patients. Our goal is to improve patients' physical therapy and psychological relaxation experience through careful applications of multimedia technology. i-m-Space consists of three types of breathing-based relaxation and three types for interactive exercise-based rehabilitation. Our inter-disciplinary team includes medical professionals, multimedia engineers, designers, and artists. We have implemented i-m-Space in an experimental space in collaboration a local breast cancer foundation. To evaluation i-m-Space, we have recruited several patients who recently recovered from breast cancer to use i-m-Space and to share their first-hand experiences. Our contributions includes the following: 1) injecting a sense of fun and playfulness into traditional therapy to attract patients; 2) providing therapists with sufficient flexibility so they can personalize therapy sessions for each patient; 3) maintaining safety of patients.
- Published
- 2010
- Full Text
- View/download PDF
30. Image recovery of geometric distortion with multi-bit data embedding
- Author
-
Ming-Sui Lee and Yu-Hsiang Chiu
- Subjects
Discrete wavelet transform ,Transmission (telecommunications) ,Robustness (computer science) ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Bit error rate ,Computer vision ,Artificial intelligence ,business ,Digital watermarking ,Object detection ,Image (mathematics) - Abstract
Image transmission is sometimes accompanied with geometric distortions. A novel image recovery scheme with multi-bit binary message is proposed in this paper. In the proposed scheme, several predefined templates are introduced to an image in the discrete wavelet transform domain. A blind template detection algorithm is performed on the geometrically distorted image to extract locations of the templates and the hidden message, which are modeled probabilistically in a Bayesian network. Once the locations of the templates are successfully detected, they serve as the registration references in the recovering process. As a result, the image attacked by geometric distortions can be recovered according to the estimated displacements. The goal of this project is to develop a scheme to correct various geometric distortions with relatively lower bit error rate.
- Published
- 2010
- Full Text
- View/download PDF
31. Touching the void
- Author
-
Hui-Shan Kao, Jane Hsu, Yi-Ping Hung, Liwei Chan, Ming-Sui Lee, and Mike Y. Chen
- Subjects
Multimedia ,Human–computer interaction ,Computer science ,Interface (computing) ,Direct touch ,Object (computer science) ,computer.software_genre ,computer ,Task (project management) - Abstract
In this paper, we explore the challenges in applying and investigate methodologies to improve direct-touch interaction on intangible displays. Direct-touch interaction simplifies object manipulation, because it combines the input and display into a single integrated interface. While traditional tangible display-based direct-touch technology is commonplace, similar direct-touch interaction within an intangible display paradigm presents many challenges. Given the lack of tactile feedback, direct-touch interaction on an intangible display may show poor performance even on the simplest of target acquisition tasks. In order to study this problem, we have created a prototype of an intangible display. In the initial study, we collected user discrepancy data corresponding to the interpretation of 3D location of targets shown on our intangible display. The result showed that participants performed poorly in determining the z-coordinate of the targets and were imprecise in their execution of screen touches within the system. Thirty percent of positioning operations showed errors larger than 30mm from the actual surface. This finding triggered our interest to design a second study, in which we quantified task time in the presence of visual and audio feedback. The pseudo-shadow visual feedback was shown to be helpful both in improving user performance and satisfaction.
- Published
- 2010
- Full Text
- View/download PDF
32. A Content-Adaptive Method for Single Image Dehazing
- Author
-
Ming-Sui Lee and Chao-Tsung Chu
- Subjects
Haze ,Pixel ,Transmission (telecommunications) ,business.industry ,Computer science ,Computer vision ,Artificial intelligence ,Content adaptive ,Single image ,business ,Image restoration ,Image (mathematics) - Abstract
A content adaptive method for single image dehazing is proposed in this work. Since the degradation level affected by haze is related to the depth of the scene and pixels in each specific part of the image (such as trees, buildings or other objects) tend to have similar depth to the camera, we assume that the degradation level affected by haze of each region is the same That is, the transmission in each region should be similar as well. Based on this assumption, each input image is segmented into different regions and transmission is estimated for each region followed by refinement by soft matting. As a result, the hazy images can be successfully recovered. The experimental results demonstrate that the proposed method performs satisfactorily.
- Published
- 2010
- Full Text
- View/download PDF
33. Transformational Breathing between Present and Past: Virtual Exhibition System of the Mao-Kung Ting
- Author
-
Chia-Ping Chen, Meng-Chieh Yu, Szu-Wei Wu, Xin Tong, Han-Hung Lin, Ju-Chun Ko, I-Ling Liu, Jiaping Wang, Yi-Ping Hung, Ming-Sui Lee, Chun-Ko Hsieh, Liang-Chun Lin, Yi-Yu Chung, Quo-Ping Lin, and Chu-Song Chen
- Subjects
Exhibition ,Artifact (archaeology) ,Transformational leadership ,Computer science ,Narrative ,Historical document ,Visual arts - Abstract
The Mao-Kung Ting is one of the most precious artifacts in the National Palace Museum. Having five-hundred-character inscription cast inside, the Mao-Kung Ting is regarded as a very important historical document, dating back to 800 B.C.. Motivated by revealing the great nature of the artifact and interpreting it into a meaningful narrative, we have proposed an innovative Virtual Exhibition System to facilitate communication between the Mao-Kung Ting and audiences. Consequently, we develop the Virtual Exhibition system into the following scenarios: “Breathing through the History” and “View-dependent display”.
- Published
- 2010
- Full Text
- View/download PDF
34. An Efficient Upsampling Technique for Images and Videos
- Author
-
Chen-Wei Chang and Ming-Sui Lee
- Subjects
business.industry ,Frame (networking) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Subpixel rendering ,Reduction (complexity) ,Upsampling ,Discrete cosine transform ,Computer vision ,Artificial intelligence ,business ,Image resolution ,Block (data storage) ,Reference frame ,Mathematics - Abstract
A block-based upsampling method for images and videos is proposed in this work. Block classification is first conducted in the DCT domain to categorize 8x8 image blocks into several types: smooth areas, edges and others. For the plain background and smooth surfaces, simple patches are used to enlarge the image size without degrading the resultant visual quality. Since human eyes are more sensitive to edges, a more sophisticated technique is applied to edge blocks. They are approximated by a facet model so that the image data at subpixel positions can be generated accordingly. By taking temporal information into account, this concept can further be applied to videos. To upsample an image block in the current frame, we may borrow the upsampled version of the corresponding block in the reference frame if the residual is tolerable. Experimental results are shown to demonstrate the great reduction of computational complexity while the output visual quality still remains satisfactory.
- Published
- 2009
- Full Text
- View/download PDF
35. QPalm: A gesture recognition system for remote control with list menu
- Author
-
Yi-Ping Hung, Ming-Sui Lee, Ju-Chun Ko, Yu-Hsin Chang, Jane Yung-jen Hsu, and Liwei Chan
- Subjects
Computer science ,Machine vision ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Interaction technique ,law.invention ,Digital media ,Circular motion ,law ,Human–computer interaction ,Gesture recognition ,Scrolling ,Computer vision ,Artificial intelligence ,business ,Remote control ,Stereo camera - Abstract
The coming ubiquity of digital media content is driving the need of a solution for improving the interaction between the people and media. In this work, we proposed a novel interaction technique, QPalm, which allows the user to control the media via a list menu shown on a distant display by drawing circles in the air with one hand. To manipulate a list menu remotely, QPalm includes two basic functions, browse and choosing, realized by recognizing the userpsilas palm performing circular and push motions in the air. The circular motion provides fluidity in scrolling a menu up and down, while push motion is intuitive when the user decided to choose an item during a circular motion. Based on this design, we develop a vision system based on a stereo camera to track the userpsilas palm without interfering by intruders behind or next to the operating user. For more specifically, the contribution of the work includes: (1) an intuitive interaction technique, QPalm, for remote control with list menu, and (2) a palm tracking algorithm to support QPalm based on merely depth and motion information of images for a practical consideration.
- Published
- 2008
- Full Text
- View/download PDF
36. A Content-Adaptive Up-Sampling Technique for Image Resolution Enhancement
- Author
-
C.-C.J. Kuo, Mei-Yin Shen, and Ming-Sui Lee
- Subjects
Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Edge enhancement ,Subpixel rendering ,Image (mathematics) ,Upsampling ,Discrete cosine transform ,Computer vision ,Artificial intelligence ,business ,Image resolution ,Unsharp masking ,Block (data storage) - Abstract
A content adaptive technique is proposed to upsample an image to an output image of higher resolution in this work. The proposed technique is a block-based processing algorithm that offers the flexibility in choosing the most suitable up-sampling method for a particular block type. Block classification is first conducted in the DCT domain to categorized each image block into several types: smooth areas, textures, edges and others. For the plain background and smooth surfaces, simple patches are used to enlarge the image size without degrading the resultant visual quality. The unsharp masking method is applied to the textured region to preserve high frequency components. Since human eyes are more sensitive to edges, we adopt a more sophisticated technique to process edge blocks. That is, they are approximated by a facet model so that the image data at subpixel positions can be generated accordingly. A post-processing technique such as ID directional unsharp masking can be used to enhance edge sharpness furthermore. Experimental results are given to demonstrate the efficiency of the proposed technique.
- Published
- 2007
- Full Text
- View/download PDF
37. A Quad-Tree Decomposition Approach to Cartoon Image Compression
- Author
-
Yi-Chen Tsai, Mei-Yin Shen, C.-C.J. Kuo, and Ming-Sui Lee
- Subjects
Lossless compression ,Computer science ,business.industry ,Search engine indexing ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,File size ,Shape coding ,Computer vision ,Entropy encoding ,Artificial intelligence ,business ,ComputingMethodologies_COMPUTERGRAPHICS ,Color Cell Compression ,Data compression ,Image compression - Abstract
A quad-tree decomposition approach is proposed for cartoon image compression in this work. The proposed algorithm achieves excellent coding performance by using a unique quad-tree decomposition and shape coding method along with a GIF like color indexing technique to efficiently encode large areas of the same color, which appear in a cartoon-type image commonly. To reduce complexity, the input image is partitioned into small blocks and the quad-tree decomposition is independently applied to each block instead of the entire image. The LZW entropy coding method can be performed as a post-processing step to further reduce the coded file size. It is demonstrated by experimental results that the proposed method outperforms several well-known lossless image compression techniques for cartoon images that contain 256 colors or less.
- Published
- 2006
- Full Text
- View/download PDF
38. A DCT-Domain Video Alignment Technique for MPEG Sequences
- Author
-
Mei-Yin Shen, C.-C.J. Kuo, and Ming-Sui Lee
- Subjects
Motion compensation ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Image registration ,Edge detection ,Motion estimation ,Computer Science::Multimedia ,Discrete cosine transform ,Computer vision ,Artificial intelligence ,business ,Group of pictures ,Mathematics ,Block-matching algorithm ,Data compression - Abstract
An image/video registration technique for multiple compressed video inputs such as MPEG sequences is investigated. The proposed technique is based on the matching of discrete cosine transform (DCT) coefficients and motion vectors. First, the I frame of each input sequence is separated into the background and moving objects. For the background, coarse edge features are extracted by applying edge detectors of different characteristics to the luminance DC coefficients. Each detector generates a difference map for a single background. A threshold is determined for each difference map to produce a binary map. Then, alignment parameters are determined using the binary maps of input images generated by the same detector. For the moving object, alignment parameters can be finetuned by the motion information of all frames in the same group of pictures (GOP). Finally, the actual displacement in the pixel domain is estimated by the weighted average of alignment parameters from all background detectors and refinement parameters from motion information. It is shown by experimental results that the proposed method reduces the computational cost of image/video registration significantly in comparison with the traditional pixel domain registration techniques while achieving certain quality of composition
- Published
- 2005
- Full Text
- View/download PDF
39. A reduced color approach to high quality cartoon coding
- Author
-
Mei-Yin Shen, Ming-Sui Lee, C.-C. Jay Kuo, and Yi-Chen Tsai
- Subjects
Color histogram ,Pixel ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Vector quantization ,Color balance ,Image processing ,Color quantization ,Web colors ,Computer vision ,Artificial intelligence ,business ,ComputingMethodologies_COMPUTERGRAPHICS ,Image compression ,Data compression - Abstract
An algorithm that integrates table indexing and quad-tree decomposition is proposed for cartoon image compression in this work. The proposed method includes 3 steps. First, colors in the color palette are selected based on the input image and a set of training images. Second, the input image is divided into blocks of size 16 by 16. The number of colors inside each block is checked. If the block has one uniform color or exactly two colors, no further processing is required. Otherwise, quad-tree decomposition will be performed for this block. The subdivision continues until all subblocks have either one or two colors. Then, the code for each subblock will be output in a depth-first order. If the subblock size reaches 2 x 2 and the number of colors in that block is still more than 2, no further subdivision is performed and a code that indicates colors of 4 pixels are output. Finally, to further reduce the size, the data part of the output stream is losslessly compressed by the LZW method. Experimental results are given to demonstrate the superior performance of the proposed method for cartoon image compression.
- Published
- 2005
- Full Text
- View/download PDF
40. DCT-Domain Image Registration Techniques for Compressed Video
- Author
-
Mei-Yin Shen, C.-C. Jay Kuo, Ming-Sui Lee, and Akio Yoneyama
- Subjects
Pixel ,business.industry ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Image registration ,Pattern recognition ,Edge detection ,Motion JPEG ,Computer Science::Computer Vision and Pattern Recognition ,Computer Science::Multimedia ,Discrete cosine transform ,Computer vision ,Artificial intelligence ,business ,Transform coding ,Data compression ,Mathematics - Abstract
A technique for image registration in compressed video, such as motion JPEG or the I-picture of MPEG, is investigated. The technique is based on DCT (discrete cosine transform) coefficient matching. First, the coarse edge features are extracted by applying several edge detectors to luminance DC coefficients. Each detector generates one difference map for a single input image. A threshold is set up for each difference map to produce a binary map. Then, the alignment parameters are determined based on the binary maps of both input images generated by the same detector. Finally, the actual displacement in the pixel domain is calculated by averaging parameters from all detectors. Experimental results show that the proposed method reduces the computational cost of image registration dramatically as compared with the pixel domain and edge-based DCT domain registration techniques, while achieving a certain quality of composition.
- Published
- 2005
- Full Text
- View/download PDF
41. Compressed-domain registration techniques for MPEG video
- Author
-
C.-C. Jay Kuo, Mei-Yin Shen, and Ming-Sui Lee
- Subjects
Pixel ,business.industry ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Image registration ,Image processing ,Edge detection ,Displacement (vector) ,Filter (video) ,Computer Science::Computer Vision and Pattern Recognition ,Motion estimation ,Computer vision ,Artificial intelligence ,business ,Mathematics - Abstract
A multi-scale DCT-domain image registration technique for two MPEG video inputs is proposed in this work. Several edge detectors are first applied to the luminance component of DC coefficients to generate the so-called difference maps for each input image. Then, a threshold is selected for each difference map to filter out regions of lower activity. Following that, we estimate the displacement parameters by examining the difference maps of the two input images associated with the same edge detector. Finally, the ultimate displacement vector is calculated by averaging the parameters from all detectors. In order to reach higher quality of the output mosaic, 1D alignment is locally applied to pixels around the boundaries of displacement that is decided in the previous step. It is shown that the proposed method reduces the computation complexity dramatically as compared to pixel-based image registration techniques while reaching a satisfactory result in composition. Moreover, we discuss how the overlapping region affects the quality of alignment.
- Published
- 2005
- Full Text
- View/download PDF
42. DCT-domain image registration techniques for compressed video
- Author
-
Mei-Yin Shen, Ming-Sui Lee, and C.-C. Jay Kuo
- Subjects
Orientation (computer vision) ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Image registration ,Image processing ,Filter (signal processing) ,Image segmentation ,Motion JPEG ,Computer Science::Computer Vision and Pattern Recognition ,Computer Science::Multimedia ,Discrete cosine transform ,Computer vision ,Artificial intelligence ,business ,Image compression - Abstract
An image registration technique for compressed video such as motion JPEG or the I picture of MPEG based on the matching of DCT (Discrete Cosine Transform) coefficients is investigated in this research. Several simple features such as the DC value and a couple of low-frequency AC coefficients in the DCT domain are first extracted to indicate the edge strength and orientation inside each block for the image alignment purpose. Next, we conduct a coarse-level image segmentation task to filter out irrelevant regions. Then, for the regions of interest, we perform a more detail analysis to get the edge map. Finally, the alignment parameters are determined based on the information contained by the edge map. It is shown by experimental results that the proposed method reduces the computational cost of image registration dramatically as compared with the pixel domain registration technique while achieving certain quality of composition.
- Published
- 2004
- Full Text
- View/download PDF
43. Pixel- and compressed-domain color matching techniques for video mosaic applications
- Author
-
Mei-Yin Shen, C.-C.J. Kuo, and Ming-Sui Lee
- Subjects
Color histogram ,Pixel ,Color difference ,Color normalization ,Computer science ,business.industry ,Color image ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Normalization (image processing) ,Video compression picture types ,High color ,Color depth ,Discrete cosine transform ,Computer vision ,Artificial intelligence ,business ,Histogram equalization - Abstract
Several color matching algorithms are proposed to merge two or more video inputs of smaller sizes into one single video output of a larger size with a wider field of view for the video mosaic application. The main challenge is to remove apparent seam lines between image boundaries. All developed algorithms share the same basic idea with different implementation details. That is, color differences between input images are first compensated using either histogram equalization or polynomial-based contrast stretching techniques. Then, a linear filtering technique is adopted to remove seam lines between image boundaries. The algorithms are developed in both the pixel and the DCT domains. The compressed-domain processing is attractive since it reduces the computational complexity. It is shown by experimental results that the color matching problem can be solved satisfactorily even in the compressed domain.
- Published
- 2004
- Full Text
- View/download PDF
44. Color matching techniques for video mosaic applications
- Author
-
Mei-Yin Shen, Ming-Sui Lee, and C.-C. Jay Kuo
- Subjects
Pixel ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Normalization (image processing) ,Image registration ,Image segmentation ,Computer Science::Computer Vision and Pattern Recognition ,Computer Science::Multimedia ,Discrete cosine transform ,Computer vision ,Artificial intelligence ,business ,Transform coding ,Data compression - Abstract
Color matching techniques are proposed to merge two or more video inputs of smaller sizes into one single larger output video with a wider field of view for video mosaic applications. The main challenge is to remove the seam lines between image boundaries due to the different color tone of the inputs. In this paper, color differences between input images are first compensated using the polynomial-based contrast stretching technique. Then, a linear filtering technique is adopted to remove the seam lines. The algorithms are developed in both the pixel domain and the DCT (discrete cosine transform) domain. The second approach is attractive for its lower computational complexity. Experimental results demonstrate that the color-matching problem can be satisfactorily solved in the compressed domain even when the DCT blocks of original input images are not aligned and also for the images taken with camera movement as long as image registration is done in advance. The proposed approach is applicable for MPEG2 video.
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.