Author: "Yong-Jin Liu" / Topic: artificial intelligence - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Yong-Jin Liu"' showing total 72 results

Start Over Author "Yong-Jin Liu" Topic artificial intelligence

72 results on '"Yong-Jin Liu"'

1. Motif-GCNs With Local and Non-Local Temporal Blocks for Skeleton-Based Action Recognition

Author: Yu-Hui Wen, Lin Gao, Hongbo Fu, Fang-Lue Zhang, Shihong Xia, and Yong-Jin Liu
Subjects: Computational Theory and Mathematics, Artificial Intelligence, Applied Mathematics, Computer Vision and Pattern Recognition, Software
Abstract: Recent works have achieved remarkable performance for action recognition with human skeletal data by utilizing graph convolutional models. Existing models mainly focus on developing graph convolutions to encode structural properties of the skeletal graph. Some recent works further take sample-dependent relationships among joints into consideration. However, the complex relationships are difficult to learn. In this paper, we propose a motif-based graph convolution method, which makes use of sample-dependent latent relations among non-physically connected joints to impose a high-order locality and assigns different semantic roles to physical neighbors of a joint to encode hierarchical structures. Furthermore, we propose a sparsity-promoting loss function to learn a sparse motif adjacency matrix for latent dependencies in non-physical connections. For extracting effective temporal information, we propose an efficient local temporal block. It adopts partial dense connections to reuse temporal features in local time windows, and enrich a variety of information flow by gradient combination. In addition, we introduce a non-local temporal block to capture global dependencies among frames. Comprehensive experiments on four large-scale datasets show that our model outperforms the state-of-the-art methods. Our code is publicly available at https://github.com/wenyh1616/SAMotif-GCN.
Published: 2023

2. SparseDGCNN: Recognizing Emotion From Multichannel EEG Signals

Author: Guanhua Zhang, Guozhen Zhao, Minjing Yu, Wenming Zheng, Yong-Jin Liu, and Dan Zhang
Subjects: medicine.diagnostic_test, Computer science, business.industry, Pattern recognition, Spectral bands, Electroencephalography, Convolutional neural network, Human-Computer Interaction, Constraint (information theory), Discriminative model, Scalability, medicine, Graph (abstract data type), Artificial intelligence, Affective computing, business, Software
Abstract: Emotion recognition from EEG signals has attracted much attention in affective computing. Recently, a novel dynamic graph convolutional neural network (DGCNN) model was proposed, which simultaneously optimized the network parameters and a weighted graph G characterizing the strength of functional relation between each pair of two electrodes in the EEG recording equipment. In this paper, we propose a sparse DGCNN model which improves the DGCNN by imposing a sparseness constraint on G. Our work is based on an important observation: the tomography study reveals that different brain regions sampled by EEG electrodes may be related to different functions of the brain and then the functional relations among electrodes are possibly highly localized and sparse. However, introducing sparseness constraint into the graph G makes the loss function of sparse DGCNN non-differentiable at some singular points. To ensure that the training process of sparse DGCNN converges, we apply the forward-backward splitting method. To evaluate the performance of sparse DGCNN, we compare it with four representative recognition methods as well as different features and spectral bands. The results show that (1) sparse DGCNN has consistently better accuracy than representative methods and has a good scalability, and (2) DE, PSD and ASM features on γ bands convey most discriminative emotional information, and fusion of separate features and frequency bands can improve recognition performance.
Published: 2023

3. Multi-Target Positive Emotion Recognition From EEG Signals

Author: Yulin Zhang, Guozhen Zhao, Yong-Jin Liu, Dan Zhang, and Guanhua Zhang
Subjects: medicine.diagnostic_test, Computer science, business.industry, 05 social sciences, Feature extraction, Linear model, Pattern recognition, Regression analysis, Electroencephalography, 050105 experimental psychology, Regression, Human-Computer Interaction, 03 medical and health sciences, 0302 clinical medicine, Ranking, Task analysis, medicine, 0501 psychology and cognitive sciences, Artificial intelligence, business, 030217 neurology & neurosurgery, Software, Rank correlation
Abstract: Compared with the widely studied negative emotions in which different classes are easy to distinguish, nowadays less attention is paid to the recognition of positive emotions that are not fully independent. In this paper, we propose to recognize multiple positive emotions by analyzing brain activities and explore the neural representation of different positive emotions. Thirty-seven participants volunteered to participate in our study, in which their brain activities were recorded when watching five selected film clips. First, 150 well-known power features extracted from Electroencephalography (EEG) signals and 105 multimedia content analysis features were collected as the pool of candidate features. Second, based on the collected features, we propose to use a linear model and a nonlinear model to predict the percentage of five positive emotions. Then, percentage values were converted to ranking numbers and Kendall rank correlation coefficients were calculated. Our results showed that (1) ensemble of regressor chains using LSTM as unit regressor obtained both the best regression results and the best Kendall rank correlation coefficient on EEG features merely, and (2) top features from alpha frequency bands of EEG signals could represent different positive emotions. These results demonstrate the effectiveness of selective EEG features on recognizing different positive emotions.
Published: 2023

4. Quality Metric Guided Portrait Line Drawing Generation From Unpaired Training Data

Author: Ran Yi, Yong-Jin Liu, Yu-Kun Lai, and Paul L. Rosin
Subjects: FOS: Computer and information sciences, Computer Science - Graphics, Computational Theory and Mathematics, Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Applied Mathematics, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Computer Science - Computer Vision and Pattern Recognition, Computer Vision and Pattern Recognition, Graphics (cs.GR), Software
Abstract: Face portrait line drawing is a unique style of art which is highly abstract and expressive. However, due to its high semantic constraints, many existing methods learn to generate portrait drawings using paired training data, which is costly and time-consuming to obtain. In this paper, we propose a novel method to automatically transform face photos to portrait drawings using unpaired training data with two new features; i.e., our method can (1) learn to generate high quality portrait drawings in multiple styles using a single network and (2) generate portrait drawings in a "new style" unseen in the training data. To achieve these benefits, we (1) propose a novel quality metric for portrait drawings which is learned from human perception, and (2) introduce a quality loss to guide the network toward generating better looking portrait drawings. We observe that existing unpaired translation methods such as CycleGAN tend to embed invisible reconstruction information indiscriminately in the whole drawings due to significant information imbalance between the photo and portrait drawing domains, which leads to important facial features missing. To address this problem, we propose a novel asymmetric cycle mapping that enforces the reconstruction information to be visible and only embedded in the selected facial regions. Along with localized discriminators for important facial regions, our method well preserves all important facial features in the generated drawings. Generator dissection further explains that our model learns to incorporate face semantic information during drawing generation. Extensive experiments including a user study show that our model outperforms state-of-the-art methods., Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence, https://doi.org/10.1109/TPAMI.2022.3147570, code: https://github.com/yiranran/QMUPD
Published: 2023

5. PPR-Net++: Accurate 6-D Pose Estimation in Stacked Scenarios

Author: Yong-Jin Liu, Wei Lv, Zhi Kai Dong, and Long Zeng
Subjects: business.industry, Computer science, Supervised learning, Bandwidth (signal processing), Centroid, Pattern recognition, Function (mathematics), ComputingMethodologies_PATTERNRECOGNITION, Control and Systems Engineering, Margin (machine learning), Point (geometry), Artificial intelligence, Electrical and Electronic Engineering, Cluster analysis, business, Pose
Abstract: Most supervised learning-based pose estimation methods for stacked scenes are trained on massive synthetic datasets. In most cases, the challenge is that the learned network on the training dataset is no longer optimal on the testing dataset. To address this problem, we propose a pose regression network PPR-Net++. It transforms each scene point into a point in the centroid space, followed by a clustering process and a voting process. In the training phase, a mapping function between the network's critical parameter (i.e., the bandwidth of the clustering algorithm) and the compactness of the centroid distributions is obtained. This function is used to adapt the bandwidth between centroid distributions of two different domains. In addition, to further improve the pose estimation accuracy, the network also predicts the confidence of each point, based on its visibility and pose error. Only the points with high confidence have the right to vote for the final object pose. In experiments, our method is trained on the IPA synthetic dataset and compared with the state-of-the-art algorithm. When tested with the public synthetic Sileane dataset, our method is better in all eight objects, where five of them are improved by more than 5% in average precision (AP). On IPA real dataset, our method outperforms a large margin by 20%. This lays a solid foundation for robot grasping in industrial scenarios.
Published: 2022

6. GAN-Based Multi-Style Photo Cartoonization

Author: Wang Zhao, Yu-Kun Lai, Ran Yi, Yong-Jin Liu, Zipeng Ye, Yezhi Shu, Mengfei Xia, and Yang Chen
Subjects: Network architecture, Exploit, business.industry, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Semantics, Computer Graphics and Computer-Aided Design, GeneralLiterature_MISCELLANEOUS, Image (mathematics), Style (sociolinguistics), Signal Processing, Computer vision, Computer Vision and Pattern Recognition, Artificial intelligence, Architecture, business, Encoder, Software, ComputingMethodologies_COMPUTERGRAPHICS, Generator (mathematics)
Abstract: Cartoon is a common form of art in our daily life and automatic generation of cartoon images from photos is highly desirable. However, state-of-the-art single-style methods can only generate one style of cartoon images from photos and existing multi-style image style transfer methods still struggle to produce high-quality cartoon images due to their highly simplified and abstract nature. In this article, we propose a novel multi-style generative adversarial network (GAN) architecture, called MS-CartoonGAN, which can transform photos into multiple cartoon styles. MS-CartoonGAN uses only unpaired photos and cartoon images of multiple styles for training. To achieve this, we propose to use (1) a hierarchical semantic loss with sparse regularization to retain semantic content and recover flat shading in different abstract levels, (2) a new edge-promoting adversarial loss for producing fine edges, and (3) a style loss to enhance the difference between output cartoon styles and make training process more stable. We also develop a multi-domain architecture, where the generator consists of a shared encoder and multiple decoders for different cartoon styles, along with multiple discriminators for individual styles. By observing that cartoon images drawn by different artists have their unique styles while sharing some common characteristics, our shared network architecture exploits the common characteristics of cartoon styles, achieving better cartoonization and being more efficient than single-style cartoonization. We show that our multi-domain architecture can theoretically guarantee to output desired multiple cartoon styles. Through extensive experiments including a user study, we demonstrate the superiority of the proposed method, outperforming state-of-the-art single-style and multi-style image style transfer methods.
Published: 2022

7. An Efficient LSTM Network for Emotion Recognition From Multichannel EEG Signals

Author: Yong-Jin Liu, Hongan Wang, Jinyao Li, Guozhen Zhao, Xiaoming Deng, Yu-Kun Lai, Cuixia Ma, Xiaobing Du, and Guanhua Zhang
Subjects: Discriminator, medicine.diagnostic_test, business.industry, Computer science, Deep learning, Feature vector, Feature extraction, Pattern recognition, 02 engineering and technology, Electroencephalography, External Data Representation, Data modeling, Human-Computer Interaction, 03 medical and health sciences, 0302 clinical medicine, 0202 electrical engineering, electronic engineering, information engineering, medicine, Feature (machine learning), 020201 artificial intelligence & image processing, Artificial intelligence, business, 030217 neurology & neurosurgery, Software
Abstract: Most previous EEG-based emotion recognition methods studied hand-crafted EEG features extracted from different electrodes. In this paper, we study the relation among different EEG electrodes and propose a deep learning method to automatically extract the spatial features that characterize the functional relation between EEG signals at different electrodes. Our proposed deep model is called ATtention-based LSTM with Domain Discriminator (ATDD-LSTM) that can characterize nonlinear relations among EEG signals of different electrodes. To achieve state-of-the-art emotion recognition performance, the architecture of ATDD-LSTM has two distinguishing characteristics: (1) By applying the attention mechanism to the feature vectors produced by LSTM, ATDD-LSTM automatically selects suitable EEG channels for emotion recognition, which makes the learned model concentrate on the emotion related channels in response to a given emotion; (2) To minimize the significant feature distribution shift between different sessions and/or subjects, ATDD-LSTM uses a domain discriminator to modify the data representation space and generate domain-invariant features. We evaluate the proposed ATDD-LSTM model on three public EEG emotional databases (DEAP, SEED and CMEED) for emotion recognition. The experimental results demonstrate that our ATDD-LSTM model achieves superior performance on subject-dependent (for the same subject), subject-independent (for different subjects) and cross-session (for the same subject) evaluation.
Published: 2022

8. Deep Reinforcement Learning for Robot Collision Avoidance With Self-State-Attention and Sensor Fusion

Author: Yiheng Han, Irvin Haozhe Zhan, Wang Zhao, Jia Pan, Ziyang Zhang, Yaoyuan Wang, and Yong-Jin Liu
Subjects: Human-Computer Interaction, Control and Optimization, Artificial Intelligence, Control and Systems Engineering, Mechanical Engineering, Biomedical Engineering, Computer Vision and Pattern Recognition, Computer Science Applications
Published: 2022

9. Poisson Vector Graphics (PVG)-Guided Face Color Transfer in Videos

Author: Ying He, Fei Hou, Zhenchuan Huang, Qian Fu, Anxiang Zeng, Yong-Jin Liu, Qian Sun, and Juyong Zhang
Subjects: business.industry, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Poisson distribution, Computer Graphics and Computer-Aided Design, Vector graphics, symbols.namesake, Computer Science::Multimedia, symbols, Computer vision, Artificial intelligence, Diffusion curve, business, Software, Blossom algorithm
Abstract: This article presents a simple yet effective algorithm for automatically transferring face colors in portrait videos. We extract the facial features and vectorize the faces in the input video using Poisson vector graphics, which encodes the low-frequency colors as the boundary colors of diffusion curves, and the high-frequency colors as Poisson regions. Then, we transfer the face color of a reference image/video to the first frame of the input video by applying optimal mass transport between the boundary colors of diffusion curves. Next the boundary color of the first frame is transferred to the subsequent frames by matching the curves. Finally, with the original or modified Poisson regions, we render the video using an efficient random-access Poisson solver. Thanks to our efficient diffusion curve matching algorithm, transferring colors for the vectorized video takes less than 1 millisecond per frame. Our method is particularly desired for frequent transfer from multiple references due to its information reuse nature. The simple diffusion curve matching also greatly improves the performance of video vectorization, since we only need to solve an optimization problem for the first frame. Since our method does not require correspondence between the reference image/video and the input video, it is flexible and robust to handle faces with significantly different geometries and postures, which often pose challenges to the existing methods. Moreover, by manipulating Poisson regions, we can enhance or reduce the highlight and contrast so that the reference color can fit into the input video naturally. We demonstrate the efficacy of our method on image-to-video transfer and color swap in videos.
Published: 2021

10. SketchMaker: Sketch extraction and reuse for interactive scene sketch composition

Author: Fang Liu, Xiaoming Deng, Jiancheng Song, Yu-Kun Lai, Yong-Jin Liu, Hao Wang, Cuixia Ma, Shengfeng Qin, and Hongan Wang
Subjects: Human-Computer Interaction, Artificial Intelligence, W200
Abstract: Sketching is an intuitive and simple way to depict sciences with various object form and appearance characteristics. In the past few years, widely available touchscreen devices have increasingly made sketch-based human-AI co-creation applications popular. One key issue of sketch-oriented interaction is to prepare input sketches efficiently by non-professionals because it is usually difficult and time-consuming to draw an ideal sketch with appropriate outlines and rich details, especially for novice users with no sketching skills. Thus, sketching brings great obstacles for sketch applications in daily life. On the other hand, hand-drawn sketches are scarce and hard to collect. Given the fact that there are several large-scale sketch datasets providing sketch data resources, but they usually have a limited number of objects and categories in sketch, and do not support users to collect new sketch materials according to their personal preferences. In addition, few sketch-related applications support the reuse of existing sketch elements. Thus, knowing how to extract sketches from existing drawings and effectively re-use them in interactive scene sketch composition will provide an elegant way for sketch-based image retrieval (SBIR) applications, which are widely used in various touch screen devices. In this study, we first conduct a study on current SBIR to better understand the main requirements and challenges in sketch-oriented applications. Then we develop the SketchMaker as an interactive sketch extraction and composition system to help users generate scene sketches via reusing object sketches in existing scene sketches with minimal manual intervention. Moreover, we demonstrate how SBIR improves from composited scene sketches to verify the performance of our interactive sketch processing system. We also include a sketch-based video localization task as an alternative application of our sketch composition scheme. Our pilot study shows that our system is effective and efficient, and provides a way to promote practical applications of sketches.
Published: 2022

11. View planning in robot active vision: A survey of systems, algorithms, and applications

Author: Yong-Jin Liu, Rui Zeng, Yuhui Wen, and Wang Zhao
Subjects: robotic, active vision, 0209 industrial biotechnology, Computer science, media_common.quotation_subject, 02 engineering and technology, Computer graphics, next-best view, 020901 industrial engineering & automation, Artificial Intelligence, view planning, 0202 electrical engineering, electronic engineering, information engineering, Quality (business), Active vision, Pose, media_common, Cognitive neuroscience of visual object recognition, sensor planning, 020207 software engineering, QA75.5-76.95, Object (computer science), Computer Graphics and Computer-Aided Design, Electronic computers. Computer science, Robot, Computer Vision and Pattern Recognition, State (computer science), Algorithm
Abstract: Rapid development of artificial intelligence motivates researchers to expand the capabilities of intelligent and autonomous robots. In many robotic applications, robots are required to make planning decisions based on perceptual information to achieve diverse goals in an efficient and effective way. The planning problem has been investigated in active robot vision, in which a robot analyzes its environment and its own state in order to move sensors to obtain more useful information under certain constraints. View planning, which aims to find the best view sequence for a sensor, is one of the most challenging issues in active robot vision. The quality and efficiency of view planning are critical for many robot systems and are influenced by the nature of their tasks, hardware conditions, scanning states, and planning strategies. In this paper, we first summarize some basic concepts of active robot vision, and then review representative work on systems, algorithms and applications from four perspectives: object reconstruction, scene reconstruction, object recognition, and pose estimation. Finally, some potential directions are outlined for future work.
Published: 2020

12. Interactions With Reconfigurable Modular Robots Enhance Spatial Reasoning Performance

Author: Yulin Zhang, Guozhen Zhao, Chun Yu, Yong-Jin Liu, Yuanchun Shi, and Minjing Yu
Subjects: Self-reconfiguring modular robot, Measure (data warehouse), Computer science, Spatial ability, 05 social sciences, 050301 education, Spatial intelligence, Mental rotation, Task (project management), User studies, 03 medical and health sciences, 0302 clinical medicine, Behavioral data, Artificial Intelligence, Human–computer interaction, 0503 education, 030217 neurology & neurosurgery, Software
Abstract: Reconfigurable modular robots (RMRobots) can change their shape and functionality (e.g., locomotion styles) to fit different environments, and have been widely investigated in applications, such as exploration and inspection. In this paper, we present a new application of RMRobots for improving human spatial ability which plays a significant role in developing an individual’s performance and achievement in science, technology, engineering, and mathematics (STEM). Two user studies are conducted, and the results show that: 1) the task performance of interacting with RMRobots has a significant positive relationship with mental rotation, a widely used measure of spatial ability; and 2) interacting with RMRobots can effectively improve the performance on a task related to spatial reasoning skills according to behavioral data and electroencephalograph (EEG) indices. Our presented study broadens RMRobot research in the area of human-robot interaction.
Published: 2020

13. 3D-CariGAN: An End-to-End Solution to 3D Caricature Generation from Normal Face Photos

Author: Yanan Sun, Mengfei Xia, Zipeng Ye, Yu-Kun Lai, Minjing Yu, Ran Yi, Juyong Zhang, and Yong-Jin Liu
Subjects: FOS: Computer and information sciences, business.industry, Computer science, Computer Vision and Pattern Recognition (cs.CV), Entertainment industry, Computer Science - Computer Vision and Pattern Recognition, Representation (arts), Computer Graphics and Computer-Aided Design, Sketch, Domain (software engineering), Character (mathematics), Face (geometry), Signal Processing, Polygon mesh, Computer vision, Computer Vision and Pattern Recognition, Artificial intelligence, Set (psychology), business, Software
Abstract: Caricature is a type of artistic style of human faces that attracts considerable attention in the entertainment industry. So far a few 3D caricature generation methods exist and all of them require some caricature information (e.g., a caricature sketch or 2D caricature) as input. This kind of input, however, is difficult to provide by non-professional users. In this paper, we propose an end-to-end deep neural network model that generates high-quality 3D caricatures directly from a normal 2D face photo. The most challenging issue for our system is that the source domain of face photos (characterized by normal 2D faces) is significantly different from the target domain of 3D caricatures (characterized by 3D exaggerated face shapes and textures). To address this challenge, we: (1) build a large dataset of 5,343 3D caricature meshes and use it to establish a PCA model in the 3D caricature shape space; (2) reconstruct a normal full 3D head from the input face photo and use its PCA representation in the 3D caricature shape space to establish correspondences between the input photo and 3D caricature shape; and (3) propose a novel character loss and a novel caricature loss based on previous psychological studies on caricatures. Experiments including a novel two-level user study show that our system can generate high-quality 3D caricatures directly from normal face photos., Comment: Accepted by IEEE Transactions on Visualization and Computer Graphics
Published: 2021

14. GPU-Based Supervoxel Generation With a Novel Anisotropic Metric

Author: Zhonggui Chen, Xiaohu Guo, Yong-Jin Liu, Xiao Dong, and Junfeng Yao
Subjects: Flooding algorithm, Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Optical flow, Graphics processing unit, Video processing, Image segmentation, Computer Graphics and Computer-Aided Design, Metric (mathematics), Computer vision, Segmentation, Artificial intelligence, Voronoi diagram, business, Software
Abstract: Video over-segmentation into supervoxels is an important pre-processing technique for many computer vision tasks. Videos are an order of magnitude larger than images. Most existing methods for generating supervovels are either memory- or time-inefficient, which limits their application in subsequent video processing tasks. In this paper, we present an anisotropic supervoxel method, which is memory-efficient and can be executed on the graphics processing unit (GPU). Therefore, our algorithm achieves good balance among segmentation quality, memory usage and processing time. In order to provide accurate segmentation for moving objects in video, we use the optical flow information to design a brand new non-Euclidean metric to calculate the anisotropic distances between seeds and voxels. To efficiently compute the anisotropic metric, we adjust the classic jump flooding algorithm (which is designed for parallel execution on the GPU) to generate anisotropic Voronoi tessellation in the combined color and spatio-temporal space. We evaluate our method and the representative supervoxel algorithms for their capability on segmentation performance, computation speed and memory efficiency. We also apply supervoxel results to the application of foreground propagation in videos to test the performance on solving practical problems. Experiments show that our algorithm is much faster than the existing methods, and achieves good balance on segmentation quality and efficiency.
Published: 2021

15. Line drawings for face portraits from photos using global and local structure based GANs

Author: Yu-Kun Lai, Yong-Jin Liu, Mengfei Xia, Paul L. Rosin, and Ran Yi
Subjects: Structure (mathematical logic), business.industry, Computer science, Applied Mathematics, media_common.quotation_subject, 02 engineering and technology, Machine learning, computer.software_genre, Portrait, Computational Theory and Mathematics, Artificial Intelligence, Face (geometry), Line (geometry), 0202 electrical engineering, electronic engineering, information engineering, Image translation, 020201 artificial intelligence & image processing, Quality (business), Computer Vision and Pattern Recognition, Artificial intelligence, business, computer, Distance transform, Software, media_common, Generator (mathematics)
Abstract: Despite signiﬁcant effort and notable success of neural style transfer, it remains challenging for highly abstract styles, in particular line drawings. In this paper, we propose APDrawingGAN++, a generative adversarial network (GAN) for transforming face photos to artistic portrait drawings (APDrawings), which addresses substantial challenges including highly abstract style, different drawing techniques for different facial features, and high perceptual sensitivity to artifacts. To address these, we propose a composite GAN architecture that consists of local networks (to learn effective representations for speciﬁc facial features) and a global network (to capture the overall content). We provide a theoretical explanation for the necessity of this composite GAN structure by proving that any GAN with a single generator cannot generate artistic styles like APDrawings. We further introduce a classiﬁcation-and-synthesis approach for lips and hair where different drawing styles are used by artists, which applies suitable styles for a given input. To capture the highly abstract art form inherent in APDrawings, we address two challenging operations — (1) coping with lines with small misalignments while penalizing large discrepancy and (2) generating more continuous lines — by introducing two novel loss terms: one is a novel distance transform loss with nonlinear mapping and the other is a novel line continuity loss, both of which improve the line quality. We also develop dedicated data augmentation and pre-training to further improve results. Extensive experiments, including a user study, show that our method outperforms state-of-the-art methods, both qualitatively and quantitatively.
Published: 2021

16. Autoregressive Stylized Motion Synthesis with Generative Flow

Author: Yong-Jin Liu, Hongbo Fu, Zhipeng Yang, Yu-Hui Wen, Yanan Sun, and Lin Gao
Subjects: Computer science, business.industry, Deep learning, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Motion (geometry), Context (language use), Animation, Motion capture, Autoregressive model, Flow (mathematics), Content (measure theory), Artificial intelligence, business, Algorithm, ComputingMethodologies_COMPUTERGRAPHICS
Abstract: Motion style transfer is an important problem in many computer graphics and computer vision applications, including human animation, games, and robotics. Most existing deep learning methods for this problem are supervised and trained by registered motion pairs. In addition, these methods are often limited to yielding a deterministic output, given a pair of style and content motions. In this paper, we propose an unsupervised approach for motion style transfer by synthesizing stylized motions autoregressively using a generative flow model $\mathcal{M}$. $\mathcal{M}$ is trained to maximize the exact likelihood of a collection of unlabeled motions, based on an autoregressive context of poses in previous frames and a control signal representing the movement of a root joint. Thanks to invertible flow transformations, latent codes that encode deep properties of motion styles are efficiently inferred by $\mathcal{M}$. By combining the latent codes (from an input style motion S) with the autoregressive context and control signal (from an input content motion C), $\mathcal{M}$ outputs a stylized motion which transfers style from S to C. Moreover, our model is probabilistic and is able to generate various plausible motions with a specific style. We evaluate the proposed model on motion capture datasets containing different human motion styles. Experiment results show that our model outperforms the state-of-the-art methods, despite not requiring manually labeled training data.
Published: 2021

17. NP-completeness of optimal planning problem for modular robots

Author: Zipeng Ye, Yong-Jin Liu, and Minjing Yu
Subjects: Self-reconfiguring modular robot, 0209 industrial biotechnology, Mathematical optimization, Optimization problem, Computational complexity theory, Computer science, Approximation algorithm, 02 engineering and technology, 020901 industrial engineering & automation, Artificial Intelligence, Position (vector), 0202 electrical engineering, electronic engineering, information engineering, Robot, 020201 artificial intelligence & image processing, Completeness (statistics), Global optimization
Abstract: Self-reconfigurable modular robots (SRM-robots) can autonomously change their shape according to different tasks and work environments, and have received considerable attention recently. Many reshaping/reconfiguration algorithms have been proposed. In this paper, we present a theoretical analysis of computational complexity on a reshape planning for a kind of lattice-type 3D SRM-robots, whose modules are of cubic shape and can move by rotating on the surfaces of other modules. Different from previous NP-completeness study on general chain-type robots (i.e. the motion of any chains and the location of modules can be arbitrary), we consider more practical constraints on modules’ shape (i.e. cubic shape), position (lying in 2D/3D grids) and motion (using orthogonal rotations) in this paper. We formulate the reshape planning problem of SRM-robots with these practical constraints by a (p, q) optimization problem, where p and q characterize two widely used metrics, i.e. the number of disconnecting/reconnecting operations and the number of reshaping steps. Proofs are presented, showing that this optimization problem is NP-complete. Therefore, instead of finding global optimization results, most likely approximation solution can be obtained for the problem instead of seeking polynomial algorithm. We also present the upper and lower bounds for the 2-tuple (p, q), which is useful for evaluating the approximation algorithms in future research.
Published: 2019

18. Energy-Efficient Coverage Path Planning for General Terrain Surfaces

Author: Yong-Jin Liu, Jun Wang, Chenming Wu, Xianfeng David Gu, Chengkai Dai, Charlie C. L. Wang, and Xiaoxi Gong
Subjects: Surface (mathematics), 0209 industrial biotechnology, Control and Optimization, Geodesic, Computer science, Computation, Biomedical Engineering, Terrain, 02 engineering and technology, symbols.namesake, 020901 industrial engineering & automation, Artificial Intelligence, 0502 economics and business, Motion planning, ComputingMethodologies_COMPUTERGRAPHICS, 050210 logistics & transportation, Mechanical Engineering, 05 social sciences, Computer Science Applications, Human-Computer Interaction, Fermat's spiral, Control and Systems Engineering, symbols, Computer Vision and Pattern Recognition, Heuristics, Algorithm, Efficient energy use
Abstract: This letter tackles the problem of energy-efficient coverage path planning for exploring general surfaces by an autonomous vehicle. Efficient algorithms are developed to generate paths on freeform 3-D surfaces according to a special design pattern as height extremity aware Fermat spiral for this purpose. By using the exact boundary-sourced geodesic distances, the method for generating Fermat spiral paths is first introduced to cover a general surface. Then, heuristics for energy efficiency are incorporated to add peak points of a height field as sources for geodesic computation. The paths generated by our method can significantly reduce the cost caused by gravity. Physical experiments have been taken on different terrain surfaces to demonstrate the effectiveness of our approach.
Published: 2019

19. ParametricNet: 6DoF Pose Estimation Network for Parametric Shapes in Stacked Scenarios

Author: Wei Lv, Yong-Jin Liu, Long Zeng, and Xinyu Zhang
Subjects: Generalization, Computer science, business.industry, GRASP, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Centroid, Pattern recognition, Object (computer science), Regression, Point (geometry), Artificial intelligence, business, Pose, Parametric statistics
Abstract: Most industrial parts are parametric and their special properties are not fully explored yet. This paper proposes a new 6DoF pose estimation network for parametric shapes in stacked scenarios (ParametricNet). It treats a parametric shape, instead of a part object, as a category. The keypoints of individual instances are learned with point- wise regression and Hough voting scheme, from which specific parameter values are calculated. Then, the template keypoints are obtained based on the computed parameter values and the parametric shape templates. Finally, the 6DoF pose is estimated by least-square fitting between the individual instance’s and the template’s keypoints & centroid. On the public Sileane dataset, the average of APs of ParametricNet is 96%, compared with 82% for the state-of-the-art method. In addition, a new parametric dataset with four shape templates is constructed, in which the evaluated learning and generalization abilities of ParametricNet outperform the state-of-the-art methods. In particular, for the less symmetric shape, the mAP is improved by over 20%, which is an obvious improvement. Real-world experiments show that our method can grasp parametric shapes with unknown parameter values in stacked scenarios.
Published: 2021

20. Video-based Facial Micro-Expression Analysis: A Survey of Datasets, Features and Algorithms

Author: Xianye Ben, Yi Ren, Kidiyo Kpalma, Yong-Jin Liu, Junping Zhang, Weixiao Meng, Su-Jing Wang, Shandong University, Fudan University [Shanghai], Chinese Academy of Sciences [Beijing] (CAS), Institut d'Électronique et des Technologies du numéRique (IETR), Université de Nantes (UN)-Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS), Harbin Institute of Technology (HIT), Tsinghua University [Beijing] (THU), Université de Nantes (UN)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS), and Nantes Université (NU)-Université de Rennes 1 (UR1)
Subjects: FOS: Computer and information sciences, Computer science, Computer Vision and Pattern Recognition (cs.CV), Emotions, Computer Science - Computer Vision and Pattern Recognition, 02 engineering and technology, Facial recognition system, [SPI]Engineering Sciences [physics], Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, Humans, Transient (computer programming), Macro, Video based, Structure (mathematical logic), Facial expression, Applied Mathematics, Spotting, Facial Expression, Computational Theory and Mathematics, Key (cryptography), 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Algorithm, Software, Algorithms
Abstract: International audience; Unlike the conventional facial expressions, micro-expressions are involuntary and transient facial expressions capable of revealing the genuine emotions that people attempt to hide. Therefore, they can provide important information in a broad range of applications such as lie detection, criminal detection, etc. Since micro-expressions are transient and of low intensity, however, their detection and recognition is difficult and relies heavily on expert experiences. Due to its intrinsic particularity and complexity, video-based micro-expression analysis is attractive but challenging, and has recently become an active area of research. Although there have been numerous developments in this area, thus far there has been no comprehensive survey that provides researchers with a systematic overview of these developments with a unified evaluation. Accordingly, in this survey paper, we first highlight the key differences between macro- and micro-expressions, then use these differences to guide our research survey of video-based micro-expression analysis in a cascaded structure, encompassing the neuropsychological basis, datasets, features, spotting algorithms, recognition algorithms, applications and evaluation of state-of-the-art approaches. For each aspect, the basic techniques, advanced developments and major challenges are addressed and discussed. Furthermore, after considering the limitations of existing micro-expression datasets, we present and release a new dataset - called micro-and-macro expression warehouse (MMEW) - containing more video samples and more labeled emotion types. We then perform a unified comparison of representative methods on CAS(ME) for spotting, and on MMEW and SAMM for recognition, respectively. Finally, some potential future research directions are explored and outlined.
Published: 2021

21. Configuration Space Decomposition for Learning-based Collision Checking in High-DOF Robots

Author: Wang Zhao, Yiheng Han, Jia Pan, and Yong-Jin Liu
Subjects: 0209 industrial biotechnology, business.industry, Computer science, Robotics, 02 engineering and technology, 021001 nanoscience & nanotechnology, Computer Science::Robotics, 020901 industrial engineering & automation, Margin (machine learning), Robot, Artificial intelligence, Configuration space, Decomposition method (constraint satisfaction), Motion planning, 0210 nano-technology, business, Algorithm, Robotic arm, Classifier (UML), Subspace topology
Abstract: Motion planning for robots of high degrees-of-freedom (DOFs) is an important problem in robotics with sampling-based methods in configuration space $\mathcal{C}$ as one popular solution. Recently, machine learning methods have been introduced into sampling-based motion planning methods, which train a classifier to distinguish collision free subspace from in-collision subspace in $\mathcal{C}$. In this paper, we propose a novel configuration space decomposition method and show two nice properties resulted from this decomposition. Using these two properties, we build a composite classifier that works compatibly with previous machine learning methods by using them as the elementary classifiers. Experimental results are presented, showing that our composite classifier outperforms state-of-the-art single-classifier methods by a large margin. A real application of motion planning in a multi-robot system in plant phenotyping using three UR5 robotic arms is also presented.
Published: 2020

22. NPRportrait 1.0: A Three-Level Benchmark for Non-Photorealistic Rendering of Portraits

Author: Paul L. Rosin, Yu-Kun Lai, David Mould, Ran Yi, Itamar Berger, Lars Doyle, Seungyong Lee, Chuan Li, Yong-Jin Liu, Amir Semmo, Ariel Shamir, Minjung Son, and Holger Winnemöller
Subjects: FOS: Computer and information sciences, Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Computer Science - Computer Vision and Pattern Recognition, Computer Vision and Pattern Recognition, Computer Graphics and Computer-Aided Design
Abstract: Despite the recent upsurge of activity in image-based non-photorealistic rendering (NPR), and in particular portrait image stylisation, due to the advent of neural style transfer, the state of performance evaluation in this field is limited, especially compared to the norms in the computer vision and machine learning communities. Unfortunately, the task of evaluating image stylisation is thus far not well defined, since it involves subjective, perceptual and aesthetic aspects. To make progress towards a solution, this paper proposes a new structured, three level, benchmark dataset for the evaluation of stylised portrait images. Rigorous criteria were used for its construction, and its consistency was validated by user studies. Moreover, a new methodology has been developed for evaluating portrait stylisation algorithms, which makes use of the different benchmark levels as well as annotations provided by user studies regarding the characteristics of the faces. We perform evaluation for a wide variety of image stylisation methods (both portrait-specific and general purpose, and also both traditional NPR approaches and neural style transfer) using the new benchmark dataset., 17 pages, 15 figures
Published: 2020

23. Unpaired Portrait Drawing Generation via Asymmetric Cycle Mapping

Author: Paul L. Rosin, Ran Yi, Yong-Jin Liu, and Yu-Kun Lai
Subjects: business.industry, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Portrait, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, Classifier (UML), 0105 earth and related environmental sciences
Abstract: Portrait drawing is a common form of art with high abstraction and expressiveness. Due to its unique characteristics, existing methods achieve decent results only with paired training data, which is costly and time-consuming to obtain.In this paper, we address the problem of automatic transfer from face photos to portrait drawings with unpaired training data. We observe that due to the significant imbalance of information richness between photos and drawings, existing unpaired transfer methods such as CycleGAN tends to embed invisible reconstruction information indiscriminately in the whole drawings, leading to important facial features partially missing in drawings. To address this problem, we propose a novel asymmetric cycle mapping that enforces the reconstruction information to be visible (by a truncation loss) and only embedded in selective facial regions (by a relaxed forward cycle-consistency loss). Along with localized discriminators for the eyes, nose and lips, our method well preserves all important facial features in the generated portrait drawings. By introducing a style classifier and taking the style vector into account, our method can learn to generate portrait drawings in multiple styles using a single network. Extensive experiments show that our model outperforms state-of-the-art methods.
Published: 2020

24. Feature-Aware Uniform Tessellations on Video Manifold for Content-Sensitive Supervoxels

Author: Minjing Yu, Ran Yi, Wang Zhao, Yong-Jin Liu, Zipeng Ye, and Yu-Kun Lai
Subjects: Tessellation, Competitive analysis, business.industry, Computer science, Applied Mathematics, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Motion (geometry), Boundary (topology), Cascading Style Sheets, 02 engineering and technology, Manifold, Computational Theory and Mathematics, Artificial Intelligence, Feature (computer vision), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, Centroidal Voronoi tessellation, business, Algorithm, computer, Software, computer.programming_language
Abstract: Over-segmenting a video into supervoxels has strong potential to reduce the complexity of downstream computer vision applications. Content-sensitive supervoxels (CSSs) are typically smaller in content-dense regions (i.e., with high variation of appearance and/or motion) and larger in content-sparse regions. In this paper, we propose to compute feature-aware CSSs (FCSSs) that are regularly shaped 3D primitive volumes well aligned with local object/region/motion boundaries in video. To compute FCSSs, we map a video to a 3D manifold embedded in a combined color and spatiotemporal space, in which the volume elements of video manifold give a good measure of the video content density. Then any uniform tessellation on video manifold can induce CSS in the video. Our idea is that among all possible uniform tessellations on the video manifold, FCSS finds one whose cell boundaries well align with local video boundaries. To achieve this goal, we propose a novel restricted centroidal Voronoi tessellation method that simultaneously minimizes the tessellation energy (leading to uniform cells in the tessellation) and maximizes the average boundary distance (leading to good local feature alignment). Theoretically our method has an optimal competitive ratio $O(1)$ O ( 1 ) , and its time and space complexities are $O(NK)$ O ( N K ) and $O(N+K)$ O ( N + K ) for computing $K$ K supervoxels in an $N$ N -voxel video. We also present a simple extension of FCSS to streaming FCSS for processing long videos that cannot be loaded into main memory at once. We evaluate FCSS, streaming FCSS and ten representative supervoxel methods on four video datasets and two novel video applications. The results show that our method simultaneously achieves state-of-the-art performance with respect to various evaluation criteria.
Published: 2020

25. Learning to Accelerate Decomposition for Multi-Directional 3D Printing

Author: Yong-Jin Liu, Chenming Wu, and Charlie C. L. Wang
Subjects: FOS: Computer and information sciences, Control and Optimization, Computer science, Additive manufacturing, Computation, Computer Vision and Pattern Recognition (cs.CV), Biomedical Engineering, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Robotics, Computer Science - Graphics, Artificial Intelligence, Search algorithm, Decomposition (computer science), Beam diameter, Artificial neural network, Mechanical Engineering, Function (mathematics), Graphics (cs.GR), Computer Science Applications, Human-Computer Interaction, Control and Systems Engineering, Feature (computer vision), intelligent and flexible manufacturing, Computer Vision and Pattern Recognition, Clipping (computer graphics), Algorithm, Robotics (cs.RO)
Abstract: Multi-directional 3D printing has the capability of decreasing or eliminating the need for support structures. Recent work proposed a beam-guided search algorithm to find an optimized sequence of plane-clipping, which gives volume decomposition of a given 3D model. Different printing directions are employed in different regions to fabricate a model with tremendously less support (or even no support in many cases).To obtain optimized decomposition, a large beam width needs to be used in the search algorithm, leading to a very time-consuming computation. In this paper, we propose a learning framework that can accelerate the beam-guided search by using a smaller number of the original beam width to obtain results with similar quality. Specifically, we use the results of beam-guided search with large beam width to train a scoring function for candidate clipping planes based on six newly proposed feature metrics. With the help of these feature metrics, both the current and the sequence-dependent information are captured by the neural network to score candidates of clipping. As a result, we can achieve around 3x computational speed. We test and demonstrate our accelerated decomposition on a large dataset of models for 3D printing., Comment: 8 pages, accepted by IEEE Robotics and Automation Letters 2020
Published: 2020
Full Text: View/download PDF

26. Towards Better Generalization: Joint Depth-Pose Learning without PoseNet

Author: Wang Zhao, Shaohui Liu, Yezhi Shu, and Yong-Jin Liu
Subjects: FOS: Computer and information sciences, Computer science, Generalization, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Optical flow, Point cloud, 02 engineering and technology, 010501 environmental sciences, Machine learning, computer.software_genre, 01 natural sciences, Computer Science - Robotics, Odometry, Depth map, 0202 electrical engineering, electronic engineering, information engineering, Visual odometry, Fundamental matrix (computer vision), Pose, 0105 earth and related environmental sciences, business.industry, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Robotics (cs.RO)
Abstract: In this work, we tackle the essential problem of scale inconsistency for self-supervised joint depth-pose learning. Most existing methods assume that a consistent scale of depth and pose can be learned across all input samples, which makes the learning problem harder, resulting in degraded performance and limited generalization in indoor environments and long-sequence visual odometry application. To address this issue, we propose a novel system that explicitly disentangles scale from the network estimation. Instead of relying on PoseNet architecture, our method recovers relative pose by directly solving fundamental matrix from dense optical flow correspondence and makes use of a two-view triangulation module to recover an up-to-scale 3D structure. Then, we align the scale of the depth prediction with the triangulated point cloud and use the transformed depth map for depth error computation and dense reprojection check. Our whole system can be jointly trained end-to-end. Extensive experiments show that our system not only reaches state-of-the-art performance on KITTI depth and flow estimation, but also significantly improves the generalization ability of existing self-supervised depth-pose learning methods under a variety of challenging scenarios, and achieves state-of-the-art results among self-supervised learning-based methods on KITTI Odometry and NYUv2 dataset. Furthermore, we present some interesting findings on the limitation of PoseNet-based relative pose estimation methods in terms of generalization ability. Code is available at https://github.com/B1ueber2y/TrianFlow., Comment: To appear in CVPR 2020
Published: 2020
Full Text: View/download PDF

27. Micro-expression recognition with small sample size by transferring long-term convolutional neural network

Author: Su-Jing Wang, Feng Xu, Xiaolan Fu, Xiaohua Huang, Xinyu Ou, Yong-Jin Liu, Bing-Jun Li, and Wen-Jing Yan
Subjects: Computer science, business.industry, Cognitive Neuroscience, Deep learning, Frame (networking), ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020207 software engineering, Pattern recognition, 02 engineering and technology, Convolutional neural network, Computer Science Applications, Term (time), Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, Transfer of learning, business, Network model
Abstract: Micro-expression is one of important clues for detecting lies. Its most outstanding characteristics include short duration and low intensity of movement. Therefore, video clips of high spatial-temporal resolution are much more desired than still images to provide sufficient details. On the other hand, owing to the difficulties to collect and encode micro-expression data, it is small sample size. In this paper, we use only 560 micro-expression video clips to evaluate the proposed network model: Transferring Long-term Convolutional Neural Network (TLCNN). TLCNN uses Deep CNN to extract features from each frame of micro-expression video clips, then feeds them to Long Short Term Memory (LSTM) which learn the temporal sequence information of micro-expression. Due to the small sample size of micro-expression data, TLCNN uses two steps of transfer learning: (1) transferring from expression data and (2) transferring from single frame of micro-expression video clips, which can be regarded as “big data”. Evaluation on 560 micro-expression video clips collected from three spontaneous databases is performed. The results show that the proposed TLCNN is better than some state-of-the-art algorithms.
Published: 2018

28. Real-Time Assessment of the Cross-Task Mental Workload Using Physiological Measures During Anomaly Detection

Author: Guozhen Zhao, Yuanchun Shi, and Yong-Jin Liu
Subjects: Computer Networks and Communications, Computer science, Speech recognition, 05 social sciences, Human Factors and Ergonomics, Workload, Computer Science Applications, Task (project management), Human-Computer Interaction, Support vector machine, 03 medical and health sciences, 0302 clinical medicine, Behavioral data, Artificial Intelligence, Control and Systems Engineering, Activity detection, Signal Processing, Task analysis, 0501 psychology and cognitive sciences, Anomaly detection, Set (psychology), 050107 human factors, 030217 neurology & neurosurgery
Abstract: The ability to detect anomalies in perceived stimuli is critical to a broad range of practical and applied activities involving human operators. In this paper, we propose a real-time physiological-based system to assess the cross-task mental workload during anomaly detection. Forty participants were recruited to detect anomalous images from a set of different distracting images (Task I) and abnormal activities from surveillance videos (Task II). In Task I, the task difficulty levels were manipulated by changing the number of anomalies/distracting stimuli (15, 21, 28, or 36) with and without time constraints (i.e., 4 × 2 = 8 task difficulty levels). Physiological and behavioral data from four task difficulty levels were divided into four categories according to subjective ratings of the mental workload. The support vector machine (SVM) classifiers were trained on these data to predict the mental workload categories of: 1) the same four task difficulty levels (within level); and 2) the other four task difficulty levels in Task I (cross level). Within-level classifications (with an average of 95.29%) were more accurate than cross-level classifications (average of 72.2%), which were much more accurate than random level classifications (25%). In Task II, the same participants monitored one, two, or four video clips simultaneously in accordance with three task difficulty levels. The same physiological signals were processed for real-time recognition of a participant's mental workload after he or she completed each activity detection task. The three-class SVM classifiers were trained on physiological data from Task I to predict the mental workload categories of the Task II (cross task), achieving an overall classification accuracy of 53.83%, compared to a 33.33% accuracy at random. These results are discussed in terms of their implications for developing situation-aware recognition systems of the mental workload and adaptive human–computer interaction platforms.
Published: 2018

29. StarFont

Author: Ran Yi, Xu Yan, Yuntao Wang, Zhiyu Sun, and Yong-Jin Liu
Subjects: Class (computer programming), Character (computing), business.industry, Computer science, Deep learning, computer.software_genre, Multiple input, Font, Learning methods, Artificial intelligence, Chinese characters, business, computer, Natural language processing
Abstract: Font design has become an essential part of multimedia. It has the ability to convey the mood and intention of the designer. However, creating a new font for Chinese characters takes a lot of effort because the language contains over 20 thousand character with complex morphological structures. Current font completion methods have many disadvantages. To address this problem, we propose StarFont, a font completion system that can automatically complete a whole font using a few-shot learning method. Our model takes several examples of a new font, learns the design style and applies it to the remaining characters to complete the font. Unlike existing models proposed for font generation, we treat each character not the font as a class and abandon reconstruction loss because the font's ground truth is easier to obtain. Moreover, we combine multiple input images to generate new images, while existing methods use a one-to-one approach. Compared to other deep learning-based font completion methods, our model requires fewer examples of the new font and generates better results. Both qualitative and quantitative methods show that our method is more advanced.
Published: 2019

30. Fast Computation of Content-Sensitive Superpixels and Supervoxels Using Q-Distances

Author: Ying He, Minjing Yu, Ran Yi, Zipeng Ye, and Yong-Jin Liu
Subjects: Tessellation (computer graphics), Tessellation, Geodesic, Computer science, business.industry, Computation, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 0211 other engineering and technologies, Approximation algorithm, 02 engineering and technology, Manifold, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, Centroidal Voronoi tessellation, business, Algorithm, Time complexity, Distance, 021101 geological & geomatics engineering
Abstract: State-of-the-art researches model the data of images and videos as low-dimensional manifolds and generate superpixels/supervoxels in a content-sensitive way, which is achieved by computing geodesic centroidal Voronoi tessellation (GCVT) on manifolds. However, computing exact GCVTs is slow due to computationally expensive geodesic distances. In this paper, we propose a much faster queue-based graph distance (called q-distance). Our key idea is that for manifold regions in which q-distances are different from geodesic distances, GCVT is prone to placing more generators in them, and therefore after few iterations, the q-distance-induced tessellation is an exact GCVT. This idea works well in practice and we also prove it theoretically under moderate assumption. Our method is simple and easy to implement. It runs 6-8 times faster than state-of-the-art GCVT computation, and has an optimal approximation ratio O(1) and a linear time complexity O(N) for N-pixel images or N-voxel videos. A thorough evaluation of 31 superpixel methods on five image datasets and 8 supervoxel methods on four video datasets shows that our method consistently achieves the best over-segmentation accuracy. We also demonstrate the advantage of our method on one image and two video applications.
Published: 2019

31. An Adaptive Filter for Deep Learning Networks on Large-Scale Point Cloud

Author: Yong-Jin Liu, Wang Zhao, and Ran Yi
Subjects: business.industry, Computer science, Feature vector, Deep learning, Feature extraction, Point cloud, Scale (descriptive set theory), 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Adaptive filter, Filter (video), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Point (geometry), Artificial intelligence, business, Algorithm, 0105 earth and related environmental sciences
Abstract: Recently some pioneering works such as PointNet and Point-Net++ successfully introduce deep learning architectures into point cloud analysis. These novel networks take irregular point cloud (i.e., a set of unordered points) as input, in which each point is represented by (x, y, z) coordinates plus some attributes including color, normal, and other local or global features. Despite of their success on various tasks, the computational cost of these networks become extremely high for large-scale point clouds, e.g., containing hundreds of thousands or millions of points. Instead of uniform downsam-pling, in this paper, we propose a simple and novel filter that can efficiently filter any large-scale point cloud into thousands of representative points embedded in a high dimensional feature space, such that without changing the existing deep learning networks, simply using our filter as a preprocess, these existing models can work with large-scale point clouds. Experimental results show that by using our proposed filter, the computational cost (measured by floating-point operation) of PointNet and PointNet++ is reduced 30-60 times and the accuracy of semantic segmentation (measured by mean IoU) on ScanNet dataset is improved 5%-15% averagely.
Published: 2019

32. Ranking-Preserving Cross-Source Learning for Image Retargeting Quality Assessment

Author: Yong-Jin Liu, Yu-Kun Lai, Zipeng Ye, and Yiheng Han
Subjects: Computer science, business.industry, Quality assessment, Applied Mathematics, 02 engineering and technology, Machine learning, computer.software_genre, Computational Theory and Mathematics, Seam carving, Artificial Intelligence, Retargeting, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, business, computer, Software
Abstract: Image retargeting techniques adjust images into different sizes and have attracted much attention recently. Objective quality assessment (OQA) of image retargeting results is often desired to automatically select the best results. Existing OQA methods train a model using some benchmarks (e.g., RetargetMe), in which subjective scores evaluated by users are provided. Observing that it is challenging even for human subjects to give consistent scores for retargeting results of different source images (diff-source-results), in this paper we propose a learning-based OQA method that trains a General Regression Neural Network (GRNN) model based on relative scores - which preserve the ranking - of retargeting results of the same source image (same-source-results). In particular, we develop a novel training scheme with provable convergence that learns a common base scalar for same-source-results. With this source specific offset, our computed scores not only preserve the ranking of subjective scores for same-source-results, but also provide a reference to compare the diff-source-results. We train and evaluate our GRNN model using human preference data collected in RetargetMe. We further introduce a subjective benchmark to evaluate the generalizability of different OQA methods. Experimental results demonstrate that our method outperforms ten representative OQA methods in ranking prediction and has better generalizability to different datasets.
Published: 2019

33. SketchGAN: Joint Sketch Completion and Recognition With Generative Adversarial Network

Author: Xiaoming Deng, Fang Liu, Cuixia Ma, Yu-Kun Lai, Yong-Jin Liu, and Hongan Wang
Subjects: Boosting (machine learning), Categorization, business.industry, Sketch recognition, Computer science, 0202 electrical engineering, electronic engineering, information engineering, 020207 software engineering, 020201 artificial intelligence & image processing, 02 engineering and technology, Artificial intelligence, business, Generative adversarial network, Sketch
Abstract: Hand-drawn sketch recognition is a fundamental problem in computer vision, widely used in sketch-based image\ud and video retrieval, editing, and reorganization. Previous methods often assume that a complete sketch is used as input; however, hand-drawn sketches in common application scenarios are often incomplete, which makes sketch recognition a challenging problem. In this paper, we propose SketchGAN, a new generative adversarial network (GAN) based approach that jointly completes and recognizes a sketch, boosting the performance of both tasks. Specifically, we use a cascade Encode-Decoder network to complete the input sketch in an iterative manner, and employ an auxiliary sketch recognition task to recognize the completed sketch. Experiments on the Sketchy database benchmark demonstrate that our joint learning approach achieves competitive sketch completion and recognition performance compared with the state-of-the-art methods. Further experiments using several sketch-based applications also validate the performance of our method.
Published: 2019

34. APDrawingGAN: Generating Artistic Portrait Drawings From Face Photos With Hierarchical GANs

Author: Yong-Jin Liu, Yu-Kun Lai, Paul L. Rosin, and Ran Yi
Subjects: Painting, Similarity (geometry), business.industry, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020207 software engineering, 02 engineering and technology, Portrait, Face (geometry), Computer graphics (images), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, Set (psychology)
Abstract: Significant progress has been made with image stylization using deep learning, especially with generative adversarial networks (GANs). However, existing methods fail to produce high quality artistic portrait drawings. Such drawings have a highly abstract style, containing a sparse set of continuous graphical elements such as lines, and so small artifacts are much more exposed than for painting styles. Moreover, artists tend to use different strategies to draw different facial features and the lines drawn are only loosely related to obvious image features. To address these challenges, we propose APDrawingGAN, a novel GAN based architecture that builds upon hierarchical generators and discriminators combining both a global network (for images as a whole) and local networks (for individual facial regions). This allows dedicated drawing strategies to be learned for different facial features. Since artists' drawings may not have lines perfectly aligned with image features, we develop a novel loss to measure similarity between generated and artists' drawings based on distance transforms, leading to improved strokes in portrait drawing. To train APDrawingGAN, we construct an artistic drawing dataset containing high-resolution portrait photos and corresponding professional artistic drawings. Extensive experiments, including a user study, show that APDrawingGAN produces significantly better artistic drawings than state-of-the-art methods.
Published: 2019

35. Vectorization based color transfer for portrait images

Author: Anxiang Zeng, Qian Fu, Juyong Zhang, Fei Hou, Ying He, Yong-Jin Liu, School of Computer Science and Engineering, NTU-Alibaba Joint Research Institute, and NTU-Alibaba Joint Research Institute, Singapore
Subjects: 0209 industrial biotechnology, Artificial neural network, business.industry, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Boundary (topology), 020207 software engineering, 02 engineering and technology, Residual, Computer Graphics and Computer-Aided Design, Industrial and Manufacturing Engineering, Computer Science Applications, Image (mathematics), Color Transfer, Set (abstract data type), 020901 industrial engineering & automation, 0202 electrical engineering, electronic engineering, information engineering, Computer science and engineering [Engineering], Image tracing, Computer vision, Artificial intelligence, Portrait Images, business, Diffusion curve, Laplace operator
Abstract: This paper introduces a method for transferring colors between portrait images. Using a trained neural network to extract facial mask, we vectorize each image with a set of sparse diffusion curves to encode the low-frequency colors, and use the Laplacian of residual colors to represent the high-frequency details. Then we apply optimal mass transport to transfer the boundary colors between the diffusion curves of the source and reference images. Finally, the original or modified Laplacians of colors are added to the transferred diffusion curve image. Unlike the existing methods that either require 3D information or assume the source and reference images have similar poses and dense correspondence, our method is computationally efficient and flexible, which can work for portrait images with large pose and color differences. Ministry of Education (MOE) Accepted version This project was partially supported by Singapore Ministry of Education Grant RG26/17, NTU-Alibaba Joint Research Institute, Singapore, National Natural Science Foundation of China Grants (61872347, 61672481, 61725204, and U1736220), Special Plan for the Development of Distinguished Young Scientists of ISCAS, China (Y8RC535018), Youth Innovation Promotion Association CAS, China (No. 2018495) and the Royal Society-Newton Advanced Fellowship, China (NA150431).
Published: 2019

36. Attention-aware Multi-stroke Style Transfer

Author: Yuan Yao, Weidong Liu, Jun Wang, Jianqiang Ren, Xuansong Xie, and Yong-Jin Liu
Subjects: FOS: Computer and information sciences, Stylized fact, business.industry, Computer science, Deep learning, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, 010501 environmental sciences, Machine learning, computer.software_genre, 01 natural sciences, Autoencoder, Salient, 0202 electrical engineering, electronic engineering, information engineering, Visual attention, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Swap (computer programming), 0105 earth and related environmental sciences
Abstract: Neural style transfer has drawn considerable attention from both academic and industrial field. Although visual effect and efficiency have been significantly improved, existing methods are unable to coordinate spatial distribution of visual attention between the content image and stylized image, or render diverse level of detail via different brush strokes. In this paper, we tackle these limitations by developing an attention-aware multi-stroke style transfer model. We first propose to assemble self-attention mechanism into a style-agnostic reconstruction autoencoder framework, from which the attention map of a content image can be derived. By performing multi-scale style swap on content features and style features, we produce multiple feature maps reflecting different stroke patterns. A flexible fusion strategy is further presented to incorporate the salient characteristics from the attention map, which allows integrating multiple stroke patterns into different spatial regions of the output image harmoniously. We demonstrate the effectiveness of our method, as well as generate comparable stylized images with multiple stroke patterns against the state-of-the-art methods.
Published: 2019
Full Text: View/download PDF

37. A Main Directional Mean Optical Flow Feature for Spontaneous Micro-Expression Recognition

Author: Yong-Jin Liu, Guoying Zhao, Xiaolan Fu, Su-Jing Wang, Jin-Kai Zhang, and Wen-Jing Yan
Subjects: business.industry, Computer science, Feature vector, Feature extraction, Optical flow, 020207 software engineering, Pattern recognition, 02 engineering and technology, Facial recognition system, Human-Computer Interaction, Microexpression, Feature Dimension, Feature (computer vision), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Noise (video), Artificial intelligence, business, Software
Abstract: Micro-expressions are brief facial movements characterized by short duration, involuntariness and low intensity. Recognition of spontaneous facial micro-expressions is a great challenge. In this paper, we propose a simple yet effective Main Directional Mean Optical-flow (MDMO) feature for micro-expression recognition. We apply a robust optical flow method on micro-expression video clips and partition the facial area into regions of interest (ROIs) based partially on action units. The MDMO is a ROI-based, normalized statistic feature that considers both local statistic motion information and its spatial location. One of the significant characteristics of MDMO is that its feature dimension is small. The length of a MDMO feature vector is $36 \times 2=72$ , where $36$ is the number of ROIs. Furthermore, to reduce the influence of noise due to head movements, we propose an optical-flow-driven method to align all frames of a micro-expression video clip. Finally, a SVM classifier with the proposed MDMO feature is adopted for micro-expression recognition. Experimental results on three spontaneous micro-expression databases, namely SMIC, CASME and CASME II, show that the MDMO can achieve better performance than two state-of-the-art baseline features, i.e., LBP-TOP and HOOF.
Published: 2016

38. A PMJ-inspired cognitive framework for natural scene categorization in line drawings

Author: Su-Jing Wang, Yong-Jin Liu, Qiufang Fu, Minjing Yu, and Xiaolan Fu
Subjects: Hierarchy, Visual perception, Computer science, business.industry, Cognitive Neuroscience, media_common.quotation_subject, Line drawings, Computational cognition, 020207 software engineering, 02 engineering and technology, computer.software_genre, Computer Science Applications, Categorization, Artificial Intelligence, Perception, 0202 electrical engineering, electronic engineering, information engineering, Natural (music), 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, computer, Natural language processing, media_common
Abstract: Humans' remarkable capacity on rapid natural scene categorization has been widely studied in neuroscience. Recently, a functional MRI (fMRI) study showed that in human brain, decoding of natural scenes from line drawings was very similar to those from color photographs. In this paper, based on recently proposed computational cognition model of Perception, Memory and Judgement (PMJ model), we investigate the computational model of line drawings and propose a PMJ-inspired cognitive framework for natural scene categorization in line drawings. The Ohio State University (OSU) dataset was used, which included 475 color photographs in six categories, i.e., beaches, city streets, forests, highways, mountains and offices, as well as 475 corresponding line drawings produced by trained artists. Experimental results show that our proposed cognitive framework achieves 48.4% recognition rate in leave-one-out cross-validation, which is much higher than fMRI-data-driven decoding accuracy in the visual-processing hierarchy (29% in V1, 27% in V2+VP, 26% in V4, 29% in PPA and 23% in RSC).
Published: 2016

39. CFD: A Collaborative Feature Difference Method for Spontaneous Micro-Expression Spotting

Author: Yu-Kun Lai, Yiheng Han, Bing-Jun Li, and Yong-Jin Liu
Subjects: 0209 industrial biotechnology, Facial expression, Computer science, business.industry, Feature extraction, Pattern recognition, 02 engineering and technology, Spotting, Linear discriminant analysis, Expression (mathematics), 020901 industrial engineering & automation, Feature (computer vision), Face (geometry), Histogram, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business
Abstract: Micro-expression (ME) is a special type of human expression\ud which can reveal the real emotion that people want to\ud conceal. Spontaneous ME (SME) spotting is to identify the\ud subsequences containing SMEs from a long facial video. The\ud study of SME spotting has a significant importance, but is also\ud very challenging due to the fact that in real-world scenarios,\ud SMEs may occur along with normal facial expressions and\ud other prominent motions such as head movements. In this\ud paper, we improve a state-of-the-art SME spotting method\ud called feature difference analysis (FD) in the following two\ud aspects. First, FD relies on a partitioning of facial area into\ud uniform regions of interest (ROIs) and computing features of\ud a selected sequence. We propose a novel evaluation method\ud by utilizing the Fisher linear discriminant to assign a weight\ud for each ROI, leading to more semantically meaningful ROIs.\ud Second, FD only considers two features (LBP and HOOF)\ud independently. We introduce a state-of-the-art MDMO feature\ud into FD and propose a simple yet efficient collaborative\ud strategy to work with two complementary features, i.e., LBP\ud characterizing texture information and MDMO characterizing\ud motion information. We call our improved FD method\ud collaborative feature difference (CFD). Experimental results\ud on two well-established SME datasets SMIC-E and CASME\ud II show that CFD significantly improves the performance of\ud the original FD.
Published: 2018

40. CartoonGAN: Generative Adversarial Networks for Photo Cartoonization

Author: Yang Chen, Yu-Kun Lai, and Yong-Jin Liu
Subjects: Painting, Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Initialization, 020207 software engineering, 02 engineering and technology, Variation (game tree), GeneralLiterature_MISCELLANEOUS, Computer graphics, Feature (computer vision), Simple (abstract algebra), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Shading, Artificial intelligence, business, Generative grammar, ComputingMethodologies_COMPUTERGRAPHICS, Abstraction (linguistics)
Abstract: In this paper, we propose a solution to transforming photos\ud of real-world scenes into cartoon style images, which is\ud valuable and challenging in computer vision and computer\ud graphics. Our solution belongs to learning based methods,\ud which have recently become popular to stylize images in\ud artistic forms such as painting. However, existing methods\ud do not produce satisfactory results for cartoonization,\ud due to the fact that (1) cartoon styles have unique characteristics\ud with high level simplification and abstraction, and\ud (2) cartoon images tend to have clear edges, smooth color\ud shading and relatively simple textures, which exhibit significant\ud challenges for texture-descriptor-based loss functions\ud used in existing methods. In this paper, we propose CartoonGAN,\ud a generative adversarial network (GAN) framework\ud for cartoon stylization. Our method takes unpaired\ud photos and cartoon images for training, which is easy to\ud use. Two novel losses suitable for cartoonization are proposed:\ud (1) a semantic content loss, which is formulated as\ud a sparse regularization in the high-level feature maps of\ud the VGG network to cope with substantial style variation\ud between photos and cartoons, and (2) an edge-promoting\ud adversarial loss for preserving clear edges. We further introduce\ud an initialization phase, to improve the convergence\ud of the network to the target manifold. Our method is also\ud much more efficient to train than existing methods. Experimental\ud results show that our method is able to generate\ud high-quality cartoon images from real-world photos (i.e.,\ud following specific artists’ styles and with clear edges and\ud smooth shading) and outperforms state-of-the-art methods.
Published: 2018

41. Human experience–inspired path planning for robots

Author: Wenyong Gong, Xiaohua Xie, and Yong-Jin Liu
Subjects: Service (business), 0209 industrial biotechnology, Computer science, business.industry, lcsh:Electronics, lcsh:TK7800-8360, 02 engineering and technology, lcsh:QA75.5-76.95, Computer Science Applications, Computer Science::Robotics, 020901 industrial engineering & automation, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, Robot, 020201 artificial intelligence & image processing, Motion planning, Artificial intelligence, lcsh:Electronic computers. Computer science, business, Software
Abstract: In this article, we present a human experience–inspired path planning algorithm for service robots. In addition to considering the path distance and smoothness, we emphasize the safety of robot navigation. Specifically, we build a speed field in accordance with several human driving experiences, like slowing down or detouring at a narrow aisle, and keeping a safe distance to the obstacles. Based on this speed field, the path curvatures, path distance, and steering speed are all integrated to form an energy function, which can be efficiently solved by the A* algorithm to seek the optimal path by resorting to an admissible heuristic function estimated from the energy function. Moreover, a simple yet effective fast path smoothing algorithm is proposed so as to ease the robots steering. Several examples are presented, demonstrating the effectiveness of our human experience–inspired path planning method.
Published: 2018

42. Semi-Continuity of Skeletons in Two-Manifold and Discrete Voronoi Approximation

Author: Yong-Jin Liu
Subjects: Sequence, Geodesic, Applied Mathematics, Boundary (topology), Omega, Manifold, Pattern Recognition, Automated, Combinatorics, Semi-continuity, Imaging, Three-Dimensional, Computational Theory and Mathematics, Artificial Intelligence, Euclidean geometry, Computer Graphics, Animals, Humans, Computer Vision and Pattern Recognition, Voronoi diagram, Algorithms, Software, Mathematics
Abstract: The skeleton of a 2D shape is an important geometric structure in pattern analysis and computer vision. In this paper we study the skeleton of a 2D shape in a two-manifold $\mathcal {M}$ , based on a geodesic metric. We present a formal definition of the skeleton $S(\Omega )$ for a shape $\Omega$ in $\mathcal {M}$ and show several properties that make $S(\Omega )$ distinct from its Euclidean counterpart in $\mathbb {R}^2$ . We further prove that for a shape sequence $\lbrace \Omega _i\rbrace$ that converge to a shape $\Omega$ in $\mathcal {M}$ , the mapping $\Omega \rightarrow \overline{S}(\Omega )$ is lower semi-continuous. A direct application of this result is that we can use a set $P$ of sample points to approximate the boundary of a 2D shape $\Omega$ in $\mathcal {M}$ , and the Voronoi diagram of $P$ inside $\Omega \subset \mathcal {M}$ gives a good approximation to the skeleton $S(\Omega )$ . Examples of skeleton computation in topography and brain morphometry are illustrated.
Published: 2015

43. Cognitive mechanism related to line drawings and its applications in intelligent process of visual media: a survey

Author: Wenfeng Chen, Qiufang Fu, Ye Liu, Lexing Xie, Yong-Jin Liu, and Minjing Yu
Subjects: Cognitive science, Computational model, Visual perception, General Computer Science, Mechanism (biology), business.industry, Process (engineering), Computer science, Line drawings, 020207 software engineering, Cognition, 02 engineering and technology, Theoretical Computer Science, Visual media, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business
Abstract: Line drawings, as a concise form, can be recognized by infants and even chimpanzees. Recently, how the visual system processes line-drawings attracts more and more attention from psychology, cognitive science and computer science. The neuroscientific studies revealed that line drawings generate similar neural actions as color photographs, which give insights on how to efficiently process big media data. In this paper, we present a comprehensive survey on line drawing studies, including cognitive mechanism of visual perception, computational models in computer vision and intelligent process in diverse media applications. Major debates, challenges and solutions that have been addressed over the years are discussed. Finally some of the ensuing challenges in line drawing studies are outlined.
Published: 2015

44. Transforming photos to comics using convolutional neural networks

Author: Yang Chen, Yu-Kun Lai, and Yong-Jin Liu
Subjects: QA75, Painting, Computer science, business.industry, Feature extraction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, Iterative reconstruction, 010501 environmental sciences, Comics, 01 natural sciences, Grayscale, Convolutional neural network, GeneralLiterature_MISCELLANEOUS, Image (mathematics), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, ComputingMethodologies_COMPUTERGRAPHICS, 0105 earth and related environmental sciences
Abstract: In this paper, inspired by Gatys’s recent work, we propose\ud a novel approach that transforms photos to comics using\ud deep convolutional neural networks (CNNs). While Gatys’s\ud method that uses a pre-trained VGG network generally works\ud well for transferring artistic styles such as painting from a\ud style image to a content image, for more minimalist styles\ud such as comics, the method often fails to produce satisfactory\ud results. To address this, we further introduce a dedicated\ud comic style CNN, which is trained for classifying comic images\ud and photos. This new network is effective in capturing\ud various comic styles and thus helps to produce better comic\ud stylization results. Even with a grayscale style image, Gatys’s\ud method can still produce colored output, which is not desirable\ud for comics. We develop a modified optimization framework\ud such that a grayscale image is guaranteed to be synthesized.\ud To avoid converging to poor local minima, we further\ud initialize the output image using grayscale version of the content\ud image. Various examples show that our method synthesizes\ud better comic images than the state-of-the-art method.
Published: 2017

45. Intrinsic manifold SLIC : a simple and efficient method for computing content-sensitive superpixels

Author: Bing-Jun Li, Yong-Jin Liu, Minjing Yu, Ying He, and School of Computer Science and Engineering
Subjects: Iterative method, 02 engineering and technology, Image Segmentation, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, Sensitivity (control systems), Superpixel, Cluster analysis, Mathematics, business.industry, Applied Mathematics, 020207 software engineering, Pattern recognition, Image segmentation, Euclidean distance, Computational Theory and Mathematics, Computer Science::Computer Vision and Pattern Recognition, Content (measure theory), Computer science and engineering [Engineering], 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, Centroidal Voronoi tessellation, business, Voronoi diagram, Software
Abstract: Superpixels are perceptually meaningful atomic regions that can effectively capture image features. Among various methods for computing uniform superpixels, simple linear iterative clustering (SLIC) is popular due to its simplicity and high performance. In this paper, we extend SLIC to compute content-sensitive superpixels, i.e., small superpixels in content-dense regions with high intensity or colour variation and large superpixels in content-sparse regions. Rather than using the conventional SLIC method that clusters pixels in $\mathbb {R}^5$ , we map the input image $I$ to a 2-dimensional manifold $\mathcal {M}\subset \mathbb {R}^5$ , whose area elements are a good measure of the content density in $I$ . We propose a simple method, called intrinsic manifold SLIC (IMSLIC), for computing a geodesic centroidal Voronoi tessellation (GCVT)—a uniform tessellation—on $\mathcal {M}$ , which induces the content-sensitive superpixels in $I$ . In contrast to the existing algorithms, IMSLIC characterizes the content sensitivity by measuring areas of Voronoi cells on $\mathcal {M}$ . Using a simple and fast approximation to a closed-form solution, the method can compute the GCVT at a very low cost and guarantees that all Voronoi cells are simply connected. We thoroughly evaluate IMSLIC and compare it with eleven representative methods on the BSDS500 dataset and seven representative methods on the NYUV2 dataset. Computational results show that IMSLIC outperforms existing methods in terms of commonly used quality measures pertaining to superpixels such as compactness, adherence to boundaries, and achievable segmentation accuracy. We also evaluate IMSLIC and seven representative methods in an image contour closure application, and the results on two datasets, WHD and WSD, show that IMSLIC achieves the best foreground segmentation performance.
Published: 2017

46. A global energy optimization framework for 2.1D sketch extraction from monocular images

Author: Yong-Jin Liu, Xiaolan Fu, Matt Tianfu Wu, Kaiyun Li, and Cheng-Chi Yu
Subjects: Ground truth, business.industry, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Real image, Computer Graphics and Computer-Aided Design, Sketch, Image (mathematics), Data set, Set (abstract data type), Modeling and Simulation, Segmentation, Computer vision, Geometry and Topology, Artificial intelligence, business, Global optimization, Software
Abstract: The 2.1D sketch is a layered image representation, which assigns a partial depth ordering of over-segmented regions in a monocular image. This paper presents a global optimization framework for inferring the 2.1D sketch from a monocular image. Our method only uses over-segmented image regions (i.e., superpixels) as input, without any information of objects in the image, since (1) segmenting objects in images is a difficult problem on its own and (2) the objective of our proposed method is to be generic as an initial module useful for downstream high-level vision tasks. This paper formulates the inference of the 2.1D sketch using a global energy optimization framework. The proposed energy function consists of two components: (1) one is defined based on the local partial ordering relations (i.e., figure-ground) between two adjacent over-segmented regions, which captures the marginal information of the global partial depth ordering and (2) the other is defined based on the same depth layer relations among all the over-segmented regions, which groups regions of the same object to account for the over-segmentation issues. A hybrid evolution algorithm is utilized to minimize the global energy function efficiently. In experiments, we evaluated our method on a test data set containing 100 diverse real images from Berkeley segmentation data set (BSDS500) with the annotated ground truth. Experimental results show that our method can infer the 2.1D sketch with high accuracy.
Published: 2014

47. For micro-expression recognition: Database and suggestions

Author: Xiaolan Fu, Yong-Jin Liu, Su-Jing Wang, Wen-Jing Yan, and Qi Wu
Subjects: Facial expression, Database, Computer science, Cognitive Neuroscience, media_common.quotation_subject, Frame (networking), Spotting, Deception, computer.software_genre, Computer Science Applications, Facial expression recognition, Artificial Intelligence, computer, media_common
Abstract: Micro-expression is gaining more attention in both the scientific field and the mass media. It represents genuine emotions that people try to conceal, thus making it a promising cue for lie detection. Since micro-expressions are considered almost imperceptible to naked eyes, researchers have sought to automatically detect and recognize these fleeting facial expressions to help people make use of such deception cues. However, the lack of well-established micro-expression databases might be the biggest obstacle. Although several databases have been developed, there may exist some problems either in the approach of eliciting micro-expression or the labeling. We built a spontaneous micro-expression database with rigorous frame spotting, AU coding and micro-expression labeling. This paper introduces how the micro-expressions were elicited in a laboratory situation and how the database was built with the guide of psychology. In addition, this paper proposes issues that may help researchers effectively use micro-expression databases and improve micro-expression recognition. (C) 2014 Elsevier B.V. All rights reserved.
Published: 2014

48. A distributed computational cognitive model for object recognition

Author: Xiaolan Fu, Qiufang Fu, Ye Liu, and Yong-Jin Liu
Subjects: Cognitive model, Vision processing, General Computer Science, Computer science, Mechanism (biology), business.industry, Cognitive neuroscience of visual object recognition, Cognition, Neurophysiology, Machine learning, computer.software_genre, Object (computer science), Object model, Artificial intelligence, business, computer
Abstract: Based on cognitive functionalities in human vision processing, we propose a computational cognitive model for object recognition with detailed algorithmic descriptions. The contribution of this paper is of two folds. Firstly, we present a systematic review on psychological and neurophysiological studies, which provide collective evidence for a distributed representation of 3D objects in the human brain. Secondly, we present a computational model which simulates the distributed mechanism of object vision pathway. Experimental results show that the presented computational cognitive model outperforms five representative 3D object recognition algorithms in computer science research.
Published: 2013

49. Optimal-Scaling-Factor Assignment for Patch-wise Image Retargeting

Author: Lexing Xie, Yong-Jin Liu, Xiaolan Fu, Xiao-Nan Luo, and Yun Liang
Subjects: Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Cognitive neuroscience of visual object recognition, Image processing, Computer Graphics and Computer-Aided Design, Image (mathematics), Set (abstract data type), Computer graphics, Seam carving, Computer vision, Artificial intelligence, business, Scaling, Software
Abstract: Image retargeting adjusts images to arbitrary sizes such that they can be viewed on different displays. Content-aware image retargeting has been receiving increased attention. In particular, researchers have improved a patch-wise scaling method for image retargeting at the object level. The scaling partitions the image into rectangular patches of adaptive sizes, which are comparable to the sizes of the salient objects in the image. This partitioning is based on a visual-saliency map; accordingly, the method labels the patches as important or unimportant. Then, the method scales the important patches as uniformly as possible and stretches or squeezes the unimportant patches to fit the target size. A patch-based image-similarity measure finds the optimal set of scaling factors. In experiments, the improved method performed well for three image types: lines and edges, foreground objects, and geometric structures.
Published: 2013

50. Manifold SLIC: A Fast Method to Compute Content-Sensitive Superpixels

Author: Minjing Yu, Yong-Jin Liu, Ying He, and Cheng-Chi Yu
Subjects: Geodesic, Pixel, business.industry, 020207 software engineering, Pattern recognition, 02 engineering and technology, Image segmentation, Manifold, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, Centroidal Voronoi tessellation, Cluster analysis, Voronoi diagram, Time complexity, Mathematics
Abstract: Superpixels are perceptually meaningful atomic regions that can effectively capture image features. Among various methods for computing uniform superpixels, simple linear iterative clustering (SLIC) is popular due to its simplicity and high performance. In this paper, we extend SLIC to compute content-sensitive superpixels, i.e., small superpixels in content-dense regions (e.g., with high intensity or color variation) and large superpixels in content-sparse regions. Rather than the conventional SLIC method that clusters pixels in R5, we map the image I to a 2-dimensional manifold M ⊂ R5, whose area elements are a good measure of the content density in I. We propose an efficient method to compute restricted centroidal Voronoi tessellation (RCVT) — a uniform tessellation — on M, which induces the content-sensitive superpixels in I. Unlike other algorithms that characterize content-sensitivity by geodesic distances, manifold SLIC tackles the problem by measuring areas of Voronoi cells on M, which can be computed at a very low cost. As a result, it runs 10 times faster than the state-of-the-art content-sensitive superpixels algorithm. We evaluate manifold SLIC and seven representative methods on the BSDS500 benchmark and observe that our method outperforms the existing methods.
Published: 2016

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

72 results on '"Yong-Jin Liu"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources