248 results on '"Chan, Antoni B."'
Search Results
202. Clustering dynamic textures with the hierarchical EM algorithm
- Author
-
Chan, Antoni B., primary, Coviello, Emanuele, additional, and Lanckriet, Gert. R. G., additional
- Published
- 2010
- Full Text
- View/download PDF
203. Generalized Stauffer–Grimson background subtraction for dynamic scenes
- Author
-
Chan, Antoni B., primary, Mahadevan, Vijay, additional, and Vasconcelos, Nuno, additional
- Published
- 2010
- Full Text
- View/download PDF
204. Modeling Music as a Dynamic Texture
- Author
-
Barrington, Luke, primary, Chan, Antoni B., additional, and Lanckriet, Gert, additional
- Published
- 2010
- Full Text
- View/download PDF
205. Bayesian Poisson regression for crowd counting
- Author
-
Chan, Antoni B, primary and Vasconcelos, Nuno, additional
- Published
- 2009
- Full Text
- View/download PDF
206. Variational layered dynamic textures
- Author
-
Chan, Antoni B., primary and Vasconcelos, Nuno, additional
- Published
- 2009
- Full Text
- View/download PDF
207. Direct convex relaxations of sparse SVM
- Author
-
Chan, Antoni B., primary, Vasconcelos, Nuno, additional, and Lanckriet, Gert R. G., additional
- Published
- 2007
- Full Text
- View/download PDF
208. Classifying Video with Kernel Dynamic Textures
- Author
-
Chan, Antoni B., primary and Vasconcelos, Nuno, additional
- Published
- 2007
- Full Text
- View/download PDF
209. Growing a bag of systems tree for fast and accurate classification.
- Author
-
Coviello, Emanuele, Mumtaz, Adeel, Chan, Antoni B., and Lanckriet, Gert R. G.
- Abstract
The bag-of-systems (BoS) representation is a descriptor of motion in a video, where dynamic texture (DT) codewords represent the typical motion patterns in spatio-temporal patches extracted from the video. The efficacy of the BoS descriptor depends on the richness of the codebook, which directly depends on the number of codewords in the codebook. However, for even modest sized codebooks, mapping videos onto the codebook results in a heavy computational load. In this paper we propose the BoS Tree, which constructs a bottom-up hierarchy of codewords that enables efficient mapping of videos to the BoS codebook. By leveraging the tree structure to efficiently index the codewords, the BoS Tree allows for fast look-ups in the codebook and enables the practical use of larger, richer codebooks. We demonstrate the effectiveness of BoS Trees on classification of three video datasets, as well as on annotation of a music dataset. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
210. Adaptive figure-ground classification.
- Author
-
Chen, Yisong, Chan, Antoni B., and Wang, Guoping
- Abstract
We propose an adaptive figure-ground classification algorithm to automatically extract a foreground region using a user-provided bounding-box. The image is first over-segmented with an adaptive mean-shift algorithm, from which background and foreground priors are estimated. The remaining patches are iteratively assigned based on their distances to the priors, with the foreground prior being updated online. A large set of candidate segmentations are obtained by changing the initial foreground prior. The best candidate is determined by a score function that evaluates the segmentation quality. Rather than using a single distance function or score function, we generate multiple hypothesis segmentations from different combinations of distance measures and score functions. The final segmentation is then automatically obtained with a voting or weighted combination scheme from the multiple hypotheses. Experiments indicate that our method performs at or above the current state-of-the-art on several datasets, with particular success on challenging scenes that contain irregular or multiple-connected foregrounds. In addition, this improvement in accuracy is achieved with low computational cost. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
211. Look over here.
- Author
-
Ying Cao, Lau, Rynson W. H., and Chan, Antoni B.
- Subjects
PICTURES ,COMIC books, strips, etc. ,BALLOONS ,PROBABILISTIC generative models ,GRAPHICAL modeling (Statistics) - Abstract
Picture subjects and text balloons are basic elements in comics, working together to propel the story forward. Japanese comics artists often leverage a carefully designed composition of subjects and balloons (generally referred to as panel elements) to provide a continuous and fluid reading experience. However, such a composition is hard to produce for people without the required experience and knowledge. In this paper, we propose an approach for novices to synthesize a composition of panel elements that can effectively guide the reader's attention to convey the story. Our primary contribution is a probabilistic graphical model that describes the relationships among the artist's guiding path, the panel elements, and the viewer attention, which can be effectively learned from a small set of existing manga pages. We show that the proposed approach can measurably improve the readability, visual appeal, and communication of the story of the resulting pages, as compared to an existing method. We also demonstrate that the proposed approach enables novice users to create higher-quality compositions with less time, compared with commercially available programs. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
212. Clustering Hidden Markov Models with Variational HEM.
- Author
-
Coviello, Emanuele, Chan, Antoni B., and Lanckriet, Gert R. G.
- Subjects
- *
HIDDEN Markov models , *VARIATIONAL approach (Mathematics) , *DATA analysis , *ALGORITHMS , *MATHEMATICAL optimization , *GROUP theory - Abstract
The hidden Markov model (HMM) is a widely-used generative model that copes with sequential data, assuming that each observation is conditioned on the state of a hidden Markov chain. In this paper, we derive a novel algorithm to cluster HMMs based on the hierarchical EM (HEM) algorithm. The proposed algorithm i) clusters a given collection of HMMs into groups of HMMs that are similar, in terms of the distributions they represent, and ii) characterizes each group by a "cluster center", that is, a novel HMM that is representative for the group, in a manner that is consistent with the underlying generative model of the HMM. To cope with intractable inference in the E-step, the HEM algorithm is formulated as a variational optimization problem, and efficiently solved for the HMM case by leveraging an appropriate variational approximation. The benefits of the proposed algorithm, which we call variational HEM (VHEM), are demonstrated on several tasks involving time-series data, such as hierarchical clustering of motion capture sequences, and automatic annotation and retrieval of music and of online hand-writing data, showing improvements over current methods. In particular, our variational HEM algorithm effectively leverages large amounts of data when learning annotation models by using an efficient hierarchical estimation procedure, which reduces learning times and memory requirements, while improving model robustness through better regularization. [ABSTRACT FROM AUTHOR]
- Published
- 2014
213. Clustering Dynamic Textures with the Hierarchical EM Algorithm for Modeling Video.
- Author
-
Mumtaz, Adeel, Coviello, Emanuele, Lanckriet, Gert R.G., and Chan, Antoni B.
- Subjects
PROBABILISTIC generative models ,SPACETIME ,LINEAR dynamical systems ,EXPECTATION-maximization algorithms ,VIDEO processing ,COMPUTER vision ,IMAGE segmentation ,HEURISTIC algorithms - Abstract
Dynamic texture (DT) is a probabilistic generative model, defined over space and time, that represents a video as the output of a linear dynamical system (LDS). The DT model has been applied to a wide variety of computer vision problems, such as motion segmentation, motion classification, and video registration. In this paper, we derive a new algorithm for clustering DT models that is based on the hierarchical EM algorithm. The proposed clustering algorithm is capable of both clustering DTs and learning novel DT cluster centers that are representative of the cluster members in a manner that is consistent with the underlying generative probabilistic model of the DT. We also derive an efficient recursive algorithm for sensitivity analysis of the discrete-time Kalman smoothing filter, which is used as the basis for computing expectations in the E-step of the HEM algorithm. Finally, we demonstrate the efficacy of the clustering algorithm on several applications in motion analysis, including hierarchical motion clustering, semantic motion annotation, and learning bag-of-systems (BoS) codebooks for dynamic texture recognition. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
214. Automatic stylistic manga layout.
- Author
-
Ying Cao, Chan, Antoni B., and Lau, Rynson W. H.
- Subjects
MANGA (Art) ,LAYOUT (Printing) ,AUTOMATION ,PRINTING ,DESIGN templates ,ART - Abstract
Manga layout is a core component in manga production, characterized by its unique styles. However, stylistic manga layouts are difficult for novices to produce as it requires hands-on experience and domain knowledge. In this paper, we propose an approach to automatically generate a stylistic manga layout from a set of input artworks with user-specified semantics, thus allowing less-experienced users to create high-quality manga layouts with minimal efforts. We first introduce three parametric style models that encode the unique stylistic aspects of manga layouts, including layout structure, panel importance, and panel shape. Next, we propose a two-stage approach to generate a manga layout: 1) an initial layout is created that best fits the input artworks and layout structure model, according to a generative probabilistic framework; 2) the layout and artwork geometries are jointly refined using an efficient optimization procedure, resulting in a professional-looking manga layout. Through a user study, we demonstrate that our approach enables novice users to easily and quickly produce higher-quality layouts that exhibit realistic manga styles, when compared to a commercially-available manual layout tool. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
215. Counting People With Low-Level Features and Bayesian Regression.
- Author
-
Chan, Antoni B. and Vasconcelos, Nuno
- Subjects
- *
DIGITAL image processing , *FEATURE extraction , *BAYESIAN analysis , *REGRESSION analysis , *TEXTURE analysis (Image processing) , *IMAGE segmentation , *ROBUST control , *GROUND penetrating radar - Abstract
An approach to the problem of estimating the size of inhomogeneous crowds, which are composed of pedestrians that travel in different directions, without using explicit object segmentation or tracking is proposed. Instead, the crowd is segmented into components of homogeneous motion, using the mixture of dynamic-texture motion model. A set of holistic low-level features is extracted from each segmented region, and a function that maps features into estimates of the number of people per segment is learned with Bayesian regression. Two Bayesian regression models are examined. The first is a combination of Gaussian process regression with a compound kernel, which accounts for both the global and local trends of the count mapping but is limited by the real-valued outputs that do not match the discrete counts. We address this limitation with a second model, which is based on a Bayesian treatment of Poisson regression that introduces a prior distribution on the linear weights of the model. Since exact inference is analytically intractable, a closed-form approximation is derived that is computationally efficient and kernelizable, enabling the representation of nonlinear functions. An approximate marginal likelihood is also derived for kernel hyperparameter learning. The two regression-based crowd counting methods are evaluated on a large pedestrian data set, containing very distinct camera views, pedestrian traffic, and outliers, such as bikes or skateboarders. Experimental results show that regression-based counts are accurate regardless of the crowd size, outperforming the count estimates produced by state-of-the-art pedestrian detectors. Results on 2 h of video demonstrate the efficiency and robustness of the regression-based crowd size estimation over long periods of time. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
216. Modeling, Clustering, and Segmenting Video with Mixtures of Dynamic Textures.
- Author
-
Chan, Antoni B. and Vasconcelos, Nuno
- Subjects
- *
COMPUTER vision , *LINEAR systems , *MACHINE learning , *CONTROL theory (Engineering) , *TIME series analysis , *EXPECTATION-maximization algorithms - Abstract
A dynamic texture is a spatio-temporal generative model for video, which represents video sequences as observations from a linear dynamical system. This work studies the mixture of dynamic textures, a statistical model for an ensemble of video sequences that is sampled from a finite collection of visual processes, each of which is a dynamic texture. An expectation-maximization (EM) algorithm is derived for learning the parameters of the model, and the model is related to previous works in linear systems, machine learning, time-series clustering, control theory, and computer vision. Through experimentation, it is shown that the mixture of dynamic textures is a suitable representation for both the appearance and dynamics of a variety of visual processes that have traditionally been challenging for computer vision (for example, fire, steam, water, vehicle and pedestrian traffic, and so forth). When compared with state-of-the-art methods in motion segmentation, including both temporal texture methods and traditional representations (for example, optical flow or other localized motion representations), the mixture of dynamic textures achieves superior performance in the problems of clustering and segmenting video of such processes. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
217. Supervised Learning of Semantic Classes for Image Annotation and Retrieval.
- Author
-
Carneiro, Gustavo, Chan, Antoni B., Moreno, Pedro J., and Vasconcelos, Nuno
- Subjects
- *
SUPERVISED learning , *IMAGE retrieval , *EXPECTATION-maximization algorithms , *IMAGE analysis , *ALGORITHMS - Abstract
A probabilistic formulation for semantic image annotation and retrieval is proposed. Annotation and retrieval are posed as classification problems where each class is defined as the group of database images labeled with a common semantic label. It is shown that, by establishing this one-to-one correspondence between semantic labels and semantic classes, a minimum probability of error annotation and retrieval are feasible with algorithms that are 1) conceptually simple, 2) computationally efficient, and 3) do not require prior semantic segmentation of training images. In particular, images are represented as bags of localized feature vectors, a mixture density estimated for each image, and the mixtures associated with all images annotated with a common semantic label pooled into a density estimate for the corresponding semantic class. This pooling is justified by a multiple instance learning argument and performed efficiently with a hierarchical extension of expectation-maximization. The benefits of the supervised formulation over the more complex, and currently popular, joint modeling of semantic label and visual feature distributions are illustrated through theoretical arguments and extensive experiments. The supervised formulation is shown to achieve higher accuracy than various previously published methods at a fraction of their computational cost. Finally, the proposed method is shown to be fairly robust to parameter tuning. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
218. Human attention guided explainable artificial intelligence for computer vision models.
- Author
-
Liu, Guoyang, Zhang, Jindi, Chan, Antoni B., and Hsiao, Janet H.
- Subjects
- *
ARTIFICIAL intelligence , *OBJECT recognition (Computer vision) , *IMAGE recognition (Computer vision) , *COMPUTER simulation , *KERNEL functions , *COMPUTER vision - Abstract
Explainable artificial intelligence (XAI) has been increasingly investigated to enhance the transparency of black-box artificial intelligence models, promoting better user understanding and trust. Developing an XAI that is faithful to models and plausible to users is both a necessity and a challenge. This work examines whether embedding human attention knowledge into saliency-based XAI methods for computer vision models could enhance their plausibility and faithfulness. Two novel XAI methods for object detection models, namely FullGrad-CAM and FullGrad-CAM++, were first developed to generate object-specific explanations by extending the current gradient-based XAI methods for image classification models. Using human attention as the objective plausibility measure, these methods achieve higher explanation plausibility. Interestingly, all current XAI methods when applied to object detection models generally produce saliency maps that are less faithful to the model than human attention maps from the same object detection task. Accordingly, human attention-guided XAI (HAG-XAI) was proposed to learn from human attention how to best combine explanatory information from the models to enhance explanation plausibility by using trainable activation functions and smoothing kernels to maximize the similarity between XAI saliency map and human attention map. The proposed XAI methods were evaluated on widely used BDD-100K, MS-COCO, and ImageNet datasets and compared with typical gradient-based and perturbation-based XAI methods. Results suggest that HAG-XAI enhanced explanation plausibility and user trust at the expense of faithfulness for image classification models, and it enhanced plausibility, faithfulness, and user trust simultaneously and outperformed existing state-of-the-art XAI methods for object detection models. • Human attention guided XAI is proposed for more faithful and plausible explanations. • Two gradient-based XAI methods are presented for explaining object detection models. • Human attention is adopted as an objective plausibility measure for XAI evaluation. • The generalization ability and robustness of the proposed XAI methods are evaluated. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
219. Eye movement analysis of children's attention for midline diastema.
- Author
-
Cho, Vanessa Y., Hsiao, Janet H., Chan, Antoni B., Ngo, Hien C., King, Nigel M., and Anthonappa, Robert P.
- Subjects
- *
EYE movements , *HIDDEN Markov models , *EYE , *DIASTEMA (Teeth) , *AGE groups - Abstract
No previous studies have investigated eye-movement patterns to show children's information processing while viewing clinical images. Therefore, this study aimed to explore children and their educators' perception of a midline diastema by applying eye-movement analysis using the hidden Markov models (EMHMM). A total of 155 children between 2.5 and 5.5 years of age and their educators (n = 34) viewed pictures with and without a midline diastema while Tobii Pro Nano eye-tracker followed their eye movements. Fixation data were analysed using data-driven, and fixed regions of interest (ROIs) approaches with EMHMM. Two different eye-movement patterns were identified: explorative pattern (76%), where the children's ROIs were predominantly around the nose and mouth, and focused pattern (26%), where children's ROIs were precise, locating on the teeth with and without a diastema, and fixations transited among the ROIs with similar frequencies. Females had a significantly higher eye-movement preference for without diastema image than males. Comparisons between the different age groups showed a statistically significant difference for overall entropies. The 3.6–4.5y age groups exhibited higher entropies, indicating lower eye-movement consistency. In addition, children and their educators exhibited two specific eye-movement patterns. Children in the explorative pattern saw the midline diastema more often while their educators focussed on the image without diastema. Thus, EMHMMs are valuable in analysing eye-movement patterns in children and adults. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
220. Individuals with insomnia misrecognize angry faces as fearful faces while missing the eyes: an eye-tracking study.
- Author
-
Zhang, Jinxiao, Chan, Antoni B, Lau, Esther Yuet Ying, and Hsiao, Janet H
- Published
- 2019
- Full Text
- View/download PDF
221. Modeling noisy annotations for crowd counting
- Author
-
Jia Wan and Chan, Antoni B.
222. Improve Generalization and Robustness of Neural Networks via Weight Scale Shifting Invariant Regularizations
- Author
-
Liu, Ziquan, Yufei CUI, and Chan, Antoni B.
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,Machine Learning (cs.LG) - Abstract
Using weight decay to penalize the L2 norms of weights in neural networks has been a standard training practice to regularize the complexity of networks. In this paper, we show that a family of regularizers, including weight decay, is ineffective at penalizing the intrinsic norms of weights for networks with positively homogeneous activation functions, such as linear, ReLU and max-pooling functions. As a result of homogeneity, functions specified by the networks are invariant to the shifting of weight scales between layers. The ineffective regularizers are sensitive to such shifting and thus poorly regularize the model capacity, leading to overfitting. To address this shortcoming, we propose an improved regularizer that is invariant to weight scale shifting and thus effectively constrains the intrinsic norm of a neural network. The derived regularizer is an upper bound for the input gradient of the network so minimizing the improved regularizer also benefits the adversarial robustness. Residual connections are also considered and we show that our regularizer also forms an upper bound to input gradients of such a residual network. We demonstrate the efficacy of our proposed regularizer on various datasets and neural network architectures at improving generalization and adversarial robustness., Comment: 14 pages, 5 figures, Accepted by ICML 2021 Workshop on Adversarial Machine Learning
223. Neighbours Matter
- Author
-
Qingzhong Wang, Jiuniu Wang, Chan, Antoni B., Siyu Huang, Haoyi Xiong, Xingjian Li, and Dejing Dou
224. Incorporating Side Information by Adaptive Convolution.
- Author
-
Kang, Di, Dhar, Debarun, and Chan, Antoni B.
- Subjects
- *
CONVOLUTIONAL neural networks , *MATHEMATICAL convolutions , *COMPUTER vision , *DEEP learning - Abstract
Computer vision tasks often have side information available that is helpful to solve the task. For example, for crowd counting, the camera perspective (e.g., camera angle and height) gives a clue about the appearance and scale of people in the scene. While side information has been shown to be useful for counting systems using traditional hand-crafted features, it has not been fully utilized in deep learning based counting systems. In order to incorporate the available side information, we propose an adaptive convolutional neural network (ACNN), where the convolution filter weights adapt to the current scene context via the side information. In particular, we model the filter weights as a low-dimensional manifold within the high-dimensional space of filter weights. The filter weights are generated using a learned "filter manifold" sub-network, whose input is the side information. With the help of side information and adaptive weights, the ACNN can disentangle the variations related to the side information, and extract discriminative features related to the current context (e.g. camera perspective, noise level, blur kernel parameters). We demonstrate the effectiveness of ACNN incorporating side information on 3 tasks: crowd counting, corrupted digit recognition, and image deblurring. Our experiments show that ACNN improves the performance compared to a plain CNN with a similar number of parameters and achieves similar or better than state-of-the-art performance on crowd counting task. Since existing crowd counting datasets do not contain ground-truth side information, we collect a new dataset with the ground-truth camera angle and height as the side information. We also perform ablation experiments, mainly for crowd counting, to study the helpfulness of the side information, and the effect of the placement of the adaptive convolutional layers in order to get insight about ACNNs. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
225. Scanpath modeling and classification with hidden Markov models.
- Author
-
Coutrot, Antoine, Hsiao, Janet H., and Chan, Antoni B.
- Subjects
- *
EYE movements , *INFORMATION processing , *MACHINE learning , *HIDDEN Markov models , *DISCRIMINANT analysis , *VISUAL perception - Abstract
How people look at visual information reveals fundamental information about them; their interests and their states of mind. Previous studies showed that scanpath, i.e., the sequence of eye movements made by an observer exploring a visual stimulus, can be used to infer observer-related (e.g., task at hand) and stimuli-related (e.g., image semantic category) information. However, eye movements are complex signals and many of these studies rely on limited gaze descriptors and bespoke datasets. Here, we provide a turnkey method for scanpath modeling and classification. This method relies on variational hidden Markov models (HMMs) and discriminant analysis (DA). HMMs encapsulate the dynamic and individualistic dimensions of gaze behavior, allowing DA to capture systematic patterns diagnostic of a given class of observers and/or stimuli. We test our approach on two very different datasets. Firstly, we use fixations recorded while viewing 800 static natural scene images, and infer an observer-related characteristic: the task at hand. We achieve an average of 55.9% correct classification rate (chance = 33%). We show that correct classification rates positively correlate with the number of salient regions present in the stimuli. Secondly, we use eye positions recorded while viewing 15 conversational videos, and infer a stimulus-related characteristic: the presence or absence of original soundtrack. We achieve an average 81.2% correct classification rate (chance = 50%). HMMs allow to integrate bottom-up, top-down, and oculomotor influences into a single model of gaze behavior. This synergistic approach between behavior and machine learning will open new avenues for simple quantification of gazing behavior. We release
SMAC with HMM , a Matlab toolbox freely available to the community under an open-source license agreement. [ABSTRACT FROM AUTHOR]- Published
- 2018
- Full Text
- View/download PDF
226. Optimal planning of municipal-scale distributed rooftop photovoltaic systems with maximized solar energy generation under constraints in high-density cities.
- Author
-
Ren, Haoshan, Ma, Zhenjun, Chan, Antoni B., and Sun, Yongjun
- Subjects
- *
SOLAR energy , *PHOTOVOLTAIC power systems , *PHOTOVOLTAIC power generation , *LINEAR programming , *INTEGER programming , *ELECTRIC power production - Abstract
Deployment planning of distributed rooftop photovoltaic (PV) systems remains a critical challenge for high-density cities, due to complex shading effects and diversified rooftop availabilities. Furthermore, such planning for large-scale systems could be extremely complex due to high dimensionality caused by the enormous number of buildings. To tackle the challenge, this study proposed an optimal planning strategy for municipal-scale distributed rooftop PV systems in high-density cities. The optimization problem was solved by integer learning programming, based on high-accuracy solar energy potentials characterization. By selecting proper rooftops for PV, the electricity generation was maximized, considering the conflicting budget and peak-export-power constraints. A Hong Kong-based case study (including 582 real building rooftops) was conducted. The effectiveness of the proposed strategy was verified by comparing with 5,000,000 Monte-Carlo-generated alternatives. The strategy more effectively identified the proper rooftops for PV installations, achieving up to 17.7% improvements in performance-cost ratio. Furthermore, the optimal planning strategy was systematically compared with two heuristic planning methods, i.e., total-energy-prioritized and energy-intensity-prioritized methods. The strategy outperformed the heuristic methods by up to 23.3% through well considering trade-off between rooftop total energy and energy intensity. The developed strategy can be used to facilitate rooftop PV deployments, and thus contribute to urban decarbonization. • An optimal planning strategy is proposed for large-scale distributed rooftop PVs. • High-dimensional optimal planning is solved by integer linear programming. • Complex building shading effects and rooftop availabilities are considered. • Improved performance of the planning strategy has been verified. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
227. RegGeoNet: Learning Regular Representations for Large-Scale 3D Point Clouds.
- Author
-
Zhang, Qijian, Hou, Junhui, Qian, Yue, Chan, Antoni B., Zhang, Juyong, and He, Ying
- Subjects
- *
POINT cloud , *DEEP learning , *IMAGE processing , *POINT processes , *SOURCE code , *POINT set theory , *GRIDS (Cartography) - Abstract
Deep learning has proven an effective tool for 3D point cloud processing. Currently, most deep set architectures are developed for sparse inputs (typically with a few thousand points), which are unable to provide sufficient structural statistics and semantic cues due to low resolutions. Since these architectures suffer from unacceptable computational and memory costs when consuming dense inputs, there is a pressing need in real-world applications to handle large-scale 3D point clouds. To bridge this gap, this paper presents a novel unsupervised neural architecture called RegGeoNet to parameterize an unstructured point set into a completely regular image structure dubbed as deep geometry image (DeepGI), such that spatial coordinates of unordered points are recorded in three-channel grid pixels. Intuitively, our goal is to embed irregular 3D surface points onto uniform 2D lattice grids, while trying to preserve local neighborhood consistency. Functionally, DeepGI serves as a generic representation modality for raw point cloud data and can be conveniently integrated into mature image processing pipelines. Driven by its unique structural characteristics, we are motivated to customize a set of efficient feature extractors that directly operate on DeepGIs for achieving a rich variety of downstream tasks. To demonstrate the potential and universality of our proposed learning paradigms built upon DeepGIs for large-scale point cloud processing, we conduct extensive experiments on various downstream tasks, including shape classification, object part segmentation, scene semantic segmentation, normal estimation, and geometry compression, where our frameworks achieve highly competitive performance, compared with state-of-the-art methods. The source code will be publicly available at https://github.com/keeganhk/RegGeoNet. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
228. Eye movement analysis with hidden Markov models (EMHMM) with co-clustering.
- Author
-
Hsiao, Janet H., Lan, Hui, Zheng, Yueyuan, and Chan, Antoni B.
- Subjects
- *
HIDDEN Markov models , *MARKOV processes , *EYE movements , *EYE tracking , *COGNITIVE styles , *DISCIPLINE of children - Abstract
The eye movement analysis with hidden Markov models (EMHMM) method provides quantitative measures of individual differences in eye-movement pattern. However, it is limited to tasks where stimuli have the same feature layout (e.g., faces). Here we proposed to combine EMHMM with the data mining technique co-clustering to discover participant groups with consistent eye-movement patterns across stimuli for tasks involving stimuli with different feature layouts. Through applying this method to eye movements in scene perception, we discovered explorative (switching between the foreground and background information or different regions of interest) and focused (mainly looking at the foreground with less switching) eye-movement patterns among Asian participants. Higher similarity to the explorative pattern predicted better foreground object recognition performance, whereas higher similarity to the focused pattern was associated with better feature integration in the flanker task. These results have important implications for using eye tracking as a window into individual differences in cognitive abilities and styles. Thus, EMHMM with co-clustering provides quantitative assessments on eye-movement patterns across stimuli and tasks. It can be applied to many other real-life visual tasks, making a significant impact on the use of eye tracking to study cognitive behavior across disciplines. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
229. Applying the Hidden Markov Model to Analyze Urban Mobility Patterns: An Interdisciplinary Approach.
- Author
-
Loo, Becky P. Y., Zhang, Feiyang, Hsiao, Janet H., Chan, Antoni B., and Lan, Hui
- Subjects
- *
HIDDEN Markov models , *URBAN research , *MARITAL status , *EYE movements , *INTERNET of things - Abstract
With the emergence of the Internet of Things (IoT), there has been a proliferation of urban studies using big data. Yet, another type of urban research innovations that involve interdisciplinary thinking and methods remains underdeveloped. This paper represents an attempt to adopt a Hidden Markov Model (HMM) toolbox developed in Computer Science for the analysis of eye movement patterns in Psychology to answer urban mobility questions in Geography. The main idea is that both people's eye movements and travel behavior follow the stop-travel-stop pattern, which can be summarized using HMM. Methodological challenges were addressed by adjusting the HMM to analyze territory-wide travel survey data in Hong Kong, China. By using the adjusted toolbox to identify the activity-travel patterns of working adults in Hong Kong, two distinctive groups of balanced (38.4%) and work-oriented (61.6%) lifestyles were identified. With some notable exceptions, working adults living in the urban core were having a more work-oriented lifestyle. Those with a balanced lifestyle were having a relatively compact zone of non-work activities around their homes but a relatively long commuting distance. Furthermore, working females tend to spend more time at home than their counterparts, regardless of their marital status and lifestyle. Overall, this interdisciplinary research demonstrates an attempt to integrate spatial, temporal, and sequential information for understanding people's behavior in urban mobility research. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
230. Hidden Markov model analysis reveals the advantage of analytic eye movement patterns in face recognition across cultures.
- Author
-
Chuk, Tim, Crookes, Kate, Hayward, William G., Chan, Antoni B., and Hsiao, Janet H.
- Subjects
- *
EYE tracking , *FACE perception , *CROSS-cultural differences , *HIDDEN Markov models , *INDIVIDUAL differences , *TASK performance - Abstract
It remains controversial whether culture modulates eye movement behavior in face recognition. Inconsistent results have been reported regarding whether cultural differences in eye movement patterns exist, whether these differences affect recognition performance, and whether participants use similar eye movement patterns when viewing faces from different ethnicities. These inconsistencies may be due to substantial individual differences in eye movement patterns within a cultural group. Here we addressed this issue by conducting individual-level eye movement data analysis using hidden Markov models (HMMs). Each individual’s eye movements were modeled with an HMM. We clustered the individual HMMs according to their similarities and discovered three common patterns in both Asian and Caucasian participants: holistic (looking mostly at the face center), left-eye-biased analytic (looking mostly at the two individual eyes in addition to the face center with a slight bias to the left eye), and right-eye-based analytic (looking mostly at the right eye in addition to the face center). The frequency of participants adopting the three patterns did not differ significantly between Asians and Caucasians, suggesting little modulation from culture. Significantly more participants (75%) showed similar eye movement patterns when viewing own- and other-race faces than different patterns. Most importantly, participants with left-eye-biased analytic patterns performed significantly better than those using either holistic or right-eye-biased analytic patterns. These results suggest that active retrieval of facial feature information through an analytic eye movement pattern may be optimal for face recognition regardless of culture. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
231. Martial Arts, Dancing and Sports dataset: A challenging stereo and multi-view dataset for 3D human pose estimation.
- Author
-
Zhang, Weichen, Liu, Zhiguang, Zhou, Liuyang, Leung, Howard, and Chan, Antoni B.
- Subjects
- *
POSE estimation (Computer vision) , *MARTIAL arts , *DANCE , *MOTION capture (Human mechanics) , *GAUSSIAN function - Abstract
Human pose estimation is one of the most popular research topics in the past two decades, especially with the introduction of human pose datasets for benchmark evaluation. These datasets usually capture simple daily life actions. Here, we introduce a new dataset, the Martial Arts, Dancing and Sports (MADS), which consists of challenging martial arts actions (Tai-chi and Karate), dancing actions (hip-hop and jazz), and sports actions (basketball, volleyball, football, rugby, tennis and badminton). Two martial art masters, two dancers and an athlete performed these actions while being recorded with either multiple cameras or a stereo depth camera. In the multi-view or single-view setting, we provide three color views for 2D image-based human pose estimation algorithms. For depth-based human pose estimation, we provide stereo-based depth images from a single view. All videos have corresponding synchronized and calibrated ground-truth poses, which were captured using a Motion Capture system. We provide initial baseline results on our dataset using a variety of tracking frameworks, including a generative tracker based on the annealing particle filter and robust likelihood function, a discriminative tracker using twin Gaussian processes [ 1 ], and hybrid trackers, such as Personalized Depth Tracker [ 2 ]. The results of our evaluation suggest that discriminative approaches perform better than generative approaches when there are enough representative training samples, and that the generative methods are more robust to diversity of poses, but can fail to track when the motion is too quick for the effective search range of the particle filter. The data and the accompanying code will be made available to the research community. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
232. Do portrait artists have enhanced face processing abilities? Evidence from hidden Markov modeling of eye movements.
- Author
-
Hsiao, Janet H., An, Jeehye, Zheng, Yueyuan, and Chan, Antoni B.
- Subjects
- *
HIDDEN Markov models , *EYE movements , *FACE perception , *MARKOV processes , *SELECTIVITY (Psychology) , *STIMULUS & response (Psychology) , *ART , *RESEARCH , *RESEARCH methodology , *MEDICAL cooperation , *EVALUATION research , *EYE , *COMPARATIVE studies , *VISUAL perception - Abstract
Recent research has suggested the importance of part-based information in face recognition in addition to global, whole-face information. Nevertheless, face drawing experience was reported to enhance selective attention to the eyes but did not improve face recognition performance, leading to speculations about limited plasticity in adult face recognition. Here we examined the mechanism underlying the limited advantage of face drawing experience in face recognition through the Eye Movement analysis with Hidden Markov Models (EMHMM) approach. We found that portrait artists showed more eyes-focused eye movement patterns and outperformed novices in face matching, and participants' drawing rating was correlated with both eye movement pattern and performance. In contrast, portrait artists did not outperform novices and did not differ from novices in eye movement pattern in either the face recognition or part-whole tasks, although the eyes-focused pattern was associated with better recognition performance and longer response times in the whole condition relative to the part condition. Interestingly, in contrast to the face recognition and part-whole tasks, participants' performance in face matching was predicted by their drawing rating but not eye movement pattern. These results suggested that artists' advantage in face processing is specific to tasks similar to their drawing experience such as face matching, and may be related to their better ability in extracting identity-invariant information between two faces rather than more eyes-focused eye movement patterns. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
233. Gradient-Based Instance-Specific Visual Explanations for Object Specification and Object Discrimination.
- Author
-
Zhao C, Hsiao JH, and Chan AB
- Abstract
We propose the gradient-weighted Object Detector Activation Maps (ODAM), a visual explanation technique for interpreting the predictions of object detectors. Utilizing the gradients of detector targets flowing into the intermediate feature maps, ODAM produces heat maps that show the influence of regions on the detector's decision for each predicted attribute. Compared to previous works on classification activation maps (CAM), ODAM generates instance-specific explanations rather than class-specific ones. We show that ODAM is applicable to one-stage, two-stage, and transformer-based detectors with different types of detector backbones and heads, and produces higher-quality visual explanations than the state-of-the-art in terms of both effectiveness and efficiency. We discuss two explanation tasks for object detection: 1) object specification: what is the important region for the prediction? 2) object discrimination: which object is detected? Aiming at these two aspects, we present a detailed analysis of the visual explanations of detectors and carry out extensive experiments to validate the effectiveness of the proposed ODAM. Furthermore, we investigate user trust on the explanation maps, how well the visual explanations of object detectors agrees with human explanations, as measured through human eye gaze, and whether this agreement is related with user trust. Finally, we also propose two applications, ODAM-KD and ODAM-NMS, based on these two abilities of ODAM. ODAM-KD utilizes the object specification of ODAM to generate top-down attention for key predictions and instruct the knowledge distillation of object detection. ODAM-NMS considers the location of the model's explanation for each prediction to distinguish the duplicate detected objects. A training scheme, ODAM-Train, is proposed to improve the quality on object discrimination, and help with ODAM-NMS.
- Published
- 2024
- Full Text
- View/download PDF
234. Another Perspective of Over-Smoothing: Alleviating Semantic Over-Smoothing in Deep GNNs.
- Author
-
Li J, Zhang Q, Liu W, Chan AB, and Fu YG
- Abstract
Graph neural networks (GNNs) are widely used for analyzing graph-structural data and solving graph-related tasks due to their powerful expressiveness. However, existing off-the-shelf GNN-based models usually consist of no more than three layers. Deeper GNNs usually suffer from severe performance degradation due to several issues including the infamous "over-smoothing" issue, which restricts the further development of GNNs. In this article, we investigate the over-smoothing issue in deep GNNs. We discover that over-smoothing not only results in indistinguishable embeddings of graph nodes, but also alters and even corrupts their semantic structures, dubbed semantic over-smoothing. Existing techniques, e.g., graph normalization, aim at handling the former concern, but neglect the importance of preserving the semantic structures in the spatial domain, which hinders the further improvement of model performance. To alleviate the concern, we propose a cluster-keeping sparse aggregation strategy to preserve the semantic structure of embeddings in deep GNNs (especially for spatial GNNs). Particularly, our strategy heuristically redistributes the extent of aggregations for all the nodes from layers, instead of aggregating them equally, so that it enables aggregate concise yet meaningful information for deep layers. Without any bells and whistles, it can be easily implemented as a plug-and-play structure of GNNs via weighted residual connections. Last, we analyze the over-smoothing issue on the GNNs with weighted residual structures and conduct experiments to demonstrate the performance comparable to the state-of-the-arts.
- Published
- 2024
- Full Text
- View/download PDF
235. Generalized Characteristic Function Loss for Crowd Analysis in the Frequency Domain.
- Author
-
Shu W, Wan J, and Chan AB
- Abstract
Typical approaches that learn crowd density maps are limited to extracting the supervisory information from the loosely organized spatial information in the crowd dot/density maps. This paper tackles this challenge by performing the supervision in the frequency domain. More specifically, we devise a new loss function for crowd analysis called generalized characteristic function loss (GCFL). This loss carries out two steps: 1) transforming the spatial information in density or dot maps to the frequency domain; 2) calculating a loss value between their frequency contents. For step 1, we establish a series of theoretical fundaments by extending the definition of the characteristic function for probability distributions to density maps, as well as proving some vital properties of the extended characteristic function. After taking the characteristic function of the density map, its information in the frequency domain is well-organized and hierarchically distributed, while in the spatial domain it is loose-organized and dispersed everywhere. In step 2, we design a loss function that can fit the information organization in the frequency domain, allowing the exploitation of the well-organized frequency information for the supervision of crowd analysis tasks. The loss function can be adapted to various crowd analysis tasks through the specification of its window functions. In this paper, we demonstrate its power in three tasks: Crowd Counting, Crowd Localization and Noisy Crowd Counting. We show the advantages of our GCFL compared to other SOTA losses and its competitiveness to other SOTA methods by theoretical analysis and empirical results on benchmark datasets. Our codes are available at https://github.com/wbshu/Crowd_Counting_in_the_Frequency_Domain.
- Published
- 2024
- Full Text
- View/download PDF
236. Modeling Noisy Annotations for Point-Wise Supervision.
- Author
-
Wan J, Wu Q, and Chan AB
- Abstract
Point-wise supervision is widely adopted in computer vision tasks such as crowd counting and human pose estimation. In practice, the noise in point annotations may affect the performance and robustness of algorithm significantly. In this paper, we investigate the effect of annotation noise in point-wise supervision and propose a series of robust loss functions for different tasks. In particular, the point annotation noise includes spatial-shift noise, missing-point noise, and duplicate-point noise. The spatial-shift noise is the most common one, and exists in crowd counting, pose estimation, visual tracking, etc, while the missing-point and duplicate-point noises usually appear in dense annotations, such as crowd counting. In this paper, we first consider the shift noise by modeling the real locations as random variables and the annotated points as noisy observations. The probability density function of the intermediate representation (a smooth heat map generated from dot annotations) is derived and the negative log likelihood is used as the loss function to naturally model the shift uncertainty in the intermediate representation. The missing and duplicate noise are further modeled by an empirical way with the assumption that the noise appears at high density region with a high probability. We apply the method to crowd counting, human pose estimation and visual tracking, propose robust loss functions for those tasks, and achieve superior performance and robustness on widely used datasets.
- Published
- 2023
- Full Text
- View/download PDF
237. Visual attention to own- versus other-race faces: Perspectives from learning mechanisms and task demands.
- Author
-
Hsiao JH and Chan AB
- Subjects
- Humans, Learning, Recognition, Psychology, Eye Movements, Racial Groups psychology, Pattern Recognition, Visual
- Abstract
Multiple factors have been proposed to contribute to the other-race effect in face recognition, including perceptual expertise and social-cognitive accounts. Here, we propose to understand the effect and its contributing factors from the perspectives of learning mechanisms that involve joint learning of visual attention strategies and internal representations for faces, which can be modulated by quality of contact with other-race individuals including emotional and motivational factors. Computational simulations of this process will enhance our understanding of interactions among factors and help resolve inconsistent results in the literature. In particular, since learning is driven by task demands, visual attention effects observed in different face-processing tasks, such as passive viewing or recognition, are likely to be task specific (although may be associated) and should be examined and compared separately. When examining visual attention strategies, the use of more data-driven and comprehensive eye movement measures, taking both spatial-temporal pattern and consistency of eye movements into account, can lead to novel discoveries in other-race face processing. The proposed framework and analysis methods may be applied to other tasks of real-life significance such as face emotion recognition, further enhancing our understanding of the relationship between learning and visual cognition., (© 2023 The Authors. British Journal of Psychology published by John Wiley & Sons Ltd on behalf of The British Psychological Society.)
- Published
- 2023
- Full Text
- View/download PDF
238. Author Correction: Understanding the role of eye movement consistency in face recognition and autism through integrating deep neural networks and hidden Markov models.
- Author
-
Hsiao JH, An J, Hui VKS, Zheng Y, and Chan AB
- Published
- 2023
- Full Text
- View/download PDF
239. On becoming socially anxious: Toddlers' attention bias to fearful faces.
- Author
-
Wang L, Hsiao JH, Chan AB, Cheung J, Hung S, and Au TK
- Subjects
- Humans, Child, Preschool, Infant, Anxiety, Anger, Happiness, Emotions, Facial Expression, Fear psychology
- Abstract
Early attention bias to threat-related negative emotions may lead children to overestimate dangers in social situations. This study examined its emergence and how it might develop in tandem with a known predictor namely temperamental shyness for toddlers' fear of strangers in 168 Chinese toddlers. Measurable individual differences in such attention bias to fearful faces were found and remained stable from age 12 to 18 months. When shown photos of paired happy versus fearful or happy versus angry faces, toddlers initially gazed more and had longer initial fixation and total fixation at fearful faces compared with happy faces consistently. However, they initially gazed more at happy faces compared with angry faces consistently and had a longer total fixation at angry faces only at 18 months. Stranger anxiety at 12 months predicted attention bias to fearful faces at 18 months. Temperamentally shyer 12-month-olds went on to show stronger attention bias to fearful faces at 18 months, and their fear of strangers also increased more from 12 to 18 months. Together with prior research suggesting attention bias to angry or fearful faces foretelling social anxiety, the present findings point to likely positive feedback loops among attention bias to fearful faces, temperamental shyness, and stranger anxiety in early childhood. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
- Published
- 2023
- Full Text
- View/download PDF
240. On Distinctive Image Captioning via Comparing and Reweighting.
- Author
-
Wang J, Xu W, Wang Q, and Chan AB
- Abstract
Recent image captioning models are achieving impressive results based on popular metrics, i.e., BLEU, CIDEr, and SPICE. However, focusing on the most popular metrics that only consider the overlap between the generated captions and human annotation could result in using common words and phrases, which lacks distinctiveness, i.e., many similar images have the same caption. In this paper, we aim to improve the distinctiveness of image captions via comparing and reweighting with a set of similar images. First, we propose a distinctiveness metric-between-set CIDEr (CIDErBtw) to evaluate the distinctiveness of a caption with respect to those of similar images. Our metric reveals that the human annotations of each image in the MSCOCO dataset are not equivalent based on distinctiveness; however, previous works normally treat the human annotations equally during training, which could be a reason for generating less distinctive captions. In contrast, we reweight each ground-truth caption according to its distinctiveness during training. We further integrate a long-tailed weight strategy to highlight the rare words that contain more information, and captions from the similar image set are sampled as negative examples to encourage the generated sentence to be unique. Finally, extensive experiments are conducted, showing that our proposed approach significantly improves both distinctiveness (as measured by CIDErBtw and retrieval metrics) and accuracy (e.g., as measured by CIDEr) for a wide variety of image captioning baselines. These results are further confirmed through a user study.
- Published
- 2023
- Full Text
- View/download PDF
241. Understanding the role of eye movement consistency in face recognition and autism through integrating deep neural networks and hidden Markov models.
- Author
-
Hsiao JH, An J, Hui VKS, Zheng Y, and Chan AB
- Abstract
Greater eyes-focused eye movement pattern during face recognition is associated with better performance in adults but not in children. We test the hypothesis that higher eye movement consistency across trials, instead of a greater eyes-focused pattern, predicts better performance in children since it reflects capacity in developing visual routines. We first simulated visual routine development through combining deep neural network and hidden Markov model that jointly learn perceptual representations and eye movement strategies for face recognition. The model accounted for the advantage of eyes-focused pattern in adults, and predicted that in children (partially trained models) consistency but not pattern of eye movements predicted recognition performance. This result was then verified with data from typically developing children. In addition, lower eye movement consistency in children was associated with autism diagnosis, particularly autistic traits in social skills. Thus, children's face recognition involves visual routine development through social exposure, indexed by eye movement consistency., (© 2022. The Author(s).)
- Published
- 2022
- Full Text
- View/download PDF
242. Understanding the collinear masking effect in visual search through eye tracking.
- Author
-
Hsiao JH, Chan AB, An J, Yeh SL, and Jingling L
- Subjects
- Aged, Attention, Fixation, Ocular, Humans, Reaction Time, Visual Perception, Young Adult, Eye-Tracking Technology, Saccades
- Abstract
Recent research has reported that, while both orientation contrast and collinearity increase target salience in visual search, a combination of the two counterintuitively masks a local target. Through eye-tracking and eye-movement analysis with hidden Markov models (EMHMM), here we showed that this collinear masking effect was associated with reduced eye-fixation consistency (as measured in entropy) at the central fixation cross prior to the search display presentation. As a decreased precision of saccade landing position is shown to be related to attention shift away from the saccadic target, our result suggested that the collinear masking effect may be related to attention shift to a non-saccadic-goal location in expectation of the search display before saccading to the central fixation cross. This attention shift may consequently interfere with attention capture by the collinear distractor containing the target, resulting in the masking effect. In contrast, although older adults had longer response times, more dispersed eye-movement pattern, and lower eye-movement consistency than young adults during visual search, the two age groups did not differ in the masking effect, suggesting limited contribution from ageing-related cognitive decline. Thus, participants' pre-saccadic attention shift prior to search may be an important factor influencing their search behavior., (© 2021. The Psychonomic Society, Inc.)
- Published
- 2021
- Full Text
- View/download PDF
243. Eye movement analysis with switching hidden Markov models.
- Author
-
Chuk T, Chan AB, Shimojo S, and Hsiao JH
- Subjects
- Humans, Individuality, Markov Chains, Probability, Eye Movements, Face
- Abstract
Here we propose the eye movement analysis with switching hidden Markov model (EMSHMM) approach to analyzing eye movement data in cognitive tasks involving cognitive state changes. We used a switching hidden Markov model (SHMM) to capture a participant's cognitive state transitions during the task, with eye movement patterns during each cognitive state being summarized using a regular HMM. We applied EMSHMM to a face preference decision-making task with two pre-assumed cognitive states-exploration and preference-biased periods-and we discovered two common eye movement patterns through clustering the cognitive state transitions. One pattern showed both a later transition from the exploration to the preference-biased cognitive state and a stronger tendency to look at the preferred stimulus at the end, and was associated with higher decision inference accuracy at the end; the other pattern entered the preference-biased cognitive state earlier, leading to earlier above-chance inference accuracy in a trial but lower inference accuracy at the end. This finding was not revealed by any other method. As compared with our previous HMM method, which assumes no cognitive state change (i.e., EMHMM), EMSHMM captured eye movement behavior in the task better, resulting in higher decision inference accuracy. Thus, EMSHMM reveals and provides quantitative measures of individual differences in cognitive behavior/style, making a significant impact on the use of eyetracking to study cognitive behavior across disciplines.
- Published
- 2020
- Full Text
- View/download PDF
244. Density-Preserving Hierarchical EM Algorithm: Simplifying Gaussian Mixture Models for Approximate Inference.
- Author
-
Lei Yu, Tianyu Yang, and Chan AB
- Abstract
We propose an algorithm for simplifying a finite mixture model into a reduced mixture model with fewer mixture components. The reduced model is obtained by maximizing a variational lower bound of the expected log-likelihood of a set of virtual samples. We develop three applications for our mixture simplification algorithm: recursive Bayesian filtering using Gaussian mixture model posteriors, KDE mixture reduction, and belief propagation without sampling. For recursive Bayesian filtering, we propose an efficient algorithm for approximating an arbitrary likelihood function as a sum of scaled Gaussian. Experiments on synthetic data, human location modeling, visual tracking, and vehicle self-localization show that our algorithm can be widely used for probabilistic data analysis, and is more accurate than other mixture simplification methods.
- Published
- 2019
- Full Text
- View/download PDF
245. Eye-movement patterns in face recognition are associated with cognitive decline in older adults.
- Author
-
Chan CYH, Chan AB, Lee TMC, and Hsiao JH
- Subjects
- Adolescent, Adult, Aged, Aged, 80 and over, Female, Humans, Male, Young Adult, Aging physiology, Cognitive Dysfunction physiopathology, Executive Function physiology, Facial Recognition physiology, Fixation, Ocular physiology
- Abstract
The Hidden Markov Modeling approach for eye-movement data analysis is able to quantitatively assess differences and similarities among individual patterns. Here we applied this approach to examine the relationships between eye-movement patterns in face recognition and age-related cognitive decline. We found that significantly more older than young adults adopted "holistic" patterns, in which most eye fixations landed around the face center, as opposed to "analytic" patterns, in which eye movements switched among the two eyes and the face center. Participants showing analytic patterns had better performance than those with holistic patterns regardless of age. Interestingly, older adults with lower cognitive status (as assessed by the Montreal Cognitive Assessment), particularly in executive and visual attention functioning (as assessed by Tower of London and Trail Making Tests) were associated with a higher likelihood of holistic patterns. This result suggests the possibility of using eye movements as an easily deployable screening assessment for cognitive decline in older adults.
- Published
- 2018
- Full Text
- View/download PDF
246. Enhanced figure-ground classification with background prior propagation.
- Author
-
Chen Y and Chan AB
- Abstract
We present an adaptive figure-ground segmentation algorithm that is capable of extracting foreground objects in a generic environment. Starting from an interactively assigned background mask, an initial background prior is defined and multiple soft-label partitions are generated from different foreground priors by progressive patch merging. These partitions are fused to produce a foreground probability map. The probability map is then binarized via threshold sweeping to create multiple hard-label candidates. A set of segmentation hypotheses is formed using different evaluation scores. From this set, the hypothesis with maximal local stability is propagated as the new background prior, and the segmentation process is repeated until convergence. Similarity voting is used to select a winner set, and the corresponding hypotheses are fused to yield the final segmentation result. Experiments indicate that our method performs at or above the current state-of-the-art on several data sets, with particular success on challenging scenes that contain irregular or multiple-connected foregrounds.
- Published
- 2015
- Full Text
- View/download PDF
247. A robust likelihood function for 3D human pose tracking.
- Author
-
Zhang W, Shang L, and Chan AB
- Subjects
- Algorithms, Computer Simulation, Humans, Models, Statistical, Walking, Image Processing, Computer-Assisted methods, Movement physiology, Posture physiology, Video Recording methods
- Abstract
Recent works on 3D human pose tracking using unsupervised methods typically focus on improving the optimization framework to find a better maximum in the likelihood function (i.e., the tracker). In contrast, in this paper, we focus on improving the likelihood function, by making it more robust and less ambiguous, thus making the optimization task easier. In particular, we propose an exponential chamfer distance for model matching that is robust to small pose changes, and a part-based model that is better able to localize partially occluded and overlapping parts. Using a standard annealing particle filter and simple diffusion motion model, the proposed likelihood function obtains significantly lower error than other unsupervised tracking methods on the HumanEva dataset. Noting that the joint system of the tracker’s body model is different than the joint system of the motion capture ground-truth model, we propose a novel method for transforming between the two joint systems. Applying this bias correction, our part-based likelihood obtains results equivalent to state-of-the-art supervised tracking methods.
- Published
- 2014
- Full Text
- View/download PDF
248. Understanding eye movements in face recognition using hidden Markov models.
- Author
-
Chuk T, Chan AB, and Hsiao JH
- Subjects
- Adolescent, Female, Humans, Male, Markov Chains, Models, Statistical, Probability, Young Adult, Eye Movements physiology, Face physiology, Pattern Recognition, Visual physiology, Recognition, Psychology physiology
- Abstract
We use a hidden Markov model (HMM) based approach to analyze eye movement data in face recognition. HMMs are statistical models that are specialized in handling time-series data. We conducted a face recognition task with Asian participants, and model each participant's eye movement pattern with an HMM, which summarized the participant's scan paths in face recognition with both regions of interest and the transition probabilities among them. By clustering these HMMs, we showed that participants' eye movements could be categorized into holistic or analytic patterns, demonstrating significant individual differences even within the same culture. Participants with the analytic pattern had longer response times, but did not differ significantly in recognition accuracy from those with the holistic pattern. We also found that correct and wrong recognitions were associated with distinctive eye movement patterns; the difference between the two patterns lies in the transitions rather than locations of the fixations alone., (© 2014 ARVO.)
- Published
- 2014
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.