41 results on '"Roland Goecke"'
Search Results
2. Feature Map Augmentation to Improve Rotation Invariance in Convolutional Neural Networks
- Author
-
Dharmendra Sharma, Dinesh Kumar, and Roland Goecke
- Subjects
Machine vision ,Computer science ,business.industry ,Computation ,Deep learning ,Feature extraction ,0211 other engineering and technologies ,02 engineering and technology ,Invariant (physics) ,01 natural sciences ,Convolutional neural network ,0103 physical sciences ,Computer vision ,Artificial intelligence ,business ,010303 astronomy & astrophysics ,Rotation (mathematics) ,MNIST database ,021101 geological & geomatics engineering - Abstract
Whilst it is a trivial task for a human vision system to recognize and detect objects with good accuracy, making computer vision algorithms achieve the same feat remains an active area of research. For a human vision system, objects seen once are recognized with high accuracy despite alterations to its appearance by various transformations such as rotations, translations, scale, distortions and occlusion making it a state-of-the-art spatially invariant biological vision system. To make computer algorithms such as Convolutional Neural Networks (CNNs) spatially invariant one popular practice is to introduce variations in the data set through data augmentation. This achieves good results but comes with increased computation cost. In this paper, we address rotation transformation and instead of using data augmentation we propose a novel method that allows CNNs to improve rotation invariance by augmentation of feature maps. This is achieved by creating a rotation transformer layer called Rotation Invariance Transformer (RiT) that can be placed at the output end of a convolution layer. Incoming features are rotated by a given set of rotation parameters which are then passed to the next layer. We test our technique on benchmark CIFAR10 and MNIST datasets in a setting where our RiT layer is placed between the feature extraction and classification layers of the CNN. Our results show promising improvements in the networks ability to be rotation invariant across classes with no increase in model parameters.
- Published
- 2020
- Full Text
- View/download PDF
3. MSMCT: Multi-State Multi-Camera Tracker
- Author
-
Behzad Bozorgtabar and Roland Goecke
- Subjects
Optimization problem ,business.industry ,Computer science ,Association (object-oriented programming) ,05 social sciences ,02 engineering and technology ,Variation (game tree) ,Similarity measure ,Tracking (particle physics) ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,Trajectory ,Eye tracking ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,0509 other social sciences ,Electrical and Electronic Engineering ,050904 information & library sciences ,business - Abstract
Visual tracking of multiple persons simultaneously is an important tool for group behaviour analysis. In this paper, we demonstrate that multi-target tracking in a network of non-overlapping cameras can be formulated in a framework, where the association among all given target hypotheses both within and between cameras is performed simultaneously. Our approach helps to overcome the fragility of multi-camera-based tracking, where the performance relies on the single-camera tracking results obtained at input level. In particular, we formulate an estimation of the target states as a multi-state graph optimization problem, in which the likelihood of each target hypothesis belonging to different identities is modeled. In addition, we learn the target-specific model to improve the similarity measure among targets based on the appearance cues. We also handle the occluded targets when there is no reliable evidence for the target’s presence and each target trajectory is expected to be fragmented into multiple tracks. An iterative procedure is proposed to solve the optimization problem, resulting in final trajectories that reveal the true states of the targets. The performance of the proposed approach has been extensively evaluated on challenging multi-camera non-overlapping tracking data sets, in which many difficulties, such as occlusion, viewpoint, and illumination variation, are present. The results of systematic experiments conducted on a large set of sequences show that the proposed approach outperforms several state-of-the-art trackers.
- Published
- 2018
- Full Text
- View/download PDF
4. Facial feature tracking: a psychophysiological measure to assess exercise intensity?
- Author
-
Bradley Clark, Roland Goecke, Kevin G. Thompson, Kathleen H. Miles, and Julien D. Périard
- Subjects
Adult ,medicine.medical_specialty ,Movement ,Physical Exertion ,Video Recording ,Physical Therapy, Sports Therapy and Rehabilitation ,Correlation ,03 medical and health sciences ,0302 clinical medicine ,Physical medicine and rehabilitation ,Heart Rate ,medicine ,Humans ,Orthopedics and Sports Medicine ,Computer vision ,Lactic Acid ,Exercise physiology ,Facial movement ,Exercise ,Facial expression ,business.industry ,Lactate threshold ,030229 sport sciences ,Middle Aged ,Bicycling ,Intensity (physics) ,Facial Expression ,Face ,Time and Motion Studies ,Exercise intensity ,Feature tracking ,Perception ,Artificial intelligence ,business ,Psychology ,Head ,030217 neurology & neurosurgery ,Psychophysiology - Abstract
The primary aim of this study was to determine whether facial feature tracking reliably measures changes in facial movement across varying exercise intensities. Fifteen cyclists completed three, incremental intensity, cycling trials to exhaustion while their faces were recorded with video cameras. Facial feature tracking was found to be a moderately reliable measure of facial movement during incremental intensity cycling (intra-class correlation coefficient = 0.65-0.68). Facial movement (whole face (WF), upper face (UF), lower face (LF) and head movement (HM)) increased with exercise intensity, from lactate threshold one (LT1) until attainment of maximal aerobic power (MAP) (WF 3464 ± 3364mm, P 0.005; UF 1961 ± 1779mm, P = 0.002; LF 1608 ± 1404mm, P = 0.002; HM 849 ± 642mm, P 0.001). UF movement was greater than LF movement at all exercise intensities (UF minus LF at: LT1, 1048 ± 383mm; LT2, 1208 ± 611mm; MAP, 1401 ± 712mm; P 0.001). Significant medium to large non-linear relationships were found between facial movement and power output (r
- Published
- 2017
- Full Text
- View/download PDF
5. Efficient multi-target tracking via discovering dense subgraphs
- Author
-
Roland Goecke and Behzad Bozorgtabar
- Subjects
Smoothness (probability theory) ,BitTorrent tracker ,business.industry ,Context (language use) ,02 engineering and technology ,Link (geometry) ,021001 nanoscience & nanotechnology ,Tracking (particle physics) ,Discriminative model ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,Trajectory ,020201 artificial intelligence & image processing ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,0210 nano-technology ,Set (psychology) ,business ,Software ,Mathematics - Abstract
A multi-target tracking is formulated as a dense subgraph discovering problem.Both local and global cues are exploited to represent the tracklet affinity model.The distinguishable appearance based models are learned for the targets. In this paper, we cast multi-target tracking as a dense subgraph discovering problem on the undirected relation graph of all given target hypotheses. We aim to extract multiple clusters (dense subgraphs), in which each cluster contains a set of hypotheses of one particular target. In the presence of occlusion or similar moving targets or when there is no reliable evidence for the target's presence, each target trajectory is expected to be fragmented into multiple tracklets. The proposed tracking framework can efficiently link such fragmented target trajectories to build a longer trajectory specifying the true states of the target. In particular, a discriminative scheme is devised via learning the targets' appearance models. Moreover, the smoothness characteristic of the target trajectory is utilised by suggesting a smoothness tracklet affinity model to increase the power of the proposed tracker to produce persistent target trajectories revealing different targets' moving paths. The performance of the proposed approach has been extensively evaluated on challenging public datasets and also in the context of team sports (e.g. soccer, AFL), where team players tend to exhibit quick and unpredictable movements. Systematic experimental results conducted on a large set of sequences show that the proposed approach performs better than the state-of-the-art trackers, in particular, when dealing with occlusion and fragmented target trajectory.
- Published
- 2016
- Full Text
- View/download PDF
6. Gait Estimation and Analysis from Noisy Observations
- Author
-
Hafsa Ismail, Hanna Suominen, Ibrahim Radwan, and Roland Goecke
- Subjects
Weakness ,Observer (quantum physics) ,Computer science ,Movement ,Early detection ,01 natural sciences ,03 medical and health sciences ,0302 clinical medicine ,Gait (human) ,medicine ,Humans ,Computer vision ,Force platform ,Gait ,Ground truth ,business.industry ,010401 analytical chemistry ,Regression analysis ,0104 chemical sciences ,Biomechanical Phenomena ,Gait analysis ,Artificial intelligence ,medicine.symptom ,Cadence ,business ,Motion measurement ,030217 neurology & neurosurgery - Abstract
People’s walking style – their gait – can be an indicator of their health as it is affected by pain, illness, weakness, and aging. Gait analysis aims to detect gait variations. It is usually performed by an experienced observer with the help of different devices, such as cameras, sensors, and/or force plates. Frequent gait analysis, to observe changes over time, is costly and impractical. This paper initiates an inexpensive gait analysis based on recorded video. Our methodology first discusses estimating gait movements from predicted 2D joint locations that represent selected body parts from videos. Then, using a long-short-term memory (LSTM) regression model to predict 3D (Vicon) data, which was recorded simultaneously with the videos as ground truth. Feet movements estimated from video are highly correlated with the Vicon data, enabling gait analysis by measuring selected spatial gait parameters (step and cadence length, and walk base) from estimated movements. Using inexpensive and reliable cameras to record, estimate and analyse a person’s gait can be helpful; early detection of its changes facilitates early intervention.
- Published
- 2018
- Full Text
- View/download PDF
7. Ordered trajectories for human action recognition with large number of classes
- Author
-
Roland Goecke and O. V. Ramana Murthy
- Subjects
business.industry ,Optical flow ,Pattern recognition ,Feature selection ,Support vector machine ,Bag-of-words model ,Feature (computer vision) ,Signal Processing ,Trajectory ,Benchmark (computing) ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Representation (mathematics) ,business ,Mathematics - Abstract
Recently, a video representation based on dense trajectories has been shown to outperform other human action recognition methods on several benchmark datasets. The trajectories capture the motion characteristics of different moving objects in space and temporal dimensions. In dense trajectories, points are sampled at uniform intervals in space and time and then tracked using a dense optical flow field over a fixed length of L frames (optimally 15) spread overlapping over the entire video. However, among these base (dense) trajectories, a few may continue for longer than duration L, capturing motion characteristics of objects that may be more valuable than the information from the base trajectories. Thus, we propose a technique that searches for trajectories with a longer duration and refer to these as 'ordered trajectories'. Experimental results show that ordered trajectories perform much better than the base trajectories, both standalone and when combined. Moreover, the uniform sampling of dense trajectories does not discriminate objects of interest from the background or other objects. Consequently, a lot of information is accumulated, which actually may not be useful. This can especially escalate when there is more data due to an increase in the number of action classes. We observe that our proposed trajectories remove some background clutter, too. We use a Bag-of-Words framework to conduct experiments on the benchmark HMDB51, UCF50 and UCF101 datasets containing the largest number of action classes to date. Further, we also evaluate three state-of-the art feature encoding techniques to study their performance on a common platform. A technique that captures information of objects with longer duration.A feature selection like approach that delivers better performance than several trajectory variants.Removal of a large number of trajectories related to background noise.We apply our technique on action datasets HMDB51, UCF50 and UCF101 containing largest number of classes till date.
- Published
- 2015
- Full Text
- View/download PDF
8. Joint Registration and Representation Learning for Unconstrained Face Identification
- Author
-
Naoufel Werghi, Roland Goecke, Salman H. Khan, and Munawar Hayat
- Subjects
Artificial neural network ,Computer science ,business.industry ,Deep learning ,Image registration ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Convolutional neural network ,Facial recognition system ,Discriminative model ,Feature (computer vision) ,Face (geometry) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business ,Pose ,Feature learning ,0105 earth and related environmental sciences - Abstract
Recent advances in deep learning have resulted in human-level performances on popular unconstrained face datasets including Labeled Faces in the Wild and YouTube Faces. To further advance research, IJB-A benchmark was recently introduced with more challenges especially in the form of extreme head poses. Registration of such faces is quite demanding and often requires laborious procedures like facial landmark localization. In this paper, we propose a Convolutional Neural Networks based data-driven approach which learns to simultaneously register and represent faces. We validate the proposed scheme on template based unconstrained face identification. Here, a template contains multiple media in the form of images and video frames. Unlike existing methods which synthesize all template media information at feature level, we propose to keep the template media intact. Instead, we represent gallery templates by their trained one-vs-rest discriminative models and then employ a Bayesian strategy which optimally fuses decisions of all medias in a query template. We demonstrate the efficacy of the proposed scheme on IJB-A, YouTube Celebrities and COX datasets where our approach achieves significant relative performance boosts of 3.6%, 21.6% and 12.8% respectively.
- Published
- 2017
- Full Text
- View/download PDF
9. Human Postural Sway Estimation from Noisy Observations
- Author
-
Roland Goecke, Ibrahim Radwan, Hafsa Ismail, Gordon Waddington, and Hanna Suominen
- Subjects
business.industry ,Computer science ,Process (computing) ,Balance test ,Video camera ,02 engineering and technology ,010501 environmental sciences ,Video tracking system ,01 natural sciences ,law.invention ,Recurrent neural network ,law ,0202 electrical engineering, electronic engineering, information engineering ,RGB color model ,Elderly people ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business ,Simulation ,0105 earth and related environmental sciences ,Balance (ability) - Abstract
Postural sway is a reflection of brain signals that are generated to control a person’s balance. During the process of ageing, the postural sway changes, which increases the likelihood of a fall. Thus far, expensive specialist equipment is required, such as a force plate, in order to detect such changes over time, which makes the process costly and impractical. Our long-term goal is to investigate the use of inexpensive, everyday video technology as an alternative. This paper describes a study that establishes a 3-way correlation between the clinical gold standard (force plate), a highly accurate multi-camera 3D video tracking system (Vicon) and a standard RGB video camera. To this end, a dataset of 18 subjects performing the BESS balance test on the force plate was recorded, while simultaneously recording the 3D Vicon data, and the RGB video camera data. Then, using Gaussian process regression and a recurrent neural network, models were built to predict the lateral postural sway in the force plate data from the RGB video data. The predicted results show high correlation with the actual force plate signals, which supports the hypothesis that lateral postural sway can be accurately predicted from video data alone. Detecting changes to a person’s postural sway can be used to improve elderly people’s life by monitoring the likelihood of a fall and detecting its increase well before a fall occurs, so that countermeasures (e.g. exercises) can be put in place to prevent falls occurring.
- Published
- 2017
- Full Text
- View/download PDF
10. Facial Performance Transfer via Deformable Models and Parametric Correspondence
- Author
-
Abhinav Dhall, Roland Goecke, M. de la Hunty, and Akshay Asthana
- Subjects
Male ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Solid modeling ,Computer graphics ,Imaging, Three-Dimensional ,Image texture ,Computer Graphics ,Humans ,Computer Simulation ,Computer vision ,Graphics ,Parametric statistics ,business.industry ,Animation ,Computer Graphics and Computer-Aided Design ,Active appearance model ,Cinematography ,Face ,Face (geometry) ,Signal Processing ,Female ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Focus (optics) ,Algorithms ,Software - Abstract
The issue of transferring facial performance from one person's face to another's has been an area of interest for the movie industry and the computer graphics community for quite some time. In recent years, deformable face models, such as the Active Appearance Model (AAM), have made it possible to track and synthesize faces in real time. Not surprisingly, deformable face model-based approaches for facial performance transfer have gained tremendous interest in the computer vision and graphics community. In this paper, we focus on the problem of real-time facial performance transfer using the AAM framework. We propose a novel approach of learning the mapping between the parameters of two completely independent AAMs, using them to facilitate the facial performance transfer in a more realistic manner than previous approaches. The main advantage of modeling this parametric correspondence is that it allows a "meaningful" transfer of both the nonrigid shape and texture across faces irrespective of the speakers' gender, shape, and size of the faces, and illumination conditions. We explore linear and nonlinear methods for modeling the parametric correspondence between the AAMs and show that the sparse linear regression method performs the best. Moreover, we show the utility of the proposed framework for a cross-language facial performance transfer that is an area of interest for the movie dubbing industry.
- Published
- 2012
- Full Text
- View/download PDF
11. Regression based automatic face annotation for deformable model building
- Author
-
Akshay Asthana, Simon Lucey, and Roland Goecke
- Subjects
Ground truth ,Facial expression ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Image processing ,Expression (mathematics) ,Active appearance model ,Artificial Intelligence ,Face (geometry) ,Signal Processing ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Correspondence problem ,Software - Abstract
A major drawback of statistical models of non-rigid, deformable objects, such as the active appearance model (AAM), is the required pseudo-dense annotation of landmark points for every training image. We propose a regression-based approach for automatic annotation of face images at arbitrary pose and expression, and for deformable model building using only the annotated frontal images. We pose the problem of learning the pattern of manual annotation as a data-driven regression problem and explore several regression strategies to effectively predict the spatial arrangement of the landmark points for unseen face images, with arbitrary expression, at arbitrary poses. We show that the proposed fully sparse non-linear regression approach outperforms other regression strategies by effectively modelling the changes in the shape of the face under varying pose and is capable of capturing the subtleties of different facial expressions at the same time, thus, ensuring the high quality of the generated synthetic images. We show the generalisability of the proposed approach by automatically annotating the face images from four different databases and verifying the results by comparing them with a ground truth obtained from manual annotations.
- Published
- 2011
- Full Text
- View/download PDF
12. Video and Image based Emotion Recognition Challenges in the Wild
- Author
-
Abhinav Dhall, Roland Goecke, Tom Gedeon, Jyoti Joshi, and O. V. Ramana Murthy
- Subjects
Facial expression ,Static image ,business.industry ,Computer science ,Speech recognition ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Computer vision ,Artificial intelligence ,Emotion recognition ,business ,Video based ,Expression (mathematics) ,Image based - Abstract
The third Emotion Recognition in the Wild (EmotiW) challenge 2015 consists of an audio-video based emotion and static image based facial expression classification sub-challenges, which mimics real-world conditions. The two sub-challenges are based on the Acted Facial Expression in the Wild (AFEW) 5.0 and the Static Facial Expression in the Wild (SFEW) 2.0 databases, respectively. The paper describes the data, baseline method, challenge protocol and the challenge results. A total of 12 and 17 teams participated in the video based emotion and image based expression sub-challenges, respectively.
- Published
- 2015
- Full Text
- View/download PDF
13. Discriminative Multi-Task Sparse Learning for Robust Visual Tracking Using Conditional Random Field
- Author
-
Roland Goecke and Behzad Bozorgtabar
- Subjects
Conditional random field ,Computer science ,business.industry ,Pattern recognition ,Discriminative model ,Robustness (computer science) ,Video tracking ,Eye tracking ,Computer vision ,Artificial intelligence ,Linear combination ,Particle filter ,business ,Sparse matrix - Abstract
In this paper, we propose a discriminative multitask sparse learning scheme for object tracking in a particle filter framework. By representing each particle as a linear combination of adaptive dictionary templates, we utilise the correlations among different particles (tasks) to obtain a better representation and a more efficient scheme than learning each task individually. However, this model is completely generative and the designed tracker may not be robust enough to prevent the drifting problem in the presence of rapid appearance changes. In this paper, we use a Conditional Random Field (CRF) along with the multitask sparse model to extend our scheme to distinguish the object candidate from the background particle candidate. By this way, the number of particle samples is reduced significantly, while we make the tracker more robust. The proposed algorithm is evaluated on 11 challenging sequences and the results confirm the effectiveness of the approach and significantly outperforms the state-of-the-art trackers in terms of accuracy measures including the centre location error and the overlap ratio, respectively.
- Published
- 2014
- Full Text
- View/download PDF
14. Joint sparsity-based robust visual tracking
- Author
-
Roland Goecke and Behzad Bozorgtabar
- Subjects
BitTorrent tracker ,business.industry ,Pattern recognition ,Iteratively reweighted least squares ,Norm (mathematics) ,Video tracking ,Conjugate gradient method ,Outlier ,Eye tracking ,Computer vision ,Artificial intelligence ,Particle filter ,business ,Mathematics - Abstract
In this paper, we propose a new object tracking in a particle filter framework utilising a joint sparsity-based model. Based on the observation that a target can be reconstructed from several templates that are updated dynamically, we jointly analyse the representation of the particles under a single regression framework and with the shared underlying structure. Two convex regularisations are combined and used in our model to enable sparsity as well as facilitate coupling information between particles. Unlike the previous methods that consider a model commonality between particles or regard them as independent tasks, we simultaneously take into account a structure inducing norm and an outlier detecting norm. Such a formulation is shown to be more flexible in terms of handling various types of challenges including occlusion and cluttered background. To derive the optimal solution efficiently, we propose to use a Preconditioned Conjugate Gradient method, which is computationally affordable for high-dimensional data. Furthermore, an online updating procedure scheme is included in the dictionary learning, which makes the proposed tracker less vulnerable to outliers. Experiments on challenging video sequences demonstrate the robustness of the proposed approach to handling occlusion, pose and illumination variation and outperform state-of-the-art trackers in tracking accuracy.
- Published
- 2014
- Full Text
- View/download PDF
15. Dense body part trajectories for human action recognition
- Author
-
Ibrahim Radwan, Roland Goecke, and O. V. Ramana Murthy
- Subjects
Action (philosophy) ,Computer science ,business.industry ,Benchmark (computing) ,Action recognition ,Computer vision ,Human body ,Artificial intelligence ,business ,Pose ,Dense body - Abstract
Several techniques have been proposed for human action recognition from videos. It has been observed that incorporating mid-level viz. human body and/or high-level information viz. pose estimation in the computation of low-level features viz. trajectories yields the best performance in action recognition where full body is presumed. However, in datasets with a large number of classes, where the full body may not be visible at all times, incorporating such mid- and high-level information is unexplored. Moreover, changes and developments in any stage will require a recompute of all low-level features. We decouple mid-level and low-level feature computation and study on benchmark action recognition datasets such as UCF50, UCF101 and HMDB51, containing the largest number of action classes to date. Further, we employ a part-based model for human body part detection in frames statically, thus also investigating classes where the full body is not present. We also track dense regions around the detected human body parts by Hungarian particle linking, thus minimising most of the wrongly detected body parts and enriching the mid-level information.
- Published
- 2014
- Full Text
- View/download PDF
16. Automatic Prediction of Perceived Traits Using Visual Cues under Varied Situational Context
- Author
-
Hatice Gunes, Roland Goecke, and Jyoti Joshi
- Subjects
business.industry ,media_common.quotation_subject ,Behavioral traits ,Behavioral data ,Perception ,Credibility ,Trait ,Personality ,Computer vision ,Artificial intelligence ,Big Five personality traits ,business ,Psychology ,Sensory cue ,media_common ,Cognitive psychology - Abstract
Automatic assessment of human personality traits is a non-trivial problem, especially when perception is marked over a fairly short duration of time. In this study, thin slices of behavioral data are analyzed. Perceived physical and behavioral traits are assessed by external observers (raters). Along with the big-five personality trait model, four new traits are introduced and assessed in this work. The relationship between various traits is investigated to obtain a better understanding of observer perception and assessment. Perception change is also considered when participants interact with several virtual characters each with a distinct emotional style. Encapsulating these observations and analysis, an automated system is proposed by firstly computing low level visual features. Using these features a separate model is trained for each trait and performance is evaluated. Further, a weighted model based on rater credibility is proposed to address observer biases. Experimental results indicate that a weighted model show major improvement for automatic prediction of perceived physical and behavioral traits.
- Published
- 2014
- Full Text
- View/download PDF
17. Thermal spatio-temporal data for stress recognition
- Author
-
Abhinav Dhall, Nandita Sharma, Roland Goecke, and Tamás D. Gedeon
- Subjects
Biometrics ,Local binary patterns ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,Temporal database ,Support vector machine ,Face (geometry) ,Histogram ,Signal Processing ,Stress (linguistics) ,Pattern recognition (psychology) ,Computer vision ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Information Systems - Abstract
Stress is a serious concern facing our world today, motivating the development of a better objective understanding through the use of non-intrusive means for stress recognition by reducing restrictions to natural human behavior. As an initial step in computer vision-based stress detection, this paper proposes a temporal thermal spectrum (TS) and visible spectrum (VS) video database ANUStressDB - a major contribution to stress research. The database contains videos of 35 subjects watching stressed and not-stressed film clips validated by the subjects. We present the experiment and the process conducted to acquire videos of subjects' faces while they watched the films for the ANUStressDB. Further, a baseline model based on computing local binary patterns on three orthogonal planes (LBP-TOP) descriptor on VS and TS videos for stress detection is presented. A LBP-TOP-inspired descriptor was used to capture dynamic thermal patterns in histograms (HDTP) which exploited spatio-temporal characteristics in TS videos. Support vector machines were used for our stress detection model. A genetic algorithm was used to select salient facial block divisions for stress classification and to determine whether certain regions of the face of subjects showed better stress patterns. Results showed that a fusion of facial patterns from VS and TS videos produced statistically significantly better stress recognition rates than patterns from VS or TS videos used in isolation. Moreover, the genetic algorithm selection method led to statistically significantly better stress detection rates than classifiers that used all the facial block divisions. In addition, the best stress recognition rate was obtained from HDTP features fused with LBP-TOP features for TS and VS videos using a hybrid of a genetic algorithm and a support vector machine stress detection model. The model produced an accuracy of 86%.
- Published
- 2014
- Full Text
- View/download PDF
18. A discriminative parts based model approach for fiducial points free and shape constrained head pose normalisation in the wild
- Author
-
Gwen Littlewort, Abhinav Dhall, Roland Goecke, Karan Sikka, and Marian Stewart Bartlett
- Subjects
Propagation of uncertainty ,Landmark ,business.industry ,Computer science ,Detector ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Kinematics ,Iterative reconstruction ,Discriminative model ,Face (geometry) ,Computer vision ,Artificial intelligence ,business ,Texture mapping ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
This paper proposes a method for parts-based view-invariant head pose normalisation, which works well even in difficult real-world conditions. Handling pose is a classical problem in facial analysis. Recently, parts-based models have shown promising performance for facial landmark points detection ‘in the wild’. Leveraging on the success of these models, the proposed data-driven regression framework computes a constrained normalised virtual frontal head pose. The response maps of a discriminatively trained part detector are used as texture information. These sparse texture maps are projected from non-frontal to frontal pose using block-wise structured regression. Finally, a facial kinematic shape constraint is achieved by applying a shape model. The advantages of the proposed approach are: a) no explicit dependence on the outputs of a facial parts detector and, thus, avoiding any error propagation owing to their failure; (b) the application of a shape prior on the reconstructed frontal maps provides an anatomically constrained facial shape; and c) modelling head pose as a mixture-of-parts model allows the framework to work without any prior pose information. Experiments are performed on the Multi-PIE and the ‘in the wild’ SFEW databases. The results demonstrate the effectiveness of the proposed method.
- Published
- 2014
- Full Text
- View/download PDF
19. Monocular Image 3D Human Pose Estimation under Self-Occlusion
- Author
-
Abhinav Dhall, Ibrahim Radwan, and Roland Goecke
- Subjects
Computer science ,Orientation (computer vision) ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Kinematics ,Iterative reconstruction ,3D pose estimation ,Articulated body pose estimation ,Hallucinating ,Computer vision ,Artificial intelligence ,business ,Pose ,Pruning (morphology) - Abstract
In this paper, an automatic approach for 3D pose reconstruction from a single image is proposed. The presence of human body articulation, hallucinated parts and cluttered background leads to ambiguity during the pose inference, which makes the problem non-trivial. Researchers have explored various methods based on motion and shading in order to reduce the ambiguity and reconstruct the 3D pose. The key idea of our algorithm is to impose both kinematic and orientation constraints. The former is imposed by projecting a 3D model onto the input image and pruning the parts, which are incompatible with the anthropomorphism. The latter is applied by creating synthetic views via regressing the input view to multiple oriented views. After applying the constraints, the 3D model is projected onto the initial and synthetic views, which further reduces the ambiguity. Finally, we borrow the direction of the unambiguous parts from the synthetic views to the initial one, which results in the 3D pose. Quantitative experiments are performed on the Human Eva-I dataset and qualitatively on unconstrained images from the Image Parse dataset. The results show the robustness of the proposed approach to accurately reconstruct the 3D pose form a single image.
- Published
- 2013
- Full Text
- View/download PDF
20. Relative Body Parts Movement for Automatic Depression Analysis
- Author
-
Abhinav Dhall, Roland Goecke, Jyoti Joshi, and Jeffrey F. Cohn
- Subjects
Motion analysis ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Facial recognition system ,Motion (physics) ,Support vector machine ,ComputingMethodologies_PATTERNRECOGNITION ,Bag-of-words model ,Motion estimation ,Computer vision ,Artificial intelligence ,Affective computing ,business ,Psychology ,Pose - Abstract
In this paper, a human body part motion analysis based approach is proposed for depression analysis. Depression is a serious psychological disorder. The absence of an (automated) objective diagnostic aid for depression leads to a range of subjective biases in initial diagnosis and ongoing monitoring. Researchers in the affective computing community have approached the depression detection problem using facial dynamics and vocal prosody. Recent works in affective computing have shown the significance of body pose and motion in analysing the psychological state of a person. Inspired by these works, we explore a body parts motion based approach. Relative orientation and radius are computed for the body parts detected using the pictorial structures framework. A histogram of relative parts motion is computed. To analyse the motion on a holistic level, space-time interest points are computed and a bag of words framework is learnt. The two histograms are fused and a support vector machine classifier is trained. The experiments conducted on a clinical database, prove the effectiveness of the proposed method.
- Published
- 2013
- Full Text
- View/download PDF
21. Modeling Stress Using Thermal Facial Patterns: A Spatio-temporal Approach
- Author
-
Nandita Sharma, Abhinav Dhall, Roland Goecke, and Tom Gedeon
- Subjects
Contextual image classification ,business.industry ,Local binary patterns ,Stress recognition ,Pattern recognition ,Facial recognition system ,Support vector machine ,Histogram ,Stress (linguistics) ,Computer vision ,Artificial intelligence ,Emotion recognition ,business ,Psychology - Abstract
Stress is a serious concern facing our world today, motivating the development of better objective understanding using non-intrusive means for stress recognition. The aim for the work was to use thermal imaging of facial regions to detect stress automatically. The work uses facial regions captured in videos in thermal (TS) and visible (VS) spectrums and introduces our database ANU StressDB. It describes the experiment conducted for acquiring TS and VS videos of observers of stressed and not-stressed films for the ANU StressDB. Further, it presents an application of local binary patterns on three orthogonal planes (LBP-TOP) on VS and TS videos for stress recognition. It proposes a novel method to capture dynamic thermal patterns in histograms (HDTP) to utilize thermal and spatio-temporal characteristics associated in TS videos. Individual-independent support vector machine classifiers were developed for stress recognition. Results show that a fusion of facial patterns from VS and TS videos produced significantly better stress recognition rates than patterns from only VS or TS videos with p
- Published
- 2013
- Full Text
- View/download PDF
22. Can body expressions contribute to automatic depression analysis?
- Author
-
Jyoti Joshi, Gordon Parker, Michael Breakspear, and Roland Goecke
- Subjects
Facial expression ,business.industry ,Eye contact ,Facial recognition system ,Bag-of-words model ,Gesture recognition ,Computer vision ,Artificial intelligence ,Psychology ,business ,Cluster analysis ,Sensory cue ,Cognitive psychology ,Gesture - Abstract
Depression is one of the most common mental health disorders with strong adverse effects on personal and social functioning. The absence of any objective diagnostic aid for depression leads to a range of subjective biases in initial diagnosis and ongoing monitoring. Psychologists use various visual cues in their assessment to quantify depression such as facial expressions, eye contact and head movements. This paper studies the contribution of (upper) body expressions and gestures for automatic depression analysis. A framework based on space-time interest points and bag of words is proposed for the analysis of upper body and facial movements. Salient interest points are selected using clustering. The major contribution of this paper lies in the creation of a bag of body expressions and a bag of facial dynamics for assessing the contribution of different body parts for depression analysis. Head movement analysis is performed by selecting rigid facial fiducial points and a new histogram of head movements is proposed. The experiments are performed on real-world clinical data where video clips of patients and healthy controls are recorded during interactive interview sessions. The results show the effectiveness of the proposed system to evaluate the contribution of various body parts in depression analysis.
- Published
- 2013
- Full Text
- View/download PDF
23. The Visual Object Tracking VOT2013 challenge results
- Author
-
Mohamed El Helw, Yang Li, Lijun Cao, Samantha Yueying Lim, Behzad Bozorgtabar, Junge Zhang, Adam Gatt, Matej Kristan, Dorothy Monekosso, Karel Lebeda, ZhiHeng Niu, Dale A. Ward, Alfredo Petrosino, Paolo Remagnino, Xiaoqin Zhang, David Kearney, Jin Gao, Jianke Zhu, Weiming Hu, Ahmad Khajenezhad, Mario Edoardo Maresca, Gustavo Fernandez, Fatih Porikli, Luka Cehovin, Mei Kuan Lim, Bo Li, Ale Leonardis, Kaiqi Huang, Sebastien Wong, Jingjing Xiao, Cherkeng Heng, Roland Goecke, Georg Nebehay, Roman Pflugfelder, Rustam Stolkin, Anthony Milton, Shin'Ichi Satoh, Jiri Matas, Richard Bowden, Weihua Chen, Hamid R. Rabiee, Ali Zarezade, Sara Maher, Ali Soltani-Farani, Ahmed Salahledin, Michael Felsberg, Junliang Xing, Hakki Can Karaimer, Sebastien Poullot, Toma Vojir, Chee Seng Chan, Karaimer, Hakkı Can, Izmir Institute of Technology. Computer Engineering, 2013 IEEE International Conference on Computer Vision Workshops Sydney, New South Wales 2-8 December 2013, Kristan, Matej, Pflugfelder, Roman, Leonardis, Aleš, Matas, Jiri, Milton, Anthony Edward, Ward, Dale Andrew, Kearney, David Andrew, and Niu, Zhui Heng
- Subjects
Protocol (science) ,VOT2013 ,Computer science ,business.industry ,Visual object tracking challenge ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Machine learning ,computer.software_genre ,image motion analysis ,Field (computer science) ,computer vision ,Annan data- och informationsvetenskap ,Visualization ,Video tracking ,cameras ,Benchmark (computing) ,Eye tracking ,Object appearance ,Artificial intelligence ,business ,computer ,Other Computer and Information Science ,object tracking - Abstract
2013 14th IEEE International Conference on Computer Vision Workshops, ICCVW 2013; Sydney, NSW; Australia; 1 December 2013 through 8 December 2013, Visual tracking has attracted a significant attention in the last few decades. The recent surge in the number of publications on tracking-related problems have made it almost impossible to follow the developments in the field. One of the reasons is that there is a lack of commonly accepted annotated data-sets and standardized evaluation protocols that would allow objective comparison of different tracking methods. To address this issue, the Visual Object Tracking (VOT) workshop was organized in conjunction with ICCV2013. Researchers from academia as well as industry were invited to participate in the first VOT2013 challenge which aimed at single-object visual trackers that do not apply pre-learned models of object appearance (model-free). Presented here is the VOT2013 benchmark dataset for evaluation of single-object visual trackers as well as the results obtained by the trackers competing in the challenge. In contrast to related attempts in tracker benchmarking, the dataset is labeled per-frame by visual attributes that indicate occlusion, illumination change, motion change, size change and camera motion, offering a more systematic comparison of the trackers. Furthermore, we have designed an automated system for performing and evaluating the experiments. We present the evaluation protocol of the VOT2013 challenge and the results of a comparison of 27 trackers on the benchmark dataset. The dataset, the evaluation tools and the tracker rankings are publicly available from the challenge website (http://votchallenge. net).
- Published
- 2013
24. Regression Based Pose Estimation with Automatic Occlusion Detection and Rectification
- Author
-
Roland Goecke, Jyoti Joshi, Abhinav Dhall, and Ibrahim Radwan
- Subjects
Ground truth ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,3D pose estimation ,Object detection ,Articulated body pose estimation ,Computer graphics ,Robustness (computer science) ,Computer vision ,Artificial intelligence ,business ,Pose - Abstract
Human pose estimation is a classic problem in computer vision. Statistical models based on part-based modelling and the pictorial structure framework have been widely used recently for articulated human pose estimation. However, the performance of these models has been limited due to the presence of self-occlusion. This paper presents a learning-based framework to automatically detect and recover self-occluded body parts. We learn two different models: one for detecting occluded parts in the upper body and another one for the lower body. To solve the key problem of knowing which parts are occluded, we construct Gaussian Process Regression (GPR) models to learn the parameters of the occluded body parts from their corresponding ground truth parameters. Using these models, the pictorial structure of the occluded parts in unseen images is automatically rectified. The proposed framework outperforms a state-of-the-art pictorial structure approach for human pose estimation on 3 different datasets.
- Published
- 2012
- Full Text
- View/download PDF
25. A SSIM-based approach for finding similar facial expressions
- Author
-
Abhinav Dhall, Akshay Asthana, and Roland Goecke
- Subjects
Facial expression ,Similarity (geometry) ,business.industry ,Feature vector ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,Facial recognition system ,Expression (mathematics) ,Active appearance model ,ComputingMethodologies_PATTERNRECOGNITION ,Metric (mathematics) ,Computer vision ,Artificial intelligence ,business ,Computer facial animation ,Mathematics - Abstract
There are various scenarios where finding the most similar expression is the requirement rather than classifying one into discrete, pre-defined classes, for example, for facial expression transfer and facial expression based automatic album generation. This paper proposes a novel method for finding the most similar facial expression. Instead of the regular L2 norm distance, we investigate the use of the Structural SIMilarity (SSIM) metric for similarity comparison as a distance metric in a nearest neighbour unsupervised algorithm. The feature vectors are generated using Active Appearance Models (AAM). We also demonstrate how this technique can be extended and used for finding corresponding facial expression images across two or more subjects, which is useful in applications such as facial animation and automatic expression transfer. Person-independent facial expression performance results are shown on the Multi-PIE, FEEDTUM and AVOZES databases. We also compare the performance of the SSIM metric versus other distance metrics in a nearest neighbour search for finding the most similar facial expression to a given image.
- Published
- 2011
- Full Text
- View/download PDF
26. Emotion recognition using PHOG and LPQ features
- Author
-
Abhinav Dhall, Tom Gedeon, Roland Goecke, and Akshay Asthana
- Subjects
Contextual image classification ,business.industry ,Emotion classification ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,k-means clustering ,Pattern recognition ,Facial recognition system ,Histogram of oriented gradients ,Feature (machine learning) ,Computer vision ,Pyramid (image processing) ,Artificial intelligence ,business ,Mathematics - Abstract
We propose a method for automatic emotion recognition as part of the FERA 2011 competition. The system extracts pyramid of histogram of gradients (PHOG) and local phase quantisation (LPQ) features for encoding the shape and appearance information. For selecting the key frames, K-means clustering is applied to the normalised shape vectors derived from constraint local model (CLM) based face tracking on the image sequences. Shape vectors closest to the cluster centers are then used to extract the shape and appearance features. We demonstrate the results on the SSPNET GEMEP-FERA dataset. It comprises of both person specific and person independent partitions. For emotion classification we use support vector machine (SVM) and largest margin nearest neighbour (LMNN) and compare our results to the pre-computed FERA 2011 emotion challenge baseline.
- Published
- 2011
- Full Text
- View/download PDF
27. Pose Normalization via Learned 2D Warping for Fully Automatic Face Recognition
- Author
-
Kinh Tieu, Roland Goecke, Michael Jones, Akshay Asthana, and Tim K. Marks
- Subjects
Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Normalization (image processing) ,Pattern recognition ,3D pose estimation ,Facial recognition system ,Euler angles ,Nonlinear system ,symbols.namesake ,Kriging ,symbols ,Three-dimensional face recognition ,Computer vision ,Artificial intelligence ,Image warping ,business - Abstract
We present a novel approach to pose-invariant face recognition that handles continuous pose variations, is not database-specific, and achieves high accuracy without any manual intervention. Our method uses multidimensional Gaussian process regression to learn a nonlinear mapping function from the 2D shapes of faces at any non-frontal pose to the corresponding 2D frontal face shapes. We use this mapping to take an input image of a new face at an arbitrary pose and pose-normalize it, generating a synthetic frontal image of the face that is then used for recognition. Our fully automatic system for face recognition includes automatic methods for extracting 2D facial feature points and accurately estimating 3D head pose, and this information is used as input to the 2D pose-normalization algorithm. The current system can handle pose variation up to 45 degrees to the left or right (yaw angle) and up to 30 degrees up or down (pitch angle). The system demonstrates high accuracy in recognition experiments on the CMU-PIE, USF 3D, and Multi-PIE databases, showing excellent generalization across databases and convincingly outperforming other automatic methods.
- Published
- 2011
- Full Text
- View/download PDF
28. Linear Facial Expression Transfer with Active Appearance Models
- Author
-
Miles de la Hunty, Roland Goecke, and Akshay Asthana
- Subjects
Computer graphics ,Facial expression ,Computer science ,business.industry ,Face (geometry) ,Computer vision ,Artificial intelligence ,business ,Facial recognition system ,Expression (mathematics) ,Active appearance model - Abstract
The issue of transferring facial expressions from one person's face to another's has been an area of interest for the movie industry and the computer graphics community for quite some time. In recent years, with the proliferation of online image and video collections and web applications, such as Google Street View, the question of preserving privacy through face de-identification has gained interest in the computer vision community. In this paper, we focus on the problem of real-time dynamic facial expression transfer using an Active Appearance Model framework. We provide a theoretical foundation for a generalisation of two well-known expression transfer methods and demonstrate the improved visual quality of the proposed linear extrapolation transfer method on examples of face swapping and expression transfer using the AVOZES data corpus. Realistic talking faces can be generated in real-time at low computational cost.
- Published
- 2010
- Full Text
- View/download PDF
29. Facial Expression Based Automatic Album Creation
- Author
-
Abhinav Dhall, Akshay Asthana, and Roland Goecke
- Subjects
Facial expression ,Qualitative analysis ,Index (publishing) ,Structural similarity ,Computer science ,business.industry ,Human visual system model ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Computer vision ,Artificial intelligence ,business ,Image (mathematics) ,Active appearance model - Abstract
With simple cost effective imaging solutions being widely available these days, there has been an enormous rise in the number of images consumers have been taking. Due to this increase, searching, browsing and managing images in multi-media systems has become more complex. One solution to this problem is to divide images into albums for meaningful and effective browsing. We propose a novel automated, expression driven image album creation for consumer image management systems. The system groups images with faces having similar expressions into albums. Facial expressions of the subjects are grouped into albums by the Structural Similarity Index measure, which is based on the theory on how easily the human visual system can extract the shape information of a scene. We also propose a search by similar expression, in which the user can create albums by providing example facial expression images. A qualitative analysis of the performance of the system is presented on the basis of a user study.
- Published
- 2010
- Full Text
- View/download PDF
30. Automatic frontal face annotation and AAM building for arbitrary expressions from a single frontal image only
- Author
-
Akshay Asthana, Roland Goecke, and Asim Khwaja
- Subjects
Facial expression ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Image registration ,Animation ,Facial recognition system ,Object detection ,Active appearance model ,Virtual image ,Face (geometry) ,Computer vision ,Artificial intelligence ,business ,Computer animation ,Computer facial animation - Abstract
In recent years, statistically motivated approaches for the registration and tracking of non-rigid objects, such as the Active Appearance Model (AAM), have become very popular. A major drawback of these approaches is that they require manual annotation of all training images which can be tedious and error prone. In this paper, a MPEG-4 based approach for the automatic annotation of frontal face images, having any arbitrary facial expression, from a single annotated frontal image is presented. This approach utilises the MPEG-4 based facial animation system to generate virtual images having different expressions and uses the existing AAM framework to automatically annotate unseen images. The approach demonstrates an excellent generalisability by automatically annotating face images from two different databases.
- Published
- 2009
- Full Text
- View/download PDF
31. Learning based automatic face annotation for arbitrary poses and expressions from frontal images only
- Author
-
Novi Quadrianto, Roland Goecke, Tom Gedeon, and Akshay Asthana
- Subjects
Facial expression ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,Facial recognition system ,Active appearance model ,Virtual image ,Active shape model ,Face (geometry) ,Computer vision ,Artificial intelligence ,Image warping ,business ,Pose - Abstract
Statistical approaches for building non-rigid deformable models, such as the active appearance model (AAM), have enjoyed great popularity in recent years, but typically require tedious manual annotation of training images. In this paper, a learning based approach for the automatic annotation of visually deformable objects from a single annotated frontal image is presented and demonstrated on the example of automatically annotating face images that can be used for building AAMs for fitting and tracking. This approach employs the idea of initially learning the correspondences between landmarks in a frontal image and a set of training images with a face in arbitrary poses. Using this learner, virtual images of unseen faces at any arbitrary pose for which the learner was trained can be reconstructed by predicting the new landmark locations and warping the texture from the frontal image. View-based AAMs are then built from the virtual images and used for automatically annotating unseen images, including images of different facial expressions, at any random pose within the maximum range spanned by the virtually reconstructed images. The approach is experimentally validated by automatically annotating face images from three different databases.
- Published
- 2009
- Full Text
- View/download PDF
32. Biologically Inspired Contrast Enhancement Using Asymmetric Gain Control
- Author
-
Roland Goecke and Asim Khwaja
- Subjects
Pixel ,business.industry ,Computer science ,media_common.quotation_subject ,Iterative reconstruction ,Luminance ,Receptive field ,Human visual system model ,Automatic gain control ,Contrast (vision) ,Computer vision ,Adaptive histogram equalization ,Artificial intelligence ,business ,media_common - Abstract
A neuro-physiologically inspired model is presented for the contrast enhancement of images. The contrast of an image is calculated using simulated on- and off-centre receptive fields whereby obtaining the corresponding two contrast maps. We propose an adaptive asymmetric gain control function that is applied to the two contrast maps which are then used to reconstruct the image resulting in its contrast enhancement. The image's mean luminance can be adjusted as desired by adjusting the asymmetricity between the gain control factors of the two maps. The model performs local contrast enhancement in the contrast domain of an image where it lends itself very naturally to such adjustments. Furthermore, the model is extended on to colour images using the concept of colour-opponent receptive fields found in the human visual system. The colour model enhances the contrast right in the colour space without extracting the luminance information from it. Being neuro-physiologically plausible, this model can be beneficial in theorising and understanding the gain control mechanisms in the primate visual system. We compare our results with the CLAHE algorithm.
- Published
- 2009
- Full Text
- View/download PDF
33. A Quadratic Deformation Model for Facial Expression Recognition
- Author
-
Mohammad Obaid, Mark Billinghurst, Roland Goecke, Hartmut Seichter, Ramakrishnan Mukundan, Digital Image Computing: Techniques and Applications (DICTA) Melbourne, Australia 1-3 December 2009, Obaid, M, Mukundany, R, Goecke, R, Billinghurst, M, and Seichter, H
- Subjects
quadratic deformation models ,Facial expression ,active appearance models ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Facial recognition system ,Active appearance model ,Euclidean distance ,Quadratic equation ,Feature (computer vision) ,facial expression recognition ,Computer vision ,Artificial intelligence ,business ,Computer facial animation ,Computer animation ,facial feature tracking ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
In this paper, we propose a novel approach for recognizing facial expressions based on using an active appearance model facial feature tracking system with the quadratic deformation model representations of facial expressions. Thirty seven facial feature points are tracked based on the MPEG-4 facial animation parameters layout. The proposed approach relies on the Euclidean distance measures between the tracked feature points and the reference deformed facial feature points of the six main expressions (smile, sad, fear, disgust, surprise, and anger). An evaluation of 30 model subjects, selected randomly from the Cohn-Kanade database, was carried out. Results show that the main six facial expressions can successfully be recognized with an overall recognition accuracy of 89%. The proposed approach yields to promising recognition rates and can be used in real time applications. Refereed/Peer-reviewed
- Published
- 2009
- Full Text
- View/download PDF
34. Learning-based Face Synthesis for Pose-Robust Recognition from Single Image
- Author
-
Conrad Sanderson, Tamás D. Gedeon, Akshay Asthana, and Roland Goecke
- Subjects
Landmark ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,Facial recognition system ,Active appearance model ,ComputingMethodologies_PATTERNRECOGNITION ,Histogram ,Face (geometry) ,Three-dimensional face recognition ,Computer vision ,Artificial intelligence ,Focus (optics) ,Face detection ,business - Abstract
Face recognition in real-world conditions requires the ability to deal with a number of conditions, such as variations in pose, illumination and expression. In this paper, we focus on variations in head pose and use a computationally efficient regression-based approach for synthesising face images in different poses, which are used to extend the face recognition training set. In this data-driven approach, the correspondences between facial landmark points in frontal and non-frontal views are learnt offline from manually annotated training data via Gaussian Process Regression. We then use this learner to synthesise non-frontal face images from any unseen frontal image. To demonstrate the utility of this approach, two frontal face recognition systems (the commonly used PCA and the recent Multi-Region Histograms) are augmented with synthesised non-frontal views for each person. This synthesis and augmentation approach is experimentally validated on the FERET dataset, showing a considerable improvement in recognition rates for ±40◦ and ±60◦ views, while maintaining high recognition rates for ±15◦ and ±25◦ views.
- Published
- 2009
- Full Text
- View/download PDF
35. Optical flow estimation using Fourier Mellin Transform
- Author
-
Huy Tho Ho and Roland Goecke
- Subjects
business.industry ,Optical flow ,Image registration ,Image processing ,Filter (signal processing) ,symbols.namesake ,Fourier transform ,Motion estimation ,Phase correlation ,symbols ,Computer vision ,Artificial intelligence ,business ,Algorithm ,Smoothing ,Mathematics - Abstract
In this paper, we propose a novel method of computing the optical flow using the Fourier Mellin Transform (FMT). Each image in a sequence is divided into a regular grid of patches and the optical flow is estimated by calculating the phase correlation of each pair of co-sited patches using the FMT. By applying the FMT in calculating the phase correlation, we are able to estimate not only the pure translation, as limited in the case of the basic phase correlation techniques, but also the scale and rotation motion of image patches, i.e. full similarity transforms. Moreover, the motion parameters of each patch can be estimated to sub-pixel accuracy based on a recently proposed algorithm that uses a 2D esinc function in fitting the data from the phase correlation output. We also improve the estimation of the optical flow by presenting a method of smoothing the field by using a vector weighted average filter. Finally, experimental results, using publicly available data sets are presented, demonstrating the accuracy and improvements of our method over previous optical flow methods.
- Published
- 2008
- Full Text
- View/download PDF
36. Image Reconstruction from Contrast Information
- Author
-
Asim Khwaja and Roland Goecke
- Subjects
Difference of Gaussians ,Pixel ,Iterative method ,business.industry ,media_common.quotation_subject ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Iterative reconstruction ,Computer Science::Computer Vision and Pattern Recognition ,Contrast (vision) ,Computer vision ,Deconvolution ,Artificial intelligence ,business ,Gradient descent ,Algorithm ,Image restoration ,Mathematics ,media_common - Abstract
An iterative algorithm for the reconstruction of natural images given only their contrast map is presented. The solution is neuro-physiologically inspired, where the retinal cells, for the most part, transfer only the contrast information to the cortex, which at some stage performs reconstruction for perception. We provide an image reconstruction algorithm based on least squares error minimization using gradient descent as well as its corresponding Bayesian framework for the underlying problem. Starting from an initial image, we compute its contrast map using the Difference of Gaussians (DoG) operator at each iteration, which is then compared to the contrast map of the original image generating a contrast error map. This contrast map is processed by a non-linearity to deal with saturation effects. Pixel values are then updated proportionally to the resulting contrast errors. Using a least squares error measure, the result is a convex error surface with a single minimum, thus providing consistent convergence. Our experiments show that the algorithm's convergence is robust to initial conditions but not the performance. A good initial estimate results in faster convergence. Finally, an extension of the algorithm to colour images is presented. We test our algorithm on images from the COREL public image database. The paper provides a novel approach to manipulating an image in its contrast domain.
- Published
- 2008
- Full Text
- View/download PDF
37. Quaternion Potential Functions for a Colour Image Completion Method Using Markov Random Fields
- Author
-
Roland Goecke and Huy Tho Ho
- Subjects
Hypercomplex number ,Markov chain ,business.industry ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Image processing ,Image segmentation ,Digital image ,Computer vision ,Artificial intelligence ,Hidden Markov model ,business ,Quaternion ,Algorithm ,Mathematics - Abstract
An exemplar-based algorithm has been proposed recently to solve the image completion problem by using a discrete global optimisation strategy based on Markov Random Fields. We can apply this algorithm to the task of completing colour images by processing the three colour channels separately and combining the results. However, this approach does not capture the correlations across the colour layers and, thus, may miss out on information important to the completion process. In this paper, we introduce the use of quaternions or hypercomplex numbers in estimating the potential functions for the image completion algorithm. The potential functions are calculated by correlating quaternion image patches based on the recently developed concepts of quaternion Fourier transform and quaternion correlation. Experimental results are presented for image completion which evidence improvements of the proposed approach over the monochromatic model.
- Published
- 2007
- Full Text
- View/download PDF
38. Monocular and Stereo Methods for AAM Learning from Video
- Author
-
Roland Goecke and Jason Saragih
- Subjects
Monocular ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Active appearance model ,Stereopsis ,Visual Objects ,Computer vision ,Artificial intelligence ,Fundamental matrix (computer vision) ,business ,computer ,ComputingMethodologies_COMPUTERGRAPHICS ,computer.programming_language - Abstract
The active appearance model (AAM) is a powerful method for modeling deformable visual objects. One of the major drawbacks of the AAM is that it requires a training set of pseudo-dense correspondences over the whole database. In this work, we investigate the utility of stereo constraints for automatic model building from video. First, we propose a new method for automatic correspondence finding in monocular images which is based on an adaptive template tracking paradigm. We then extend this method to take the scene geometry into account, proposing three approaches, each accounting for the availability of the fundamental matrix and calibration parameters or the lack thereof. The performance of the monocular method was first evaluated on a pre-annotated database of a talking face. We then compared the monocular method against its three stereo extensions using a stereo database.
- Published
- 2007
- Full Text
- View/download PDF
39. Fast voxel-based 2D/3D registration algorithm using a volume rendering method based on the shear-warp factorization
- Author
-
Paul Antoon Cyriel Desmedt, Graeme P. Penney, Juergen Weese, Roland Goecke, Heidrun Schumann, and Thorsten M. Buzug
- Subjects
Computer science ,business.industry ,Image registration ,Volume rendering ,Similarity measure ,computer.software_genre ,Imaging phantom ,Rendering (computer graphics) ,Voxel ,Computer vision ,Artificial intelligence ,business ,Algorithm ,Image resolution ,computer ,Image restoration - Abstract
2D/3D registration makes it possible to use pre-operative CT scans for navigation purposes during X-ray fluoroscopy guided interventions. We present a fast voxel-based method for this registration task, which uses a recently introduced similarity measure (pattern intensity). This measure is especially suitable for 2D/3D registration, because it is robust with respect to structures such as a stent visible in the X-ray fluoroscopy image but not in the CT scan. The method uses only a part of the CT scan for the generation of digitally reconstructed radiographs (DRRs) to accelerate their computation. Nevertheless, computation time is crucial for intra-operative application and a further speed-up is required, because numerous DRRs must be computed. For that reason, the suitability of different volume rendering methods for 2D/3D registration has been investigated. A method based on the shear-warp factorization of the viewing transformation turned out to be especially suitable and builds the basis of the registration algorithm. The algorithm has been applied to images of a spine phantom and to clinical images. For comparison, registration results have been calculated using ray-casting. The shear-warp factorization based rendering method accelerates registration by a factor of up to seven compared to ray-casting without degrading registration accuracy. Using a vertebra as feature for registration, computation time is in the range of 3-4s (Sun UltraSparc, 300 MHz) which is acceptable for intra-operative application.
- Published
- 1999
- Full Text
- View/download PDF
40. Towards detection and tracking of on-road objects
- Author
-
Niklas Pettersson, Roland Goecke, and Lars Petersson
- Subjects
Engineering ,Boosting (machine learning) ,Pixel ,business.industry ,Robustness (computer science) ,Road surface ,Video tracking ,Computer vision ,AdaBoost ,Artificial intelligence ,business ,Phase detector ,Object detection - Abstract
In this paper, we present a system capable of detecting and tracking on-road objects in the scene, in particular vehicles. Such a system is a useful part of a driver assistance system. This system employs two different techniques in the detection phase to increase the robustness. A large part of this paper is devoted to reducing the computational amount required of the overall algorithm by quickly excluding pixels above the horizon and on the road surface. The number of pixels that require further, computationally expensive processing is reduced by up to 65% in the sequences used in the experimental evaluation. Objects are detected in the remaining image areas by an improved boosting approach of weak classifiers based on the well-known AdaBoost and RealBoost approaches. The tracking is then done by a combination of periodically running the detection algorithm, while using adaptable templates at other times which allow for changes in shape and appearance as the car and the other vehicles travel along the road.
41. Visual vehicle egomotion estimation using the Fourier-Mellin Transform
- Author
-
Akshay Asthana, Lars Petersson, Roland Goecke, and N. Pettersson
- Subjects
Sequence ,Pixel ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Optical flow ,Image registration ,Translation (geometry) ,symbols.namesake ,Fourier transform ,Motion estimation ,symbols ,Computer vision ,Artificial intelligence ,business ,Rotation (mathematics) - Abstract
This paper is concerned with the problem of estimating the motion of a single camera from a sequence of images, with an application scenario of vehicle egomotion estimation. Egomotion estimation has been an active area of research for many years and various solutions to the problem have been proposed. Many methods rely on optical flow or local image features to establish the spatial relationship between two images. A new method of egomotion estimation is presented which makes use of the Fourier-Mellin Transform for registering images in a video sequence, from which the rotation and translation of the camera motion can be estimated. The Fourier-Mellin Transform provides an accurate and efficient way of computing the camera motion parameters. It is a global method that takes the contributions from all pixels into account. The performance of the proposed approach is compared to two variants of optical flow methods and results are presented for a real-world video sequence taken from a moving vehicle.
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.