476 results on '"Valstar, Michel"'
Search Results
202. Digital innovations in L2 motivation: harnessing the power of the Ideal L2 Self
- Author
-
Adolphs, Svenja, Clark, Leigh, Glover, Tony, Henry, Alastair, Muir, Christine, and Valstar, Michel
- Subjects
motivation ,digital animation ,computer-assisted language learning ,avatar ,ideal self - Abstract
Sustained motivation is crucial to learning a second language (L2), and one way to support this can be through the mental visualisation of ideal L2 selves (Dörnyei & Kubanyiova, 2014). This paper reports on an exploratory study which investigated the possibility of using technology to create representations of language learners’ ideal L2 selves digitally. Nine Chinese learners of L2 English were invited to three semi-structured interviews to discuss their ideal L2 selves and their future language goals, as well as their opinions on several different technological approaches to representing their ideal L2 selves. Three approaches were shown to participants: (a) 2D and 3D animations, (b) Facial Overlay, and (c) Facial Mask. Within these, several iterations were also included (e.g. with/without background or context). Results indicate that 3D animation currently offers the best approach in terms of realism and animation of facial features, and improvements to Facial Overlay could lead to beneficial results in the future. Approaches using the 2D animations and the Facial Mask approach appeared to have little future potential. The descriptive details of learners’ ideal L2 selves also provide preliminary directions for the development of content that might be included in future technology-based interventions.
203. Digital innovations in L2 motivation: harnessing the power of the Ideal L2 Self
- Author
-
Adolphs, Svenja, Clark, Leigh, Dörnyei, Zoltán, Glover, Tony, Henry, Alastair, Muir, Christine, Sánchez-Lozano, Enrique, Valstar, Michel, Adolphs, Svenja, Clark, Leigh, Dörnyei, Zoltán, Glover, Tony, Henry, Alastair, Muir, Christine, Sánchez-Lozano, Enrique, and Valstar, Michel
- Abstract
Sustained motivation is crucial to learning a second language (L2), and one way to support this can be through the mental visualisation of ideal L2 selves (Dörnyei & Kubanyiova, 2014). This paper reports on an exploratory study which investigated the possibility of using technology to create representations of language learners’ ideal L2 selves digitally. Nine Chinese learners of L2 English were invited to three semi-structured interviews to discuss their ideal L2 selves and their future language goals, as well as their opinions on several different technological approaches to representing their ideal L2 selves. Three approaches were shown to participants: (a) 2D and 3D animations, (b) Facial Overlay, and (c) Facial Mask. Within these, several iterations were also included (e.g. with/without background or context). Results indicate that 3D animation currently offers the best approach in terms of realism and animation of facial features, and improvements to Facial Overlay could lead to beneficial results in the future. Approaches using the 2D animations and the Facial Mask approach appeared to have little future potential. The descriptive details of learners’ ideal L2 selves also provide preliminary directions for the development of content that might be included in future technology-based interventions.
- Full Text
- View/download PDF
204. Objective methods for reliable detection of concealed depression
- Author
-
Solomon, Cynthia, Valstar, Michel F., Morriss, Richard K., Crowe, John, Solomon, Cynthia, Valstar, Michel F., Morriss, Richard K., and Crowe, John
- Abstract
Recent research has shown that it is possible to automatically detect clinical depression from audio-visual recordings. Before considering integration in a clinical pathway, a key question that must be asked is whether such systems can be easily fooled. This work explores the potential of acoustic features to detect clinical depression in adults both when acting normally and when asked to conceal their depression. Nine adults diagnosed with mild to moderate depression as per the Beck Depression Inventory (BDI-II) and Patient Health Questionnaire (PHQ-9) were asked a series of questions and to read a excerpt from a novel aloud under two different experimental conditions. In one, participants were asked to act naturally and in the other, to suppress anything that they felt would be indicative of their depression. Acoustic features were then extracted from this data and analysed using paired t-tests to determine any statistically significant differences between healthy and depressed participants. Most features that were found to be significantly different during normal behaviour remained so during concealed behaviour. In leave-one-subject-out automatic classification studies of the 9 depressed subjects and 8 matched healthy controls, an 88% classification accuracy and 89% sensitivity was achieved. Results remained relatively robust during concealed behaviour, with classifiers trained on only non-concealed data achieving 81% detection accuracy and 75% sensitivity when tested on concealed data. These results indicate there is good potential to build deception-proof automatic depression monitoring systems.
- Full Text
- View/download PDF
205. Objective methods for reliable detection of concealed depression
- Author
-
Solomon, Cynthia, Valstar, Michel F., Morriss, Richard K., Crowe, John, Solomon, Cynthia, Valstar, Michel F., Morriss, Richard K., and Crowe, John
- Abstract
Recent research has shown that it is possible to automatically detect clinical depression from audio-visual recordings. Before considering integration in a clinical pathway, a key question that must be asked is whether such systems can be easily fooled. This work explores the potential of acoustic features to detect clinical depression in adults both when acting normally and when asked to conceal their depression. Nine adults diagnosed with mild to moderate depression as per the Beck Depression Inventory (BDI-II) and Patient Health Questionnaire (PHQ, Chang, 2012) were asked a series of questions and to read a excerpt from a novel aloud under two different experimental conditions. In one, participants were asked to act naturally and in the other, to suppress anything that they felt would be indicative of their depression. Acoustic features were then extracted from this data and analyzed using paired t-tests to determine any statistically significant differences between healthy and depressed participants. Most features that were found to be significantly different during normal behavior remained so during concealed behavior. In leave-one-subject-out automatic classification studies of the 9 depressed subjects and 8 matched healthy controls, an 88% classification accuracy and 89% sensitivity was achieved. Results remained relatively robust during concealed behavior, with classifiers trained on only non-concealed data achieving 81% detection accuracy and 75% sensitivity when tested on concealed data. These results indicate there is good potential to build deception-proof automatic depression monitoring systems.
- Full Text
- View/download PDF
206. Objective methods for reliable detection of concealed depression
- Author
-
Solomon, Cynthia, Valstar, Michel F., Morriss, Richard K., Crowe, John, Solomon, Cynthia, Valstar, Michel F., Morriss, Richard K., and Crowe, John
- Abstract
Recent research has shown that it is possible to automatically detect clinical depression from audio-visual recordings. Before considering integration in a clinical pathway, a key question that must be asked is whether such systems can be easily fooled. This work explores the potential of acoustic features to detect clinical depression in adults both when acting normally and when asked to conceal their depression. Nine adults diagnosed with mild to moderate depression as per the Beck Depression Inventory (BDI-II) and Patient Health Questionnaire (PHQ-9) were asked a series of questions and to read a excerpt from a novel aloud under two different experimental conditions. In one, participants were asked to act naturally and in the other, to suppress anything that they felt would be indicative of their depression. Acoustic features were then extracted from this data and analysed using paired t-tests to determine any statistically significant differences between healthy and depressed participants. Most features that were found to be significantly different during normal behaviour remained so during concealed behaviour. In leave-one-subject-out automatic classification studies of the 9 depressed subjects and 8 matched healthy controls, an 88% classification accuracy and 89% sensitivity was achieved. Results remained relatively robust during concealed behaviour, with classifiers trained on only non-concealed data achieving 81% detection accuracy and 75% sensitivity when tested on concealed data. These results indicate there is good potential to build deception-proof automatic depression monitoring systems.
- Full Text
- View/download PDF
207. Digital innovations in L2 motivation: harnessing the power of the Ideal L2 Self
- Author
-
Adolphs, Svenja, Clark, Leigh, Dörnyei, Zoltán, Glover, Tony, Henry, Alastair, Muir, Christine, Sánchez-Lozano, Enrique, Valstar, Michel, Adolphs, Svenja, Clark, Leigh, Dörnyei, Zoltán, Glover, Tony, Henry, Alastair, Muir, Christine, Sánchez-Lozano, Enrique, and Valstar, Michel
- Abstract
Sustained motivation is crucial to learning a second language (L2), and one way to support this can be through the mental visualisation of ideal L2 selves (Dörnyei & Kubanyiova, 2014). This paper reports on an exploratory study which investigated the possibility of using technology to create representations of language learners’ ideal L2 selves digitally. Nine Chinese learners of L2 English were invited to three semi-structured interviews to discuss their ideal L2 selves and their future language goals, as well as their opinions on several different technological approaches to representing their ideal L2 selves. Three approaches were shown to participants: (a) 2D and 3D animations, (b) Facial Overlay, and (c) Facial Mask. Within these, several iterations were also included (e.g. with/without background or context). Results indicate that 3D animation currently offers the best approach in terms of realism and animation of facial features, and improvements to Facial Overlay could lead to beneficial results in the future. Approaches using the 2D animations and the Facial Mask approach appeared to have little future potential. The descriptive details of learners’ ideal L2 selves also provide preliminary directions for the development of content that might be included in future technology-based interventions.
- Full Text
- View/download PDF
208. Automatic detection of ADHD and ASD from expressive behaviour in RGBD data
- Author
-
Jaiswal, Shashank, Valstar, Michel F., Gillott, Alinda, Daley, David, Jaiswal, Shashank, Valstar, Michel F., Gillott, Alinda, and Daley, David
- Abstract
Attention Deficit Hyperactivity Disorder (ADHD) and Autism Spectrum Disorder (ASD) are neurodevelopmental conditions which impact on a significant number of children and adults. Currently, the diagnosis of such disorders is done by experts who employ standard questionnaires and look for certain behavioural markers through manual observation. Such methods for their diagnosis are not only subjective, difficult to repeat, and costly but also extremely time consuming. In this work, we present a novel methodology to aid diagnostic predictions about the presence/absence of ADHD and ASD by automatic visual analysis of a person's behaviour. To do so, we conduct the questionnaires in a computer-mediated way while recording participants with modern RGBD (Colour+Depth) sensors. In contrast to previous automatic approaches which have focussed only on detecting certain behavioural markers, our approach provides a fully automatic end-to-end system to directly predict ADHD and ASD in adults. Using state of the art facial expression analysis based on Dynamic Deep Learning and 3D analysis of behaviour, we attain classification rates of 96% for Controls vs Condition (ADHD/ASD) groups and 94% for Comorbid (ADHD+ASD) vs ASD only group. We show that our system is a potentially useful time saving contribution to the clinical diagnosis of ADHD and ASD.
209. Digital innovations in L2 motivation: harnessing the power of the Ideal L2 Self
- Author
-
Adolphs, Svenja, Clark, Leigh, Dörnyei, Zoltán, Glover, Tony, Henry, Alastair, Muir, Christine, Sánchez-Lozano, Enrique, Valstar, Michel, Adolphs, Svenja, Clark, Leigh, Dörnyei, Zoltán, Glover, Tony, Henry, Alastair, Muir, Christine, Sánchez-Lozano, Enrique, and Valstar, Michel
- Abstract
Sustained motivation is crucial to learning a second language (L2), and one way to support this can be through the mental visualisation of ideal L2 selves (Dörnyei & Kubanyiova, 2014). This paper reports on an exploratory study which investigated the possibility of using technology to create representations of language learners’ ideal L2 selves digitally. Nine Chinese learners of L2 English were invited to three semi-structured interviews to discuss their ideal L2 selves and their future language goals, as well as their opinions on several different technological approaches to representing their ideal L2 selves. Three approaches were shown to participants: (a) 2D and 3D animations, (b) Facial Overlay, and (c) Facial Mask. Within these, several iterations were also included (e.g. with/without background or context). Results indicate that 3D animation currently offers the best approach in terms of realism and animation of facial features, and improvements to Facial Overlay could lead to beneficial results in the future. Approaches using the 2D animations and the Facial Mask approach appeared to have little future potential. The descriptive details of learners’ ideal L2 selves also provide preliminary directions for the development of content that might be included in future technology-based interventions.
- Full Text
- View/download PDF
210. Human behaviour-based automatic depression analysis using hand-crafted statistics and deep learned spectral features
- Author
-
Song, Siyang, Shen, Linlin, Valstar, Michel F., Song, Siyang, Shen, Linlin, and Valstar, Michel F.
- Abstract
Depression is a serious mental disorder that affects millions of people all over the world. Traditional clinical diagnosis methods are subjective, complicated and need extensive participation of experts. Audio-visual automatic depression analysis systems predominantly base their predictions on very brief sequential segments, sometimes as little as one frame. Such data contains much redundant information, causes a high computational load, and negatively affects the detection accuracy. Final decision making at the sequence level is then based on the fusion of frame or segment level predictions. However, this approach loses longer term behavioural correlations, as the behaviours themselves are abstracted away by the frame-level predictions. We propose to on the one hand use automatically detected human behaviour primitives such as Gaze directions, Facial action units (AU), etc. as low-dimensional multi-channel time series data, which can then be used to create two sequence descriptors. The first calculates the sequence-level statistics of the behaviour primitives and the second casts the problem as a Convolutional Neural Network problem operating on a spectral representation of the multichannel behaviour signals. The results of depression detection (binary classification) and severity estimation (regression) experiments conducted on the AVEC 2016 DAIC-WOZ database show that both methods achieved significant improvement compared to the previous state of the art in terms of the depression severity estimation.
211. Predicting folds in poker using action unit detectors and decision trees
- Author
-
Vinkemeier, Doratha, Valstar, Michel F., Gratch, Jonathan, Vinkemeier, Doratha, Valstar, Michel F., and Gratch, Jonathan
- Abstract
Predicting how a person will respond can be very useful, for instance when designing a strategy for negotiations. We investigate whether it is possible for machine learning and computer vision techniques to recognize a person’s intentions and predict their actions based on their visually expressive behaviour, where in this paper we focus on the face. We have chosen as our setting pairs of humans playing a simplified version of poker, where the players are behaving naturally and spontaneously, albeit mediated through a computer connection. In particular, we ask if we can automatically predict whether a player is going to fold or not. We also try to answer the question of at what time point the signal for predicting if a player will fold is strongest. We use state-of-the-art FACS Action Unit detectors to automatically annotate the players facial expressions, which have been recorded on video. In addition, we use timestamps of when the player received their card and when they placed their bets, as well as the amounts they bet. Thus, the system is fully automated. We are able to predict whether a person will fold or not significantly better than chance based solely on their expressive behaviour starting three seconds before they fold.
212. ChaLearn Looking at People and Faces of the World: Face AnalysisWorkshop and Challenge 2016
- Author
-
Escalera, Sergio, Torres, Mercedes Torres, Martinez, Brais, Baro, Xavier, Escalante, Hugo Jair, Guyon, Isabelle, Tzimiropoulos, Georgios, Corneanu, Ciprian, Oliu, Marc, Bagheri, Mohammad Ali, Valstar, Michel, Escalera, Sergio, Torres, Mercedes Torres, Martinez, Brais, Baro, Xavier, Escalante, Hugo Jair, Guyon, Isabelle, Tzimiropoulos, Georgios, Corneanu, Ciprian, Oliu, Marc, Bagheri, Mohammad Ali, and Valstar, Michel
- Abstract
We present the 2016 ChaLearn Looking at People and Faces of the World Challenge and Workshop, which ran three competitions on the common theme of face analysis from still images. The first one, Looking at People, addressed age estimation, while the second and third competitions, Faces of the World, addressed accessory classification and smile and gender classification, respectively. We present two crowd-sourcing methodologies used to collect manual annotations. A custom-build application was used to collect and label data about the apparent age of people (as opposed to the real age). For the Faces of the World data, the citizen-science Zooniverse platform was used. This paper summarizes the three challenges and the data used, as well as the results achieved by the participants of the competitions. Details of the ChaLearn LAP FotW competitions can be found at http://gesture.chalearn.org.
213. A functional regression approach to facial landmark tracking
- Author
-
Sánchez-Lozano, Enrique, Tzimiropoulos, Georgios, Martinez, Brais, De la Torre, Fernando, Valstar, Michel, Sánchez-Lozano, Enrique, Tzimiropoulos, Georgios, Martinez, Brais, De la Torre, Fernando, and Valstar, Michel
- Abstract
Linear regression is a fundamental building block in many face detection and tracking algorithms, typically used to predict shape displacements from image features through a linear mapping. This paper presents a Functional Regression solution to the least squares problem, which we coin Continuous Regression, resulting in the first real-time incremental face tracker. Contrary to prior work in Functional Regression, in which B-splines or Fourier series were used, we propose to approximate the input space by its first-order Taylor expansion, yielding a closed-form solution for the continuous domain of displacements. We then extend the continuous least squares problem to correlated variables, and demonstrate the generalisation of our approach. We incorporate Continuous Regression into the cascaded regression framework, and show its computational benefits for both training and testing. We then present a fast approach for incremental learning within Cascaded Continuous Regression, coined iCCR, and show that its complexity allows real-time face tracking, being 20 times faster than the state of the art. To the best of our knowledge, this is the first incremental face tracker that is shown to operate in real-time. We show that iCCR achieves state-of-the-art performance on the 300-VW dataset, the most recent, large-scale benchmark for face tracking.
- Full Text
- View/download PDF
214. AVEC 2017--Real-life depression, and affect recognition workshop and challenge
- Author
-
Ringeval, Fabien, Schuller, Björn, Valstar, Michel, Gratch, Jonathan, Cowie, Roddy, Scherer, Stefan, Mozgai, Sharon, Cummins, Nicholas, Schmitt, Maximilian, Pantic, Maja, Ringeval, Fabien, Schuller, Björn, Valstar, Michel, Gratch, Jonathan, Cowie, Roddy, Scherer, Stefan, Mozgai, Sharon, Cummins, Nicholas, Schmitt, Maximilian, and Pantic, Maja
- Abstract
The Audio/Visual Emotion Challenge and Workshop (AVEC 2017) “Real-life depression, and affect” will be the seventh competition event aimed at comparison of multimedia processing and machine learning methods for automatic audiovisual depression and emotion analysis, with all participants competing under strictly the same conditions. .e goal of the Challenge is to provide a common benchmark test set for multimodal information processing and to bring together the depression and emotion recognition communities, as well as the audiovisual processing communities, to compare the relative merits of the various approaches to depression and emotion recognition from real-life data. .is paper presents the novelties introduced this year, the challenge guidelines, the data used, and the performance of the baseline system on the two proposed tasks: dimensional emotion recognition (time and value-continuous), and dimensional depression estimation (value-continuous).
215. The NoXi database: multimodal recordings of mediated novice-expert interactions
- Author
-
Cafaro, Angelo, Wagner, Johannes, Baur, Tobias, Dermouche, Soumia, Torres, Mercedes Torres, Pelachaud, Catherine, André, Elisabeth, Valstar, Michel F., Cafaro, Angelo, Wagner, Johannes, Baur, Tobias, Dermouche, Soumia, Torres, Mercedes Torres, Pelachaud, Catherine, André, Elisabeth, and Valstar, Michel F.
- Abstract
We present a novel multi-lingual database of natural dyadic novice-expert interactions, named NoXi, featuring screen-mediated dyadic human interactions in the context of information exchange and retrieval. NoXi is designed to provide spontaneous interactions with emphasis on adaptive behaviors and unexpected situations (e.g. conversational interruptions). A rich set of audio-visual data, as well as continuous and discrete annotations are publicly available through a web interface. Descriptors include low level social signals (e.g. gestures, smiles), functional descriptors (e.g. turn-taking, dialogue acts) and interaction descriptors (e.g. engagement, interest, and fluidity).
216. Ask Alice: an artificial retrieval of information agent
- Author
-
Valstar, Michel F., Baur, Tobias, Cafaro, Angelo, Ghitulescu, Alexandru, Potard, Blaise, Wagner, Johannes, André, Elisabeth, Durieu, Laurent, Aylett, Matthew, Dermouche, Soumia, Pelachaud, Catherine, Coutinho, Eduardo, Schuller, Björn, Zhang, Yue, Heylen, Dirk, Theune, Mariët, Waterschoot, Jelte van, Valstar, Michel F., Baur, Tobias, Cafaro, Angelo, Ghitulescu, Alexandru, Potard, Blaise, Wagner, Johannes, André, Elisabeth, Durieu, Laurent, Aylett, Matthew, Dermouche, Soumia, Pelachaud, Catherine, Coutinho, Eduardo, Schuller, Björn, Zhang, Yue, Heylen, Dirk, Theune, Mariët, and Waterschoot, Jelte van
- Abstract
We present a demonstration of the ARIA framework, a modular approach for rapid development of virtual humans for information retrieval that have linguistic, emotional, and social skills and a strong personality. We demonstrate the framework’s capabilities in a scenario where ‘Alice in Wonderland’, a popular English literature book, is embodied by a virtual human representing Alice. The user can engage in an information exchange dialogue, where Alice acts as the expert on the book, and the user as an interested novice. Besides speech recognition, sophisticated audio-visual behaviour analysis is used to inform the core agent dialogue module about the user’s state and intentions, so that it can go beyond simple chat-bot dialogue. The behaviour generation module features a unique new capability of being able to deal gracefully with interruptions of the agent.
217. Fusing deep learned and hand-crafted features of appearance, shape, and dynamics for automatic pain estimation
- Author
-
Egede, Joy Onyekachukwu, Valstar, Michel F., Martinez, Brais, Egede, Joy Onyekachukwu, Valstar, Michel F., and Martinez, Brais
- Abstract
Automatic continuous time, continuous value assessment of a patient's pain from face video is highly sought after by the medical profession. Despite the recent advances in deep learning that attain impressive results in many domains, pain estimation risks not being able to benefit from this due to the difficulty in obtaining data sets of considerable size. In this work we propose a combination of hand-crafted and deep-learned features that makes the most of deep learning techniques in small sample settings. Encoding shape, appearance, and dynamics, our method significantly outperforms the current state of the art, attaining a RMSE error of less than 1 point on a 16-level pain scale, whilst simultaneously scoring a 67.3% Pearson correlation coefficient between our predicted pain level time series and the ground truth.
218. A dynamic appearance descriptor approach to facial actions temporal modeling
- Author
-
Jiang, Bihan, Valstar, Michel, Martinez, Brais, Pantic, Maja, Jiang, Bihan, Valstar, Michel, Martinez, Brais, and Pantic, Maja
- Abstract
Both the configuration and the dynamics of facial expressions are crucial for the interpretation of human facial behavior. Yet to date, the vast majority of reported efforts in the field either do not take the dynamics of facial expressions into account, or focus only on prototypic facial expressions of six basic emotions. Facial dynamics can be explicitly analyzed by detecting the constituent temporal segments in Facial Action Coding System (FACS) Action Units (AUs)-onset, apex, and offset. In this paper, we present a novel approach to explicit analysis of temporal dynamics of facial actions using the dynamic appearance descriptor Local Phase Quantization from Three Orthogonal Planes (LPQ-TOP). Temporal segments are detected by combining a discriminative classifier for detecting the temporal segments on a frame-by-frame basis with Markov Models that enforce temporal consistency over the whole episode. The system is evaluated in detail over the MMI facial expression database, the UNBC-McMaster pain database, the SAL database, the GEMEP-FERA dataset in database-dependent experiments, in cross-database experiments using the Cohn-Kanade, and the SEMAINE databases. The comparison with other state-of-the-art methods shows that the proposed LPQ-TOP method outperforms the other approaches for the problem of AU temporal segment detection, and that overall AU activation detection benefits from dynamic appearance information.
- Full Text
- View/download PDF
219. A CNN cascade for landmark guided semantic part segmentation
- Author
-
Jackson, Aaron S., Valstar, Michel, Tzimiropoulos, Georgios, Jackson, Aaron S., Valstar, Michel, and Tzimiropoulos, Georgios
- Abstract
This paper proposes a CNN cascade for semantic part segmentation guided by pose-specifc information encoded in terms of a set of landmarks (or keypoints). There is large amount of prior work on each of these tasks separately, yet, to the best of our knowledge, this is the first time in literature that the interplay between pose estimation and semantic part segmentation is investigated. To address this limitation of prior work, in this paper, we propose a CNN cascade of tasks that firstly performs landmark localisation and then uses this information as input for guiding semantic part segmentation. We applied our architecture to the problem of facial part segmentation and report large performance improvement over the standard unguided network on the most challenging face datasets. Testing code and models will be published online at http://cs.nott.ac.uk/~psxasj/.
220. Topic switch models for dialogue management in virtual humans
- Author
-
Zhu, Wenjue, Chowanda, Andry, Valstar, Michel F., Zhu, Wenjue, Chowanda, Andry, and Valstar, Michel F.
- Abstract
This paper presents a novel data-driven Topic Switch Model based on a cognitive representation of a limited set of topics that are currently in-focus, which determines what utterances are chosen next. The transition model was statistically learned from a large set of transcribed dyadic interactions. Results show that using our proposed model results in interactions that on average last 2.17 times longer compared to the same system without our model.
221. Cascaded continuous regression for real-time incremental face tracking
- Author
-
Sánchez Lozano, Enrique, Martinez, Brais, Tzimiropoulos, Georgios, Valstar, Michel F., Sánchez Lozano, Enrique, Martinez, Brais, Tzimiropoulos, Georgios, and Valstar, Michel F.
- Abstract
This paper introduces a novel real-time algorithm for facial landmark tracking. Compared to detection, tracking has both additional challenges and opportunities. Arguably the most important aspect in this domain is updating a tracker's models as tracking progresses, also known as incremental (face) tracking. While this should result in more accurate localisation, how to do this online and in real time without causing a tracker to drift is still an important open research question. We address this question in the cascaded regression framework, the state-of-the-art approach for facial landmark localisation. Because incremental learning for cascaded regression is costly, we propose a much more efficient yet equally accurate alternative using continuous regression. More specifically, we first propose cascaded continuous regression (CCR) and show its accuracy is equivalent to the Supervised Descent Method. We then derive the incremental learning updates for CCR (iCCR) and show that it is an order of magnitude faster than standard incremental learning for cascaded regression, bringing the time required for the update from seconds down to a fraction of a second, thus enabling real-time tracking. Finally, we evaluate iCCR and show the importance of incremental learning in achieving state-of-the-art performance. Code for our iCCR is available from http://www.cs.nott.ac.uk/~psxes1.
222. Playing with social and emotional game companions
- Author
-
Chowanda, Andry, Flintham, Martin, Blanchfield, Peter, Valstar, Michel, Chowanda, Andry, Flintham, Martin, Blanchfield, Peter, and Valstar, Michel
- Abstract
This paper presents the findings of an empirical study that explores player game experience by implementing the ERiSA Framework in games. A study with Action Role-Playing Game (RPG) was designed to evaluate player interactions with game companions, who were imbued with social and emotional skill by the ERiSA Framework. Players had to complete a quest in the Skyrim game, in which players had to use social and emotional skills to obtain a sword. The results clearly show that game companions who are capable of perceiving and exhibit emotions, are perceived to have personality and can forge relationships with the players, enhancing the player experience during the game.
223. Play SMILE Game with ERiSA: a user study on game companions
- Author
-
Chowanda, Andry, Blanchfield, Peter, Flintham, Martin D., Valstar, Michel F., Chowanda, Andry, Blanchfield, Peter, Flintham, Martin D., and Valstar, Michel F.
- Abstract
This paper describes the evaluation of our fully integrated virtual game companions framework (ERiSA) [4]. We conducted three user studies with different scenarios using two versions of The Smile Game[4] in semi-public and public spaces. In our study, we show that the game companions' personality was successfully perceived by the participants while interacting and playing with the game companions. Topic about the game itself was the most popular topic with total 598 occurrences in our studies. Moreover, facial expressions is the most performed type of attack in the game. Finally, from the large number of video data collected, we aim to automatically learn the interaction rules and additional attack movements.
- Full Text
- View/download PDF
224. Objective methods for reliable detection of concealed depression
- Author
-
Solomon, Cynthia, Valstar, Michel F., Morriss, Richard K., Crowe, John, Solomon, Cynthia, Valstar, Michel F., Morriss, Richard K., and Crowe, John
- Abstract
Recent research has shown that it is possible to automatically detect clinical depression from audio-visual recordings. Before considering integration in a clinical pathway, a key question that must be asked is whether such systems can be easily fooled. This work explores the potential of acoustic features to detect clinical depression in adults both when acting normally and when asked to conceal their depression. Nine adults diagnosed with mild to moderate depression as per the Beck Depression Inventory (BDI-II) and Patient Health Questionnaire (PHQ-9) were asked a series of questions and to read a excerpt from a novel aloud under two different experimental conditions. In one, participants were asked to act naturally and in the other, to suppress anything that they felt would be indicative of their depression. Acoustic features were then extracted from this data and analysed using paired t-tests to determine any statistically significant differences between healthy and depressed participants. Most features that were found to be significantly different during normal behaviour remained so during concealed behaviour. In leave-one-subject-out automatic classification studies of the 9 depressed subjects and 8 matched healthy controls, an 88% classification accuracy and 89% sensitivity was achieved. Results remained relatively robust during concealed behaviour, with classifiers trained on only non-concealed data achieving 81% detection accuracy and 75% sensitivity when tested on concealed data. These results indicate there is good potential to build deception-proof automatic depression monitoring systems.
- Full Text
- View/download PDF
225. AV+ EC 2015--the first affect recognition challenge bridging across audio, video, and physiological data
- Author
-
Ringeval, Fabien, Schuller, Björn, Valstar, Michel, Jaiswal, Shashank, Marchi, Erik, Lalanne, Denis, Cowie, Roddy, Pantic, Maja, Ringeval, Fabien, Schuller, Björn, Valstar, Michel, Jaiswal, Shashank, Marchi, Erik, Lalanne, Denis, Cowie, Roddy, and Pantic, Maja
- Abstract
We present the first Audio-Visual+ Emotion recognition Challenge and workshop (AV+EC 2015) aimed at comparison of multimedia processing and machine learning methods for automatic audio, visual and physiological emotion analysis. This is the 5th event in the AVEC series, but the very first Challenge that bridges across audio, video and physiological data. The goal of the Challenge is to provide a common benchmark test set for multimodal information processing and to bring together the audio, video and physiological emotion recognition communities, to compare the relative merits of the three approaches to emotion recognition under well-defined and strictly comparable conditions and establish to what extent fusion of the approaches is possible and beneficial. This paper presents the challenge, the dataset and the performance of the baseline system.
226. The automatic detection of chronic pain-related expression: requirements, challenges and a multimodal dataset
- Author
-
Aung, Min S.H., Kaltwang, Sebastian, Romera-Paredes, Bernardino, Martinez, Brais, Singh, Aneesha, Cella, Matteo, Valstar, Michel F., Meng, Hongying, Kemp, Andrew, Shafizadeh, Moshen, Elkins, Aaron, Kanakam, Natalie, Rothschild, Amshal de, Tyler, Nick, Watson, Paul J., Williams, Amanda C. de C., Pantic, Maja, Bianchi-Berthouze, Nadia, Aung, Min S.H., Kaltwang, Sebastian, Romera-Paredes, Bernardino, Martinez, Brais, Singh, Aneesha, Cella, Matteo, Valstar, Michel F., Meng, Hongying, Kemp, Andrew, Shafizadeh, Moshen, Elkins, Aaron, Kanakam, Natalie, Rothschild, Amshal de, Tyler, Nick, Watson, Paul J., Williams, Amanda C. de C., Pantic, Maja, and Bianchi-Berthouze, Nadia
- Abstract
Pain-related emotions are a major barrier to effective self rehabilitation in chronic pain. Automated coaching systems capable of detecting these emotions are a potential solution. This paper lays the foundation for the development of such systems by making three contributions. First, through literature reviews, an overview of how chronic pain is expressed and the motivation for detecting it in physical rehabilitation is provided. Second, a fully labelled multimodal dataset containing high resolution multiple-view face videos, head mounted and room audio signals, full body 3-D motion capture and electromyographic signals from back muscles is supplied. Natural unconstrained pain related facial expressions and body movement behaviours were elicited from people with chronic pain carrying out physical exercises. Both instructed and non- instructed exercises where considered to reflect different rehabilitation scenarios. Two sets of labels were assigned: level of pain from facial expressions annotated by eight raters and the occurrence of six pain-related body behaviours segmented by four experts. Third, through exploratory experiments grounded in the data, the factors and challenges in the automated recognition of such expressions and behaviour are described, the paper concludes by discussing potential avenues in the context of these findings also highlighting differences for the two exercise scenarios addressed.
- Full Text
- View/download PDF
227. TRIC-track: tracking by regression with incrementally learned cascades
- Author
-
Wang, Xiaomeng, Valstar, Michel F., Martinez, Brais, Khan, Muhammad Haris, Pridmore, Tony, Wang, Xiaomeng, Valstar, Michel F., Martinez, Brais, Khan, Muhammad Haris, and Pridmore, Tony
- Abstract
This paper proposes a novel approach to part-based track- ing by replacing local matching of an appearance model by direct prediction of the displacement between local image patches and part locations. We propose to use cascaded regression with incremental learning to track generic objects without any prior knowledge of an object’s structure or appearance. We exploit the spatial constraints between parts by implicitly learning the shape and deformation parameters of the object in an online fashion. We integrate a multiple temporal scale motion model to initialise our cascaded regression search close to the target and to allow it to cope with occlusions. Experimental results show that our tracker ranks first on the CVPR 2013 Benchmark.
228. Learning to transfer: transferring latent task structures and its application to person-specific facial action unit detection
- Author
-
Almaev, Timur, Martinez, Brais, Valstar, Michel F., Almaev, Timur, Martinez, Brais, and Valstar, Michel F.
- Abstract
In this article we explore the problem of constructing person-specific models for the detection of facial Action Units (AUs), addressing the problem from the point of view of Transfer Learning and Multi-Task Learning. Our starting point is the fact that some expressions, such as smiles, are very easily elicited, annotated, and automatically detected, while others are much harder to elicit and to annotate. We thus consider a novel problem: all AU models for the tar- get subject are to be learnt using person-specific annotated data for a reference AU (AU12 in our case), and no data or little data regarding the target AU. In order to design such a model, we propose a novel Multi-Task Learning and the associated Transfer Learning framework, in which we con- sider both relations across subjects and AUs. That is to say, we consider a tensor structure among the tasks. Our approach hinges on learning the latent relations among tasks using one single reference AU, and then transferring these latent relations to other AUs. We show that we are able to effectively make use of the annotated data for AU12 when learning other person-specific AU models, even in the absence of data for the target task. Finally, we show the excellent performance of our method when small amounts of annotated data for the target tasks are made available.
229. Finding information about mental health in microblogging platforms: a Case study of depression
- Author
-
Wilson, Max L., Ali, Susan Abubakir, Valstar, Michel F., Wilson, Max L., Ali, Susan Abubakir, and Valstar, Michel F.
- Abstract
Searching for online health information has been well studied in web search, but social media, such as public microblogging services, are well known for different types of tacit information: personal experience and shared information. Finding useful information in public microblogging platforms is an on-going hard problem and so to begin to develop a better model of what health information can be found, Twitter posts using the word “depression” were examined as a case study of a search for a prevalent mental health issue. 13,279 public tweets were analysed using a mixed methods approach and compared to a general sample of tweets. First, a linguistic analysis suggested that tweets mentioning depression were typically anxious but not angry, and were less likely to be in the first person, indicating that most were not from individuals discussing their own depression. Second, to un-derstand what types of tweets can be found, an inductive thematic analysis revealed three major themes: 1) dissemi-nating information or link of information, 2) self-disclosing, and 3) the sharing of overall opinion; each had significantly different linguistic patterns. We conclude with a discussion of how different types of posts about mental health may be retrieved from public social media like Twitter.
230. Objective methods for reliable detection of concealed depression
- Author
-
Solomon, Cynthia, Valstar, Michel F., Morriss, Richard K., Crowe, John, Solomon, Cynthia, Valstar, Michel F., Morriss, Richard K., and Crowe, John
- Abstract
Recent research has shown that it is possible to automatically detect clinical depression from audio-visual recordings. Before considering integration in a clinical pathway, a key question that must be asked is whether such systems can be easily fooled. This work explores the potential of acoustic features to detect clinical depression in adults both when acting normally and when asked to conceal their depression. Nine adults diagnosed with mild to moderate depression as per the Beck Depression Inventory (BDI-II) and Patient Health Questionnaire (PHQ, Chang, 2012) were asked a series of questions and to read a excerpt from a novel aloud under two different experimental conditions. In one, participants were asked to act naturally and in the other, to suppress anything that they felt would be indicative of their depression. Acoustic features were then extracted from this data and analyzed using paired t-tests to determine any statistically significant differences between healthy and depressed participants. Most features that were found to be significantly different during normal behavior remained so during concealed behavior. In leave-one-subject-out automatic classification studies of the 9 depressed subjects and 8 matched healthy controls, an 88% classification accuracy and 89% sensitivity was achieved. Results remained relatively robust during concealed behavior, with classifiers trained on only non-concealed data achieving 81% detection accuracy and 75% sensitivity when tested on concealed data. These results indicate there is good potential to build deception-proof automatic depression monitoring systems.
- Full Text
- View/download PDF
231. ALTCAI: Enabling the Use of Embodied Conversational Agents to Deliver Informal Health Advice during Wizard of Oz Studies
- Author
-
Galvez Trigo, Maria J., Porcheron, Martin, Egede, Joy, Fischer, Joel E., Hazzard, Adrian, Greenhalgh, Chris, Bodiaj, Edgar, Valstar, Michel, Galvez Trigo, Maria J., Porcheron, Martin, Egede, Joy, Fischer, Joel E., Hazzard, Adrian, Greenhalgh, Chris, Bodiaj, Edgar, and Valstar, Michel
- Abstract
We present ALTCAI, a Wizard of Oz Embodied Conversational Agent that has been developed to explore the use of interactive agents as an effective and engaging tool for delivering health and well-being advice to expectant and nursing mothers in Nigeria. This paper briefly describes the motivation and context for its creation, ALTCAI’s various components, and presents a discussion on its adaptability and potential uses in other contexts, as well as on potential future work on extending its functionality.
232. Design and evaluation of virtual human mediated tasks for assessment of depression and anxiety
- Author
-
Egede, Joy O., Price, Dominic, Krishnan, Deepa B., Jaiswal, Shashank, Elliott, Natasha, Morriss, Richard, Galvez Trigo, Maria J., Nixon, Neil, Liddle, Peter, Greenhalgh, Christopher, Valstar, Michel, Egede, Joy O., Price, Dominic, Krishnan, Deepa B., Jaiswal, Shashank, Elliott, Natasha, Morriss, Richard, Galvez Trigo, Maria J., Nixon, Neil, Liddle, Peter, Greenhalgh, Christopher, and Valstar, Michel
- Abstract
Virtual human technologies are now being widely explored as therapy tools for mental health disorders including depression and anxiety. These technologies leverage the ability of the virtual agents to engage in naturalistic social interactions with a user to elicit behavioural expressions which are indicative of depression and anxiety. Research efforts have focused on optimising the human-like expressive capabilities of the virtual human, but less attention has been given to investigating the effect of virtual human mediation on the expressivity of the user. In addition, it is still not clear what an optimal task is or what task characteristics are likely to sustain long term user engagement. To this end, this paper describes the design and evaluation of virtual human-mediated tasks in a user study of 56 participants. Half the participants complete tasks guided by a virtual human, while the other half are guided by text on screen. Self-reported PHQ9 scores, biosignals and participants' ratings of tasks are collected. Findings show that virtual-human mediation influences behavioural expressiveness and this observation differs for different depression severity levels. It further shows that virtual human mediation improves users' disposition towards tasks.
233. Designing an adaptive embodied conversational agent for health literacy
- Author
-
Egede, Joy, Trigo, Maria J. Galvez, Hazzard, Adrian, Porcheron, Martin, Bodiaj, Edgar, Fischer, Joel E., Greenhalgh, Chris, Valstar, Michel, Egede, Joy, Trigo, Maria J. Galvez, Hazzard, Adrian, Porcheron, Martin, Bodiaj, Edgar, Fischer, Joel E., Greenhalgh, Chris, and Valstar, Michel
- Abstract
ccess to healthcare advice is crucial to promote healthy societies. Many factors shape how access might be constrained, such as economic status, education or, as the COVID-19 pandemic has shown, remote consultations with health practitioners. Our work focuses on providing pre/post-natal advice to maternal women. A salient factor of our work concerns the design and deployment of embodied conversation agents (ECAs) which can sense the (health) literacy of users and adapt to scaffold user engagement in this setting. We present an account of a Wizard of Oz user study of 'ALTCAI', an ECA with three modes of interaction (i.e., adaptive speech and text, adaptive ECA, and non-adaptive ECA). We compare reported engagement with these modes from 44 maternal women who have differing levels of literacy. The study shows that a combination of embodiment and adaptivity scaffolds reported engagement, but matters of health-literacy and language introduce nuanced considerations for the design of ECAs.
234. Digital innovations in L2 motivation: harnessing the power of the Ideal L2 Self
- Author
-
Adolphs, Svenja, Clark, Leigh, Dörnyei, Zoltán, Glover, Tony, Henry, Alastair, Muir, Christine, Sánchez-Lozano, Enrique, Valstar, Michel, Adolphs, Svenja, Clark, Leigh, Dörnyei, Zoltán, Glover, Tony, Henry, Alastair, Muir, Christine, Sánchez-Lozano, Enrique, and Valstar, Michel
- Abstract
Sustained motivation is crucial to learning a second language (L2), and one way to support this can be through the mental visualisation of ideal L2 selves (Dörnyei & Kubanyiova, 2014). This paper reports on an exploratory study which investigated the possibility of using technology to create representations of language learners’ ideal L2 selves digitally. Nine Chinese learners of L2 English were invited to three semi-structured interviews to discuss their ideal L2 selves and their future language goals, as well as their opinions on several different technological approaches to representing their ideal L2 selves. Three approaches were shown to participants: (a) 2D and 3D animations, (b) Facial Overlay, and (c) Facial Mask. Within these, several iterations were also included (e.g. with/without background or context). Results indicate that 3D animation currently offers the best approach in terms of realism and animation of facial features, and improvements to Facial Overlay could lead to beneficial results in the future. Approaches using the 2D animations and the Facial Mask approach appeared to have little future potential. The descriptive details of learners’ ideal L2 selves also provide preliminary directions for the development of content that might be included in future technology-based interventions.
- Full Text
- View/download PDF
235. Predicting folds in poker using action unit detectors and decision trees
- Author
-
Vinkemeier, Doratha, Valstar, Michel F., Gratch, Jonathan, Vinkemeier, Doratha, Valstar, Michel F., and Gratch, Jonathan
- Abstract
Predicting how a person will respond can be very useful, for instance when designing a strategy for negotiations. We investigate whether it is possible for machine learning and computer vision techniques to recognize a person’s intentions and predict their actions based on their visually expressive behaviour, where in this paper we focus on the face. We have chosen as our setting pairs of humans playing a simplified version of poker, where the players are behaving naturally and spontaneously, albeit mediated through a computer connection. In particular, we ask if we can automatically predict whether a player is going to fold or not. We also try to answer the question of at what time point the signal for predicting if a player will fold is strongest. We use state-of-the-art FACS Action Unit detectors to automatically annotate the players facial expressions, which have been recorded on video. In addition, we use timestamps of when the player received their card and when they placed their bets, as well as the amounts they bet. Thus, the system is fully automated. We are able to predict whether a person will fold or not significantly better than chance based solely on their expressive behaviour starting three seconds before they fold.
236. Human behaviour-based automatic depression analysis using hand-crafted statistics and deep learned spectral features
- Author
-
Song, Siyang, Shen, Linlin, Valstar, Michel F., Song, Siyang, Shen, Linlin, and Valstar, Michel F.
- Abstract
Depression is a serious mental disorder that affects millions of people all over the world. Traditional clinical diagnosis methods are subjective, complicated and need extensive participation of experts. Audio-visual automatic depression analysis systems predominantly base their predictions on very brief sequential segments, sometimes as little as one frame. Such data contains much redundant information, causes a high computational load, and negatively affects the detection accuracy. Final decision making at the sequence level is then based on the fusion of frame or segment level predictions. However, this approach loses longer term behavioural correlations, as the behaviours themselves are abstracted away by the frame-level predictions. We propose to on the one hand use automatically detected human behaviour primitives such as Gaze directions, Facial action units (AU), etc. as low-dimensional multi-channel time series data, which can then be used to create two sequence descriptors. The first calculates the sequence-level statistics of the behaviour primitives and the second casts the problem as a Convolutional Neural Network problem operating on a spectral representation of the multichannel behaviour signals. The results of depression detection (binary classification) and severity estimation (regression) experiments conducted on the AVEC 2016 DAIC-WOZ database show that both methods achieved significant improvement compared to the previous state of the art in terms of the depression severity estimation.
237. AVEC 2016 – Depression, mood, and emotion recognition workshop and challenge
- Author
-
Valstar, Michel F., Gratch, Jonathan, Schuller, Björn, Ringeval, Fabien, Lalanne, Denis, Torres, Mercedes Torres, Scherer, Stefan, Stratou, Giota, Cowie, Roddy, Pantic, Maja, Valstar, Michel F., Gratch, Jonathan, Schuller, Björn, Ringeval, Fabien, Lalanne, Denis, Torres, Mercedes Torres, Scherer, Stefan, Stratou, Giota, Cowie, Roddy, and Pantic, Maja
- Abstract
The Audio/Visual Emotion Challenge and Workshop (AVEC 2016) "Depression, Mood and Emotion" will be the sixth competition event aimed at comparison of multimedia processing and machine learning methods for automatic audio, visual and physiological depression and emotion analysis, with all participants competing under strictly the same conditions. The goal of the Challenge is to provide a common benchmark test set for multi-modal information processing and to bring together the depression and emotion recognition communities, as well as the audio, video and physiological processing communities, to compare the relative merits of the various approaches to depression and emotion recognition under well-defined and strictly comparable conditions and establish to what extent fusion of the approaches is possible and beneficial. This paper presents the challenge guidelines, the common data used, and the performance of the baseline system on the two tasks.
238. ChaLearn Looking at People and Faces of the World: Face AnalysisWorkshop and Challenge 2016
- Author
-
Escalera, Sergio, Torres, Mercedes Torres, Martinez, Brais, Baro, Xavier, Escalante, Hugo Jair, Guyon, Isabelle, Tzimiropoulos, Georgios, Corneanu, Ciprian, Oliu, Marc, Bagheri, Mohammad Ali, Valstar, Michel, Escalera, Sergio, Torres, Mercedes Torres, Martinez, Brais, Baro, Xavier, Escalante, Hugo Jair, Guyon, Isabelle, Tzimiropoulos, Georgios, Corneanu, Ciprian, Oliu, Marc, Bagheri, Mohammad Ali, and Valstar, Michel
- Abstract
We present the 2016 ChaLearn Looking at People and Faces of the World Challenge and Workshop, which ran three competitions on the common theme of face analysis from still images. The first one, Looking at People, addressed age estimation, while the second and third competitions, Faces of the World, addressed accessory classification and smile and gender classification, respectively. We present two crowd-sourcing methodologies used to collect manual annotations. A custom-build application was used to collect and label data about the apparent age of people (as opposed to the real age). For the Faces of the World data, the citizen-science Zooniverse platform was used. This paper summarizes the three challenges and the data used, as well as the results achieved by the participants of the competitions. Details of the ChaLearn LAP FotW competitions can be found at http://gesture.chalearn.org.
239. The NoXi database: multimodal recordings of mediated novice-expert interactions
- Author
-
Cafaro, Angelo, Wagner, Johannes, Baur, Tobias, Dermouche, Soumia, Torres, Mercedes Torres, Pelachaud, Catherine, André, Elisabeth, Valstar, Michel F., Cafaro, Angelo, Wagner, Johannes, Baur, Tobias, Dermouche, Soumia, Torres, Mercedes Torres, Pelachaud, Catherine, André, Elisabeth, and Valstar, Michel F.
- Abstract
We present a novel multi-lingual database of natural dyadic novice-expert interactions, named NoXi, featuring screen-mediated dyadic human interactions in the context of information exchange and retrieval. NoXi is designed to provide spontaneous interactions with emphasis on adaptive behaviors and unexpected situations (e.g. conversational interruptions). A rich set of audio-visual data, as well as continuous and discrete annotations are publicly available through a web interface. Descriptors include low level social signals (e.g. gestures, smiles), functional descriptors (e.g. turn-taking, dialogue acts) and interaction descriptors (e.g. engagement, interest, and fluidity).
240. AVEC 2017--Real-life depression, and affect recognition workshop and challenge
- Author
-
Ringeval, Fabien, Schuller, Björn, Valstar, Michel, Gratch, Jonathan, Cowie, Roddy, Scherer, Stefan, Mozgai, Sharon, Cummins, Nicholas, Schmitt, Maximilian, Pantic, Maja, Ringeval, Fabien, Schuller, Björn, Valstar, Michel, Gratch, Jonathan, Cowie, Roddy, Scherer, Stefan, Mozgai, Sharon, Cummins, Nicholas, Schmitt, Maximilian, and Pantic, Maja
- Abstract
The Audio/Visual Emotion Challenge and Workshop (AVEC 2017) “Real-life depression, and affect” will be the seventh competition event aimed at comparison of multimedia processing and machine learning methods for automatic audiovisual depression and emotion analysis, with all participants competing under strictly the same conditions. .e goal of the Challenge is to provide a common benchmark test set for multimodal information processing and to bring together the depression and emotion recognition communities, as well as the audiovisual processing communities, to compare the relative merits of the various approaches to depression and emotion recognition from real-life data. .is paper presents the novelties introduced this year, the challenge guidelines, the data used, and the performance of the baseline system on the two proposed tasks: dimensional emotion recognition (time and value-continuous), and dimensional depression estimation (value-continuous).
241. A functional regression approach to facial landmark tracking
- Author
-
Sánchez-Lozano, Enrique, Tzimiropoulos, Georgios, Martinez, Brais, De la Torre, Fernando, Valstar, Michel, Sánchez-Lozano, Enrique, Tzimiropoulos, Georgios, Martinez, Brais, De la Torre, Fernando, and Valstar, Michel
- Abstract
Linear regression is a fundamental building block in many face detection and tracking algorithms, typically used to predict shape displacements from image features through a linear mapping. This paper presents a Functional Regression solution to the least squares problem, which we coin Continuous Regression, resulting in the first real-time incremental face tracker. Contrary to prior work in Functional Regression, in which B-splines or Fourier series were used, we propose to approximate the input space by its first-order Taylor expansion, yielding a closed-form solution for the continuous domain of displacements. We then extend the continuous least squares problem to correlated variables, and demonstrate the generalisation of our approach. We incorporate Continuous Regression into the cascaded regression framework, and show its computational benefits for both training and testing. We then present a fast approach for incremental learning within Cascaded Continuous Regression, coined iCCR, and show that its complexity allows real-time face tracking, being 20 times faster than the state of the art. To the best of our knowledge, this is the first incremental face tracker that is shown to operate in real-time. We show that iCCR achieves state-of-the-art performance on the 300-VW dataset, the most recent, large-scale benchmark for face tracking.
- Full Text
- View/download PDF
242. Ask Alice: an artificial retrieval of information agent
- Author
-
Valstar, Michel F., Baur, Tobias, Cafaro, Angelo, Ghitulescu, Alexandru, Potard, Blaise, Wagner, Johannes, André, Elisabeth, Durieu, Laurent, Aylett, Matthew, Dermouche, Soumia, Pelachaud, Catherine, Coutinho, Eduardo, Schuller, Björn, Zhang, Yue, Heylen, Dirk, Theune, Mariët, Waterschoot, Jelte van, Valstar, Michel F., Baur, Tobias, Cafaro, Angelo, Ghitulescu, Alexandru, Potard, Blaise, Wagner, Johannes, André, Elisabeth, Durieu, Laurent, Aylett, Matthew, Dermouche, Soumia, Pelachaud, Catherine, Coutinho, Eduardo, Schuller, Björn, Zhang, Yue, Heylen, Dirk, Theune, Mariët, and Waterschoot, Jelte van
- Abstract
We present a demonstration of the ARIA framework, a modular approach for rapid development of virtual humans for information retrieval that have linguistic, emotional, and social skills and a strong personality. We demonstrate the framework’s capabilities in a scenario where ‘Alice in Wonderland’, a popular English literature book, is embodied by a virtual human representing Alice. The user can engage in an information exchange dialogue, where Alice acts as the expert on the book, and the user as an interested novice. Besides speech recognition, sophisticated audio-visual behaviour analysis is used to inform the core agent dialogue module about the user’s state and intentions, so that it can go beyond simple chat-bot dialogue. The behaviour generation module features a unique new capability of being able to deal gracefully with interruptions of the agent.
243. Fusing deep learned and hand-crafted features of appearance, shape, and dynamics for automatic pain estimation
- Author
-
Egede, Joy Onyekachukwu, Valstar, Michel F., Martinez, Brais, Egede, Joy Onyekachukwu, Valstar, Michel F., and Martinez, Brais
- Abstract
Automatic continuous time, continuous value assessment of a patient's pain from face video is highly sought after by the medical profession. Despite the recent advances in deep learning that attain impressive results in many domains, pain estimation risks not being able to benefit from this due to the difficulty in obtaining data sets of considerable size. In this work we propose a combination of hand-crafted and deep-learned features that makes the most of deep learning techniques in small sample settings. Encoding shape, appearance, and dynamics, our method significantly outperforms the current state of the art, attaining a RMSE error of less than 1 point on a 16-level pain scale, whilst simultaneously scoring a 67.3% Pearson correlation coefficient between our predicted pain level time series and the ground truth.
244. A CNN cascade for landmark guided semantic part segmentation
- Author
-
Jackson, Aaron S., Valstar, Michel, Tzimiropoulos, Georgios, Jackson, Aaron S., Valstar, Michel, and Tzimiropoulos, Georgios
- Abstract
This paper proposes a CNN cascade for semantic part segmentation guided by pose-specifc information encoded in terms of a set of landmarks (or keypoints). There is large amount of prior work on each of these tasks separately, yet, to the best of our knowledge, this is the first time in literature that the interplay between pose estimation and semantic part segmentation is investigated. To address this limitation of prior work, in this paper, we propose a CNN cascade of tasks that firstly performs landmark localisation and then uses this information as input for guiding semantic part segmentation. We applied our architecture to the problem of facial part segmentation and report large performance improvement over the standard unguided network on the most challenging face datasets. Testing code and models will be published online at http://cs.nott.ac.uk/~psxasj/.
245. A dynamic appearance descriptor approach to facial actions temporal modeling
- Author
-
Jiang, Bihan, Valstar, Michel, Martinez, Brais, Pantic, Maja, Jiang, Bihan, Valstar, Michel, Martinez, Brais, and Pantic, Maja
- Abstract
Both the configuration and the dynamics of facial expressions are crucial for the interpretation of human facial behavior. Yet to date, the vast majority of reported efforts in the field either do not take the dynamics of facial expressions into account, or focus only on prototypic facial expressions of six basic emotions. Facial dynamics can be explicitly analyzed by detecting the constituent temporal segments in Facial Action Coding System (FACS) Action Units (AUs)-onset, apex, and offset. In this paper, we present a novel approach to explicit analysis of temporal dynamics of facial actions using the dynamic appearance descriptor Local Phase Quantization from Three Orthogonal Planes (LPQ-TOP). Temporal segments are detected by combining a discriminative classifier for detecting the temporal segments on a frame-by-frame basis with Markov Models that enforce temporal consistency over the whole episode. The system is evaluated in detail over the MMI facial expression database, the UNBC-McMaster pain database, the SAL database, the GEMEP-FERA dataset in database-dependent experiments, in cross-database experiments using the Cohn-Kanade, and the SEMAINE databases. The comparison with other state-of-the-art methods shows that the proposed LPQ-TOP method outperforms the other approaches for the problem of AU temporal segment detection, and that overall AU activation detection benefits from dynamic appearance information.
- Full Text
- View/download PDF
246. Topic switch models for dialogue management in virtual humans
- Author
-
Zhu, Wenjue, Chowanda, Andry, Valstar, Michel F., Zhu, Wenjue, Chowanda, Andry, and Valstar, Michel F.
- Abstract
This paper presents a novel data-driven Topic Switch Model based on a cognitive representation of a limited set of topics that are currently in-focus, which determines what utterances are chosen next. The transition model was statistically learned from a large set of transcribed dyadic interactions. Results show that using our proposed model results in interactions that on average last 2.17 times longer compared to the same system without our model.
247. Cascaded continuous regression for real-time incremental face tracking
- Author
-
Sánchez Lozano, Enrique, Martinez, Brais, Tzimiropoulos, Georgios, Valstar, Michel F., Sánchez Lozano, Enrique, Martinez, Brais, Tzimiropoulos, Georgios, and Valstar, Michel F.
- Abstract
This paper introduces a novel real-time algorithm for facial landmark tracking. Compared to detection, tracking has both additional challenges and opportunities. Arguably the most important aspect in this domain is updating a tracker's models as tracking progresses, also known as incremental (face) tracking. While this should result in more accurate localisation, how to do this online and in real time without causing a tracker to drift is still an important open research question. We address this question in the cascaded regression framework, the state-of-the-art approach for facial landmark localisation. Because incremental learning for cascaded regression is costly, we propose a much more efficient yet equally accurate alternative using continuous regression. More specifically, we first propose cascaded continuous regression (CCR) and show its accuracy is equivalent to the Supervised Descent Method. We then derive the incremental learning updates for CCR (iCCR) and show that it is an order of magnitude faster than standard incremental learning for cascaded regression, bringing the time required for the update from seconds down to a fraction of a second, thus enabling real-time tracking. Finally, we evaluate iCCR and show the importance of incremental learning in achieving state-of-the-art performance. Code for our iCCR is available from http://www.cs.nott.ac.uk/~psxes1.
248. Playing with social and emotional game companions
- Author
-
Chowanda, Andry, Flintham, Martin, Blanchfield, Peter, Valstar, Michel, Chowanda, Andry, Flintham, Martin, Blanchfield, Peter, and Valstar, Michel
- Abstract
This paper presents the findings of an empirical study that explores player game experience by implementing the ERiSA Framework in games. A study with Action Role-Playing Game (RPG) was designed to evaluate player interactions with game companions, who were imbued with social and emotional skill by the ERiSA Framework. Players had to complete a quest in the Skyrim game, in which players had to use social and emotional skills to obtain a sword. The results clearly show that game companions who are capable of perceiving and exhibit emotions, are perceived to have personality and can forge relationships with the players, enhancing the player experience during the game.
249. Play SMILE Game with ERiSA: a user study on game companions
- Author
-
Chowanda, Andry, Blanchfield, Peter, Flintham, Martin D., Valstar, Michel F., Chowanda, Andry, Blanchfield, Peter, Flintham, Martin D., and Valstar, Michel F.
- Abstract
This paper describes the evaluation of our fully integrated virtual game companions framework (ERiSA) [4]. We conducted three user studies with different scenarios using two versions of The Smile Game[4] in semi-public and public spaces. In our study, we show that the game companions' personality was successfully perceived by the participants while interacting and playing with the game companions. Topic about the game itself was the most popular topic with total 598 occurrences in our studies. Moreover, facial expressions is the most performed type of attack in the game. Finally, from the large number of video data collected, we aim to automatically learn the interaction rules and additional attack movements.
- Full Text
- View/download PDF
250. AV+ EC 2015--the first affect recognition challenge bridging across audio, video, and physiological data
- Author
-
Ringeval, Fabien, Schuller, Björn, Valstar, Michel, Jaiswal, Shashank, Marchi, Erik, Lalanne, Denis, Cowie, Roddy, Pantic, Maja, Ringeval, Fabien, Schuller, Björn, Valstar, Michel, Jaiswal, Shashank, Marchi, Erik, Lalanne, Denis, Cowie, Roddy, and Pantic, Maja
- Abstract
We present the first Audio-Visual+ Emotion recognition Challenge and workshop (AV+EC 2015) aimed at comparison of multimedia processing and machine learning methods for automatic audio, visual and physiological emotion analysis. This is the 5th event in the AVEC series, but the very first Challenge that bridges across audio, video and physiological data. The goal of the Challenge is to provide a common benchmark test set for multimodal information processing and to bring together the audio, video and physiological emotion recognition communities, to compare the relative merits of the three approaches to emotion recognition under well-defined and strictly comparable conditions and establish to what extent fusion of the approaches is possible and beneficial. This paper presents the challenge, the dataset and the performance of the baseline system.
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.